📚 NoKV - Awesome Go Library for Database

Go Gopher mascot for NoKV

High-performance distributed KV storage based on LSM Tree

🏷️ Database
📂 Databases Implemented in Go
0 stars
View on GitHub 🔗

Detailed Description of NoKV

🚀 NoKV — Not Only KV Store

NoKV Logo

CI Coverage Go Report Card Go Reference Mentioned in Awesome DBDB.io

Go Version License DeepWiki

Not Only KV Store • LSM Tree • ValueLog • MVCC • Multi-Raft Regions • Redis-Compatible

NoKV stands for Not Only KV Store. It is a Go-native storage system that starts as a serious standalone engine and grows into a multi-Raft distributed KV cluster without changing its underlying data plane.

The interesting part is not just that it has WAL, LSM, MVCC, Redis compatibility, or Raft. The interesting part is that these pieces are built as one system: a single storage substrate that can be embedded locally, migrated into a seeded distributed node, and then expanded into a replicated cluster with an explicit protocol.

NoKV is not trying to be "yet another KV". It is trying to make the path from standalone storage to distributed replication coherent, inspectable, and testable.

✨ Why NoKV

  • Standalone to Cluster
    Start with an embedded engine, keep the same workdir, then migrate into a distributed seed and expand into a replicated region.

  • Correctness First
    Mode gates, logical region snapshots, local recovery metadata, and a clean split between execution plane and control plane keep lifecycle semantics explicit.

  • Tested as a System
    The project is validated with migration flow tests, restart recovery, Coordinator degradation, transport chaos, context propagation, and publish-boundary failpoints.

🚦 Quick Start

Start an end-to-end playground with either the local script or Docker Compose. Both spin up a three-node Raft cluster with a Coordinator service and expose the Redis-compatible gateway.

NoKV demo

# Option A: local processes
./scripts/dev/cluster.sh --config ./raft_config.example.json
# In another shell: launch the Redis gateway on top of the running cluster
go run ./cmd/nokv-redis \
  --addr 127.0.0.1:6380 \
  --raft-config ./raft_config.example.json \
  --metrics-addr 127.0.0.1:9100

# Option B: Docker Compose (cluster + gateway + Coordinator)
docker compose up --build
# Tear down
docker compose down -v

Once the cluster is running you can point any Redis client at 127.0.0.1:6380 (or the address exposed by Compose).

For quick CLI checks:

# Online stats from a running node
go run ./cmd/nokv stats --expvar http://127.0.0.1:9100

# Offline forensics from a stopped node workdir
go run ./cmd/nokv stats --workdir ./artifacts/cluster/store-1

Minimal embedded snippet:

package main

import (
	"fmt"
	"log"

	NoKV "github.com/feichai0017/NoKV"
)

func main() {
	opt := NoKV.NewDefaultOptions()
	opt.WorkDir = "./workdir-demo"

	db, err := NoKV.Open(opt)
	if err != nil {
		log.Fatalf("open failed: %v", err)
	}
	defer db.Close()

	key := []byte("hello")
	if err := db.Set(key, []byte("world")); err != nil {
		log.Fatalf("set failed: %v", err)
	}

	entry, err := db.Get(key)
	if err != nil {
		log.Fatalf("get failed: %v", err)
	}
	fmt.Printf("value=%s\n", entry.Value)
}

Note:

  • DB.Get returns detached entries (do not call DecrRef).
  • DB.GetInternalEntry returns borrowed entries and callers must call DecrRef exactly once.
  • DB.SetWithTTL accepts time.Duration (relative TTL). DB.Set/DB.SetBatch/DB.SetWithTTL reject nil values; use DB.Del or DB.DeleteRange(start,end) for deletes.
  • DB.NewIterator exposes user-facing entries, while DB.NewInternalIterator scans raw internal keys (cf+user_key+ts).

ℹ️ scripts/dev/cluster.sh rebuilds nokv and nokv-config, seeds local peer catalogs via nokv-config catalog, starts Coordinator (nokv coordinator), streams Coordinator/store logs to the current terminal, and also writes them under artifacts/cluster/store-<id>/server.log and artifacts/cluster/coordinator.log. Use Ctrl+C to exit cleanly; if the process crashes, wipe the workdir (rm -rf ./artifacts/cluster) before restarting to avoid WAL replay errors.


🧭 Topology & Configuration

Everything hangs off a single file: raft_config.example.json.

"coordinator": { "addr": "127.0.0.1:2379", "docker_addr": "nokv-coordinator:2379" },
"stores": [
  { "store_id": 1, "listen_addr": "127.0.0.1:20170", ... },
  { "store_id": 2, "listen_addr": "127.0.0.1:20171", ... },
  { "store_id": 3, "listen_addr": "127.0.0.1:20172", ... }
],
"regions": [
  { "id": 1, "range": [-inf,"m"), peers: 101/201/301, leader: store 1 },
  { "id": 2, "range": ["m",+inf), peers: 102/202/302, leader: store 2 }
]
  • Local scripts (scripts/dev/cluster.sh, scripts/dev/serve-store.sh, scripts/dev/bootstrap.sh) ingest the same JSON, so local runs match production layouts.
  • Docker Compose mounts the file into each container; manifests, transports, and Redis gateway all stay in sync.
  • Need more stores or regions? Update the JSON and re-run the script/Compose—no code changes required.
  • Programmatic access: import github.com/feichai0017/NoKV/config and call config.LoadFile / Validate for a single source of truth across tools.

🧬 Tech Stack Snapshot

LayerTech/PackageWhy it matters
Storage Corelsm/, wal/, vlog/Hybrid log-structured design with manifest-backed durability and value separation.
Concurrencypercolator/, raftstore/clientDistributed 2PC, lock management, and MVCC version semantics in raft mode.
Replicationraftstore/* + coordinator/*Multi-Raft data plane plus Coordinator-backed control plane (routing, TSO, heartbeats).
Toolingcmd/nokv, cmd/nokv-config, cmd/nokv-redisCLI, config helper, Redis-compatible gateway share the same topology file.
Observabilitystats, hotring, expvarBuilt-in metrics, hot-key analytics, and crash recovery traces.

🧱 Architecture Overview

%%{init: {
  "themeVariables": { "fontSize": "17px" },
  "flowchart": { "nodeSpacing": 42, "rankSpacing": 58, "curve": "basis" }
}}%%
flowchart TD
    App["App / CLI / Redis Client"]

    subgraph Standalone["Standalone Shape"]
        Embedded["Embedded NoKV DB API"]
    end

    subgraph Distributed["Distributed Shape"]
        Gateway["NoKV RPC / Redis Gateway"]
        Client["raftstore/client"]
        Coordinator["Coordinator<br/>route / tso / heartbeats"]
        Server["Node Server"]
        Store["Store runtime root"]
        Peer["Peer runtime"]
        Admin["RaftAdmin<br/>execution plane"]
        Meta["raftstore/localmeta<br/>local recovery metadata"]
        RaftEngine["raftstore/engine<br/>raft durable state"]
        Snap["logical region snapshot"]
    end

    subgraph DataPlane["Shared Storage Core"]
        DB["NoKV DB"]
        WAL["WAL"]
        LSM["LSM + SST"]
        VLog["ValueLog"]
        MVCC["Percolator / MVCC"]
        Manifest["Manifest"]
    end

    subgraph Migration["Standalone → Cluster Bridge"]
        Plan["migrate plan"]
        Init["migrate init"]
        Seed["seeded workdir"]
        Expand["expand / remove-peer / transfer-leader"]
    end

    App --> Embedded
    App --> Gateway
    Gateway --> Client
    Client --> Coordinator
    Client --> Server
    Server --> Store
    Store --> Peer
    Store --> Admin
    Store --> Meta
    Peer --> RaftEngine
    Peer --> Snap
    Embedded --> DB
    Peer --> DB
    Snap --> DB
    DB --> WAL
    DB --> LSM
    DB --> VLog
    DB --> MVCC
    DB --> Manifest
    Embedded -.same data plane.- DB
    Plan --> Init
    Init --> Seed
    Seed --> Server
    Seed --> Expand

What makes this layout distinctive:

  • One storage core, two deployment shapes – embedded mode and raft mode both sit on the same DB substrate instead of splitting into separate engines.
  • Migration is a protocol, not a dump/import hackplan → init → seeded → expand turns an existing standalone workdir into a replicated cluster path with explicit lifecycle state.
  • Execution plane and control plane are split on purposeRaftAdmin executes leader-side membership changes, while Coordinator stays responsible for routing, allocation, timestamps, and cluster view.
  • Recovery metadata is not mixed with engine metadata – manifest, local recovery catalog, raft durable state, and logical region snapshots each have distinct ownership.

Key ideas:

  • Durability path – WAL first, memtable second. ValueLog writes occur before WAL append so crash replay can fully rebuild state.
  • Metadata – manifest stores SST topology, WAL checkpoints, and vlog head/deletion metadata.
  • Background workers – flush manager handles Prepare → Build → Install → Release, compaction reduces level overlap, and value log GC rewrites segments based on discard stats.
  • Distributed transactions – Percolator 2PC runs in raft mode; embedded mode exposes non-transactional DB APIs.

Dive deeper in docs/architecture.md.


📊 CI Benchmark Snapshot

Benchmarks matter here, but they are not the whole story. NoKV is trying to be fast and structurally coherent: durability, migration, control-plane separation, and recovery semantics come first.

Latest public benchmark snapshot currently checked into the repository, taken from the latest successful main CI YCSB run available at the time of update (run #23701742757). This snapshot used the then-current benchmark profile: A-F, records=1,000,000, ops=1,000,000, value_size=1000, value_threshold=2048, conc=16.

Methodology and harness details live in benchmark/README.md.

EngineWorkloadModeOps/sAvg LatencyP95P99
NoKVYCSB-A50/50 read/update175,9055.684µs204.039µs307.851µs
NoKVYCSB-B95/5 read/update525,6311.902µs24.115µs750.413µs
NoKVYCSB-C100% read409,1362.444µs15.077µs25.658µs
NoKVYCSB-D95% read, 5% insert (latest)632,0311.582µs21.811µs638.457µs
NoKVYCSB-E95% scan, 5% insert45,62021.92µs139.449µs9.203945ms
NoKVYCSB-Fread-modify-write157,7326.339µs232.743µs371.209µs
BadgerYCSB-A50/50 read/update108,2329.239µs285.74µs483.139µs
BadgerYCSB-B95/5 read/update188,8935.294µs274.549µs566.042µs
BadgerYCSB-C100% read242,4634.124µs36.549µs1.862803ms
BadgerYCSB-D95% read, 5% insert (latest)284,2053.518µs233.414µs479.801µs
BadgerYCSB-E95% scan, 5% insert15,02766.547µs4.064653ms7.534558ms
BadgerYCSB-Fread-modify-write84,60111.82µs407.624µs645.491µs
PebbleYCSB-A50/50 read/update169,7925.889µs491.322µs1.65907ms
PebbleYCSB-B95/5 read/update137,4837.273µs658.763µs1.415039ms
PebbleYCSB-C100% read90,47411.052µs878.733µs1.817526ms
PebbleYCSB-D95% read, 5% insert (latest)198,1395.046µs491.515µs1.282231ms
PebbleYCSB-E95% scan, 5% insert40,79324.513µs1.332974ms2.301008ms
PebbleYCSB-Fread-modify-write122,1928.183µs760.934µs1.71655ms

🧩 Module Breakdown

ModuleResponsibilitiesSourceDocs
WALAppend-only segments with CRC, rotation, replay (wal.Manager).wal/WAL internals
LSMMemTable, flush pipeline, leveled compactions, iterator merging.lsm/Memtable
Flush pipeline
Cache
Range filter
ManifestVersionEdit log + CURRENT handling, WAL/vlog checkpoints, value-log metadata.manifest/Manifest semantics
ValueLogLarge value storage, GC, discard stats integration.vlog.go, vlog/Value log design
PercolatorDistributed MVCC 2PC primitives (prewrite/commit/rollback/resolve/status).percolator/Percolator transactions
RaftStoreMulti-Raft Region management, hooks, metrics, transport.raftstore/RaftStore overview
HotRingHot key tracking, throttling helpers.hotring/HotRing overview
ObservabilityPeriodic stats, hot key tracking, CLI integration.stats.go, cmd/nokvStats & observability
CLI reference
FilesystemPebble-inspired vfs abstraction + mmap-backed file helpers shared by SST/vlog, WAL, and manifest.vfs/, file/VFS
File abstractions

Each module has a dedicated document under docs/ describing APIs, diagrams, and recovery notes.


📡 Observability & CLI

  • Stats.StartStats publishes metrics via expvar (flush backlog, WAL segments, value log GC stats, raft/region/cache/hot metrics).
  • cmd/nokv gives you:
    • nokv stats --workdir <dir> [--json] [--no-region-metrics]
    • nokv manifest --workdir <dir>
    • nokv regions --workdir <dir> [--json]
    • nokv vlog --workdir <dir>
  • hotring continuously surfaces hot keys in stats + CLI so you can pre-warm caches or debug skewed workloads.

More in docs/cli.md and docs/testing.md.


🔌 Redis Gateway

  • cmd/nokv-redis exposes a RESP-compatible endpoint. In embedded mode (--workdir) commands execute through regular DB APIs; in distributed mode (--raft-config) calls are routed through raftstore/client and committed with TwoPhaseCommit.
  • In raft mode, TTL is persisted directly in each value entry (expires_at) through the same 2PC write path as the value payload.
  • --metrics-addr exposes Redis gateway metrics under NoKV.Stats.redis via expvar. In raft mode, --coordinator-addr can override config.coordinator when you need a non-default Coordinator endpoint.
  • A ready-to-use cluster configuration is available at raft_config.example.json, matching both scripts/dev/cluster.sh and the Docker Compose setup.

For the complete command matrix, configuration and deployment guides, see docs/nokv-redis.md.


📄 License

Apache-2.0. See LICENSE.