📚 NoKV - Awesome Go Library for Database

High-performance distributed KV storage based on LSM Tree
Detailed Description of NoKV
🚀 NoKV — Not Only KV Store
NoKV stands for Not Only KV Store. It is a Go-native storage system that starts as a serious standalone engine and grows into a multi-Raft distributed KV cluster without changing its underlying data plane.
The interesting part is not just that it has WAL, LSM, MVCC, Redis compatibility, or Raft. The interesting part is that these pieces are built as one system: a single storage substrate that can be embedded locally, migrated into a seeded distributed node, and then expanded into a replicated cluster with an explicit protocol.
NoKV is not trying to be "yet another KV". It is trying to make the path from standalone storage to distributed replication coherent, inspectable, and testable.
✨ Why NoKV
-
Standalone to Cluster
Start with an embedded engine, keep the same workdir, then migrate into a distributed seed and expand into a replicated region. -
Correctness First
Mode gates, logical region snapshots, local recovery metadata, and a clean split between execution plane and control plane keep lifecycle semantics explicit. -
Tested as a System
The project is validated with migration flow tests, restart recovery, Coordinator degradation, transport chaos, context propagation, and publish-boundary failpoints.
🚦 Quick Start
Start an end-to-end playground with either the local script or Docker Compose. Both spin up a three-node Raft cluster with a Coordinator service and expose the Redis-compatible gateway.

# Option A: local processes
./scripts/dev/cluster.sh --config ./raft_config.example.json
# In another shell: launch the Redis gateway on top of the running cluster
go run ./cmd/nokv-redis \
--addr 127.0.0.1:6380 \
--raft-config ./raft_config.example.json \
--metrics-addr 127.0.0.1:9100
# Option B: Docker Compose (cluster + gateway + Coordinator)
docker compose up --build
# Tear down
docker compose down -v
Once the cluster is running you can point any Redis client at 127.0.0.1:6380 (or the address exposed by Compose).
For quick CLI checks:
# Online stats from a running node
go run ./cmd/nokv stats --expvar http://127.0.0.1:9100
# Offline forensics from a stopped node workdir
go run ./cmd/nokv stats --workdir ./artifacts/cluster/store-1
Minimal embedded snippet:
package main
import (
"fmt"
"log"
NoKV "github.com/feichai0017/NoKV"
)
func main() {
opt := NoKV.NewDefaultOptions()
opt.WorkDir = "./workdir-demo"
db, err := NoKV.Open(opt)
if err != nil {
log.Fatalf("open failed: %v", err)
}
defer db.Close()
key := []byte("hello")
if err := db.Set(key, []byte("world")); err != nil {
log.Fatalf("set failed: %v", err)
}
entry, err := db.Get(key)
if err != nil {
log.Fatalf("get failed: %v", err)
}
fmt.Printf("value=%s\n", entry.Value)
}
Note:
DB.Getreturns detached entries (do not callDecrRef).DB.GetInternalEntryreturns borrowed entries and callers must callDecrRefexactly once.DB.SetWithTTLacceptstime.Duration(relative TTL).DB.Set/DB.SetBatch/DB.SetWithTTLrejectnilvalues; useDB.DelorDB.DeleteRange(start,end)for deletes.DB.NewIteratorexposes user-facing entries, whileDB.NewInternalIteratorscans raw internal keys (cf+user_key+ts).
ℹ️
scripts/dev/cluster.shrebuildsnokvandnokv-config, seeds local peer catalogs vianokv-config catalog, starts Coordinator (nokv coordinator), streams Coordinator/store logs to the current terminal, and also writes them underartifacts/cluster/store-<id>/server.logandartifacts/cluster/coordinator.log. UseCtrl+Cto exit cleanly; if the process crashes, wipe the workdir (rm -rf ./artifacts/cluster) before restarting to avoid WAL replay errors.
🧭 Topology & Configuration
Everything hangs off a single file: raft_config.example.json.
"coordinator": { "addr": "127.0.0.1:2379", "docker_addr": "nokv-coordinator:2379" },
"stores": [
{ "store_id": 1, "listen_addr": "127.0.0.1:20170", ... },
{ "store_id": 2, "listen_addr": "127.0.0.1:20171", ... },
{ "store_id": 3, "listen_addr": "127.0.0.1:20172", ... }
],
"regions": [
{ "id": 1, "range": [-inf,"m"), peers: 101/201/301, leader: store 1 },
{ "id": 2, "range": ["m",+inf), peers: 102/202/302, leader: store 2 }
]
- Local scripts (
scripts/dev/cluster.sh,scripts/dev/serve-store.sh,scripts/dev/bootstrap.sh) ingest the same JSON, so local runs match production layouts. - Docker Compose mounts the file into each container; manifests, transports, and Redis gateway all stay in sync.
- Need more stores or regions? Update the JSON and re-run the script/Compose—no code changes required.
- Programmatic access: import
github.com/feichai0017/NoKV/configand callconfig.LoadFile/Validatefor a single source of truth across tools.
🧬 Tech Stack Snapshot
| Layer | Tech/Package | Why it matters |
|---|---|---|
| Storage Core | lsm/, wal/, vlog/ | Hybrid log-structured design with manifest-backed durability and value separation. |
| Concurrency | percolator/, raftstore/client | Distributed 2PC, lock management, and MVCC version semantics in raft mode. |
| Replication | raftstore/* + coordinator/* | Multi-Raft data plane plus Coordinator-backed control plane (routing, TSO, heartbeats). |
| Tooling | cmd/nokv, cmd/nokv-config, cmd/nokv-redis | CLI, config helper, Redis-compatible gateway share the same topology file. |
| Observability | stats, hotring, expvar | Built-in metrics, hot-key analytics, and crash recovery traces. |
🧱 Architecture Overview
%%{init: {
"themeVariables": { "fontSize": "17px" },
"flowchart": { "nodeSpacing": 42, "rankSpacing": 58, "curve": "basis" }
}}%%
flowchart TD
App["App / CLI / Redis Client"]
subgraph Standalone["Standalone Shape"]
Embedded["Embedded NoKV DB API"]
end
subgraph Distributed["Distributed Shape"]
Gateway["NoKV RPC / Redis Gateway"]
Client["raftstore/client"]
Coordinator["Coordinator<br/>route / tso / heartbeats"]
Server["Node Server"]
Store["Store runtime root"]
Peer["Peer runtime"]
Admin["RaftAdmin<br/>execution plane"]
Meta["raftstore/localmeta<br/>local recovery metadata"]
RaftEngine["raftstore/engine<br/>raft durable state"]
Snap["logical region snapshot"]
end
subgraph DataPlane["Shared Storage Core"]
DB["NoKV DB"]
WAL["WAL"]
LSM["LSM + SST"]
VLog["ValueLog"]
MVCC["Percolator / MVCC"]
Manifest["Manifest"]
end
subgraph Migration["Standalone → Cluster Bridge"]
Plan["migrate plan"]
Init["migrate init"]
Seed["seeded workdir"]
Expand["expand / remove-peer / transfer-leader"]
end
App --> Embedded
App --> Gateway
Gateway --> Client
Client --> Coordinator
Client --> Server
Server --> Store
Store --> Peer
Store --> Admin
Store --> Meta
Peer --> RaftEngine
Peer --> Snap
Embedded --> DB
Peer --> DB
Snap --> DB
DB --> WAL
DB --> LSM
DB --> VLog
DB --> MVCC
DB --> Manifest
Embedded -.same data plane.- DB
Plan --> Init
Init --> Seed
Seed --> Server
Seed --> Expand
What makes this layout distinctive:
- One storage core, two deployment shapes – embedded mode and raft mode both sit on the same
DBsubstrate instead of splitting into separate engines. - Migration is a protocol, not a dump/import hack –
plan → init → seeded → expandturns an existing standalone workdir into a replicated cluster path with explicit lifecycle state. - Execution plane and control plane are split on purpose –
RaftAdminexecutes leader-side membership changes, whileCoordinatorstays responsible for routing, allocation, timestamps, and cluster view. - Recovery metadata is not mixed with engine metadata – manifest, local recovery catalog, raft durable state, and logical region snapshots each have distinct ownership.
Key ideas:
- Durability path – WAL first, memtable second. ValueLog writes occur before WAL append so crash replay can fully rebuild state.
- Metadata – manifest stores SST topology, WAL checkpoints, and vlog head/deletion metadata.
- Background workers – flush manager handles
Prepare → Build → Install → Release, compaction reduces level overlap, and value log GC rewrites segments based on discard stats. - Distributed transactions – Percolator 2PC runs in raft mode; embedded mode exposes non-transactional DB APIs.
Dive deeper in docs/architecture.md.
📊 CI Benchmark Snapshot
Benchmarks matter here, but they are not the whole story. NoKV is trying to be fast and structurally coherent: durability, migration, control-plane separation, and recovery semantics come first.
Latest public benchmark snapshot currently checked into the repository, taken
from the latest successful main CI YCSB run available at the time of update
(run #23701742757).
This snapshot used the then-current benchmark profile:
A-F, records=1,000,000, ops=1,000,000, value_size=1000,
value_threshold=2048, conc=16.
Methodology and harness details live in benchmark/README.md.
| Engine | Workload | Mode | Ops/s | Avg Latency | P95 | P99 |
|---|---|---|---|---|---|---|
| NoKV | YCSB-A | 50/50 read/update | 175,905 | 5.684µs | 204.039µs | 307.851µs |
| NoKV | YCSB-B | 95/5 read/update | 525,631 | 1.902µs | 24.115µs | 750.413µs |
| NoKV | YCSB-C | 100% read | 409,136 | 2.444µs | 15.077µs | 25.658µs |
| NoKV | YCSB-D | 95% read, 5% insert (latest) | 632,031 | 1.582µs | 21.811µs | 638.457µs |
| NoKV | YCSB-E | 95% scan, 5% insert | 45,620 | 21.92µs | 139.449µs | 9.203945ms |
| NoKV | YCSB-F | read-modify-write | 157,732 | 6.339µs | 232.743µs | 371.209µs |
| Badger | YCSB-A | 50/50 read/update | 108,232 | 9.239µs | 285.74µs | 483.139µs |
| Badger | YCSB-B | 95/5 read/update | 188,893 | 5.294µs | 274.549µs | 566.042µs |
| Badger | YCSB-C | 100% read | 242,463 | 4.124µs | 36.549µs | 1.862803ms |
| Badger | YCSB-D | 95% read, 5% insert (latest) | 284,205 | 3.518µs | 233.414µs | 479.801µs |
| Badger | YCSB-E | 95% scan, 5% insert | 15,027 | 66.547µs | 4.064653ms | 7.534558ms |
| Badger | YCSB-F | read-modify-write | 84,601 | 11.82µs | 407.624µs | 645.491µs |
| Pebble | YCSB-A | 50/50 read/update | 169,792 | 5.889µs | 491.322µs | 1.65907ms |
| Pebble | YCSB-B | 95/5 read/update | 137,483 | 7.273µs | 658.763µs | 1.415039ms |
| Pebble | YCSB-C | 100% read | 90,474 | 11.052µs | 878.733µs | 1.817526ms |
| Pebble | YCSB-D | 95% read, 5% insert (latest) | 198,139 | 5.046µs | 491.515µs | 1.282231ms |
| Pebble | YCSB-E | 95% scan, 5% insert | 40,793 | 24.513µs | 1.332974ms | 2.301008ms |
| Pebble | YCSB-F | read-modify-write | 122,192 | 8.183µs | 760.934µs | 1.71655ms |
🧩 Module Breakdown
| Module | Responsibilities | Source | Docs |
|---|---|---|---|
| WAL | Append-only segments with CRC, rotation, replay (wal.Manager). | wal/ | WAL internals |
| LSM | MemTable, flush pipeline, leveled compactions, iterator merging. | lsm/ | Memtable Flush pipeline Cache Range filter |
| Manifest | VersionEdit log + CURRENT handling, WAL/vlog checkpoints, value-log metadata. | manifest/ | Manifest semantics |
| ValueLog | Large value storage, GC, discard stats integration. | vlog.go, vlog/ | Value log design |
| Percolator | Distributed MVCC 2PC primitives (prewrite/commit/rollback/resolve/status). | percolator/ | Percolator transactions |
| RaftStore | Multi-Raft Region management, hooks, metrics, transport. | raftstore/ | RaftStore overview |
| HotRing | Hot key tracking, throttling helpers. | hotring/ | HotRing overview |
| Observability | Periodic stats, hot key tracking, CLI integration. | stats.go, cmd/nokv | Stats & observability CLI reference |
| Filesystem | Pebble-inspired vfs abstraction + mmap-backed file helpers shared by SST/vlog, WAL, and manifest. | vfs/, file/ | VFS File abstractions |
Each module has a dedicated document under docs/ describing APIs, diagrams, and recovery notes.
📡 Observability & CLI
Stats.StartStatspublishes metrics viaexpvar(flush backlog, WAL segments, value log GC stats, raft/region/cache/hot metrics).cmd/nokvgives you:nokv stats --workdir <dir> [--json] [--no-region-metrics]nokv manifest --workdir <dir>nokv regions --workdir <dir> [--json]nokv vlog --workdir <dir>
hotringcontinuously surfaces hot keys in stats + CLI so you can pre-warm caches or debug skewed workloads.
More in docs/cli.md and docs/testing.md.
🔌 Redis Gateway
cmd/nokv-redisexposes a RESP-compatible endpoint. In embedded mode (--workdir) commands execute through regular DB APIs; in distributed mode (--raft-config) calls are routed throughraftstore/clientand committed with TwoPhaseCommit.- In raft mode, TTL is persisted directly in each value entry (
expires_at) through the same 2PC write path as the value payload. --metrics-addrexposes Redis gateway metrics underNoKV.Stats.redisvia expvar. In raft mode,--coordinator-addrcan overrideconfig.coordinatorwhen you need a non-default Coordinator endpoint.- A ready-to-use cluster configuration is available at
raft_config.example.json, matching bothscripts/dev/cluster.shand the Docker Compose setup.
For the complete command matrix, configuration and deployment guides, see docs/nokv-redis.md.
📄 License
Apache-2.0. See LICENSE.