š localaik - Awesome Go Library for Artificial Intelligence

Local compatibility proxy for the Gemini and OpenAI APIs. Run one container locally and test both SDK protocol shapes on the same port without API keys or network access
Detailed Description of localaik
localaik
A local compatibility server for the Gemini and OpenAI APIs. Run one container, point your SDK at http://localhost:8090, and get both protocol shapes on the same port for tests and development.
Motivation
Testing code that calls Gemini or OpenAI is painful: real API calls are slow, cost money, and need network access. localaik gives you a single Docker container that speaks both protocols backed by a local model ā no API key, no internet, deterministic enough for CI.
Architecture
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā localaik container ā
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāā ā
ā ā localaik proxy (:8090) ā ā llama.cpp (:8080)ā ā
ā ā ā ā ā ā
ā ā /v1beta/* (Gemini) āāāāā¼āāā¶ ā Gemma 3 model ā ā
ā ā /v1/* (OpenAI) āāāāā¼āāā¶ ā ā ā
ā ā ā āāāāāāāāāāāāāāāāāāāā ā
ā ā ā ā
ā ā ā āāāāāāāāāāāāāāāāāāāā ā
ā ā PDF uploads āāāāāāāāāāāāā¼āāā¶ ā pdftoppm ā ā
ā ā ā ā PDF āā¶ images ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
SDK requests hit the localaik proxy, which translates Gemini or OpenAI wire format and forwards to the local llama.cpp server running a Gemma 3 model.
Quick start
docker run -d -p 8090:8090 gokhalh/localaik
Or with Docker Compose:
services:
localaik:
image: gokhalh/localaik
ports:
- "8090:8090"
localaik is a plain HTTP server, so any language or SDK that can set a base URL will work.
More runnable samples (curl, Go, Python, JavaScript, Java) live under examples/.
Gemini SDK
Go:
client, err := genai.NewClient(ctx, &genai.ClientConfig{
APIKey: "test",
HTTPOptions: genai.HTTPOptions{BaseURL: "http://localhost:8090"},
})
Python:
from google import genai
client = genai.Client(
api_key="test",
http_options=genai.types.HttpOptions(api_version="v1beta", base_url="http://localhost:8090"),
)
Or set the environment variable for any language:
export GOOGLE_GEMINI_BASE_URL=http://localhost:8090
OpenAI SDK
Python:
from openai import OpenAI
client = OpenAI(api_key="test", base_url="http://localhost:8090/v1")
Go:
client := openai.NewClient(
option.WithAPIKey("test"),
option.WithBaseURL("http://localhost:8090/v1"),
)
Docker tags
| Tag | Model | Image size |
|---|---|---|
latest, gemma3-4b | Gemma 3 4B Q4_K_M | ~3 GB |
gemma3-12b | Gemma 3 12B Q4_K_M | ~7 GB |
Version-pinned tags follow the pattern v0.1.1-gemma3-4b, v0.1.1-gemma3-12b.
Tuning (v0.1.3 onwards)
Pass environment variables to tune the underlying model server:
docker run -d -p 8090:8090 \
-e LK_THREADS=8 \
-e LK_CTX_SIZE=4096 \
-e LK_FLASH_ATTN=1 \
-e LK_CONT_BATCHING=1 \
-e LK_PARALLEL=2 \
gokhalh/localaik
Or with Docker Compose:
services:
localaik:
image: gokhalh/localaik
ports:
- "8090:8090"
environment:
LK_THREADS: 8
LK_CTX_SIZE: 4096
LK_FLASH_ATTN: 1
LK_CONT_BATCHING: 1
LK_PARALLEL: 2
| Variable | Default | Description |
|---|---|---|
LK_CTX_SIZE | 8192 | Context window in tokens |
LK_THREADS | auto | CPU threads for inference |
LK_THREADS_BATCH | same as threads | CPU threads for prompt processing |
LK_BATCH_SIZE | 2048 | Prompt processing batch size |
LK_UBATCH_SIZE | 512 | Micro-batch size |
LK_GPU_LAYERS | 0 | Layers offloaded to GPU (99 = all) |
LK_PARALLEL | 1 | Max concurrent request slots |
LK_FLASH_ATTN | 0 (off) | Flash attention (1 to enable) |
LK_CONT_BATCHING | 0 (off) | Continuous batching (1 to enable) |
LK_MLOCK | 0 (off) | Lock model in RAM (1 to enable) |
Implemented routes
| Route | Used by | Notes |
|---|---|---|
POST /v1beta/models/{model}:generateContent | Gemini GenerateContent | Translated to upstream chat completions |
POST /v1beta/models/{model}:streamGenerateContent | Gemini GenerateContentStream | Gemini-style SSE (typically ?alt=sse) |
POST /v1/chat/completions | OpenAI chat completions | Forwarded to upstream |
GET /health | Health checks | Custom route |
All other API routes return 404.
Tested SDKs
Automated contract tests validate against:
google.golang.org/genaiv1.51.0github.com/openai/openai-go/v3v3.30.0
Other SDK versions and languages may work if they emit the same HTTP shapes.
Use in CI
Run localaik as a GitHub Actions service container so your tests hit a real local model instead of mocks:
jobs:
test:
runs-on: ubuntu-latest
services:
localaik:
image: gokhalh/localaik
ports:
- 8090:8090
options: >-
--health-cmd "curl -f http://localhost:8090/health"
--health-interval 10s
--health-timeout 5s
--health-retries 30
steps:
- uses: actions/checkout@v4
- run: go test ./...
env:
GOOGLE_GEMINI_BASE_URL: http://localhost:8090
OPENAI_BASE_URL: http://localhost:8090/v1
Gemini compatibility
Supported features:
- Text, image (
inlineData), and PDF input (auto-converted to page images) fileDatafor image URLs and local/data:-URI PDF/text filessystemInstructiongenerationConfig: temperature, topP, topK, candidateCount, maxOutputTokens, stopSequences, responseLogprobs, logprobs, presencePenalty, frequencyPenalty, seed- Structured output via
responseMimeType,responseSchema,responseJsonSchema - Function declarations via
tools, function calling config viatoolConfig functionCallandfunctionResponseparts- Streaming SSE responses
- Usage metadata and finish reasons
Partial support:
top_k,n, logprobs, and tool choice behavior depends on the upstream runtimeexecutableCode,codeExecutionResult,toolCall,toolResponseparts preserved as text context
Not supported:
- SDK methods outside
GenerateContent/GenerateContentStream - Non-function tools (Google Search, Maps, URL context, code execution)
- Embeddings, token counting, cached content, live/bidi sessions, uploads
OpenAI compatibility
Supported: text chat completions, structured output, vision inputs, tool-related fields (all passed through to upstream).
Not supported: Responses API, Assistants, Embeddings, Images, Audio, Files, Vector stores.
Development
Tip: Run
make docker-upto build and start the localaik container, which includes a local llama.cpp server with a bundled model. This is the easiest way to get a working upstream for development.
# Run the proxy locally (requires a running llama.cpp server)
go run ./cmd/localaik --port 8090 --upstream http://127.0.0.1:8080/v1
# Common commands
make help # Show all targets
make lint # Format check + go vet
make test-unit # Unit tests
make test-integration # Integration tests (requires docker-up)
make test # All of the above
make docker-up # Build and start container
make docker-down # Stop container
Building the image
# Default (Gemma 3 4B)
docker build -t gokhalh/localaik .
# Custom model
docker build \
--build-arg MODEL_URL=... \
--build-arg MODEL_SHA256=... \
--build-arg MMPROJ_URL=... \
--build-arg MMPROJ_SHA256=... \
-t gokhalh/localaik:custom .
Limitations
- Intended for tests and development, not production
- Image size is dominated by model weights
- Cold starts can take tens of seconds while the model loads
- PDF rendering adds latency per page