šŸ“š localaik - Awesome Go Library for Artificial Intelligence

Go Gopher mascot for localaik

Local compatibility proxy for the Gemini and OpenAI APIs. Run one container locally and test both SDK protocol shapes on the same port without API keys or network access

šŸ·ļø Artificial Intelligence
šŸ“‚ Artificial Intelligence
⭐ 0 stars
View on GitHub šŸ”—

Detailed Description of localaik

localaik

CI Docker Hub License: MIT Go Report Card Go Version Go Reference

A local compatibility server for the Gemini and OpenAI APIs. Run one container, point your SDK at http://localhost:8090, and get both protocol shapes on the same port for tests and development.

Motivation

Testing code that calls Gemini or OpenAI is painful: real API calls are slow, cost money, and need network access. localaik gives you a single Docker container that speaks both protocols backed by a local model — no API key, no internet, deterministic enough for CI.

Architecture

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  localaik container                                    │
│                                                        │
│  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  │
│  │  localaik proxy (:8090)  │    │ llama.cpp (:8080)│  │
│  │                          │    │                  │  │
│  │  /v1beta/* (Gemini)  ────┼──▶ │  Gemma 3 model   │  │
│  │  /v1/*     (OpenAI)  ────┼──▶ │                  │  │
│  │                          │    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  │
│  │                          │                          │
│  │                          │    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  │
│  │  PDF uploads ────────────┼──▶ │    pdftoppm      │  │
│  │                          │    │  PDF ─▶ images   │  │
│  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

SDK requests hit the localaik proxy, which translates Gemini or OpenAI wire format and forwards to the local llama.cpp server running a Gemma 3 model.

Quick start

docker run -d -p 8090:8090 gokhalh/localaik

Or with Docker Compose:

services:
  localaik:
    image: gokhalh/localaik
    ports:
      - "8090:8090"

localaik is a plain HTTP server, so any language or SDK that can set a base URL will work.

More runnable samples (curl, Go, Python, JavaScript, Java) live under examples/.

Gemini SDK

Go:

client, err := genai.NewClient(ctx, &genai.ClientConfig{
    APIKey:      "test",
    HTTPOptions: genai.HTTPOptions{BaseURL: "http://localhost:8090"},
})

Python:

from google import genai

client = genai.Client(
    api_key="test",
    http_options=genai.types.HttpOptions(api_version="v1beta", base_url="http://localhost:8090"),
)

Or set the environment variable for any language:

export GOOGLE_GEMINI_BASE_URL=http://localhost:8090

OpenAI SDK

Python:

from openai import OpenAI

client = OpenAI(api_key="test", base_url="http://localhost:8090/v1")

Go:

client := openai.NewClient(
    option.WithAPIKey("test"),
    option.WithBaseURL("http://localhost:8090/v1"),
)

Docker tags

TagModelImage size
latest, gemma3-4bGemma 3 4B Q4_K_M~3 GB
gemma3-12bGemma 3 12B Q4_K_M~7 GB

Version-pinned tags follow the pattern v0.1.1-gemma3-4b, v0.1.1-gemma3-12b.

Tuning (v0.1.3 onwards)

Pass environment variables to tune the underlying model server:

docker run -d -p 8090:8090 \
  -e LK_THREADS=8 \
  -e LK_CTX_SIZE=4096 \
  -e LK_FLASH_ATTN=1 \
  -e LK_CONT_BATCHING=1 \
  -e LK_PARALLEL=2 \
  gokhalh/localaik

Or with Docker Compose:

services:
  localaik:
    image: gokhalh/localaik
    ports:
      - "8090:8090"
    environment:
      LK_THREADS: 8
      LK_CTX_SIZE: 4096
      LK_FLASH_ATTN: 1
      LK_CONT_BATCHING: 1
      LK_PARALLEL: 2
VariableDefaultDescription
LK_CTX_SIZE8192Context window in tokens
LK_THREADSautoCPU threads for inference
LK_THREADS_BATCHsame as threadsCPU threads for prompt processing
LK_BATCH_SIZE2048Prompt processing batch size
LK_UBATCH_SIZE512Micro-batch size
LK_GPU_LAYERS0Layers offloaded to GPU (99 = all)
LK_PARALLEL1Max concurrent request slots
LK_FLASH_ATTN0 (off)Flash attention (1 to enable)
LK_CONT_BATCHING0 (off)Continuous batching (1 to enable)
LK_MLOCK0 (off)Lock model in RAM (1 to enable)

Implemented routes

RouteUsed byNotes
POST /v1beta/models/{model}:generateContentGemini GenerateContentTranslated to upstream chat completions
POST /v1beta/models/{model}:streamGenerateContentGemini GenerateContentStreamGemini-style SSE (typically ?alt=sse)
POST /v1/chat/completionsOpenAI chat completionsForwarded to upstream
GET /healthHealth checksCustom route

All other API routes return 404.

Tested SDKs

Automated contract tests validate against:

  • google.golang.org/genai v1.51.0
  • github.com/openai/openai-go/v3 v3.30.0

Other SDK versions and languages may work if they emit the same HTTP shapes.

Use in CI

Run localaik as a GitHub Actions service container so your tests hit a real local model instead of mocks:

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      localaik:
        image: gokhalh/localaik
        ports:
          - 8090:8090
        options: >-
          --health-cmd "curl -f http://localhost:8090/health"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 30
    steps:
      - uses: actions/checkout@v4
      - run: go test ./...
        env:
          GOOGLE_GEMINI_BASE_URL: http://localhost:8090
          OPENAI_BASE_URL: http://localhost:8090/v1

Gemini compatibility

Supported features:

  • Text, image (inlineData), and PDF input (auto-converted to page images)
  • fileData for image URLs and local/data:-URI PDF/text files
  • systemInstruction
  • generationConfig: temperature, topP, topK, candidateCount, maxOutputTokens, stopSequences, responseLogprobs, logprobs, presencePenalty, frequencyPenalty, seed
  • Structured output via responseMimeType, responseSchema, responseJsonSchema
  • Function declarations via tools, function calling config via toolConfig
  • functionCall and functionResponse parts
  • Streaming SSE responses
  • Usage metadata and finish reasons

Partial support:

  • top_k, n, logprobs, and tool choice behavior depends on the upstream runtime
  • executableCode, codeExecutionResult, toolCall, toolResponse parts preserved as text context

Not supported:

  • SDK methods outside GenerateContent / GenerateContentStream
  • Non-function tools (Google Search, Maps, URL context, code execution)
  • Embeddings, token counting, cached content, live/bidi sessions, uploads

OpenAI compatibility

Supported: text chat completions, structured output, vision inputs, tool-related fields (all passed through to upstream).

Not supported: Responses API, Assistants, Embeddings, Images, Audio, Files, Vector stores.

Development

Tip: Run make docker-up to build and start the localaik container, which includes a local llama.cpp server with a bundled model. This is the easiest way to get a working upstream for development.

# Run the proxy locally (requires a running llama.cpp server)
go run ./cmd/localaik --port 8090 --upstream http://127.0.0.1:8080/v1

# Common commands
make help              # Show all targets
make lint              # Format check + go vet
make test-unit         # Unit tests
make test-integration  # Integration tests (requires docker-up)
make test              # All of the above
make docker-up         # Build and start container
make docker-down       # Stop container

Building the image

# Default (Gemma 3 4B)
docker build -t gokhalh/localaik .

# Custom model
docker build \
  --build-arg MODEL_URL=... \
  --build-arg MODEL_SHA256=... \
  --build-arg MMPROJ_URL=... \
  --build-arg MMPROJ_SHA256=... \
  -t gokhalh/localaik:custom .

Limitations

  • Intended for tests and development, not production
  • Image size is dominated by model weights
  • Cold starts can take tens of seconds while the model loads
  • PDF rendering adds latency per page