πŸ“š goffi - Awesome Go Library for Miscellaneous

Go Gopher mascot for goffi

Pure Go FFI with libffi-style typed call interface and structured error handling for calling C libraries without CGO

🏷️ Miscellaneous
πŸ“‚ Uncategorized
⭐ 0 stars
View on GitHub πŸ”—

Detailed Description of goffi

goffi β€” Zero-CGO FFI for Go

CI codecov Go Report Card GitHub release Go version License Go Reference Dev.to

Pure Go Foreign Function Interface for calling C libraries without CGO. Designed for WebGPU and GPU computing β€” zero C dependencies, zero per-call allocations, 88–114 ns overhead.

Deep dive: How We Call C Libraries Without a C Compiler β€” architecture, assembly, callbacks, and ecosystem.

// Load library, prepare once, call many times β€” no CGO required
handle, _ := ffi.LoadLibrary("wgpu_native.dll")
sym, _ := ffi.GetSymbol(handle, "wgpuCreateInstance")

cif := &types.CallInterface{}
ffi.PrepareCallInterface(cif, types.DefaultCall, returnType, argTypes)
ffi.CallFunction(cif, sym, unsafe.Pointer(&result), args)

Features

FeatureDetails
Zero CGOPure GoNo C compiler needed. go get and build.
Fast88–114 ns/opPre-computed CIF, zero per-call allocations
Cross-platform7 targetsWindows, Linux, macOS, FreeBSD Γ— AMD64 + ARM64
CallbacksC→Go safecrosscall2 integration, works from any C thread
Type-safeRuntime validation5 typed error types with errors.As() support
Struct passingFull ABI≀8B (RAX), 9–16B (RAX+RDX), >16B (sret)
ContextTimeoutsCallFunctionContext(ctx, ...) cancellation
Tested89% coverageCI on Linux, Windows, macOS

Quick Start

Installation

go get github.com/go-webgpu/goffi

Requirements

goffi requires CGO_ENABLED=0. This is automatic when no C compiler is installed or when cross-compiling. If you have gcc/clang:

CGO_ENABLED=0 go build ./...

Why? goffi uses Go's cgo_import_dynamic for dynamic library loading, which only activates when CGO is disabled.

Example: Calling strlen

package main

import (
	"fmt"
	"runtime"
	"unsafe"

	"github.com/go-webgpu/goffi/ffi"
	"github.com/go-webgpu/goffi/types"
)

func main() {
	// Load platform-specific C library
	libName := "libc.so.6"
	if runtime.GOOS == "windows" {
		libName = "msvcrt.dll"
	}

	handle, err := ffi.LoadLibrary(libName)
	if err != nil {
		panic(err)
	}
	defer ffi.FreeLibrary(handle)

	strlen, err := ffi.GetSymbol(handle, "strlen")
	if err != nil {
		panic(err)
	}

	// Prepare call interface once β€” reuse for all subsequent calls
	cif := &types.CallInterface{}
	err = ffi.PrepareCallInterface(
		cif,
		types.DefaultCall,                                     // auto-detects platform ABI
		types.UInt64TypeDescriptor,                            // return: size_t
		[]*types.TypeDescriptor{types.PointerTypeDescriptor},  // arg: const char*
	)
	if err != nil {
		panic(err)
	}

	// Call strlen β€” avalue elements are pointers TO argument values
	testStr := "Hello, goffi!\x00"
	strPtr := uintptr(unsafe.Pointer(unsafe.StringData(testStr)))
	var length uint64

	err = ffi.CallFunction(cif, strlen, unsafe.Pointer(&length), []unsafe.Pointer{unsafe.Pointer(&strPtr)})
	if err != nil {
		panic(err)
	}

	fmt.Printf("strlen(%q) = %d\n", testStr[:len(testStr)-1], length)
	// Output: strlen("Hello, goffi!") = 13
}

Performance

FFI overhead: 88–114 ns/op (Windows AMD64, Intel i7-1255U)

BenchmarkTimeAllocations
Empty function (getpid)88 ns2 allocs
Integer argument (abs)114 ns3 allocs
String processing (strlen)98 ns3 allocs

At 60 FPS with ~50 FFI calls per frame, overhead is 5 Β΅s per frame β€” 0.03% of the 16.6 ms budget. Unmeasurable in profiling.

See docs/PERFORMANCE.md for detailed analysis, optimization strategies, and when NOT to use goffi.


Architecture

goffi transitions from Go's managed runtime to C code through three layers:

Go Code
  β”‚  ffi.CallFunction()
  β–Ό
runtime.cgocall               ← Go runtime: system stack switch, GC coordination
  β”‚
  β–Ό
Assembly Wrapper              ← Hand-written: load GP/SSE registers per ABI
  β”‚  CALL target_function
  β–Ό
C Function                    ← External library

Three ABIs, hand-written assembly for each:

ABIGP RegistersFP RegistersNotes
System V AMD64RDI, RSI, RDX, RCX, R8, R9XMM0–XMM7Linux, macOS, FreeBSD
Win64RCX, RDX, R8, R9XMM0–XMM332-byte shadow space mandatory
AAPCS64X0–X7D0–D7HFA support for ARM64

See docs/ARCHITECTURE.md for the full technical deep dive.


Callbacks (C β†’ Go)

WebGPU fires async callbacks from internal Metal/Vulkan threads. These threads have no goroutine β€” calling Go directly would crash.

goffi uses crosscall2 for safe C→Go transitions from any thread:

cb := ffi.NewCallback(func(status uint32, adapter uintptr, msg uintptr, ud uintptr) {
    // Safe even when called from a C thread
    result.handle = adapter
    close(done)
})

ffi.CallFunction(cif, wgpuRequestAdapter, nil, args)
<-done // Wait for GPU driver callback

2000 pre-compiled trampoline entries per process. AMD64: 5 bytes/entry. ARM64: 8 bytes/entry.


Error Handling

Five typed error types for precise diagnostics:

handle, err := ffi.LoadLibrary("nonexistent.dll")
if err != nil {
	var libErr *ffi.LibraryError
	if errors.As(err, &libErr) {
		fmt.Printf("Failed to %s %q: %v\n", libErr.Operation, libErr.Name, libErr.Err)
	}
}
Error TypeWhen
InvalidCallInterfaceErrorCIF preparation failures
LibraryErrorLibrary loading / symbol lookup
CallingConventionErrorUnsupported calling convention
TypeValidationErrorInvalid type descriptor
UnsupportedPlatformErrorPlatform not supported

Comparison: goffi vs purego vs CGO

FeaturegoffipuregoCGO
C compiler requiredNoNoYes
API stylelibffi-like (prepare once, call many)reflect-based (RegisterFunc)Native
Per-call allocationsZero (CIF reusable)reflect + sync.Pool per callZero
Struct pass/returnFull (RAX+RDX, sret)Partial (no Windows structs)Full
Callback float returnsXMM0 in asmNot supported (panic)Full
ARM64 HFA detectionRecursive (nested structs)Partial (bug in nested path)Full
Typed errors5 types + errors.As()GenericN/A
Context supportTimeouts/cancellationNoNo
C-thread callbackscrosscall2crosscall2Full
String/bool/slice argsRaw pointers onlyAuto-marshalingFull
Platform breadth7 targets8 GOARCH / 20+ OSΓ—ARCHAll
AMD64 overhead88–114 nsNot published~140 ns (Go 1.26 claims ~30% reduction)

Choose goffi for GPU/real-time workloads: struct passing, zero per-call overhead, callback float returns, typed errors.

Choose purego for general-purpose bindings: string auto-marshaling, broad architecture support, less boilerplate.

See also: JupiterRider/ffi β€” pure Go binding for libffi via purego. Supports struct pass/return and variadic functions; requires libffi at runtime.


Known Limitations

Windows: C++ exceptions may crash the program (#12516)

  • Go runtime limitation, not goffi-specific. Go 1.22+ added partial SEH support (#58542), but edge cases remain.
  • Workaround: build native libraries with panic=abort.

Windows: float return values not captured from XMM0

  • syscall.SyscallN returns RAX only. Go syscall package limitation.

Variadic functions not supported (printf, sprintf)

  • Use non-variadic wrappers. Planned for v0.5.0.

Struct packing follows System V ABI only

  • Windows #pragma pack not honored. Manually specify Size/Alignment in TypeDescriptor.

No bitfields in struct types.

Unix: duplicate symbol conflict with purego (#22)

  • When using goffi and purego in the same binary with CGO_ENABLED=0, the linker reports duplicated definition of symbol _cgo_init. Both libraries include internal/fakecgo which defines identical runtime symbols.
  • Workaround: build with -tags nofakecgo to disable goffi's fakecgo, relying on purego's copy:
    CGO_ENABLED=0 go build -tags nofakecgo ./...
    

Platform Support

PlatformArchABISinceCI
Windowsamd64Win64v0.1.0Tested
Windowsarm64AAPCS64v0.5.0Tested (Snapdragon X)
Linuxamd64System Vv0.1.0Tested
Linuxarm64AAPCS64v0.3.0Cross-compile verified
macOSamd64System Vv0.1.1Tested
macOSarm64AAPCS64v0.3.7Tested (M3 Pro)
FreeBSDamd64System Vv0.5.0Cross-compile verified

Roadmap

VersionStatusHighlights
v0.2.0ReleasedCallback API, 2000-entry trampoline table
v0.3.xReleasedARM64 (AAPCS64), HFA, Apple Silicon
v0.4.0Releasedcrosscall2 for C-thread callbacks
v0.4.1ReleasedABI compliance audit β€” 10/11 gaps fixed
v0.4.2Releasedpurego compatibility (-tags nofakecgo)
v0.5.0NextWindows ARM64, FreeBSD, variadic functions, builder API
v1.0.0PlannedAPI stability (SemVer 2.0), security audit

See CHANGELOG.md for version history and ROADMAP.md for the full plan.


Testing

go test ./...                          # all tests
go test -cover ./...                   # with coverage (89%)
go test -bench=. -benchmem ./ffi       # benchmarks
go test -v ./ffi                       # verbose, auto-detects platform

Documentation

DocumentDescription
docs/ARCHITECTURE.mdTechnical architecture: assembly, ABIs, callbacks
docs/PERFORMANCE.mdBenchmarks, optimization strategies, Go 1.26
CHANGELOG.mdVersion history, migration guides
ROADMAP.mdDevelopment roadmap to v1.0
CONTRIBUTING.mdContribution guidelines
SECURITY.mdSecurity policy
examples/Working code examples

Contributing

See CONTRIBUTING.md for guidelines.

  1. Fork β†’ feature branch β†’ tests (80%+ coverage) β†’ lint β†’ PR
  2. Conventional commits: feat:, fix:, docs:, test:

Acknowledgments

  • purego β€” proved that pure Go FFI is possible. The crosscall2 callback mechanism, fakecgo approach, and assembly trampoline patterns were pioneered by purego. goffi exists because purego cleared the path.
  • libffi β€” reference for FFI architecture patterns and CIF design.
  • Go runtime β€” runtime.cgocall for GC-safe stack switching, crosscall2 for Cβ†’Go transitions.

Ecosystem

goffi powers an ecosystem of pure Go GPU libraries:

ProjectDescription
go-webgpu/webgpuZero-CGO WebGPU bindings (wgpu-native)
born-ml/bornML framework for Go, GPU-accelerated
gogpuGPU computing platform β€” dual Rust + Pure Go backends
wgpu-nativeNative WebGPU implementation (upstream)

License

MIT β€” see LICENSE.


goffi v0.4.1 | GitHub | pkg.go.dev | Dev.to