π hdf5 - Awesome Go Library for Science and Data Analysis

Pure Go implementation of the HDF5 file format for scientific data storage and exchange
Detailed Description of hdf5
HDF5 Go Library
Pure Go implementation of the HDF5 file format - No CGo required
A modern, pure Go library for reading and writing HDF5 files without CGo dependencies. HDF5 2.0.0 compatible, production-ready read/write support.
β¨ Features
- β Pure Go - No CGo, no C dependencies, cross-platform
- β Modern Design - Built with Go 1.25+ best practices
- β HDF5 2.0.0 Compatibility - Read/Write: v0, v2, v3 superblocks | Format Spec v4.0 with checksum validation
- β Full Dataset Reading - Compact, contiguous, chunked layouts with GZIP
- β Rich Datatypes - Integers, floats, strings (fixed/variable), compounds
- β Memory Efficient - Buffer pooling and smart memory management
- β Production Ready - Read support feature-complete
- βοΈ Comprehensive Write Support - Datasets, groups, attributes + Smart Rebalancing!
π Quick Start
Installation
go get github.com/scigolib/hdf5
Basic Usage
package main
import (
"fmt"
"log"
"github.com/scigolib/hdf5"
)
func main() {
// Open HDF5 file
file, err := hdf5.Open("data.h5")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Walk through file structure
file.Walk(func(path string, obj hdf5.Object) {
switch v := obj.(type) {
case *hdf5.Group:
fmt.Printf("π %s (%d children)\n", path, len(v.Children()))
case *hdf5.Dataset:
fmt.Printf("π %s\n", path)
}
})
}
Output:
π / (2 children)
π /temperature
π /experiments/ (3 children)
π Documentation
Getting Started
- Installation Guide - Install and verify the library
- Quick Start Guide - Get started in 5 minutes
- Reading Data - Comprehensive guide to reading datasets and attributes
Reference
- Datatypes Guide - HDF5 to Go type mapping
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
- API Reference - GoDoc documentation
Advanced
- Architecture Overview - How it works internally
- Performance Tuning - B-tree rebalancing strategies for optimal performance
- Rebalancing API - Complete API reference for rebalancing options
- Examples - Working code examples (7 examples with detailed documentation)
β‘ Performance Tuning
When deleting many attributes, B-trees can become sparse (wasted disk space, slower searches). This library offers 4 rebalancing strategies:
1. Default (No Rebalancing)
Fast deletions, but B-tree may become sparse
// No options = no rebalancing (like HDF5 C library)
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate)
Use for: Append-only workloads, small files (<100MB)
2. Lazy Rebalancing (10-100x faster than immediate)
Batch processing: rebalances when threshold reached
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(
hdf5.LazyThreshold(0.05), // Trigger at 5% underflow
hdf5.LazyMaxDelay(5*time.Minute), // Force rebalance after 5 min
),
)
Use for: Batch deletion workloads, medium/large files (100-500MB)
Performance: ~2% overhead, occasional 100-500ms pauses
3. Incremental Rebalancing (ZERO pause)
Background processing: rebalances in background goroutine
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(), // Prerequisite!
hdf5.WithIncrementalRebalancing(
hdf5.IncrementalBudget(100*time.Millisecond),
hdf5.IncrementalInterval(5*time.Second),
),
)
defer fw.Close() // Stops background goroutine
Use for: Large files (>500MB), continuous operations, TB-scale data
Performance: ~4% overhead, zero user-visible pause
4. Smart Rebalancing (Auto-Pilot)
Auto-tuning: library detects workload and selects optimal mode
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithSmartRebalancing(
hdf5.SmartAutoDetect(true),
hdf5.SmartAutoSwitch(true),
),
)
Use for: Unknown workloads, mixed operations, research environments
Performance: ~6% overhead, adapts automatically
Performance Comparison
| Mode | Deletion Speed | Pause Time | Use Case |
|---|---|---|---|
| Default | 100% (baseline) | None | Append-only, small files |
| Lazy | 95% (10-100x faster than immediate!) | 100-500ms batches | Batch deletions |
| Incremental | 92% | None (background) | Large files, continuous ops |
| Smart | 88% | Varies | Unknown workloads |
Learn more:
- Performance Tuning Guide: Comprehensive guide with benchmarks, recommendations, troubleshooting
- Rebalancing API Reference: Complete API documentation
- Examples: 4 working examples demonstrating each mode
π― Current Status
HDF5 2.0.0 Ready with 88%+ library coverage! π
β Fully Implemented
-
File Structure:
- Superblock parsing (v0, v2, v3) with checksum validation (CRC32)
- Object headers v1 (legacy HDF5 < 1.8) with continuations
- Object headers v2 (modern HDF5 >= 1.8) with continuations
- Groups (traditional symbol tables + modern object headers)
- B-trees (leaf + non-leaf nodes for large files)
- Local heaps (string storage)
- Global Heap (variable-length data)
- Fractal heap (direct blocks for dense attributes) β¨ NEW
-
Dataset Reading:
- Compact layout (data in object header)
- Contiguous layout (sequential storage)
- Chunked layout with B-tree indexing
- GZIP/Deflate compression
- LZF compression (h5py/PyTables compatible) β¨ NEW
- Filter pipeline for compressed data
-
Datatypes (Read + Write):
- Basic types: int8-64, uint8-64, float32/64
- AI/ML types: FP8 (E4M3, E5M2), bfloat16 - IEEE 754 compliant β¨ NEW
- Strings: Fixed-length (null/space/null-padded), variable-length (via Global Heap)
- Advanced types: Arrays, Enums, References (object/region), Opaque
- Compound types: Struct-like with nested members
-
Attributes:
- Compact attributes (in object header) β¨ NEW
- Dense attributes (fractal heap foundation) β¨ NEW
- Attribute reading for groups and datasets β¨ NEW
- Full attribute API (Group.Attributes(), Dataset.Attributes()) β¨ NEW
-
Navigation: Full file tree traversal via Walk()
-
Code Quality:
- Test coverage: 88%+ library packages (target: >70%) β
- Lint issues: 0 (34+ linters) β
- TODO items: 0 (all resolved) β
- Official HDF5 test suite: 433 files, 100% pass rate β
-
Security β¨ NEW:
- 4 CVEs fixed (CVE-2025-7067, CVE-2025-6269, CVE-2025-2926, CVE-2025-44905) β
- Overflow protection throughout (SafeMultiply, buffer validation) β
- Security limits: 1GB chunks, 64MB attributes, 16MB strings β
- 39 security test cases, all passing β
βοΈ Write Support - Feature Complete!
Production-ready write support with all features! β
Dataset Operations:
- β Create datasets (all layouts: contiguous, chunked, compact)
- β Write data (all datatypes including compound)
- β Dataset resizing with unlimited dimensions
- β Variable-length datatypes: strings, ragged arrays
- β Compression (GZIP, Shuffle, Fletcher32)
- β Array and enum datatypes
- β References and opaque types
- β Attribute writing (dense & compact storage)
- β Attribute modification/deletion
Links:
- β Hard links (full support)
- β Soft links (symbolic references - full support)
- β External links (cross-file references - full support)
Read Enhancements:
- β Hyperslab selection (data slicing) - 10-250x faster!
- β Efficient partial dataset reading
- β Stride and block support
- β Chunk-aware reading (reads ONLY needed chunks)
- β ChunkIterator API - Memory-efficient iteration over large datasets
Validation:
- β Official HDF5 Test Suite: 100% pass rate (378/378 files)
- β Production quality confirmed
Future Enhancements:
- β LZF filter (read + write, Pure Go) β¨ NEW
- β BZIP2 filter (read only, stdlib)
- β οΈ SZIP filter (stub - requires libaec)
- β οΈ Thread-safety with mutexes + SWMR mode
- β οΈ Parallel I/O
β Planned Features
Next Steps - See ROADMAP.md for complete timeline and versioning strategy.
π§ Development
Requirements
- Go 1.25 or later
- No external dependencies for the library
Building
# Clone repository
git clone https://github.com/scigolib/hdf5.git
cd hdf5
# Run tests
go test ./...
# Build examples
go build ./examples/...
# Build tools
go build ./cmd/...
Testing
# Run all tests
go test ./...
# Run with race detector
go test -race ./...
# Run with coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
π€ Contributing
Contributions are welcome! This is an early-stage project and we'd love your help.
Before contributing:
- Read CONTRIBUTING.md - Git workflow and development guidelines
- Check open issues
- Review the Architecture Overview
Ways to contribute:
- π Report bugs
- π‘ Suggest features
- π Improve documentation
- π§ Submit pull requests
- β Star the project
πΊοΈ Comparison with Other Libraries
| Feature | This Library | gonum/hdf5 | go-hdf5/hdf5 |
|---|---|---|---|
| Pure Go | β Yes | β CGo wrapper | β Yes |
| Reading | β Full | β Full | β Limited |
| Writing | β Full | β Full | β No |
| HDF5 1.8+ | β Yes | β οΈ Limited | β No |
| Advanced Datatypes | β All | β Yes | β No |
| Test Suite Validation | β 100% (378/378) | β οΈ Unknown | β No |
| Maintained | β Active | β οΈ Slow | β Inactive |
| Thread-safe | β οΈ User must sync* | β οΈ Conditional | β No |
* Different File instances are independent. Concurrent access to same File requires user synchronization (standard Go practice). Full thread-safety with mutexes + SWMR mode planned for future releases.
π HDF5 Resources
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- The HDF Group for the HDF5 format specification
- gonum/hdf5 for inspiration
- All contributors to this project
Special Thanks
Professor Ancha Baranova - This project would not have been possible without her invaluable help and support. Her assistance was crucial in bringing this library to life.
π Support
- π Documentation - Architecture and guides
- π Issue Tracker
- π¬ Discussions - Community Q&A and announcements
- π HDF Group Forum - Official HDF5 community discussion
Status: Stable - HDF5 2.0.0 compatible with security hardening
Built with β€οΈ by the HDF5 Go community Recognized by HDF Group Forum β