binary encoding
patterns for encoding/decoding binary wire formats (CBOR, CAR, protocol frames). distinct from JSON - you're working with raw bytes and need to handle endianness, varints, and content addressing.
anytype writer for encoders
the core pattern: an encoder function that accepts any writer via anytype. this lets the same encoder write to fixed buffers, ArrayLists, or any other writer:
pub fn encode(allocator: Allocator, writer: anytype, value: Value) !void {
switch (value) {
.unsigned => |v| try writeArgument(writer, 0, v),
.text => |t| {
try writeArgument(writer, 3, t.len);
try writer.writeAll(t);
},
.map => |entries| {
// sort keys (DAG-CBOR determinism), needs allocator
const sorted = try allocator.dupe(MapEntry, entries);
defer allocator.free(sorted);
std.mem.sort(MapEntry, sorted, {}, keyLessThan);
// ...
},
// ...
}
}
the allocator parameter is separate from the writer - needed for temporary allocations during encoding (sorting map keys, building intermediate buffers), not for the output itself.
usage with different writers:
// fixed buffer (no allocation for output)
var buf: [1024]u8 = undefined;
var stream = std.io.fixedBufferStream(&buf);
try encode(alloc, stream.writer(), value);
const result = stream.getWritten();
// growable buffer
var list: std.ArrayList(u8) = .{};
defer list.deinit(alloc);
try encode(alloc, list.writer(alloc), value);
note: std.io.fixedBufferStream is deprecated in 0.15 — the stdlib says to use std.Io.Writer.fixed / std.Io.Reader.fixed instead. the old API still compiles (zat uses it in 3 files) but new code should prefer the non-deprecated form. the anytype writer pattern itself is fine either way — the encoder doesn't care which writer type backs it.
see: zat/cbor.zig
encodeAlloc convenience
wrap the growable-buffer pattern into a helper:
pub fn encodeAlloc(allocator: Allocator, value: Value) ![]u8 {
var list: std.ArrayList(u8) = .{};
errdefer list.deinit(allocator);
try encode(allocator, list.writer(allocator), value);
return try list.toOwnedSlice(allocator);
}
caller owns the returned slice. errdefer ensures cleanup if encoding fails partway through.
big-endian integers without writeInt
when writing fixed-width big-endian integers to an anytype writer, build the bytes manually rather than depending on writeInt (which may not be available on all writer types):
fn writeArgument(writer: anytype, major: u3, val: u64) !void {
const prefix: u8 = @as(u8, major) << 5;
if (val <= 0xffff) {
try writer.writeByte(prefix | 25);
const v: u16 = @intCast(val);
try writer.writeAll(&[2]u8{ @truncate(v >> 8), @truncate(v) });
}
// ...
}
@truncate on shifted values is the idiomatic way to extract individual bytes.
unsigned varint (LEB128)
used by CID, CAR, and other IPLD formats for variable-length integers:
// write
pub fn writeUvarint(writer: anytype, val: u64) !void {
var v = val;
while (v >= 0x80) {
try writer.writeByte(@as(u8, @truncate(v)) | 0x80);
v >>= 7;
}
try writer.writeByte(@as(u8, @truncate(v)));
}
// read
fn readUvarint(data: []const u8, pos: *usize) ?u64 {
var result: u64 = 0;
var shift: u6 = 0;
while (pos.* < data.len) {
const byte = data[pos.*];
pos.* += 1;
result |= @as(u64, byte & 0x7f) << shift;
if (byte & 0x80 == 0) return result;
shift +|= 7;
if (shift >= 64) return null;
}
return null;
}
note +|= (saturating add) prevents overflow on the shift counter.
arena per message
for streaming protocols, create an arena per incoming message. all decoding allocations go into it, then free everything at once:
pub fn serverMessage(self: *Self, data: []const u8) !void {
var arena = std.heap.ArenaAllocator.init(self.allocator);
defer arena.deinit();
const event = decodeFrame(arena.allocator(), data) catch |err| {
log.debug("decode error: {s}", .{@errorName(err)});
return;
};
self.handler.onEvent(event);
// arena freed here — all decoded data is gone
}
this means the handler's onEvent must not hold references to event data past the call. if it needs to, it must copy into its own allocator.
see: zat/firehose.zig, zat/jetstream.zig
specialized decoders
when generic decoding is too expensive, write a purpose-built parser for a known schema. the generic path builds Value unions, MapEntry arrays, and handles every CBOR type. if you know the exact shape, skip all that.
example: MST nodes are always map(2) { "e": array[entries...], "l": CID|null }. instead of cbor.decodeAll() → extract fields from Value unions, parse the CBOR bytes directly:
pub fn decodeMstNode(allocator: Allocator, data: []const u8) MstDecodeError!MstNodeData {
// expect map(2), key "e", array(n) — known byte sequence
// parse entries inline, zero-copy slicing into input buffer
// only allocation: the entries array itself
}
pub const MstNodeData = struct {
left: ?[]const u8, // raw CID bytes (borrowed from input)
entries: []MstEntryData, // heap-allocated array
};
pub const MstEntryData = struct {
prefix_len: usize,
key_suffix: []const u8, // borrowed from input
value_cid: []const u8, // borrowed from input
tree: ?[]const u8, // borrowed from input
};
the result: MST walk went from 45.5ms (generic decode per node) to 39.3ms (specialized decode) on 243k blocks. the bigger win was avoiding the full tree rebuild (218ms → 39ms total) by verifying structure during the walk.
when to use this pattern:
- you decode the same schema thousands of times (MST nodes, CBOR blocks)
- the schema is stable and well-known
- profiling shows decode time dominates
when NOT to use it:
- the schema varies or is user-defined
- you only decode a handful of times
- generic decode is fast enough
see: zat/mst.zig decodeMstNode
deterministic encoding
DAG-CBOR requires deterministic output (same value → same bytes). the main rules:
- shortest integer encoding: 0-23 inline, 24-255 in 1 byte, etc.
- map keys sorted: by byte length first, then lexicographically
- no floats, no indefinite lengths
sorting map keys during encoding:
fn dagCborKeyLessThan(_: void, a: MapEntry, b: MapEntry) bool {
if (a.key.len != b.key.len) return a.key.len < b.key.len;
return std.mem.order(u8, a.key, b.key) == .lt;
}
// in encoder:
const sorted = try allocator.dupe(MapEntry, entries);
defer allocator.free(sorted);
std.mem.sort(MapEntry, sorted, {}, dagCborKeyLessThan);
the dupe + sort pattern avoids mutating the input — the caller's entries slice stays unchanged.