Version: 1.0.0 Date: November 13, 2025 Status: Draft Specification
This document specifies the Zyntax Bytecode Format, a platform-independent binary serialization format for HIR (High-level Intermediate Representation). This enables language implementers to target Zyntax without using Rust, by emitting bytecode that can be deserialized and compiled.
Key Benefits:
- Language-agnostic compiler frontend implementation
- Cross-platform HIR distribution
- Hot-reloading and dynamic code loading
- Compilation caching and incremental compilation
- Remote compilation and code distribution
- Language Independence: Any language can emit Zyntax bytecode
- Compact Representation: Efficient binary encoding
- Fast Deserialization: Minimal parsing overhead
- Version Compatibility: Forward and backward compatibility via versioning
- Complete Fidelity: Preserve all HIR semantics
- Debuggability: Support for debug info and source maps
- Security: Validation and sandboxing support
- Binary:
.zbc(Zyntax ByteCode) - Text/Debug:
.zbct(Zyntax ByteCode Text) - JSON representation
┌─────────────────────────────────────┐
│ File Header (32 bytes) │
├─────────────────────────────────────┤
│ String Table Section │
├─────────────────────────────────────┤
│ Type Table Section │
├─────────────────────────────────────┤
│ Constant Pool Section │
├─────────────────────────────────────┤
│ Global Definitions Section │
├─────────────────────────────────────┤
│ Function Definitions Section │
├─────────────────────────────────────┤
│ Import/Export Section │
├─────────────────────────────────────┤
│ Debug Info Section (optional) │
├─────────────────────────────────────┤
│ Metadata Section (optional) │
└─────────────────────────────────────┘
| Offset | Size | Field | Description |
|---|---|---|---|
| 0x00 | 4 | magic | Magic number: 0x5A424300 ("ZBC\0") |
| 0x04 | 2 | major_version | Major version (1) |
| 0x06 | 2 | minor_version | Minor version (0) |
| 0x08 | 4 | flags | Compilation flags (bitfield) |
| 0x0C | 8 | module_id | UUID of module (16 bytes in v2) |
| 0x14 | 4 | string_table_offset | Offset to string table |
| 0x18 | 4 | string_table_size | Size of string table |
| 0x1C | 4 | checksum | CRC32 checksum of entire file (excluding this field) |
| Bit | Name | Description |
|---|---|---|
| 0 | DEBUG_INFO | Debug information included |
| 1 | OPTIMIZED | Code has been optimized |
| 2 | POSITION_INDEPENDENT | Position-independent code |
| 3 | HOT_RELOAD | Supports hot-reloading |
| 4-7 | RESERVED | Reserved for future use |
struct BytecodeHeader {
magic: u32, // 0x5A424300
major_version: u16, // 1
minor_version: u16, // 0
flags: u32, // Bitfield
module_id: [u8; 16], // UUID bytes
string_table_offset: u32,
string_table_size: u32,
checksum: u32,
}Stores all string literals and identifiers used in the module.
┌────────────────────────┐
│ string_count: u32 │ Number of strings
├────────────────────────┤
│ String 0 │
│ length: u32 │ UTF-8 byte length
│ data: [u8; length] │ UTF-8 bytes
├────────────────────────┤
│ String 1 │
│ ... │
├────────────────────────┤
│ String N │
└────────────────────────┘
Throughout the bytecode, strings are referenced by index into the string table (0-based).
Example: String index 5 refers to the 6th string in the table.
Defines all types used in the module.
┌────────────────────────┐
│ type_count: u32 │ Number of type definitions
├────────────────────────┤
│ Type 0 │
├────────────────────────┤
│ Type 1 │
├────────────────────────┤
│ Type N │
└────────────────────────┘
Each type is encoded as:
┌────────────────────────┐
│ type_tag: u8 │ Type discriminator
├────────────────────────┤
│ type_data: varies │ Type-specific data
└────────────────────────┘
| Tag | Type | Payload |
|---|---|---|
| 0x00 | Void | None |
| 0x01 | Bool | None |
| 0x02 | I8 | None |
| 0x03 | I16 | None |
| 0x04 | I32 | None |
| 0x05 | I64 | None |
| 0x06 | I128 | None |
| 0x07 | U8 | None |
| 0x08 | U16 | None |
| 0x09 | U32 | None |
| 0x0A | U64 | None |
| 0x0B | U128 | None |
| 0x0C | F32 | None |
| 0x0D | F64 | None |
| 0x10 | Ptr | type_index: u32 |
| 0x11 | Ref | lifetime_index: u32, type_index: u32, mutable: u8 |
| 0x12 | Array | element_type: u32, length: u64 |
| 0x13 | Vector | element_type: u32, length: u32 |
| 0x20 | Struct | name_index: u32, field_count: u32, fields: [u32; count], packed: u8 |
| 0x21 | Union | name_index: u32, variant_count: u32, variants: [...], discriminant_type: u32, is_c_union: u8 |
| 0x30 | Function | param_count: u32, params: [u32], return_count: u32, returns: [u32], is_variadic: u8 |
| 0x31 | Closure | function_type: u32, capture_count: u32, captures: [...], call_mode: u8 |
| 0x40 | TraitObject | trait_id: UUID (16 bytes), vtable_index: u32 |
| 0x41 | Interface | method_count: u32, methods: [...], is_structural: u8 |
| 0x42 | AssociatedType | trait_id: UUID, self_type: u32, name_index: u32 |
| 0x50 | Opaque | name_index: u32 |
| 0x51 | ConstGeneric | name_index: u32 |
| 0x52 | Generic | base_type: u32, type_arg_count: u32, type_args: [u32], const_arg_count: u32, const_args: [u32] |
Stores constant values referenced by instructions.
┌────────────────────────┐
│ constant_count: u32 │
├────────────────────────┤
│ Constant 0 │
├────────────────────────┤
│ Constant 1 │
├────────────────────────┤
│ Constant N │
└────────────────────────┘
┌────────────────────────┐
│ const_tag: u8 │ Constant type discriminator
├────────────────────────┤
│ const_data: varies │ Constant-specific data
└────────────────────────┘
| Tag | Type | Payload |
|---|---|---|
| 0x00 | Bool | value: u8 (0 or 1) |
| 0x01 | I8 | value: i8 |
| 0x02 | I16 | value: i16 (little-endian) |
| 0x03 | I32 | value: i32 (little-endian) |
| 0x04 | I64 | value: i64 (little-endian) |
| 0x05 | I128 | value: i128 (little-endian, 16 bytes) |
| 0x06 | U8 | value: u8 |
| 0x07 | U16 | value: u16 (little-endian) |
| 0x08 | U32 | value: u32 (little-endian) |
| 0x09 | U64 | value: u64 (little-endian) |
| 0x0A | U128 | value: u128 (little-endian, 16 bytes) |
| 0x0B | F32 | value: f32 (IEEE 754, little-endian) |
| 0x0C | F64 | value: f64 (IEEE 754, little-endian) |
| 0x10 | Null | type_index: u32 |
| 0x20 | Array | element_count: u32, elements: [const_index; count] |
| 0x21 | Struct | field_count: u32, fields: [const_index; count] |
| 0x30 | String | string_index: u32 |
| 0x40 | VTable | vtable_id: UUID, method_count: u32, methods: [function_index; count] |
Defines global variables and constants.
┌────────────────────────┐
│ global_count: u32 │
├────────────────────────┤
│ Global 0 │
│ id: UUID (16 bytes) │
│ name_index: u32 │
│ type_index: u32 │
│ is_mutable: u8 │
│ is_external: u8 │
│ initializer: u32 │ Constant index (0xFFFFFFFF if none)
│ linkage: u8 │ 0=private, 1=public, 2=external
├────────────────────────┤
│ Global 1 │
│ ... │
└────────────────────────┘
The core section containing all function definitions.
┌────────────────────────┐
│ function_count: u32 │
├────────────────────────┤
│ Function 0 │
├────────────────────────┤
│ Function 1 │
├────────────────────────┤
│ Function N │
└────────────────────────┘
┌───────────────────────────────────┐
│ Function Header │
│ id: UUID (16 bytes) │
│ name_index: u32 │
│ is_external: u8 │
│ calling_convention: u8 │
│ signature_offset: u32 │ Offset to signature data
│ body_offset: u32 │ Offset to body (0 if external)
├───────────────────────────────────┤
│ Function Signature │
│ param_count: u32 │
│ params: [Param; count] │
│ return_count: u32 │
│ returns: [u32; count] │ Type indices
│ type_param_count: u32 │
│ type_params: [TypeParam; count] │
│ const_param_count: u32 │
│ const_params: [ConstParam; count]│
│ lifetime_param_count: u32 │
│ lifetime_params: [Lifetime; count]│
│ is_variadic: u8 │
│ is_async: u8 │
├───────────────────────────────────┤
│ Function Body (if not external) │
│ entry_block: UUID │
│ block_count: u32 │
│ blocks: [Block; count] │
│ local_count: u32 │
│ locals: [Local; count] │
│ value_count: u32 │
│ values: [Value; count] │
└───────────────────────────────────┘
struct Param {
id: UUID, // 16 bytes
name_index: u32, // String table index
type_index: u32, // Type table index
attributes: u16, // Bitfield: by_ref, sret, zext, sext, noalias, nonnull, readonly
}
┌───────────────────────────────────┐
│ Block Header │
│ id: UUID (16 bytes) │
│ label_index: u32 │ String table index (0xFFFFFFFF if none)
│ phi_count: u32 │
│ instruction_count: u32 │
│ predecessor_count: u32 │
│ predecessors: [UUID; count] │
│ successor_count: u32 │
│ successors: [UUID; count] │
├───────────────────────────────────┤
│ Phi Nodes │
│ phi_0: PhiNode │
│ phi_1: PhiNode │
│ ... │
├───────────────────────────────────┤
│ Instructions │
│ instr_0: Instruction │
│ instr_1: Instruction │
│ ... │
├───────────────────────────────────┤
│ Terminator │
│ terminator: Terminator │
└───────────────────────────────────┘
struct PhiNode {
result_id: UUID, // 16 bytes
type_index: u32,
incoming_count: u32,
incoming: [(UUID, UUID); count], // (value_id, block_id) pairs
}
Each instruction is encoded with an opcode followed by operands.
┌────────────────────────┐
│ opcode: u8 │ Instruction opcode
├────────────────────────┤
│ operands: varies │ Opcode-specific operands
└────────────────────────┘
| Opcode | Instruction | Operands |
|---|---|---|
| 0x00 | Binary | op: u8, result: UUID, type: u32, left: UUID, right: UUID |
| 0x01 | Unary | op: u8, result: UUID, type: u32, operand: UUID |
| 0x02 | Alloca | result: UUID, type: u32, count: UUID (or NULL), align: u32 |
| 0x03 | Load | result: UUID, type: u32, ptr: UUID, align: u32, volatile: u8 |
| 0x04 | Store | value: UUID, ptr: UUID, align: u32, volatile: u8 |
| 0x05 | GetElementPtr | result: UUID, type: u32, ptr: UUID, index_count: u32, indices: [UUID] |
| 0x06 | Call | result: UUID (or NULL), callee_type: u8, callee: varies, arg_count: u32, args: [UUID], type_arg_count: u32, type_args: [u32], const_arg_count: u32, const_args: [u32], is_tail: u8 |
| 0x07 | IndirectCall | result: UUID (or NULL), func_ptr: UUID, arg_count: u32, args: [UUID], return_type: u32 |
| 0x08 | Cast | op: u8, result: UUID, type: u32, operand: UUID |
| 0x09 | Select | result: UUID, type: u32, condition: UUID, true_val: UUID, false_val: UUID |
| 0x0A | ExtractValue | result: UUID, type: u32, aggregate: UUID, index_count: u32, indices: [u32] |
| 0x0B | InsertValue | result: UUID, type: u32, aggregate: UUID, value: UUID, index_count: u32, indices: [u32] |
| 0x0C | Atomic | op: u8, result: UUID, type: u32, ptr: UUID, value: UUID (or NULL), ordering: u8 |
| 0x0D | Fence | ordering: u8 |
| 0x10 | CreateUnion | result: UUID, union_type: u32, variant_index: u32, value: UUID |
| 0x11 | GetUnionDiscriminant | result: UUID, union_val: UUID |
| 0x12 | ExtractUnionValue | result: UUID, type: u32, union_val: UUID, variant_index: u32 |
| 0x20 | CreateTraitObject | result: UUID, trait_id: UUID, data_ptr: UUID, vtable_id: UUID |
| 0x21 | UpcastTraitObject | result: UUID, sub_trait_object: UUID, sub_trait_id: UUID, super_trait_id: UUID, super_vtable_id: UUID |
| 0x22 | TraitMethodCall | result: UUID (or NULL), trait_object: UUID, method_index: u32, method_sig: MethodSignature, arg_count: u32, args: [UUID], return_type: u32 |
| 0x30 | CreateClosure | result: UUID, closure_type: u32, function: UUID, capture_count: u32, captures: [UUID] |
| 0x31 | CallClosure | result: UUID (or NULL), closure: UUID, arg_count: u32, args: [UUID] |
| 0x40 | CreateRef | result: UUID, value: UUID, lifetime_index: u32, mutable: u8 |
| 0x41 | Deref | result: UUID, type: u32, reference: UUID |
| 0x42 | Move | result: UUID, type: u32, source: UUID |
| 0x43 | Copy | result: UUID, type: u32, source: UUID |
| 0x50 | BeginLifetime | lifetime_index: u32 |
| 0x51 | EndLifetime | lifetime_index: u32 |
| 0x52 | LifetimeConstraint | longer: u32, shorter: u32 |
BinaryOp (for opcode 0x00):
| Tag | Operation |
|---|---|
| 0x00 | Add |
| 0x01 | Sub |
| 0x02 | Mul |
| 0x03 | Div |
| 0x04 | Rem |
| 0x05 | And |
| 0x06 | Or |
| 0x07 | Xor |
| 0x08 | Shl |
| 0x09 | Shr |
| 0x0A | Eq |
| 0x0B | Ne |
| 0x0C | Lt |
| 0x0D | Le |
| 0x0E | Gt |
| 0x0F | Ge |
| 0x10 | FAdd |
| 0x11 | FSub |
| 0x12 | FMul |
| 0x13 | FDiv |
| 0x14 | FRem |
| 0x15 | FEq |
| 0x16 | FNe |
| 0x17 | FLt |
| 0x18 | FLe |
| 0x19 | FGt |
| 0x1A | FGe |
UnaryOp (for opcode 0x01):
| Tag | Operation |
|---|---|
| 0x00 | Neg |
| 0x01 | Not |
| 0x02 | FNeg |
CastOp (for opcode 0x08):
| Tag | Operation |
|---|---|
| 0x00 | BitCast |
| 0x01 | ZExt |
| 0x02 | SExt |
| 0x03 | Trunc |
| 0x04 | FPTrunc |
| 0x05 | FPExt |
| 0x06 | FPToUI |
| 0x07 | FPToSI |
| 0x08 | UIToFP |
| 0x09 | SIToFP |
| 0x0A | PtrToInt |
| 0x0B | IntToPtr |
Block terminators control control flow.
┌────────────────────────┐
│ terminator_tag: u8 │ Terminator type
├────────────────────────┤
│ terminator_data: varies│ Terminator-specific data
└────────────────────────┘
| Tag | Terminator | Payload |
|---|---|---|
| 0x00 | Return | value_count: u32, values: [UUID; count] |
| 0x01 | Branch | target: UUID |
| 0x02 | CondBranch | condition: UUID, true_target: UUID, false_target: UUID |
| 0x03 | Switch | value: UUID, default: UUID, case_count: u32, cases: [(const_index, block_id); count] |
| 0x04 | Unreachable | None |
| 0x05 | Invoke | callee_type: u8, callee: varies, arg_count: u32, args: [UUID], normal: UUID, unwind: UUID |
| 0x06 | PatternMatch | value: UUID, pattern_count: u32, patterns: [Pattern; count], default: UUID (or NULL) |
Defines module imports and exports.
┌────────────────────────┐
│ import_count: u32 │
├────────────────────────┤
│ Import 0 │
│ name_index: u32 │
│ module_name: u32 │
│ kind: u8 │ 0=function, 1=global, 2=type
│ id: UUID │
├────────────────────────┤
│ ... │
├────────────────────────┤
│ export_count: u32 │
├────────────────────────┤
│ Export 0 │
│ name_index: u32 │
│ kind: u8 │ 0=function, 1=global, 2=type
│ id: UUID │
├────────────────────────┤
│ ... │
└────────────────────────┘
Provides source location mapping and debug symbols.
┌────────────────────────┐
│ debug_magic: u32 │ 0x44424700 ("DBG\0")
├────────────────────────┤
│ source_file_count: u32 │
├────────────────────────┤
│ Source Files │
│ file_0: │
│ path_index: u32 │
│ hash: [u8; 32] │ SHA-256 of file contents
├────────────────────────┤
│ location_count: u32 │
├────────────────────────┤
│ Locations │
│ loc_0: │
│ instruction_id: UUID│
│ file_index: u32 │
│ line: u32 │
│ column: u32 │
│ length: u32 │
└────────────────────────┘
Custom metadata for tooling, profiling, etc.
┌────────────────────────┐
│ metadata_magic: u32 │ 0x4D455400 ("MET\0")
├────────────────────────┤
│ entry_count: u32 │
├────────────────────────┤
│ Metadata Entry 0 │
│ key_index: u32 │ String table index
│ value_size: u32 │
│ value_data: [u8] │
├────────────────────────┤
│ ... │
└────────────────────────┘
Zyntax Source:
fn add(a: i32, b: i32) -> i32 {
a + b
}Bytecode (Conceptual):
Header: [magic=ZBC, version=1.0, ...]
StringTable:
0: "add"
1: "a"
2: "b"
TypeTable:
0: I32
Functions:
Function {
id: <uuid>,
name: 0, // "add"
signature: {
params: [
Param { id: <uuid-a>, name: 1, type: 0 }, // a: i32
Param { id: <uuid-b>, name: 2, type: 0 }, // b: i32
],
returns: [0], // i32
},
blocks: [
Block {
id: <uuid-entry>,
instructions: [
Binary {
op: Add,
result: <uuid-result>,
type: 0, // i32
left: <uuid-a>,
right: <uuid-b>,
}
],
terminator: Return { values: [<uuid-result>] }
}
]
}
Zyntax Source:
fn identity<T>(x: T) -> T {
x
}Bytecode would include:
- Type parameter
Tin signature - Move or Copy instruction depending on trait bounds
- Monomorphization handled at link time
- Build String Table First: Collect all identifiers and strings
- Build Type Table: Encode all types with proper indices
- Build Constant Pool: Encode all constant values
- Build Functions: Encode function signatures and bodies
- Generate UUIDs: Use UUID v4 for all HirIds
- Validate: Ensure all references are valid
- Calculate Checksum: CRC32 of entire file
- Write Binary: Write sections in order
- Validate Header: Check magic number and version
- Verify Checksum: Ensure file integrity
- Load String Table: Deserialize all strings
- Load Type Table: Reconstruct type definitions
- Load Constant Pool: Deserialize constants
- Load Functions: Reconstruct HIR functions
- Validate References: Ensure all UUIDs and indices are valid
- Build HIR Module: Construct complete HirModule
Major.Minor (e.g., 1.0)
- Major: Incompatible changes (breaking)
- Minor: Backward-compatible additions
- Forward Compatibility: Older compilers can ignore unknown minor features
- Backward Compatibility: Newer compilers must support older bytecode
- Deprecation: Features deprecated for 2 major versions before removal
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-11-13 | Initial specification |
All bytecode must be validated before execution:
- Bounds Checking: All indices must be within valid ranges
- Type Safety: All operations must be type-correct
- CFG Validation: Control flow graph must be well-formed
- SSA Validation: SSA form must be valid
- Lifetime Validation: Lifetime constraints must be satisfiable
Bytecode can be sandboxed:
- Memory Limits: Restrict allocations
- Capability-based Security: Limit system calls
- Resource Quotas: CPU time, memory, I/O
- zbcdump: Disassembler for
.zbcfiles - zbcvalid: Validator for bytecode
- zbc2json: Convert binary to JSON representation
- json2zbc: Convert JSON to binary bytecode
- WebAssembly Binary Format: Inspiration for compact encoding
- LLVM Bitcode: Similar IR serialization approach
- JVM Class File Format: Constant pool and string table design
- Protocol Buffers: Binary serialization best practices
See Section 3 for complete type tag listing.
See Section 7 for complete instruction opcode listing.
Last Updated: November 13, 2025 Specification Version: 1.0.0 Status: Draft - Ready for Implementation