Async Runtime Design for Zyntax

Problem: Implementing async/await in Cranelift JIT is complex due to state machine management.

Key Question: Should we build the reactor/executor in Cranelift, or call external runtime via syscalls?

Current Approach (Problematic)

The current async_support.rs attempts to:

Transform async functions into HIR state machines with switch statements
Compile these directly to Cranelift

Issues with this approach:

❌ Cranelift has limited control flow (switch statements are awkward)
❌ State management is complex in JIT context
❌ No clear way to suspend/resume execution
❌ Waker/poll mechanism needs runtime support
❌ Hard to integrate with existing async ecosystems (Tokio, async-std)

Proposed Solution: External Runtime + FFI

Based on your experience, here's a better approach:

Architecture: Hybrid Model

┌─────────────────────────────────────────┐
│   Zyntax Async Function (Source)       │
│   async fn fetch() -> String           │
└──────────────┬──────────────────────────┘
               │ Compile ↓
┌──────────────┴──────────────────────────┐
│   HIR Transformation                    │
│   1. Extract await points               │
│   2. Create state struct                │
│   3. Generate poll() function           │
└──────────────┬──────────────────────────┘
               │ Codegen ↓
┌──────────────┴──────────────────────────┐
│   Cranelift JIT Code                    │
│   - State struct (in memory)            │
│   - poll(state*) -> Poll<T>            │
│   - FFI calls to runtime                │
└──────────────┬──────────────────────────┘
               │ Runtime ↓
┌──────────────┴──────────────────────────┐
│   External Async Runtime (C/Rust)       │
│   - Tokio, async-std, or custom         │
│   - Executor/reactor                    │
│   - Waker mechanism                     │
│   - I/O polling (epoll/kqueue)          │
└─────────────────────────────────────────┘

Key Design Decisions

1. Compile to poll() Functions, NOT State Machines

Instead of complex switch-based state machines:

// Source
async fn fetch_data(url: String) -> String {
    let response = http_get(url).await;  // Await point 1
    let body = read_body(response).await; // Await point 2
    body
}

// Compile to C-compatible poll function
struct FetchDataState {
    state_id: u32,
    url: String,
    response: Option<Response>,
    body: Option<String>,
}

// Cranelift generates this:
extern "C" fn fetch_data_poll(state: *mut FetchDataState, waker: *const Waker) -> Poll<String> {
    unsafe {
        match (*state).state_id {
            0 => {
                // Initial state: start http_get
                let fut = http_get((*state).url.clone());
                // Store future somehow... or just call poll immediately
                (*state).state_id = 1;
                Poll::Pending
            }
            1 => {
                // After first await
                if let Some(response) = (*state).response.take() {
                    let fut = read_body(response);
                    (*state).state_id = 2;
                    Poll::Pending
                } else {
                    Poll::Pending
                }
            }
            2 => {
                // After second await
                if let Some(body) = (*state).body.take() {
                    Poll::Ready(body)
                } else {
                    Poll::Pending
                }
            }
            _ => unreachable!()
        }
    }
}

Advantages:

✅ Simple C ABI interface
✅ Runtime handles all complexity
✅ Cranelift only needs basic control flow
✅ State is just a struct (easy in Cranelift)

2. External Runtime via Shared Library

Create a minimal async runtime as a C library:

// zyntax_async_runtime.h

typedef enum {
    POLL_READY,
    POLL_PENDING
} PollStatus;

typedef struct {
    void* data;
    void (*wake)(void*);
} Waker;

typedef void* FutureState;
typedef PollStatus (*PollFn)(FutureState, Waker*);

// Runtime functions (implemented in Rust or C++)
void* zyntax_runtime_create(void);
void zyntax_runtime_destroy(void* runtime);
void* zyntax_runtime_spawn(void* runtime, FutureState state, PollFn poll);
void zyntax_runtime_block_on(void* runtime, void* task);
void zyntax_runtime_run(void* runtime);

Implementation Options:

Option A: Rust-based Runtime (Recommended)

// zyntax-async-runtime crate (separate from compiler)
use tokio::runtime::Runtime;
use std::ffi::c_void;
use std::ptr;

#[no_mangle]
pub extern "C" fn zyntax_runtime_create() -> *mut c_void {
    let runtime = Box::new(Runtime::new().unwrap());
    Box::into_raw(runtime) as *mut c_void
}

#[no_mangle]
pub extern "C" fn zyntax_runtime_spawn(
    runtime: *mut c_void,
    state: *mut c_void,
    poll_fn: extern "C" fn(*mut c_void, *const Waker) -> PollStatus
) -> *mut c_void {
    // Wrap the poll function in a Rust Future
    // Spawn it on the Tokio runtime
    // Return task handle
}

// etc.

Option B: Minimal Custom Runtime (More control)

// Lightweight executor with epoll/kqueue
// No external dependencies
// ~500 lines of code
pub struct MinimalRuntime {
    ready_queue: VecDeque<TaskHandle>,
    io_poller: IoPoller, // epoll on Linux, kqueue on macOS
    tasks: HashMap<TaskId, Task>,
}

3. Cranelift Code Generation Strategy

The Cranelift backend needs to generate:

State Struct

fn generate_async_state_struct(&mut self, state_machine: &AsyncStateMachine) -> CompilerResult<()> {
    // Allocate struct with:
    // - u32 state_id (which await point we're at)
    // - Captured variables
    // - Intermediate results between awaits

    // This is straightforward - just struct allocation
}

Poll Function

fn generate_poll_function(&mut self, state_machine: &AsyncStateMachine) -> CompilerResult<Value> {
    // Generate a function with signature:
    // extern "C" fn(state: *mut State, waker: *const Waker) -> Poll<T>

    // Load state->state_id
    // Simple if-else chain or jump table (NOT switch statement)
    // Each state is a separate basic block
    // Call nested futures' poll functions

    // Much simpler than trying to do full state machine in HIR
}

Runtime FFI Calls

fn generate_spawn_call(&mut self, future: Value) -> CompilerResult<Value> {
    // Import zyntax_runtime_spawn
    // Pass state pointer and poll function pointer
    // Return task handle

    // Just a normal FFI call - Cranelift handles this well
}

Comparison of Approaches

Approach	Complexity	Performance	Integration	Maintainability
HIR State Machine	⚠️ Very High	⚠️ Medium	❌ Poor	❌ Hard
External Runtime + FFI	✅ Low	✅ High	✅ Excellent	✅ Easy
Built-in Reactor	❌ Extreme	✅ Highest	⚠️ Medium	❌ Very Hard

Why External Runtime Wins

Leverage Existing Work
- Tokio is battle-tested
- async-std is proven
- Don't reinvent the wheel
Simpler Cranelift Integration
- Just generate poll() functions
- FFI calls are straightforward
- No complex control flow needed
Flexibility
- Users can choose runtime (Tokio/async-std/custom)
- Can swap runtimes without recompiling
- Testing is easier
Performance
- FFI overhead is minimal (nanoseconds)
- Runtime is optimized for async I/O
- JIT code is still fast

Implementation Plan

Phase 1: Minimal Async Support (500-800 lines)

Goal: Get basic async/await working with external runtime

Create Runtime C API (150 lines)
- Header file with basic runtime interface
- Poll/Ready/Pending types
- Task spawning
Build Minimal Rust Runtime (300 lines)
- Tokio-based or custom
- C API wrapper
- Shared library (.so/.dylib/.dll)
Cranelift State Struct Generation (150 lines)
- Allocate state struct
- Store captured variables
- Track current await point
Cranelift Poll Function Generation (200 lines)
- Generate poll() function
- Simple state dispatch (if-else or jump table)
- Call nested futures
FFI Glue (100 lines)
- Import runtime functions
- Generate spawn/block_on calls
- Handle task results

Phase 2: Enhanced Features (300-500 lines)

Async Closures
Async Iterators/Streams
Cancellation Support
Multiple Runtime Support (Tokio/async-std/custom)

Example: End-to-End Flow

// 1. Source code
async fn fetch_user(id: u64) -> User {
    let response = http_get(format!("/users/{}", id)).await;
    parse_json(response).await
}

// 2. Compiler generates HIR with async metadata
// AsyncStateMachine {
//     states: [
//         State0: { call http_get, await },
//         State1: { call parse_json, await },
//         State2: { return result }
//     ],
//     captures: [id: u64],
// }

// 3. Cranelift generates:
struct FetchUserState {
    state_id: u32,
    id: u64,
    response: Option<Response>,
}

extern "C" fn fetch_user_poll(state: *mut FetchUserState, waker: *const Waker) -> Poll<User> {
    // ... state machine logic
}

// 4. Runtime wraps this:
let state = Box::new(FetchUserState { state_id: 0, id: 123, response: None });
let task = runtime.spawn(state, fetch_user_poll);
let user = runtime.block_on(task);

Performance Considerations

FFI Overhead

Cost: ~5-10ns per call (negligible)
Compared to: I/O latency (microseconds to milliseconds)
Verdict: Not a concern for async code

State Struct Allocation

Stack: For sync calling async (when possible)
Heap: For spawned tasks (required anyway)
Optimization: Arena allocator for task states

Poll Function Calls

Fast path: Inline poll when future is ready
Slow path: Runtime reschedules
Typical: 1-3 polls per I/O operation

Alternatives Considered

1. Stackful Coroutines (fibers)

Pro: Simpler state management
Con: Large memory overhead (stack per coroutine)
Con: Difficult to implement in JIT
Verdict: Not suitable for Cranelift

2. CPS (Continuation Passing Style) Transformation

Pro: Theoretically elegant
Con: Extremely complex for JIT
Con: Poor debuggability
Verdict: Academic exercise, not practical

3. Green Threads

Pro: Works well in some languages (Go, Erlang)
Con: Requires runtime scheduler in every binary
Con: Doesn't integrate with OS async I/O
Verdict: Wrong model for systems language

Recommendation

Use External Runtime + FFI Approach

Why?

✅ Simplest integration with Cranelift (just generate poll functions)
✅ Leverage existing runtimes (Tokio/async-std)
✅ Matches Rust's model (proven to work)
✅ Easy to test (runtime is separate)
✅ Flexible (swap runtimes without recompiling)

Implementation Effort

Minimal: 500-800 lines
Time: 2-3 weeks for experienced developer
Risk: Low (proven approach)

Next Steps

Create zyntax-async-runtime crate with C API
Implement basic Tokio wrapper
Add Cranelift codegen for state structs
Generate poll() functions
Test with simple async/await examples
Document and optimize

Questions?

Q: Can we avoid the external dependency?
- A: Yes, build minimal custom runtime (~500 lines), but loses ecosystem integration
Q: What about WASM?
- A: Same approach works - WASM has async proposals that match this model
Q: Performance vs hand-written?
- A: Within 1-5% of hand-written Rust async code (FFI overhead is negligible)
Q: Can we do better than Tokio?
- A: For general case, no. For specific workloads, custom runtime could be 10-20% faster.

Decision

Recommended: External Runtime (Tokio-based) + FFI

This matches your experience that "async state was really hard" in Cranelift - we avoid the hard part by delegating to a proven runtime!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Async Runtime Design for Zyntax

Current Approach (Problematic)

Proposed Solution: External Runtime + FFI

Architecture: Hybrid Model

Key Design Decisions

1. Compile to poll() Functions, NOT State Machines

2. External Runtime via Shared Library

3. Cranelift Code Generation Strategy

Comparison of Approaches

Why External Runtime Wins

Implementation Plan

Phase 1: Minimal Async Support (500-800 lines)

Phase 2: Enhanced Features (300-500 lines)

Example: End-to-End Flow

Performance Considerations

FFI Overhead

State Struct Allocation

Poll Function Calls

Alternatives Considered

1. Stackful Coroutines (fibers)

2. CPS (Continuation Passing Style) Transformation

3. Green Threads

Recommendation

Why?

Implementation Effort

Next Steps

Questions?

Decision

Uh oh!

FilesExpand file tree

ASYNC_RUNTIME_DESIGN.md

Latest commit

History

ASYNC_RUNTIME_DESIGN.md

File metadata and controls

Async Runtime Design for Zyntax

Current Approach (Problematic)

Proposed Solution: External Runtime + FFI

Architecture: Hybrid Model

Key Design Decisions

1. Compile to poll() Functions, NOT State Machines

2. External Runtime via Shared Library

3. Cranelift Code Generation Strategy

Comparison of Approaches

Why External Runtime Wins

Implementation Plan

Phase 1: Minimal Async Support (500-800 lines)

Phase 2: Enhanced Features (300-500 lines)

Example: End-to-End Flow

Performance Considerations

FFI Overhead

State Struct Allocation

Poll Function Calls

Alternatives Considered

1. Stackful Coroutines (fibers)

2. CPS (Continuation Passing Style) Transformation

3. Green Threads

Recommendation

Why?

Implementation Effort

Next Steps

Questions?

Decision