Problem: Implementing async/await in Cranelift JIT is complex due to state machine management.
Key Question: Should we build the reactor/executor in Cranelift, or call external runtime via syscalls?
The current async_support.rs attempts to:
- Transform async functions into HIR state machines with switch statements
- Compile these directly to Cranelift
Issues with this approach:
- ❌ Cranelift has limited control flow (switch statements are awkward)
- ❌ State management is complex in JIT context
- ❌ No clear way to suspend/resume execution
- ❌ Waker/poll mechanism needs runtime support
- ❌ Hard to integrate with existing async ecosystems (Tokio, async-std)
Based on your experience, here's a better approach:
┌─────────────────────────────────────────┐
│ Zyntax Async Function (Source) │
│ async fn fetch() -> String │
└──────────────┬──────────────────────────┘
│ Compile ↓
┌──────────────┴──────────────────────────┐
│ HIR Transformation │
│ 1. Extract await points │
│ 2. Create state struct │
│ 3. Generate poll() function │
└──────────────┬──────────────────────────┘
│ Codegen ↓
┌──────────────┴──────────────────────────┐
│ Cranelift JIT Code │
│ - State struct (in memory) │
│ - poll(state*) -> Poll<T> │
│ - FFI calls to runtime │
└──────────────┬──────────────────────────┘
│ Runtime ↓
┌──────────────┴──────────────────────────┐
│ External Async Runtime (C/Rust) │
│ - Tokio, async-std, or custom │
│ - Executor/reactor │
│ - Waker mechanism │
│ - I/O polling (epoll/kqueue) │
└─────────────────────────────────────────┘
Instead of complex switch-based state machines:
// Source
async fn fetch_data(url: String) -> String {
let response = http_get(url).await; // Await point 1
let body = read_body(response).await; // Await point 2
body
}
// Compile to C-compatible poll function
struct FetchDataState {
state_id: u32,
url: String,
response: Option<Response>,
body: Option<String>,
}
// Cranelift generates this:
extern "C" fn fetch_data_poll(state: *mut FetchDataState, waker: *const Waker) -> Poll<String> {
unsafe {
match (*state).state_id {
0 => {
// Initial state: start http_get
let fut = http_get((*state).url.clone());
// Store future somehow... or just call poll immediately
(*state).state_id = 1;
Poll::Pending
}
1 => {
// After first await
if let Some(response) = (*state).response.take() {
let fut = read_body(response);
(*state).state_id = 2;
Poll::Pending
} else {
Poll::Pending
}
}
2 => {
// After second await
if let Some(body) = (*state).body.take() {
Poll::Ready(body)
} else {
Poll::Pending
}
}
_ => unreachable!()
}
}
}Advantages:
- ✅ Simple C ABI interface
- ✅ Runtime handles all complexity
- ✅ Cranelift only needs basic control flow
- ✅ State is just a struct (easy in Cranelift)
Create a minimal async runtime as a C library:
// zyntax_async_runtime.h
typedef enum {
POLL_READY,
POLL_PENDING
} PollStatus;
typedef struct {
void* data;
void (*wake)(void*);
} Waker;
typedef void* FutureState;
typedef PollStatus (*PollFn)(FutureState, Waker*);
// Runtime functions (implemented in Rust or C++)
void* zyntax_runtime_create(void);
void zyntax_runtime_destroy(void* runtime);
void* zyntax_runtime_spawn(void* runtime, FutureState state, PollFn poll);
void zyntax_runtime_block_on(void* runtime, void* task);
void zyntax_runtime_run(void* runtime);Implementation Options:
Option A: Rust-based Runtime (Recommended)
// zyntax-async-runtime crate (separate from compiler)
use tokio::runtime::Runtime;
use std::ffi::c_void;
use std::ptr;
#[no_mangle]
pub extern "C" fn zyntax_runtime_create() -> *mut c_void {
let runtime = Box::new(Runtime::new().unwrap());
Box::into_raw(runtime) as *mut c_void
}
#[no_mangle]
pub extern "C" fn zyntax_runtime_spawn(
runtime: *mut c_void,
state: *mut c_void,
poll_fn: extern "C" fn(*mut c_void, *const Waker) -> PollStatus
) -> *mut c_void {
// Wrap the poll function in a Rust Future
// Spawn it on the Tokio runtime
// Return task handle
}
// etc.Option B: Minimal Custom Runtime (More control)
// Lightweight executor with epoll/kqueue
// No external dependencies
// ~500 lines of code
pub struct MinimalRuntime {
ready_queue: VecDeque<TaskHandle>,
io_poller: IoPoller, // epoll on Linux, kqueue on macOS
tasks: HashMap<TaskId, Task>,
}The Cranelift backend needs to generate:
-
State Struct
fn generate_async_state_struct(&mut self, state_machine: &AsyncStateMachine) -> CompilerResult<()> { // Allocate struct with: // - u32 state_id (which await point we're at) // - Captured variables // - Intermediate results between awaits // This is straightforward - just struct allocation }
-
Poll Function
fn generate_poll_function(&mut self, state_machine: &AsyncStateMachine) -> CompilerResult<Value> { // Generate a function with signature: // extern "C" fn(state: *mut State, waker: *const Waker) -> Poll<T> // Load state->state_id // Simple if-else chain or jump table (NOT switch statement) // Each state is a separate basic block // Call nested futures' poll functions // Much simpler than trying to do full state machine in HIR }
-
Runtime FFI Calls
fn generate_spawn_call(&mut self, future: Value) -> CompilerResult<Value> { // Import zyntax_runtime_spawn // Pass state pointer and poll function pointer // Return task handle // Just a normal FFI call - Cranelift handles this well }
| Approach | Complexity | Performance | Integration | Maintainability |
|---|---|---|---|---|
| HIR State Machine | ❌ Poor | ❌ Hard | ||
| External Runtime + FFI | ✅ Low | ✅ High | ✅ Excellent | ✅ Easy |
| Built-in Reactor | ❌ Extreme | ✅ Highest | ❌ Very Hard |
-
Leverage Existing Work
- Tokio is battle-tested
- async-std is proven
- Don't reinvent the wheel
-
Simpler Cranelift Integration
- Just generate poll() functions
- FFI calls are straightforward
- No complex control flow needed
-
Flexibility
- Users can choose runtime (Tokio/async-std/custom)
- Can swap runtimes without recompiling
- Testing is easier
-
Performance
- FFI overhead is minimal (nanoseconds)
- Runtime is optimized for async I/O
- JIT code is still fast
Goal: Get basic async/await working with external runtime
-
Create Runtime C API (150 lines)
- Header file with basic runtime interface
- Poll/Ready/Pending types
- Task spawning
-
Build Minimal Rust Runtime (300 lines)
- Tokio-based or custom
- C API wrapper
- Shared library (.so/.dylib/.dll)
-
Cranelift State Struct Generation (150 lines)
- Allocate state struct
- Store captured variables
- Track current await point
-
Cranelift Poll Function Generation (200 lines)
- Generate poll() function
- Simple state dispatch (if-else or jump table)
- Call nested futures
-
FFI Glue (100 lines)
- Import runtime functions
- Generate spawn/block_on calls
- Handle task results
- Async Closures
- Async Iterators/Streams
- Cancellation Support
- Multiple Runtime Support (Tokio/async-std/custom)
// 1. Source code
async fn fetch_user(id: u64) -> User {
let response = http_get(format!("/users/{}", id)).await;
parse_json(response).await
}
// 2. Compiler generates HIR with async metadata
// AsyncStateMachine {
// states: [
// State0: { call http_get, await },
// State1: { call parse_json, await },
// State2: { return result }
// ],
// captures: [id: u64],
// }
// 3. Cranelift generates:
struct FetchUserState {
state_id: u32,
id: u64,
response: Option<Response>,
}
extern "C" fn fetch_user_poll(state: *mut FetchUserState, waker: *const Waker) -> Poll<User> {
// ... state machine logic
}
// 4. Runtime wraps this:
let state = Box::new(FetchUserState { state_id: 0, id: 123, response: None });
let task = runtime.spawn(state, fetch_user_poll);
let user = runtime.block_on(task);- Cost: ~5-10ns per call (negligible)
- Compared to: I/O latency (microseconds to milliseconds)
- Verdict: Not a concern for async code
- Stack: For sync calling async (when possible)
- Heap: For spawned tasks (required anyway)
- Optimization: Arena allocator for task states
- Fast path: Inline poll when future is ready
- Slow path: Runtime reschedules
- Typical: 1-3 polls per I/O operation
- Pro: Simpler state management
- Con: Large memory overhead (stack per coroutine)
- Con: Difficult to implement in JIT
- Verdict: Not suitable for Cranelift
- Pro: Theoretically elegant
- Con: Extremely complex for JIT
- Con: Poor debuggability
- Verdict: Academic exercise, not practical
- Pro: Works well in some languages (Go, Erlang)
- Con: Requires runtime scheduler in every binary
- Con: Doesn't integrate with OS async I/O
- Verdict: Wrong model for systems language
Use External Runtime + FFI Approach
- ✅ Simplest integration with Cranelift (just generate poll functions)
- ✅ Leverage existing runtimes (Tokio/async-std)
- ✅ Matches Rust's model (proven to work)
- ✅ Easy to test (runtime is separate)
- ✅ Flexible (swap runtimes without recompiling)
- Minimal: 500-800 lines
- Time: 2-3 weeks for experienced developer
- Risk: Low (proven approach)
- Create
zyntax-async-runtimecrate with C API - Implement basic Tokio wrapper
- Add Cranelift codegen for state structs
- Generate poll() functions
- Test with simple async/await examples
- Document and optimize
-
Q: Can we avoid the external dependency?
- A: Yes, build minimal custom runtime (~500 lines), but loses ecosystem integration
-
Q: What about WASM?
- A: Same approach works - WASM has async proposals that match this model
-
Q: Performance vs hand-written?
- A: Within 1-5% of hand-written Rust async code (FFI overhead is negligible)
-
Q: Can we do better than Tokio?
- A: For general case, no. For specific workloads, custom runtime could be 10-20% faster.
Recommended: External Runtime (Tokio-based) + FFI
This matches your experience that "async state was really hard" in Cranelift - we avoid the hard part by delegating to a proven runtime!