Skip to content

Add RSA/DH SP non-blocking support for C/Small 2048/3072/4096#10394

Open
dgarske wants to merge 1 commit intowolfSSL:masterfrom
dgarske:sp_nonblock_rsa_dh
Open

Add RSA/DH SP non-blocking support for C/Small 2048/3072/4096#10394
dgarske wants to merge 1 commit intowolfSSL:masterfrom
dgarske:sp_nonblock_rsa_dh

Conversation

@dgarske
Copy link
Copy Markdown
Member

@dgarske dgarske commented May 4, 2026

Summary

Extends WOLFSSL_SP_NONBLOCK (already covering ECC and Curve25519) to RSA and Diffie-Hellman so a bare-metal loop or async-driven TLS handshake never blocks for more than ~1 ms / 100 MHz on a single big-int op. Targets the WOLFSSL_SP_SMALL C32/C64 backend; assembly variants are unchanged.

The chunked state machine yields per Montgomery op and per inner bit-extract step (mirrors TFM's fp_exptmod_nb design). A 2048-bit RSA private op yields ~10240 times = 2 * 2048 + ~13.

What's added

  • sp_RsaPublic_<bits>_nb and sp_RsaPrivate_<bits>_nbD-only path (RSA_LOW_MEM / SP_RSA_PRIVATE_EXP_D); CRT private is unsupported in non-block mode (configure-time #error).
  • sp_DhExp_<bits>_nb and sp_ModExp_<bits>_nb — byte-buffer base for nb-friendly TLS use.
  • New WC_DH_NONBLOCK API: DhNb context + wc_DhSetNonBlock(DhKey*, DhNb*), parallel to WC_RSA_NONBLOCK / WC_ECC_NONBLOCK.
  • wc_RsaFunctionAsync / wc_DhAgree_Sync dispatch to the SP non-block state machine in their compute path. wc_AsyncSimulate already translates per-yield FP_WOULDBLOCK into WC_PENDING_E, so the TLS state machine drives completion via the standard async-event loop.
  • TLS layer per-SSL nb-context allocation in AllocKey / FreeKey and the TLS 1.3 keyShare paths, gated on ssl->devId != INVALID_DEVID (mirrors the ECC / Curve25519 hooks).

Configure

./configure --enable-rsa=nonblock --enable-dh=nonblock \
            --enable-ecc=nonblock --enable-curve25519=nonblock \
            --enable-sp=yes,nonblock \
            CPPFLAGS="-DWOLFSSL_PUBLIC_MP -DWOLFSSL_DEBUG_NONBLOCK -DRSA_LOW_MEM"

--enable-rsa=nonblock and --enable-dh=nonblock auto-enable --enable-asynccrypt --enable-asynccrypt-sw and define RSA_LOW_MEM. Build-time #error checks in rsa.h / dh.h enforce the required WOLFSSL_SP_SMALL, WOLFSSL_SP_NO_MALLOC, and !WOLFSSL_SP_FAST_MODEXP companion flags.

Tests

  • wolfcrypt/test/testwolfcrypt: RSA non-block sign: 10249 times, verify 94 times, inline verify 94 times; DH non-block agree: 1 times (drives via wc_AsyncWait); both RSA test passed! and DH test passed!.
  • make check: 5 PASS / 3 SKIP / 0 FAIL.
  • TLS 1.2 handshake (TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) verified end-to-end with examples/server + examples/client; server's RSA private signature is driven through ~10437 SP non-block yields by the SW-shim translation in wc_AsyncSimulate.

CI

.github/workflows/os-check.yml extends the existing nonblock matrix entry to add --enable-rsa=nonblock --enable-dh=nonblock, plus a companion entry that forces -DSP_WORD_SIZE=32 so both sp_c32.c and sp_c64.c are exercised on every push.

@dgarske dgarske self-assigned this May 4, 2026
Copilot AI review requested due to automatic review settings May 4, 2026 22:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Extends wolfCrypt’s non-blocking (*_NONBLOCK) support to RSA and Diffie-Hellman using the SP “small” backends, enabling async/bare-metal TLS handshakes to make incremental progress without long big-int stalls.

Changes:

  • Adds SP non-blocking RSA public/private and DH/modexp APIs for 2048/3072/4096 (C32/C64) and wires them into RSA/DH compute paths.
  • Introduces DH non-block API (DhNb, wc_DhSetNonBlock) and TLS-layer per-SSL nb-context allocation/freeing.
  • Updates build/config and CI matrix to exercise the new non-block configurations (including forced SP_WORD_SIZE=32).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
wolfssl/wolfcrypt/sp_int.h Adds fixed-size SP non-blocking context buffers for modexp/RSA/DH.
wolfssl/wolfcrypt/sp.h Declares new SP non-blocking RSA/DH entry points.
wolfssl/wolfcrypt/rsa.h Expands WC_RSA_NONBLOCK backend validation and extends RsaNb for SP context.
wolfssl/wolfcrypt/dh.h Adds WC_DH_NONBLOCK validation, DhNb, and wc_DhSetNonBlock() API.
wolfcrypt/test/test.c Adds DH non-block test path driven via wc_DhSetNonBlock() / async wait loop.
wolfcrypt/src/sp_c64.c Implements SP small non-blocking modexp + RSA/DH wrappers for 2048/3072/4096 (C64).
wolfcrypt/src/sp_c32.c Implements SP small non-blocking modexp + RSA/DH wrappers for 2048/3072/4096 (C32).
wolfcrypt/src/rsa.c Dispatches RSA non-block compute through SP non-block state machines when available.
wolfcrypt/src/dh.c Adds DH non-block context binding and routes agree() through SP non-block wrapper.
src/tls.c Allocates/binds DhNb contexts for TLS 1.3 key share paths when async SW is active.
src/internal.c Allocates/frees per-SSL RSA/DH nb contexts alongside existing key allocation flows.
configure.ac Adds --enable-rsa=nonblock / --enable-dh=nonblock and auto-enables async SW shim.
.github/workflows/os-check.yml Extends CI matrix to build/test RSA/DH non-block (including SP_WORD_SIZE=32).
Comments suppressed due to low confidence (2)

wolfcrypt/src/sp_c64.c:1

  • The BIT_INIT logic can read the wrong exponent limb (and potentially out-of-bounds) when bits is an exact multiple of the limb bit-size (e.g., bits == 61 makes i == 1 and reads e[1] even though the top bits are in e[0]). A safer derivation is to base indexing on bits - 1 (e.g., i = (bits - 1) / WORD_BITS, c = (bits - 1) % WORD_BITS + 1) or explicitly handle c == 0 by decrementing i and setting c = WORD_BITS before loading e[i]. The same pattern appears in the other *_mod_exp_*_nb implementations (sp_c64.c/sp_c32.c) and should be updated consistently.
    wolfcrypt/src/sp_c64.c:1
  • Casting &ctx->mod_exp_ctx (a concrete state struct) to sp_modexp_ctx_t* is undefined behavior in C because the pointed-to object is not actually a sp_modexp_ctx_t (even if the first byte address matches). This is especially risky under optimization/strict-aliasing rules. Prefer storing a real sp_modexp_ctx_t in the wrapper and placing the state struct inside its data buffer, or change sp_*_mod_exp_*_nb to accept a void*/byte-buffer and cast directly to the concrete ctx type without dereferencing as sp_modexp_ctx_t.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfcrypt/src/dh.c Outdated
Comment thread configure.ac Outdated
Comment thread wolfcrypt/src/rsa.c Outdated
@dgarske dgarske force-pushed the sp_nonblock_rsa_dh branch from df805fd to 2309476 Compare May 4, 2026 23:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfcrypt/src/sp_c64.c
Comment on lines +2197 to +2201
ctx->n = e[ctx->i--] << (61 - ctx->c);
ctx->state = 5;
break;
case 5: /* BIT_NEXT: refill on word boundary, peel one exponent bit */
if (ctx->c == 0) {
Comment thread wolfcrypt/src/sp_c64.c
Comment on lines +5961 to +5965
break;
}
ctx->n = e[ctx->i--];
ctx->c = 57;
}
Comment thread wolfcrypt/src/sp_c64.c
Comment on lines +9031 to +9035
ctx->c = 60;
}
ctx->y = (byte)((ctx->n >> 59) & 1);
ctx->n <<= 1;
ctx->state = 6;
Comment thread wolfcrypt/src/sp_c64.c
Comment on lines +12944 to +12948
sp_3072_mont_mul_54(ctx->t[ctx->y ^ 1], ctx->t[0], ctx->t[1],
m, ctx->mp);
ctx->state = 7;
break;
case 7: /* COPY_OUT: constant-time copy &t[y] -> t[2] */
Comment thread wolfcrypt/src/sp_c64.c
Comment on lines +16051 to +16055
break;
case 7: /* COPY_OUT: constant-time copy &t[y] -> t[2] */
XMEMCPY(ctx->t[2], (void*)(((size_t)ctx->t[0] & addr_mask[ctx->y ^ 1]) +
((size_t)ctx->t[1] & addr_mask[ctx->y])),
sizeof(sp_digit) * 70 * 2);
Comment thread wolfcrypt/src/sp_c32.c
Comment on lines +14824 to +14828
sp_4096_mont_mul_142(ctx->t[ctx->y ^ 1], ctx->t[0], ctx->t[1],
m, ctx->mp);
ctx->state = 7;
break;
case 7: /* COPY_OUT: constant-time copy &t[y] -> t[2] */
Comment thread wolfcrypt/src/sp_c32.c
Comment on lines +19014 to +19018
sizeof(sp_digit) * 162 * 2);
ctx->state = 8;
break;
case 8: /* SQR: t[2] = t[2]^2 in Montgomery form */
sp_4096_mont_sqr_162(ctx->t[2], ctx->t[2], m, ctx->mp);
Comment thread wolfcrypt/src/dh.c
Comment on lines 2061 to 2065
/* Always validate peer public key (2 <= y <= p-2) per SP 800-56A */
if (wc_DhCheckPubKey(key, otherPub, pubSz) != 0) {
WOLFSSL_MSG("wc_DhAgree wc_DhCheckPubKey failed");
return DH_CHECK_PUB_E;
}
Comment thread wolfcrypt/src/rsa.c
Comment on lines +3210 to +3214
* is available, drive the chunked state machine here. wc_AsyncSimulate
* (line "if (ret == MP_WOULDBLOCK) ret = WC_PENDING_E;" at the bottom
* of the SW switch in wolfcrypt/src/async.c) translates per-yield
* MP_WOULDBLOCK into WC_PENDING_E so the TLS / async event loop can
* drive the operation to completion. */
Comment thread wolfcrypt/src/dh.c
Comment on lines +2358 to +2361
/* Async marker takes precedence: when wc_AsyncSimulate re-enters the
* compute path, wc_DhAgree_Async dispatches to the SP nonblock wrapper
* if key->nb is attached, and per-yield MP_WOULDBLOCK is translated to
* WC_PENDING_E by wc_AsyncSimulate so the TLS event loop drives it. */
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

MemBrowse Memory Report

No memory changes detected for:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants