Manual edits

dstebila · dstebila · commit a304ee9ea292 · 2026-04-11T14:26:39.000-04:00
diff --git a/researchers/soundness.md b/researchers/soundness.md
@@ -6,13 +6,12 @@ nav_order: 3
 ---
 
 # Soundness
-
-## Summary
+{: .no_toc }
 
 ProofFrog has no formal soundness proof. Each transformation in the canonicalization
-pipeline is written by hand and intended to be semantics-preserving, but correctness has
-not been mechanically verified. Treat ProofFrog as a *proof-finding aid*, not a
-*proof certifier*.
+pipeline is written by hand (and possibly with the assistance of coding agents) and 
+intended to be semantics-preserving, but correctness and soundness have not been 
+formally verified. 
 
 When ProofFrog validates a game hop, that is evidence the hop is correct -- not a
 machine-checked certificate. Serious use requires additional external checking: manual
@@ -21,9 +20,15 @@ preference for small hops that can be individually inspected. A user treating Pr
 as a rubber stamp is misusing the tool. The validation means the engine did not find a
 counterexample using its current pipeline; it does not mean no counterexample exists.
 
+It is a future goal to link ProofFrog with existing formal verification tools, such as
+EasyCrypt, to improve assurance.
+
+- TOC
+{:toc}
+
 ---
 
-## The claim
+## ProofFrog's checking methodology
 
 ProofFrog's engine attempts to verify three kinds of proof steps.
 
@@ -34,10 +39,10 @@ adversaries, the probability distribution over adversary outputs is identical. F
 Pr[A interacting with Game1 outputs 1] = Pr[A interacting with Game2 outputs 1]
 ```
 
-This definition is stated precisely on the [Execution Model]({% link
+Adversary interaction with games is described in more detail on the [Execution Model]({% link
 manual/language-reference/execution-model.md %}) page. ProofFrog attempts to verify
 interchangeability by canonicalizing both games and comparing their canonical forms. The
-canonicalization pipeline is a deterministic, hand-written rewrite sequence: inlining,
+canonicalization pipeline is a deterministic rewrite sequence: inlining,
 algebraic simplification, dead code elimination, sampling normalization, and others. If
 the canonical forms are structurally identical (up to variable renaming), the engine
 reports the hop valid. When the canonical forms differ only in the conditions of `if`
@@ -57,22 +62,23 @@ proof.
 
 ---
 
-## What is in the trust base
+## The trust base
 
 Every component listed below can harbor a bug that causes the engine to validate an
 invalid hop.
 
 **The parser** (`proof_frog/frog_parser.py` and the ANTLR grammars under
 `proof_frog/parsing/`). A parser bug could miscategorize a construct and feed the wrong
-AST into the engine. The grammars are not formally specified; their correctness relative
-to the intended language semantics is not proved.
+AST into the engine. The grammars are not formally specified beyond the ANTLR grammar 
+files; their correctness relative to the intended language semantics is not proved.
 
 **The type checker and semantic analysis** (`proof_frog/semantic_analysis.py`). A
-semantic-analysis bug could allow a malformed program through, or could annotate a well-
-formed program with incorrect type information that downstream transforms rely on.
+semantic-analysis bug could allow a malformed program through, or could annotate a 
+well-formed program with incorrect type information that downstream transforms rely on.
 
 **The transformation pipeline** (all of `proof_frog/transforms/`). Each transform is a
-hand-written Python function intended to be semantics-preserving. This is the largest
+Python function, written by hand or with the help of coding agents, that is intended to 
+be semantics-preserving. This is the largest
 component of the trust base and the most likely source of soundness failures. A transform
 that is almost correct -- one that is semantics-preserving in 999 out of 1000 inputs and
 wrong on the 1000th -- could validate an incorrect hop without any diagnostic signal. The
@@ -92,26 +98,20 @@ is an external dependency.
 **The Python runtime.** Interpreter bugs, floating-point behavior, dictionary ordering
 semantics, and similar runtime properties are all implicitly trusted.
 
-Be honest about the size of this trust base. EasyCrypt and CryptoVerif also have trust
-bases, but those tools have pen-and-paper formalizations of their program logics, with
-meta-theoretic soundness arguments for those logics. ProofFrog has neither a
-formalization nor a meta-theoretic soundness argument. The published ProofFrog eprint
-explicitly notes that it "does not provide any formal proofs of correctness for the
-transformations ProofFrog uses or for the correctness of the engine's implementation."
-
 ---
 
 ## What is NOT claimed
 
-The following are things a reader might reasonably assume that ProofFrog does not claim.
+The following are things that a user seeking machine-checked proofs of cryptographic 
+arguments might hope for, but ProofFrog does not claim to provide.
 
 **Soundness of individual transforms.** No individual transform in
 `proof_frog/transforms/` is proved correct in isolation or in composition. The transforms
 are tested, not verified.
 
 **Completeness of the engine.** Some interchangeable games will fail to canonicalize to
 the same form. The engine is incomplete: it cannot find all valid interchangeability
-relationships. Failures are capability limitations, not soundness issues -- the engine
+relationships. Such failures are capability limitations, not soundness issues -- the engine
 does not accept invalid hops just because it cannot verify valid ones. See the
 [Limitations]({% link manual/limitations.md %}) page for a catalogue of known gaps.
 
@@ -137,66 +137,83 @@ not formally established.
 
 ---
 
-## Mitigations a careful user can apply
-
-These practices reduce the trust load without eliminating it.
-
-**Keep hops small.** The smaller a hop, the fewer transforms fire and the easier it is
-to manually inspect what changed. A hop that inlines one function and cancels one XOR is
-straightforwardly checkable by hand. A hop that fires fifteen transforms across a complex
-game body is not. When in doubt, split a large hop into two or three smaller ones.
-
-**Inspect canonical forms directly.** The `step-detail` and `canonicalization-trace`
-CLI commands expose the canonical form of a specific game step and the sequence of
-intermediate rewrites the pipeline applied. `step-after-transform` shows the game AST
-after all transforms up to a named pass. Reviewing the canonical form before and after
-a hop gives you direct evidence of what the engine is claiming. (`proof_frog prove -v`
-adds game-level output to a proof run; `-vv` adds per-transform tracing.)
+## Writing and interpreting ProofFrog proofs in light of lack of formal soundness guarantees
+
+Because ProofFrog lacks a soundness proof and has a large trust base, validation using 
+ProofFrog is evidence but not a guarantee, and responsibility for
+believing a proof sits with the people writing and reading it. The practices below don't
+eliminate that responsibility, but they give a proof author and a proof reviewer concrete
+ways to form a judgement.
+
+**When writing a proof, keep hops small.** The smaller a hop, the fewer transforms fire
+and the easier it is to manually inspect what changed. When a hop applies a large number
+of transforms, it may be helpful to split it into a few smaller hops -- both for your 
+own assurance while constructing the proof and to give later readers a sequence of claims
+they can each individually check.
+
+**When writing a proof, explicit intermediate games help understanding.** Writing
+out each intermediate game explicitly, rather than relying on ProofFrog to accept a large
+implicit hop, forces you to state precisely what you think the intermediate game is.
+That statement then becomes a piece of the proof a reader -- or future you -- can check
+independently, without having to trust that the engine's implicit reasoning matched your
+own.
+
+**When reviewing a proof, inspect canonical forms directly.** When you are unsure
+whether the engine is validating a hop for the reason you think it is, the `step-detail`
+and `canonicalization-trace` CLI commands (or the hop inspector in the web editor) 
+expose the canonical form of a specific game
+step and the sequence of intermediate rewrites the pipeline applied. `step-after-transform`
+shows the game AST after all transforms up to a named pass. Reviewing the canonical form
+before and after a hop gives you direct evidence of what the engine is claiming, which
+you can then reconcile with what the proof author intended. (`proof_frog prove -v` adds
+game-level output to a proof run; `-vv` adds per-transform tracing.)
+
+**Cross-check against pen-and-paper arguments.** If a textbook or a previously
+published pen-and-paper proof agrees with what ProofFrog accepts, the two checks
+reinforce each other: an automated check and a manual one, each covering errors the
+other might miss. If they disagree, one of them is wrong and the divergence itself is
+valuable information. When reviewing someone else's ProofFrog proof, try to reconstruct
+the argument on paper for at least the hops you find most load-bearing.
+
+**Consider validating tricky ProofFrog hops in another tool.** If a ProofFrog proof
+introduces a bespoke assumption to bridge a tricky hop, or leaves the hop unverified,
+it may be possible to formalize that specific hop in another tool, such as EasyCrypt.
 
-**Cross-check against pen-and-paper proofs.** If a textbook says the proof works and
-ProofFrog says the proof works, the two checks reinforce each other. If they disagree,
-one of them is wrong and you need to determine which. The agreement of two independent
-checks -- one automated, one manual -- is stronger evidence than either alone.
+---
 
-**Prefer named intermediate games over implicit games.** Writing out each intermediate
-game explicitly, rather than relying on ProofFrog to accept a large implicit hop, forces
-you to state precisely what you think the intermediate game is. That statement is then
-independently checkable.
+## Soundness issues
 
-**Report suspicious validations.** If you suspect the engine has accepted an incorrect
-hop -- for example, if ProofFrog validates a hop that you believe is mathematically
-invalid -- file an issue at
-[https://github.com/ProofFrog/ProofFrog/issues](https://github.com/ProofFrog/ProofFrog/issues).
+**Report suspicious validations.** If while writing or reviewing a proof you suspect
+the engine has accepted an incorrect hop -- for example, if ProofFrog validates a hop
+that you believe is mathematically invalid -- file an issue on the
+[ProofFrog issue tracker](https://github.com/ProofFrog/ProofFrog/issues) and apply the
+`soundness` label.
 Include the proof file, the specific hop, and your analysis of why you think the hop is
 wrong. A validated hop that is actually invalid is a soundness bug and should be treated
 as high priority.
 
+Issues suspected of being soundness concerns are tagged with the
+[`soundness`](https://github.com/ProofFrog/ProofFrog/issues?q=label%3Asoundness) label
+on the issue tracker. These are distinct from 
+[capability limitations]({% link manual/limitations.md %}), where the engine
+correctly rejects a valid hop because its current pipeline cannot show the hop is in
+fact valid.
+
 ---
 
-## Comparison framing
+## Comparison with other tools
 
-EasyCrypt and CryptoVerif have deeper trust bases: their program logics and tactic
+Other more established formal verification tools for cryptography like EasyCrypt and 
+CryptoVerif have stronger trust bases: their program logics and tactic
 languages have pen-and-paper formalizations with meta-theoretic soundness arguments for
-those logics. ProofFrog makes no such claims. This is a genuine gap,
-not a rhetorical understatement. For high-assurance cryptographic work -- standards,
+those logics. ProofFrog lacks such soundness arguments. For high-assurance cryptographic work -- standards,
 deployed protocols, production code -- the more established tools remain the appropriate
-choice. ProofFrog's niche is earlier in the pipeline: exploration, education, and
+choice. ProofFrog's may be suitable for more preliminary work: exploration, education, and
 iterative proof development where the ease of writing and checking a game-hopping proof
 is worth the weaker soundness guarantee.
 
 One concrete direction that could narrow this gap is an export functionality that encodes
 ProofFrog's automated transformations into the syntax of a more established engine such
 as EasyCrypt, so that individual hops could be discharged by a tool with a stronger
-logical foundation. This is identified as future work in the published paper but is not
-yet implemented.
-
----
-
-## Known soundness issues
-
-There is currently no dedicated `soundness` label on the issue tracker. If you file an
-issue that you believe is a soundness concern -- as opposed to a capability limitation
-where the engine correctly rejects a valid hop -- please mention it explicitly in the
-issue body and we will tag it accordingly. The distinction matters: a capability failure
-is expected and documented; a soundness failure means the engine has accepted something
-it should not have, which is a qualitatively different problem.
+logical foundation. While this is not yet implemented, we hope to explore this direction
+in the future.