@@ -6,13 +6,12 @@ nav_order: 3
66---
77
88# Soundness
9-
10- ## Summary
9+ {: .no_toc }
1110
1211ProofFrog has no formal soundness proof. Each transformation in the canonicalization
13- pipeline is written by hand and intended to be semantics-preserving, but correctness has
14- not been mechanically verified. Treat ProofFrog as a * proof-finding aid * , not a
15- * proof certifier * .
12+ pipeline is written by hand ( and possibly with the assistance of coding agents) and
13+ intended to be semantics-preserving, but correctness and soundness have not been
14+ formally verified.
1615
1716When ProofFrog validates a game hop, that is evidence the hop is correct -- not a
1817machine-checked certificate. Serious use requires additional external checking: manual
@@ -21,9 +20,15 @@ preference for small hops that can be individually inspected. A user treating Pr
2120as a rubber stamp is misusing the tool. The validation means the engine did not find a
2221counterexample using its current pipeline; it does not mean no counterexample exists.
2322
23+ It is a future goal to link ProofFrog with existing formal verification tools, such as
24+ EasyCrypt, to improve assurance.
25+
26+ - TOC
27+ {: toc }
28+
2429---
2530
26- ## The claim
31+ ## ProofFrog's checking methodology
2732
2833ProofFrog's engine attempts to verify three kinds of proof steps.
2934
@@ -34,10 +39,10 @@ adversaries, the probability distribution over adversary outputs is identical. F
3439Pr[A interacting with Game1 outputs 1] = Pr[A interacting with Game2 outputs 1]
3540```
3641
37- This definition is stated precisely on the [ Execution Model] ({% link
42+ Adversary interaction with games is described in more detail on the [ Execution Model] ({% link
3843manual/language-reference/execution-model.md %}) page. ProofFrog attempts to verify
3944interchangeability by canonicalizing both games and comparing their canonical forms. The
40- canonicalization pipeline is a deterministic, hand-written rewrite sequence: inlining,
45+ canonicalization pipeline is a deterministic rewrite sequence: inlining,
4146algebraic simplification, dead code elimination, sampling normalization, and others. If
4247the canonical forms are structurally identical (up to variable renaming), the engine
4348reports the hop valid. When the canonical forms differ only in the conditions of ` if `
@@ -57,22 +62,23 @@ proof.
5762
5863---
5964
60- ## What is in the trust base
65+ ## The trust base
6166
6267Every component listed below can harbor a bug that causes the engine to validate an
6368invalid hop.
6469
6570** The parser** (` proof_frog/frog_parser.py ` and the ANTLR grammars under
6671` proof_frog/parsing/ ` ). A parser bug could miscategorize a construct and feed the wrong
67- AST into the engine. The grammars are not formally specified; their correctness relative
68- to the intended language semantics is not proved.
72+ AST into the engine. The grammars are not formally specified beyond the ANTLR grammar
73+ files; their correctness relative to the intended language semantics is not proved.
6974
7075** The type checker and semantic analysis** (` proof_frog/semantic_analysis.py ` ). A
71- semantic-analysis bug could allow a malformed program through, or could annotate a well-
72- formed program with incorrect type information that downstream transforms rely on.
76+ semantic-analysis bug could allow a malformed program through, or could annotate a
77+ well- formed program with incorrect type information that downstream transforms rely on.
7378
7479** The transformation pipeline** (all of ` proof_frog/transforms/ ` ). Each transform is a
75- hand-written Python function intended to be semantics-preserving. This is the largest
80+ Python function, written by hand or with the help of coding agents, that is intended to
81+ be semantics-preserving. This is the largest
7682component of the trust base and the most likely source of soundness failures. A transform
7783that is almost correct -- one that is semantics-preserving in 999 out of 1000 inputs and
7884wrong on the 1000th -- could validate an incorrect hop without any diagnostic signal. The
@@ -92,26 +98,20 @@ is an external dependency.
9298** The Python runtime.** Interpreter bugs, floating-point behavior, dictionary ordering
9399semantics, and similar runtime properties are all implicitly trusted.
94100
95- Be honest about the size of this trust base. EasyCrypt and CryptoVerif also have trust
96- bases, but those tools have pen-and-paper formalizations of their program logics, with
97- meta-theoretic soundness arguments for those logics. ProofFrog has neither a
98- formalization nor a meta-theoretic soundness argument. The published ProofFrog eprint
99- explicitly notes that it "does not provide any formal proofs of correctness for the
100- transformations ProofFrog uses or for the correctness of the engine's implementation."
101-
102101---
103102
104103## What is NOT claimed
105104
106- The following are things a reader might reasonably assume that ProofFrog does not claim.
105+ The following are things that a user seeking machine-checked proofs of cryptographic
106+ arguments might hope for, but ProofFrog does not claim to provide.
107107
108108** Soundness of individual transforms.** No individual transform in
109109` proof_frog/transforms/ ` is proved correct in isolation or in composition. The transforms
110110are tested, not verified.
111111
112112** Completeness of the engine.** Some interchangeable games will fail to canonicalize to
113113the same form. The engine is incomplete: it cannot find all valid interchangeability
114- relationships. Failures are capability limitations, not soundness issues -- the engine
114+ relationships. Such failures are capability limitations, not soundness issues -- the engine
115115does not accept invalid hops just because it cannot verify valid ones. See the
116116[ Limitations] ({% link manual/limitations.md %}) page for a catalogue of known gaps.
117117
@@ -137,66 +137,83 @@ not formally established.
137137
138138---
139139
140- ## Mitigations a careful user can apply
141-
142- These practices reduce the trust load without eliminating it.
143-
144- ** Keep hops small.** The smaller a hop, the fewer transforms fire and the easier it is
145- to manually inspect what changed. A hop that inlines one function and cancels one XOR is
146- straightforwardly checkable by hand. A hop that fires fifteen transforms across a complex
147- game body is not. When in doubt, split a large hop into two or three smaller ones.
148-
149- ** Inspect canonical forms directly.** The ` step-detail ` and ` canonicalization-trace `
150- CLI commands expose the canonical form of a specific game step and the sequence of
151- intermediate rewrites the pipeline applied. ` step-after-transform ` shows the game AST
152- after all transforms up to a named pass. Reviewing the canonical form before and after
153- a hop gives you direct evidence of what the engine is claiming. (` proof_frog prove -v `
154- adds game-level output to a proof run; ` -vv ` adds per-transform tracing.)
140+ ## Writing and interpreting ProofFrog proofs in light of lack of formal soundness guarantees
141+
142+ Because ProofFrog lacks a soundness proof and has a large trust base, validation using
143+ ProofFrog is evidence but not a guarantee, and responsibility for
144+ believing a proof sits with the people writing and reading it. The practices below don't
145+ eliminate that responsibility, but they give a proof author and a proof reviewer concrete
146+ ways to form a judgement.
147+
148+ ** When writing a proof, keep hops small.** The smaller a hop, the fewer transforms fire
149+ and the easier it is to manually inspect what changed. When a hop applies a large number
150+ of transforms, it may be helpful to split it into a few smaller hops -- both for your
151+ own assurance while constructing the proof and to give later readers a sequence of claims
152+ they can each individually check.
153+
154+ ** When writing a proof, explicit intermediate games help understanding.** Writing
155+ out each intermediate game explicitly, rather than relying on ProofFrog to accept a large
156+ implicit hop, forces you to state precisely what you think the intermediate game is.
157+ That statement then becomes a piece of the proof a reader -- or future you -- can check
158+ independently, without having to trust that the engine's implicit reasoning matched your
159+ own.
160+
161+ ** When reviewing a proof, inspect canonical forms directly.** When you are unsure
162+ whether the engine is validating a hop for the reason you think it is, the ` step-detail `
163+ and ` canonicalization-trace ` CLI commands (or the hop inspector in the web editor)
164+ expose the canonical form of a specific game
165+ step and the sequence of intermediate rewrites the pipeline applied. ` step-after-transform `
166+ shows the game AST after all transforms up to a named pass. Reviewing the canonical form
167+ before and after a hop gives you direct evidence of what the engine is claiming, which
168+ you can then reconcile with what the proof author intended. (` proof_frog prove -v ` adds
169+ game-level output to a proof run; ` -vv ` adds per-transform tracing.)
170+
171+ ** Cross-check against pen-and-paper arguments.** If a textbook or a previously
172+ published pen-and-paper proof agrees with what ProofFrog accepts, the two checks
173+ reinforce each other: an automated check and a manual one, each covering errors the
174+ other might miss. If they disagree, one of them is wrong and the divergence itself is
175+ valuable information. When reviewing someone else's ProofFrog proof, try to reconstruct
176+ the argument on paper for at least the hops you find most load-bearing.
177+
178+ ** Consider validating tricky ProofFrog hops in another tool.** If a ProofFrog proof
179+ introduces a bespoke assumption to bridge a tricky hop, or leaves the hop unverified,
180+ it may be possible to formalize that specific hop in another tool, such as EasyCrypt.
155181
156- ** Cross-check against pen-and-paper proofs.** If a textbook says the proof works and
157- ProofFrog says the proof works, the two checks reinforce each other. If they disagree,
158- one of them is wrong and you need to determine which. The agreement of two independent
159- checks -- one automated, one manual -- is stronger evidence than either alone.
182+ ---
160183
161- ** Prefer named intermediate games over implicit games.** Writing out each intermediate
162- game explicitly, rather than relying on ProofFrog to accept a large implicit hop, forces
163- you to state precisely what you think the intermediate game is. That statement is then
164- independently checkable.
184+ ## Soundness issues
165185
166- ** Report suspicious validations.** If you suspect the engine has accepted an incorrect
167- hop -- for example, if ProofFrog validates a hop that you believe is mathematically
168- invalid -- file an issue at
169- [ https://github.com/ProofFrog/ProofFrog/issues ] ( https://github.com/ProofFrog/ProofFrog/issues ) .
186+ ** Report suspicious validations.** If while writing or reviewing a proof you suspect
187+ the engine has accepted an incorrect hop -- for example, if ProofFrog validates a hop
188+ that you believe is mathematically invalid -- file an issue on the
189+ [ ProofFrog issue tracker] ( https://github.com/ProofFrog/ProofFrog/issues ) and apply the
190+ ` soundness ` label.
170191Include the proof file, the specific hop, and your analysis of why you think the hop is
171192wrong. A validated hop that is actually invalid is a soundness bug and should be treated
172193as high priority.
173194
195+ Issues suspected of being soundness concerns are tagged with the
196+ [ ` soundness ` ] ( https://github.com/ProofFrog/ProofFrog/issues?q=label%3Asoundness ) label
197+ on the issue tracker. These are distinct from
198+ [ capability limitations] ({% link manual/limitations.md %}), where the engine
199+ correctly rejects a valid hop because its current pipeline cannot show the hop is in
200+ fact valid.
201+
174202---
175203
176- ## Comparison framing
204+ ## Comparison with other tools
177205
178- EasyCrypt and CryptoVerif have deeper trust bases: their program logics and tactic
206+ Other more established formal verification tools for cryptography like EasyCrypt and
207+ CryptoVerif have stronger trust bases: their program logics and tactic
179208languages have pen-and-paper formalizations with meta-theoretic soundness arguments for
180- those logics. ProofFrog makes no such claims. This is a genuine gap,
181- not a rhetorical understatement. For high-assurance cryptographic work -- standards,
209+ those logics. ProofFrog lacks such soundness arguments. For high-assurance cryptographic work -- standards,
182210deployed protocols, production code -- the more established tools remain the appropriate
183- choice. ProofFrog's niche is earlier in the pipeline : exploration, education, and
211+ choice. ProofFrog's may be suitable for more preliminary work : exploration, education, and
184212iterative proof development where the ease of writing and checking a game-hopping proof
185213is worth the weaker soundness guarantee.
186214
187215One concrete direction that could narrow this gap is an export functionality that encodes
188216ProofFrog's automated transformations into the syntax of a more established engine such
189217as EasyCrypt, so that individual hops could be discharged by a tool with a stronger
190- logical foundation. This is identified as future work in the published paper but is not
191- yet implemented.
192-
193- ---
194-
195- ## Known soundness issues
196-
197- There is currently no dedicated ` soundness ` label on the issue tracker. If you file an
198- issue that you believe is a soundness concern -- as opposed to a capability limitation
199- where the engine correctly rejects a valid hop -- please mention it explicitly in the
200- issue body and we will tag it accordingly. The distinction matters: a capability failure
201- is expected and documented; a soundness failure means the engine has accepted something
202- it should not have, which is a qualitatively different problem.
218+ logical foundation. While this is not yet implemented, we hope to explore this direction
219+ in the future.
0 commit comments