Skip to content

Commit a304ee9

Browse files
committed
Manual edits
1 parent 90ba281 commit a304ee9

1 file changed

Lines changed: 85 additions & 68 deletions

File tree

researchers/soundness.md

Lines changed: 85 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,12 @@ nav_order: 3
66
---
77

88
# Soundness
9-
10-
## Summary
9+
{: .no_toc }
1110

1211
ProofFrog has no formal soundness proof. Each transformation in the canonicalization
13-
pipeline is written by hand and intended to be semantics-preserving, but correctness has
14-
not been mechanically verified. Treat ProofFrog as a *proof-finding aid*, not a
15-
*proof certifier*.
12+
pipeline is written by hand (and possibly with the assistance of coding agents) and
13+
intended to be semantics-preserving, but correctness and soundness have not been
14+
formally verified.
1615

1716
When ProofFrog validates a game hop, that is evidence the hop is correct -- not a
1817
machine-checked certificate. Serious use requires additional external checking: manual
@@ -21,9 +20,15 @@ preference for small hops that can be individually inspected. A user treating Pr
2120
as a rubber stamp is misusing the tool. The validation means the engine did not find a
2221
counterexample using its current pipeline; it does not mean no counterexample exists.
2322

23+
It is a future goal to link ProofFrog with existing formal verification tools, such as
24+
EasyCrypt, to improve assurance.
25+
26+
- TOC
27+
{:toc}
28+
2429
---
2530

26-
## The claim
31+
## ProofFrog's checking methodology
2732

2833
ProofFrog's engine attempts to verify three kinds of proof steps.
2934

@@ -34,10 +39,10 @@ adversaries, the probability distribution over adversary outputs is identical. F
3439
Pr[A interacting with Game1 outputs 1] = Pr[A interacting with Game2 outputs 1]
3540
```
3641

37-
This definition is stated precisely on the [Execution Model]({% link
42+
Adversary interaction with games is described in more detail on the [Execution Model]({% link
3843
manual/language-reference/execution-model.md %}) page. ProofFrog attempts to verify
3944
interchangeability by canonicalizing both games and comparing their canonical forms. The
40-
canonicalization pipeline is a deterministic, hand-written rewrite sequence: inlining,
45+
canonicalization pipeline is a deterministic rewrite sequence: inlining,
4146
algebraic simplification, dead code elimination, sampling normalization, and others. If
4247
the canonical forms are structurally identical (up to variable renaming), the engine
4348
reports the hop valid. When the canonical forms differ only in the conditions of `if`
@@ -57,22 +62,23 @@ proof.
5762

5863
---
5964

60-
## What is in the trust base
65+
## The trust base
6166

6267
Every component listed below can harbor a bug that causes the engine to validate an
6368
invalid hop.
6469

6570
**The parser** (`proof_frog/frog_parser.py` and the ANTLR grammars under
6671
`proof_frog/parsing/`). A parser bug could miscategorize a construct and feed the wrong
67-
AST into the engine. The grammars are not formally specified; their correctness relative
68-
to the intended language semantics is not proved.
72+
AST into the engine. The grammars are not formally specified beyond the ANTLR grammar
73+
files; their correctness relative to the intended language semantics is not proved.
6974

7075
**The type checker and semantic analysis** (`proof_frog/semantic_analysis.py`). A
71-
semantic-analysis bug could allow a malformed program through, or could annotate a well-
72-
formed program with incorrect type information that downstream transforms rely on.
76+
semantic-analysis bug could allow a malformed program through, or could annotate a
77+
well-formed program with incorrect type information that downstream transforms rely on.
7378

7479
**The transformation pipeline** (all of `proof_frog/transforms/`). Each transform is a
75-
hand-written Python function intended to be semantics-preserving. This is the largest
80+
Python function, written by hand or with the help of coding agents, that is intended to
81+
be semantics-preserving. This is the largest
7682
component of the trust base and the most likely source of soundness failures. A transform
7783
that is almost correct -- one that is semantics-preserving in 999 out of 1000 inputs and
7884
wrong on the 1000th -- could validate an incorrect hop without any diagnostic signal. The
@@ -92,26 +98,20 @@ is an external dependency.
9298
**The Python runtime.** Interpreter bugs, floating-point behavior, dictionary ordering
9399
semantics, and similar runtime properties are all implicitly trusted.
94100

95-
Be honest about the size of this trust base. EasyCrypt and CryptoVerif also have trust
96-
bases, but those tools have pen-and-paper formalizations of their program logics, with
97-
meta-theoretic soundness arguments for those logics. ProofFrog has neither a
98-
formalization nor a meta-theoretic soundness argument. The published ProofFrog eprint
99-
explicitly notes that it "does not provide any formal proofs of correctness for the
100-
transformations ProofFrog uses or for the correctness of the engine's implementation."
101-
102101
---
103102

104103
## What is NOT claimed
105104

106-
The following are things a reader might reasonably assume that ProofFrog does not claim.
105+
The following are things that a user seeking machine-checked proofs of cryptographic
106+
arguments might hope for, but ProofFrog does not claim to provide.
107107

108108
**Soundness of individual transforms.** No individual transform in
109109
`proof_frog/transforms/` is proved correct in isolation or in composition. The transforms
110110
are tested, not verified.
111111

112112
**Completeness of the engine.** Some interchangeable games will fail to canonicalize to
113113
the same form. The engine is incomplete: it cannot find all valid interchangeability
114-
relationships. Failures are capability limitations, not soundness issues -- the engine
114+
relationships. Such failures are capability limitations, not soundness issues -- the engine
115115
does not accept invalid hops just because it cannot verify valid ones. See the
116116
[Limitations]({% link manual/limitations.md %}) page for a catalogue of known gaps.
117117

@@ -137,66 +137,83 @@ not formally established.
137137

138138
---
139139

140-
## Mitigations a careful user can apply
141-
142-
These practices reduce the trust load without eliminating it.
143-
144-
**Keep hops small.** The smaller a hop, the fewer transforms fire and the easier it is
145-
to manually inspect what changed. A hop that inlines one function and cancels one XOR is
146-
straightforwardly checkable by hand. A hop that fires fifteen transforms across a complex
147-
game body is not. When in doubt, split a large hop into two or three smaller ones.
148-
149-
**Inspect canonical forms directly.** The `step-detail` and `canonicalization-trace`
150-
CLI commands expose the canonical form of a specific game step and the sequence of
151-
intermediate rewrites the pipeline applied. `step-after-transform` shows the game AST
152-
after all transforms up to a named pass. Reviewing the canonical form before and after
153-
a hop gives you direct evidence of what the engine is claiming. (`proof_frog prove -v`
154-
adds game-level output to a proof run; `-vv` adds per-transform tracing.)
140+
## Writing and interpreting ProofFrog proofs in light of lack of formal soundness guarantees
141+
142+
Because ProofFrog lacks a soundness proof and has a large trust base, validation using
143+
ProofFrog is evidence but not a guarantee, and responsibility for
144+
believing a proof sits with the people writing and reading it. The practices below don't
145+
eliminate that responsibility, but they give a proof author and a proof reviewer concrete
146+
ways to form a judgement.
147+
148+
**When writing a proof, keep hops small.** The smaller a hop, the fewer transforms fire
149+
and the easier it is to manually inspect what changed. When a hop applies a large number
150+
of transforms, it may be helpful to split it into a few smaller hops -- both for your
151+
own assurance while constructing the proof and to give later readers a sequence of claims
152+
they can each individually check.
153+
154+
**When writing a proof, explicit intermediate games help understanding.** Writing
155+
out each intermediate game explicitly, rather than relying on ProofFrog to accept a large
156+
implicit hop, forces you to state precisely what you think the intermediate game is.
157+
That statement then becomes a piece of the proof a reader -- or future you -- can check
158+
independently, without having to trust that the engine's implicit reasoning matched your
159+
own.
160+
161+
**When reviewing a proof, inspect canonical forms directly.** When you are unsure
162+
whether the engine is validating a hop for the reason you think it is, the `step-detail`
163+
and `canonicalization-trace` CLI commands (or the hop inspector in the web editor)
164+
expose the canonical form of a specific game
165+
step and the sequence of intermediate rewrites the pipeline applied. `step-after-transform`
166+
shows the game AST after all transforms up to a named pass. Reviewing the canonical form
167+
before and after a hop gives you direct evidence of what the engine is claiming, which
168+
you can then reconcile with what the proof author intended. (`proof_frog prove -v` adds
169+
game-level output to a proof run; `-vv` adds per-transform tracing.)
170+
171+
**Cross-check against pen-and-paper arguments.** If a textbook or a previously
172+
published pen-and-paper proof agrees with what ProofFrog accepts, the two checks
173+
reinforce each other: an automated check and a manual one, each covering errors the
174+
other might miss. If they disagree, one of them is wrong and the divergence itself is
175+
valuable information. When reviewing someone else's ProofFrog proof, try to reconstruct
176+
the argument on paper for at least the hops you find most load-bearing.
177+
178+
**Consider validating tricky ProofFrog hops in another tool.** If a ProofFrog proof
179+
introduces a bespoke assumption to bridge a tricky hop, or leaves the hop unverified,
180+
it may be possible to formalize that specific hop in another tool, such as EasyCrypt.
155181

156-
**Cross-check against pen-and-paper proofs.** If a textbook says the proof works and
157-
ProofFrog says the proof works, the two checks reinforce each other. If they disagree,
158-
one of them is wrong and you need to determine which. The agreement of two independent
159-
checks -- one automated, one manual -- is stronger evidence than either alone.
182+
---
160183

161-
**Prefer named intermediate games over implicit games.** Writing out each intermediate
162-
game explicitly, rather than relying on ProofFrog to accept a large implicit hop, forces
163-
you to state precisely what you think the intermediate game is. That statement is then
164-
independently checkable.
184+
## Soundness issues
165185

166-
**Report suspicious validations.** If you suspect the engine has accepted an incorrect
167-
hop -- for example, if ProofFrog validates a hop that you believe is mathematically
168-
invalid -- file an issue at
169-
[https://github.com/ProofFrog/ProofFrog/issues](https://github.com/ProofFrog/ProofFrog/issues).
186+
**Report suspicious validations.** If while writing or reviewing a proof you suspect
187+
the engine has accepted an incorrect hop -- for example, if ProofFrog validates a hop
188+
that you believe is mathematically invalid -- file an issue on the
189+
[ProofFrog issue tracker](https://github.com/ProofFrog/ProofFrog/issues) and apply the
190+
`soundness` label.
170191
Include the proof file, the specific hop, and your analysis of why you think the hop is
171192
wrong. A validated hop that is actually invalid is a soundness bug and should be treated
172193
as high priority.
173194

195+
Issues suspected of being soundness concerns are tagged with the
196+
[`soundness`](https://github.com/ProofFrog/ProofFrog/issues?q=label%3Asoundness) label
197+
on the issue tracker. These are distinct from
198+
[capability limitations]({% link manual/limitations.md %}), where the engine
199+
correctly rejects a valid hop because its current pipeline cannot show the hop is in
200+
fact valid.
201+
174202
---
175203

176-
## Comparison framing
204+
## Comparison with other tools
177205

178-
EasyCrypt and CryptoVerif have deeper trust bases: their program logics and tactic
206+
Other more established formal verification tools for cryptography like EasyCrypt and
207+
CryptoVerif have stronger trust bases: their program logics and tactic
179208
languages have pen-and-paper formalizations with meta-theoretic soundness arguments for
180-
those logics. ProofFrog makes no such claims. This is a genuine gap,
181-
not a rhetorical understatement. For high-assurance cryptographic work -- standards,
209+
those logics. ProofFrog lacks such soundness arguments. For high-assurance cryptographic work -- standards,
182210
deployed protocols, production code -- the more established tools remain the appropriate
183-
choice. ProofFrog's niche is earlier in the pipeline: exploration, education, and
211+
choice. ProofFrog's may be suitable for more preliminary work: exploration, education, and
184212
iterative proof development where the ease of writing and checking a game-hopping proof
185213
is worth the weaker soundness guarantee.
186214

187215
One concrete direction that could narrow this gap is an export functionality that encodes
188216
ProofFrog's automated transformations into the syntax of a more established engine such
189217
as EasyCrypt, so that individual hops could be discharged by a tool with a stronger
190-
logical foundation. This is identified as future work in the published paper but is not
191-
yet implemented.
192-
193-
---
194-
195-
## Known soundness issues
196-
197-
There is currently no dedicated `soundness` label on the issue tracker. If you file an
198-
issue that you believe is a soundness concern -- as opposed to a capability limitation
199-
where the engine correctly rejects a valid hop -- please mention it explicitly in the
200-
issue body and we will tag it accordingly. The distinction matters: a capability failure
201-
is expected and documented; a soundness failure means the engine has accepted something
202-
it should not have, which is a qualitatively different problem.
218+
logical foundation. While this is not yet implemented, we hope to explore this direction
219+
in the future.

0 commit comments

Comments
 (0)