Single entry for onboarding, local CI without just, reuse, review, verification boundaries, Verso, schema migrations, and Gate 7 releases. High-level rules: CONTRIBUTING.md. Pipeline stages and publication hooks: pipeline-extension-points.md. Maintainer operations: maintainers.md.
- Public alpha and repository state
- Running from a release
- Expectations
- Contribution model
- Domain policy
- Role-specific tracks
- One-time setup
- Local CI checklist (green before merge)
- Adding a paper and a claim
- When validation or benchmarks fail
- Reviewer guide
- Verification boundary
- Reusing Scientific Memory
- Verso integration (optional)
- Schema versioning and migration notes
- Release integrity (Gate 7)
- Summary
This section describes what "public alpha" means and how the repository is set up today.
- The repository is public. Eight papers are in
corpus/index.json(six formalized core slices plus two hard-dimension stress scaffolds; domains and narrative in README). Per-paper manifest facts (machine-checked declaration counts,build_hash_version, dependency-graph edge counts) are generated in docs/status/repo-snapshot.md (just repo-snapshot); do not rely on hand-maintained counts in this bullet. Adsorption kernels carry Hypothesis-based property tests plus pytest numeric witnesses in CI. - Contributors can add a claim, run the pipeline, and pass CI. All seven CI gates are in place (Lean build, schema validation including normalization, extraction_run for papers with claims, and kernel/theorem-card referential integrity, provenance, coverage, portal build, benchmark regression with proof-success snapshot, runtime budgets, minimum thresholds under
tasks, andtasks_ceiling(e.g. source-span alignment error rate ontasks.gold), release integrity with checksums and Sigstore keyless signing). Validation is centralized insm_pipeline.validate.gate_engine; usevalidate-all --report-jsonfor a machine-readable report after a successful run. Test counts: runjust testoruv run pytest --collect-only -qfrom the repo root (workspace includespipeline/testsandkernels/adsorption/tests). Ruff on pipeline. Runjust doctorif setup fails; usejust lake-buildorjust lake-build-verbose LOG=file.logfor Lean diagnostics. - Release and tagging are the single source of truth for "what is released." No separate artifact tarball is required for the alpha; cloning at a release tag is the canonical way to obtain a reproducible snapshot. Gate 7 is documented under Release integrity (Gate 7) in this playbook.
- Maintainers preparing to go public should follow maintainers.md.
From the repo root:
-
Clone and bootstrap
git clone <repo-url> cd scientific-memory just bootstrap
-
Build
just build
This runs the Lean build, portal build, and pipeline/kernel tests. Ensure all pass before changing corpus or formal code.
-
Run the portal locally
just portal
Open the URL shown (e.g. http://localhost:3000) to browse papers, claims, theorem cards, kernels, the coverage dashboard, search, and the diff page (baseline vs current corpus). Run
just export-diff-baselinebefore a release to refresh the diff baseline. Runjust metricsfor derived metrics (median intake, dependency, symbol conflict, proof completion, axiom count, research-value, source-span alignment, normalization visibility, assumption-suggestions, dimension-visibility, dimension-suggestions). Optional:just check-tooling(pandoc),just extract-from-source <paper_id>(output includes macro_context from preamble),just build-verso,just mcp-server(tools: list_declarations_for_paper, list_declarations_in_file, get_dependency_graph_for_declaration; requiresuv sync --extra mcp). -
Contributing
Use Adding a paper and a claim below. See CONTRIBUTING.md and Reviewer guide for workflow and review expectations.
To use a specific released version:
- Clone at tag:
git clone --branch v0.1.0 <repo-url>(or after clone,git checkout v0.1.0). Then runjust bootstrapandjust buildas above. - Release tags trigger a workflow that publishes a GitHub Release with changelog, checksums, Sigstore signature files, and
release-bundle.zip(fulldist/). Cloning at the tag remains the primary developer path; release assets support deterministic verification (see Release integrity (Gate 7)).
- The first success criterion is one full paper artifact (claims, formalization, theorem cards, manifest, portal). Platform completeness is secondary.
- Schema and corpus changes must follow Schema versioning and migration notes and the rules in SPEC; no merge without passing CI.
Contributions flow through:
- Paper admission: Add a paper directory under corpus/papers with metadata and (initially empty or stub) claims/assumptions/symbols.
- Extraction: Run extraction pipelines to populate claims, assumptions, symbols from source-adjacent inputs.
- Formalization: Add Lean declarations under formal/ScientificMemory, linked via mapping.json and theorem cards.
- Kernels: Add or extend executable kernels with declared contracts and linked theorem cards.
- Validation: All changes must pass the unified gate sequence (
sm_pipeline.validate.gate_engine): schema validation, normalization (unique IDs, resolved edges; unresolved link IDs live onlinked_*_unresolved), provenance, graph integrity (theorem-card and kernel linkage), coverage, extraction_run requirement (papers with claims must haveextraction_run.json), claim status and disputed-review_notes rules, migration doc check when schemas change, and non-blocking warnings (snapshot baseline quality; optional dependency-graph bootstrap hints for multi-card papers with emptydependency_ids; optional invalid LLM/suggestion sidecar schemas). Optional machine-readable output:validate-all --report-json <path>. See trust-boundary-and-extraction.md and testing/trust-hardening-e2e-scenarios.md. - Publish: Manifests and portal data are generated from corpus and formal artifacts only. Portal export uses
portal_read_model.build_portal_bundle(just export-portal-data). Regenerate status/repo-snapshot.md after corpus changes:just repo-snapshot.
Optional: run just metrics for derived metrics (includes normalization-policy report, reviewer_report, assumption-suggestions, dimension-visibility, dimension-suggestions); optional policy file benchmarks/normalization_policy.json; just check-paper-blueprint <paper_id>; just export-diff-baseline (or sm-pipeline export-diff-baseline --baseline-id <id> --title ... --narrative ...); just check-tooling (pandoc); just extract-from-source <paper_id>; just build-verso; just mcp-server (requires uv sync --extra mcp); proof-repair proposals via sm-pipeline proof-repair-proposals -o path (human-review only); human-gated proof-repair-apply after review (formal tree only; never in CI); bulk skeleton admission via sm-pipeline batch-admit (see paper-intake); stage-shaped automation via sm_pipeline.pipeline_orchestrator for SPEC 8.x steps where applicable.
No hand-authored portal truth; no public theorem card without a linked claim ID.
Domains are used for taxonomy and routing. Each paper has a single domain in metadata (schema enum includes chemistry, mathematics, physics, probability, control, quantum_information, other). Namespaces under formal/ScientificMemory are organized by domain (e.g. Chemistry.Adsorption, Mathematics.SumEvens). Cross-domain references are allowed but should be explicit in provenance and docs. When you introduce a new domain value, extend schemas/paper.schema.json, the pipeline Paper model, add a migration note under Schema versioning and migration notes, and extend this section if policy needs documenting.
Follow a role-specific path for a focused workflow and gate checklist:
| Role | Playbook | Use when |
|---|---|---|
| Formalizer | playbooks/formalizer.md | Adding or extending Lean formalizations and linking to claims. |
| Reviewer | playbooks/reviewer.md | Reviewing PRs; verifying gates and artifact impact. |
| Domain expander | playbooks/domain-expander.md | Adding a paper in a new domain (e.g. first mathematics or physics paper). |
| Release manager | playbooks/release-manager.md | Cutting a release; tagging, signing, and publishing artifacts. |
Each playbook includes a gate checklist, end-to-end path, and failure troubleshooting so you can complete the workflow without maintainer help.
From the repo root:
git clone <repo-url>
cd scientific-memory
just bootstrap
just build
just portalIn another terminal, run validation and benchmarks to confirm the corpus and pipeline are healthy:
just validate-corpus
just benchmark-smokeIf any of these fail, run just doctor to verify tool versions (uv, pnpm, lean, lake). If lake or lean is not found, ensure elan is installed and the repo is your working directory. For Lean build diagnostics use just lake-build-verbose LOG=lake-build.log to capture full output. Then fix the reported errors (schema, provenance, coverage, or benchmark thresholds) before changing the corpus.
Run from the repository root.
Windows: just is configured to use Git for Windows Bash (C:/Program Files/Git/bin/bash.exe in JUSTFILE). Install Git for Windows if that path is missing. Avoid running just with WSL as the recipe shell: Windows node/pnpm and corepack paths break under WSL bash. Recipes use scripts/just_shell_wrapper.sh, which sources scripts/just_env.sh so uv and Node stay on PATH in non-interactive Bash (no nested just subprocesses).
You can still mirror the main gates without just using the commands below.
Preferred path (with just available):
just bootstrapjust checkjust benchmark
Canonical fallback when you prefer not to use just is the command sequence in this section (validate, tests, Lean build, portal build, benchmark) using direct uv/lake/pnpm commands from PowerShell or any shell.
uv run --project pipeline python -m sm_pipeline.cli validate-allOptional machine-readable gate report (after success):
uv run --project pipeline python -m sm_pipeline.cli validate-all --report-json /tmp/gate-report.jsonOr: just validate (Bash).
Regenerate status/repo-snapshot.md: just repo-snapshot (requires uv; Bash via just).
Pipeline and kernel unit tests:
uv run --project pipeline pytest
uv run --project kernels/adsorption pytestMCP contract tests (requires uv sync --extra mcp):
uv run --project pipeline pytest pipeline/tests/test_mcp_server_contract.py pipeline/tests/test_mcp_server_data_paths.pyOptional smoke (Unix): bash tests/smoke/test_repo_bootstrap.sh (also run via just test).
Current status: Run pytest locally for current counts; pipeline and kernel suites include gate-engine, portal read-model, and orchestrator tests.
lake build
pnpm --dir portal lint
pnpm --dir portal buildOr: just build (runs Lean, portal build, then both pytest suites).
Note: Portal lint checks Prettier formatting and TypeScript types. Run pnpm --dir portal format to auto-fix formatting issues.
uv run --project pipeline python -m sm_pipeline.cli benchmarkFails if metrics fall below minima in tasks or exceed maxima in tasks_ceiling (e.g. tasks_ceiling.gold.source_span_alignment_error_rate). Raise or lower thresholds only when the team agrees metrics should tighten.
uv run --project pipeline ruff check pipeline kernels kernels/conformance scripts testsRuff checks for code quality issues (unused imports, variables, etc.). All checks should pass before merge.
uv run --project pipeline python -m sm_pipeline.cli publish --paper-id <paper_id>
uv run --project pipeline python -m sm_pipeline.cli export-portal-data
pnpm --dir portal buildportal build regenerates portal/public/search-index.json when configured in portal/package.json.
See also: SPEC.md, ROADMAP.md, CONTRIBUTING.md.
-
Admit the paper
just add-paper langmuir_1918_adsorption
This creates
corpus/papers/langmuir_1918_adsorption/with initialmetadata.json,claims.json,assumptions.json,symbols.json,mapping.json, andmanifest.json. Use a short, stablePAPER_ID(e.g.author_year_topic). When adding a paper, if you have a DOI or arXiv ID, setmetadata.source.doiormetadata.source.arxiv_idin the paper's metadata.Hard-paper admission metadata: after
just add-paper, editcorpus/papers/<paper_id>/metadata.jsonand append tags recording a single primary difficulty axis (and optional secondary axes) plus a short admission rationale. See paper intake – Hard-paper intake matrix and admission policy for the required tag format and vocabulary. -
Extract claims (scaffold)
just extract-claims langmuir_1918_adsorption
If
claims.jsonis missing or empty, this writes a single placeholder claim (default--mode scaffold_only). Otherwise it leaves the file unchanged. Use--mode deterministicorllm_sidecarto skip placeholder scaffolding; see trust-boundary-and-extraction.md.Optional: with a Prime Intellect API key in root
.env, runjust llm-claim-proposals <PAPER_ID>(claim drafts),just llm-mapping-proposals <PAPER_ID>(mapping suggestions), or aftermapping.jsonand Lean existjust llm-lean-proposals <PAPER_ID>(surgical Lean find/replace sidecar). All are human-reviewed before apply; see prime-intellect-llm.md. -
Scaffold formal mapping
just scaffold-formal langmuir_1918_adsorption
This ensures
mapping.jsonexists with a validpaper_id,namespace, andclaim_to_declmap. Existing entries are preserved. -
Edit corpus and Lean
- Edit
corpus/papers/<PAPER_ID>/claims.json: setsource_span,informal_text,claim_type,status, and requiredvalue_kind(scientific value classification), pluslinked_assumptions,linked_symbols. - Edit
corpus/papers/<PAPER_ID>/assumptions.jsonandsymbols.jsonas needed. - Edit
corpus/papers/<PAPER_ID>/mapping.json: add entries inclaim_to_declmapping each claim ID to the corresponding Lean declaration name. - Add or update Lean code under
formal/so that the declarations compile and (if applicable) proofs are machine-checked.
- Edit
-
Check the paper
just check-paper langmuir_1918_adsorption
Optionally, compare the paper's blueprint to
mapping.json:just check-paper-blueprint langmuir_1918_adsorption. Reports claim IDs in the blueprint that are missing or differ in mapping (mapping is authoritative for the build).This runs full corpus validation (schema, normalization, provenance, coverage). Fix any reported errors:
- Schema: Ensure every required field is present and types match the schemas in
schemas/. - Normalization: No duplicate claim/assumption/symbol IDs per paper; every
linked_assumptionsandlinked_symbolsentry must resolve to an existing ID. - Provenance: Every claim must have a
source_span; every declaration in the manifest must be linked from a claim via the mapping. - Coverage: Regenerate the manifest so that
coverage_metricsmatch the current claims (do not editmanifest.jsonby hand for coverage).
- Schema: Ensure every required field is present and types match the schemas in
-
Publish artifacts
just publish-artifacts langmuir_1918_adsorption
This regenerates
theorem_cards.jsonandmanifest.json(includingdependency_graphfrom theorem cards’dependency_ids,generated_pages, andkernel_indexwhen applicable). -
Run full check
just check
This runs format, lint, validate, test, and build (Lean, portal, pipeline, kernels). All must pass before opening a PR.
- Schema validation: Check the failing file against the JSON schema in
schemas/. Use the pipeline’svalidate-alloutput to see which file and field failed. After a successful run,validate-all --report-json <path>writes the ordered gate checklist fromgate_engine. - Normalization: No duplicate claim/assumption/symbol IDs per paper; every
linked_assumptionsandlinked_symbolsreference must resolve to an existing ID in that paper. - Provenance (Gate 3): Add or fix
source_spanon claims; ensure every declaration in the manifest is reachable from some claim viamapping.json’sclaim_to_decl. - Hardness scaffolds: For papers tagged
hardness.primary:*with emptyclaims.json, Gate 3 allows a temporarymanifest.jsonwhilemetadata.source.sha256is still all-zero. For all normal papers, manifest + sentinel hash remains a hard error. - Coverage (Gate 4): Run
just publish-artifacts <paper_id>so thatmanifest.json’scoverage_metricsare recomputed from the current claims. Do not editcoverage_metricsmanually. - Benchmark regression: If extraction/mapping/theorem_cards/gold metrics fall below minima in
tasksor above maxima intasks_ceiling(e.g. source-span alignment error rate), either fix corpus/gold drift or adjustbenchmarks/baseline_thresholds.json(with team agreement). The same applies totasks.llm_evalwhen reference fixtures underbenchmarks/llm_eval/drift after prompt changes.
How to review a pull request so artifact impact, provenance, verification boundaries, and coverage are explicit and checks pass.
Allowed statuses (must match common.schema.json): unparsed, parsed, mapped, stubbed, compiles_with_sorries, machine_checked, linked_to_kernel, disputed, superseded.
Required audit fields:
- disputed: When a claim has
status: "disputed",review_notesmust be non-empty (justification for the dispute). Enforced byjust validate(gate engine insm_pipeline.validate.gate_engine); failure blocks merge.
Lifecycle: Typical flow is forward (unparsed to parsed to mapped to machine_checked). disputed is a side state requiring human resolution; once resolved, move the claim back to an appropriate status and retain or clear review_notes as needed. Invalid status values (not in the allowed set) are rejected by validation.
A machine-readable reviewer report (claims by status, disputed with/without notes, invalid status list) is available via just metrics (included in full report) or the pipeline compute_reviewer_report API.
Theorem cards carry required reviewer_status (schema: theorem_card.schema.json): unreviewed, reviewed, accepted, or blocked. Rules enforced by validation:
- Values must be one of the schema enum entries.
blockedrequires non-emptynoteson the card.acceptedrequiresproof_status: "machine_checked".
Publishing merges prior reviewer_status and notes so human edits are not dropped on regenerate. After formal review, set accepted or reviewed on the card JSON (or rely on publish defaults derived from claim proof_status). Cross-paper disputed-formalization counts appear under just metrics --research-value as disputed_claims_with_formal_targets (claim-level). Theorem-card reviewer queue counts (blocked, unreviewed, reviewed, accepted) are emitted by just metrics under reviewer_status and surfaced on the portal dashboard.
SPEC 0.2: this is the intended human workflow for reviewer_status on theorem cards. Enforcement is staged: metrics warnings first; stricter CI or merge rules are optional and must be adopted explicitly by maintainers.
States
| Status | Meaning |
|---|---|
unreviewed |
Card generated or updated; no human reviewer sign-off. |
reviewed |
A reviewer inspected mapping, proof boundary, and claim link; not necessarily final. |
accepted |
Approved for publication scope; requires proof_status: "machine_checked" (validated). |
blocked |
Do not treat as trusted; requires non-empty notes (validated). |
Allowed transitions
unreviewed→reviewed→accepted(typical forward path for machine-checked cards).unreviewed→blockedwhen formalization or mapping is disputed.blocked→reviewedoracceptedafter resolution (clear or updatenotes).reviewed→unreviewedis allowed only for regeneration hygiene (e.g. republish reset); document in the PR why status was pulled back.accepted→reviewedorunreviewedis a conscious rollback (e.g. proof regression); fix proof or status in the same change set.
Schema and just validate (via the gate engine) already enforce: valid enum, blocked + notes, accepted + machine-checked.
Staged enforcement (warn to block)
- Phase 1 (current):
just metricsincludesreviewer_statuscounts and emits a warning when any card ismachine_checkedbut stillunreviewed(seereviewer_status.machine_checked_but_unreviewedin the JSON report). This does not fail CI. - Phase 2 (optional): Treat a high count of
unreviewedmachine-checked cards as a merge recommendation (branch protection or reviewer checklist). - Phase 3 (optional): CI fails if papers in a named allowlist have public-facing cards not in
acceptedorreviewed. Requires maintaining that list and excluding WIP papers.
Relation to claim status: Claim-level status and disputed rules are in Claim status lifecycle and required audit fields above. Theorem-card reviewer status is orthogonal: it tracks human trust in the card bundle, not only Lean compilation.
Every PR should clearly state:
- Artifact impact: papers, claim bundles, declarations, kernels, or portal routes changed.
- Provenance impact: new/edited claims have source spans; declarations are mapped from claims.
- Verification-boundary impact: proof status, axioms/sorry usage, kernel witness status, reviewer status.
- Coverage impact: before/after metrics where corpus/formal changes occur.
- Schema impact: modified schemas and migration notes if needed.
- Risk impact: new risks introduced (or explicitly none).
Before approval, ensure these pass:
just validatejust testjust build
If local setup/checks fail, run just doctor to validate toolchain availability.
- schema validation (including theorem cards and kernels when present),
- normalization integrity (unique IDs and resolved links),
- provenance integrity,
- graph integrity:
- theorem-card dependency IDs resolve,
- kernel linked theorem cards resolve,
- theorem-card executable links resolve to kernels,
- manifest kernel index resolves to kernels,
- coverage integrity,
- extraction run requirement: papers with non-empty claims have
extraction_run.json(runjust extraction-report <paper_id>if missing), - migration note in Schema versioning and migration notes in this playbook when schemas change.
- Claim added/changed without
source_span. - Declaration added/changed but not represented in paper mapping.
- Hand-edited manifest coverage metrics.
- Dangling theorem-card or kernel references.
- Schema changes without migration notes.
- Generated artifacts changed without clear generator command.
- Paper has claims but no
extraction_run.json.
Use PR comments to separate blocking issues from follow-up suggestions.
Every formal declaration and executable kernel has a verification boundary:
- fully_machine_checked: Full machine proof, no axioms.
- machine_checked_plus_axioms: Proof complete modulo stated axioms.
- numerically_witnessed: Checked by tests/simulations only.
- schema_valid_only: Structure validated, no semantic check.
- human_review_only: Human-verified only.
The boundary is stored in theorem cards and kernel manifests. The portal and reports surface it so readers know what is guaranteed.
The proof-repair workflow generates proposal artifacts only; it never modifies the corpus. Run:
uv run --project pipeline python -m sm_pipeline.cli proof-repair-proposals -o path/to/proposals.json
uv run --project pipeline python -m sm_pipeline.cli proof-repair-proposals --paper-id PAPER_ID -o out.jsonOutput conforms to schemas/proof_repair_proposal.schema.json. Every proposal has verification_boundary: "human_review_only". Apply any change to the formal tree or theorem cards only after human review; the pipeline does not auto-apply repair proposals in CI.
After reviewing proposals, a maintainer may apply an auditable bundle of find/replace patches only under formal/ using:
uv run --project pipeline python -m sm_pipeline.cli proof-repair-apply --bundle path/to/bundle.json --i-understand-human-reviewedThe bundle schema is schemas/proof_repair_apply_bundle.schema.json. CI must never run proof-repair-apply. Regenerate theorem cards and manifests (just publish-artifacts) after Lean edits; run just validate and just lake-build before merge.
This section describes supported ways third parties can reuse the repository and add value. For extending the pipeline in-tree, see pipeline-extension-points.md.
Use case: Build a custom UI, analytics, or tooling that reads the same JSON the portal uses.
-
Clone the repository.
-
Install uv (and Python toolchain) per Local CI checklist. You do not need pnpm or Lean only if you obtain
corpus-export.jsonfrom a release artifact or another machine that ran export. -
From repo root, generate the bundle:
uv sync --all-packages uv run --project pipeline python -m sm_pipeline.cli export-portal-data
-
Read
portal/.generated/corpus-export.json. Shape is defined only bybuild_portal_bundleinpipeline/src/sm_pipeline/publish/portal_read_model.py(sm_pipeline.publish.portal_read_model).
Version field: version in the JSON matches PORTAL_BUNDLE_VERSION in that module; treat unknown versions as potentially incompatible.
Also useful: JSON Schemas under schemas/ and examples under schemas/examples/ for contract-stable parsing.
Use case: Admit papers, formalize in Lean, publish manifests, and ship through the portal using the full monorepo.
Follow this playbook (Adding a paper and a claim), paper-intake.md, and Domain policy. All merge paths expect gate engine green and (for substantive changes) portal + Lean CI parity.
Use case: Add numerical or computational witnesses linked to theorem cards.
- Existing pattern:
kernels/adsorption/with tests depending on workspace packagekernels/conformance/. - New kernel family: add
kernels/<family>/with its ownpyproject.toml, depend onkernel-conformance, register incorpus/kernels.json, and ensure CI runs your pytest (extend corpus-validation workflow or document a separate workflow if you split jobs).
Today the supported path is “run from repo root” with uv run --project pipeline ...: paths assume corpus/, schemas/, and formal/ layout.
A time-boxed spike to reuse sm_pipeline as a library in another repo would need:
- vendoring or
uvpath dependency onpipeline/; - explicit
repo_rootarguments (many CLIs usePath(".")); - copies of or symlinks to
schemas/and corpus inputs.
Document findings in a discussion or ADR if you pursue this; upstream may later split a publishable package if demand is clear.
Use case: Extraction quality, proof-completion trends, gold labels.
- Gold:
benchmarks/gold/<paper_id>/andjust scaffold-gold(metrics.md). - Regression:
just benchmarkandbenchmarks/baseline_thresholds.json(mintasks, maxtasks_ceiling). - Derived metrics:
just metricsfor reviewer queues, symbol conflicts, dependency reuse, etc.
External evaluators can cite frozen release tags and dist/ checksums per Release integrity (Gate 7).
- Hand-edited
manifest.jsoncoverage or portal-only “truth” (violates project invariants). - Claiming formal support in kernel manifests without linked theorem cards (engineering rules).
See also: maintainers.md.
SPEC Section 1 lists "optional Verso-backed long-form technical docs." This section defines the integration contract and current setup.
Verso is used by the Lean project for long-form documentation (e.g. The Hitchhiker's Guide to Logical Verification). It can render Markdown and Lean code blocks with cross-references. See SPEC.md and ROADMAP.md. Human-oriented Markdown under docs/verso/ complements the Lean manual sources under scripts/verso/.
- Dependency: Verso is listed in lakefile.toml with
rev = "v4.29.0-rc6", aligned with lean-toolchain (leanprover/lean4:v4.29.0-rc6). Do not pin Verso to a tag that does not match the toolchain. - Build: scripts/verso/Main.lean defines the
generate-siteexecutable and calls Verso's manual generator. The manual content is authored in Lean (e.g. ScientificMemoryVerso.lean), not only indocs/verso/*.md. - Output: From repo root, run
just build-versoorlake exe generate-site. HTML is emitted todocs/verso/_build/(gitignored; see generated-artifacts.md). - CI: The Verso site build is not a required CI gate; corpus validation and the main
lake buildforformal/remain the merge bar.
Verso tags follow Lean releases (see Verso tags). When you change lean-toolchain, update the rev for both verso and mathlib in lakefile.toml, then run lake update and lake build.
Treat Verso as partially integrated: the Lake dependency and generate-site path exist, but richer manual content, stable URLs for portal links (e.g. optional theorem_card.verso_url), and optional CI for the static site are still incremental. Track scope in ROADMAP.md.
A working network connection to the Lean package reservoir is required for lake update. Lake fetches pre-built dependencies (e.g. mathlib) from the reservoir; without access, lake update fails.
Common failure: If you see external command 'curl' exited with code 35 and leanprover-community/mathlib: Reservoir lookup failed, this usually indicates a TLS or network problem (e.g. unreachable host, corporate proxy, or firewall blocking the reservoir). Resolve by running lake update (and lake build) in an environment with outbound HTTPS to the reservoir, or retry when the network is available; transient reservoir failures can sometimes be overcome with a retry.
Workaround when the reservoir is unreachable: This repo configures mathlib via git instead of the reservoir (lakefile.toml: [[require]] name = "mathlib" git = "..." rev = "v4.29.0-rc6"). Lake then clones mathlib4 from GitHub, so lake update does not need the Lean reservoir. The first run may take longer (cloning and, if used, building cache). If you prefer to use the reservoir when it is reachable, you can switch back to scope = "leanprover-community" and remove the git/rev fields for mathlib.
Windows: certificate revocation check (curl 35 CRYPT_E_NO_REVOCATION_CHECK): If lake update fails during mathlib's post-update hook with CRYPT_E_NO_REVOCATION_CHECK (e.g. when fetching the Lean cache/leantar), the Schannel SSL backend cannot verify revocation. Two options:
- Skip the cache fetch: Set
MATHLIB_NO_CACHE_ON_UPDATE=1before runninglake update. The update will complete without downloading pre-built cache; the firstlake buildwill be slower (mathlib compiles from source). On Windows (PowerShell):$env:MATHLIB_NO_CACHE_ON_UPDATE="1"; lake update. On Windows (cmd):set MATHLIB_NO_CACHE_ON_UPDATE=1thenlake update. From Git Bash, ifjust lake-update-no-cachereports "lake: command not found", runMATHLIB_NO_CACHE_ON_UPDATE=1 lake updatein the same terminal wherelakeworks (e.g. where elan is in PATH). - Allow curl to skip revocation: Create or edit
%USERPROFILE%\.curlrcand add the line--ssl-no-revoke. Then runlake updateagain. This reduces revocation checking only for curl; use only if your environment requires it.
Disk space: Building the generate-site executable compiles Verso and its dependencies (SubVerso, VersoManual, etc.) under .lake/packages/. The first lake exe generate-site can need several hundred MB of free space. If you see resource exhausted (error code: 28, no space left on device) or clang "No space left on device", free disk space (or move the repo to a drive with more space), then run lake exe generate-site again. lake build alone does not build the exe, so it can succeed even when the exe build would run out of space.
- Input: Verso-compatible sources under
scripts/verso/(Lean manual) and optional Markdown underdocs/verso/for contributors. - Build:
just build-versoorlake exe generate-siteproduces static HTML underdocs/verso/_build/. - Output: When published (local preview, separate hosting, or future portal links), long-form pages should be addressable in a stable way so paper or theorem-card views can deep-link.
- No schema change: Verso content is supplementary. Corpus, manifests, and portal JSON remain the source of truth. Optional metadata (e.g.
theorem_card.verso_url) may be added later without changing the proof pipeline contract. - CI (optional): A workflow may build or link-check the site; it must not block corpus validation or the main Lean library build.
- After any
lean-toolchainbump, alignversoandmathlibrevisions inlakefile.tomland runlake update/lake build. - Extend the manual in
scripts/verso/(and keepdocs/verso/in sync for human readers if desired). - Run
just build-versoand confirmdocs/verso/_build/contains the expected HTML. - When portal linking or optional CI is added, document URLs and workflows here and in ROADMAP.md.
References: SPEC Section 1; ADR index (no Verso ADR yet).
Canonical schemas live in schemas/*.schema.json. They use JSON Schema draft 2020-12 and $id for resolution.
- Backward compatibility: New optional properties and new enum values are allowed; removing or renaming properties requires a new major version and migration.
- Migration: When changing a schema, update the schema file, the corresponding Pydantic model in the pipeline, example fixtures in schemas/examples/, and add a dated entry under Migration notes below.
SPEC 8.3 Normalization: Validation (via the unified gate runner behind validate-all) ensures no duplicate claim/assumption/symbol IDs per paper and that every claim's linked_assumptions and linked_symbols resolve to existing IDs. Visibility tooling: just metrics --normalization-report (cross-paper duplicate normalized_name; report includes suggest_merge for human triage), just metrics --normalization-visibility (symbols with role_unclear, claims without linked_assumptions), just metrics --assumption-suggestions (candidate assumptions for claims with none; text overlap), just metrics --dimension-visibility (symbols with vs without dimensional_metadata), just metrics --dimension-suggestions (heuristic suggested unit/dimension from kernels and symbol names; human triage only, no auto-edit). Full automation (auto-merge, auto-tag) is accepted out of scope; incremental normalization work is tracked in ROADMAP.md and SPEC.md (normalization / metrics sections).
| Task | Status | Notes |
|---|---|---|
| No duplicate IDs (claims, assumptions, symbols per paper) | Done | validate_normalization in pipeline; CI. |
| All edges resolve (linked_assumptions, linked_symbols) | Done | Same. |
| Symbol deduplication across papers | Visibility + suggest merge | Cross-paper duplicate normalized_name via just metrics --normalization-report; report includes suggest_merge (human triage); no auto-merge. |
| Variable role disambiguation | Visibility | Schema has ambiguity_flags (role_unclear); just metrics --normalization-visibility lists symbols with role_unclear; resolution remains manual. |
| Unit/dimension tagging | Visibility + suggestions | Schema supports dimensional_metadata; just metrics --dimension-visibility lists with vs without; just metrics --dimension-suggestions emits heuristic suggested unit/dimension from kernels and symbol names (human triage, no auto-edit). |
| Assumption lifting | Visibility + suggestions | just metrics --normalization-visibility lists claims with no linked_assumptions; just metrics --assumption-suggestions suggests candidates by text overlap (human triage). |
| Dependency edge inference | Partial | Theorem cards have dependency_ids; manifest derives dependency_graph from them. Lean source-based extraction populates same-file declaration dependencies. |
| Level | Criteria | Enforcement |
|---|---|---|
| v0.1 | No duplicate IDs per paper; all linked_assumptions and linked_symbols resolve. |
validate_normalization in CI; fail-fast. |
| v0.2 | Cross-paper symbol deduplication (normalized_name) applied or explicitly waived; unit/dimension tagging complete for symbols used in kernels. | Visibility and normalization_report; acceptance gate may require zero unwaived duplicates or documented waivers. |
| v0.3 | Assumption lifting coverage: every claim has at least one linked assumption or an explicit waiver; kernel-linked symbols have dimensional_metadata. | Metrics and reports; optional CI threshold (e.g. coverage percentage). |
Progression: v0.1 is required for merge. v0.2/v0.3 are incremental; thresholds and waiver policies are set per milestone (see ROADMAP.md and SPEC.md).
Optional policy file: benchmarks/normalization_policy.json. When present, just metrics --normalization-policy (or just metrics with default run-all) produces a machine-readable report:
- duplicate_waivers: list of
normalized_namestrings that are allowed to appear in multiple papers (waived); any other cross-paper duplicate is reported asunwaived_duplicates. - assumption_coverage_max_unlinked: maximum allowed count of claims without linked assumptions; report sets
assumption_coverage_okto false and adds a warning when exceeded. - max_symbols_without_dimension: maximum allowed count of symbols without
dimensional_metadata; violations appended tokernel_dimension_violations.
Report keys: policy_loaded, unwaived_duplicates, waived_duplicates, assumption_coverage_ok, assumption_coverage_message, kernel_dimension_violations, warnings. Checks run in warn-only mode (warnings echoed to stderr); to enforce, add a CI step that fails when unwaived_duplicates is non-empty or assumption_coverage_ok is false, or when warnings is non-empty.
Blueprint docs (e.g. docs/blueprints/<paper>.md) list claim IDs and target declarations; the canonical mapping lives in corpus/papers/<paper_id>/mapping.json. When they diverge, mapping.json is authoritative for the build. Use the optional check-paper-blueprint step (see Adding a paper and a claim) to report mismatches.
-
2026-03: Added optional
first_artifact_at(date-time) to paper.schema.json; set by publish-artifacts when first writing manifest (median time intake→artifact can be computed from this). No migration required. -
2026-03: Added optional
ingestion_statusto paper.schema.json (enum: pending, ingested, failed) for SPEC 7.1. No migration required; existing corpus metadata need not populate it until tooling uses it. -
2026-03: Added
mapping.schema.jsonfor claim-to-declaration mapping (paper_id, namespace, claim_to_decl). No change to existing corpus; existing mapping.json files now validated invalidate-all. Extended the unified gate runner behindvalidate-allto validate assumptions.json, symbols.json, and mapping.json per paper when present (SPEC Segment B). -
2026-03: Added optional
literature_error.schema.jsonfor per-paperliterature_errors.json(array of { claim_id?, description, severity?, source }). Hand-maintained; used by research-value metrics when present. No migration required. -
2026-03: Added optional
target_fileto mapping.schema.json (path to Lean file containing declarations). Theorem card generator infers from namespace when absent (formal/Namespace/To/File.lean). Re-runjust publish-artifacts(or publish-manifest per paper) to populate theorem_cards.jsonfile_pathfrom mapping. -
2026-03: Added
proof_repair_proposal.schema.jsonfor proof-repair proposal artifacts (human_review_only). Generated bysm-pipeline proof-repair-proposals; no corpus migration. -
2026-03: Added
mathematicsto paper.schema.json domain enum and pipeline Paper model for multi-domain slice (non-adsorption). No migration required for existing papers. -
2026-03-18: Added
proof_repair_apply_bundle.schema.jsonfor human-gatedproof-repair-apply(find/replace patches underformal/only). No corpus migration; optional review JSON path in bundle. -
2026-03-18: Tightened theorem card reviewer contract:
reviewer_statusis now required intheorem_card.schema.json(enum unchanged), pipeline TheoremCard model updated to require reviewer_status, andschemas/examples/theorem_card.example.jsonupdated to canonical Langmuir IDs plus reviewer_status. Validator rules now additionally enforceaccepted -> proof_status=machine_checked. Existing generated theorem cards already include reviewer_status via publish. -
2026-03-20: Added
llm_claim_proposals.schema.jsonandllm_mapping_proposals.schema.jsonfor optional Prime Intellect proposal sidecars; pipelineLlmClaimProposalsBundle/LlmMappingProposalsBundlemodels; examples underschemas/examples/. Sidecars are optional;validate-allwarns only if present and invalid. See prime-intellect-llm.md and ADR 0011. -
2026-03-20 (trust hardening): Added trust-boundary-and-extraction.md and ADR 0012. Claims: optional
linked_assumptions_unresolved/linked_symbols_unresolved(normalize preserves broken link IDs off the resolved edge lists). Manifest: optionalbuild_hash_version(2= content-addressed over corpus JSON + theorem cards + kernel_index + source sha256);dependency_graphandkernel_indexare recomputed on each publish unlessSM_PUBLISH_REUSE_MANIFEST_GRAPHS=1. Theorem cards: optionaldependency_extraction_method(lean_source_regex_tier0). Extraction:extract-claims --mode scaffold_only|deterministic|llm_sidecar;extraction_run.jsonmay includeextraction_modeandplaceholder_claim_written. Suggestion schemas:llm_run_provenance.schema.json,suggested_assumptions.schema.json,suggested_symbols.schema.json,suggested_mapping.schema.json. Benchmark:tasks.llm_suggestionsinbaseline_thresholds.json. Re-runjust publish-artifactsper paper andjust repo-snapshotafter pulling. -
2026-03-20: Added
llm_lean_proposals.schema.jsonfor optional surgical Lean edit suggestions (llm_lean_proposals.jsonper paper). PipelineLlmLeanProposalsBundlemodel; exampleschemas/examples/llm_lean_proposals.example.json. CLI:sm-pipeline llm-lean-proposals,sm-pipeline llm-lean-proposals-to-apply-bundle(producesproof_repair_apply_bundlefor human-gatedproof-repair-apply). Sidecar is optional;validate-allwarns only if present and invalid. Env:SM_LLM_MODEL_LEAN. Benchmark:tasks.llm_lean_suggestions. See prime-intellect-llm.md. -
2026-03-21: LLM evaluation hardening (no new corpus schema):
pipeline/.../llm/prompt_templates.pywithmetadata.prompt_template_id/prompt_template_sha256/input_artifact_sha256on generated proposal bundles; benchmark top-levelllm_prompt_templates; tasktasks.llm_evalover reviewedbenchmarks/llm_eval/<paper_id>/reference_llm_*_proposals.json;benchmarks/baseline_thresholds.jsonminima forllm_eval; scriptsscripts/llm_live_eval.py,scripts/llm_eval_lean_build_optional.py; docs testing/llm-human-eval-rubric.md, ADR 0013. Example schemas/examples/llm_run_provenance.example.json updated for template id and input keys. -
2026-03-21 (claim value policy): Added
value_kindenum toschemas/claim.schema.json(pipelineClaimmodel) to classify the scientific value of each claim (foundational_law,bridge_lemma,executable_contract,experimental_invariant,cross_paper_anchor). 2026-03-22:value_kindis required on every claim row (corpus and gold fixtures updated). Validation includesclaim_value_policy(near-duplicate algebraic variant detection); enforcement for duplicate pairs still requires both claims to carry explicit reuse kinds where applicable. -
2026-03-22 (kernel contract schema v1): Added optional
contract_v1structured object toschemas/executable_kernel.schema.jsonand the pipelineExecutableKernelmodel.validate_kernel_contracts_v1now enforcescontract_v1completeness forcorpus/kernels.jsonentries withtest_status: "tested"(witness_type, units/domains, tolerances, and formal obligations coverage). -
2026-03-22 (kernel IO typing): Required
io_typingon each kernel (inputs/outputsas typed numeric bindings). Legacyinput_schema/output_schemastrings are optional documentation only.validate_kernel_contracts_v1checksio_typingnames matchcontract_v1.unitskeys and that nonnegative/positive numeric kinds align with domain strings. -
2026-03-22 (manifest dependency edges): Optional
confidence(high|medium|low) on eachdependency_graphedge inschemas/artifact_manifest.schema.json, derived fromdependency_extraction_methodon the consumer theorem card (e.g. tier1 →high). Re-runjust publish-artifacts <paper_id>to refresh manifests. -
2026-03-22 (LLM eval provenance): Optional
reviewer_time_secondsonschemas/llm_run_provenance.schema.json/LlmRunMetadata;benchmarks/tasks/llm_eval/scorer.pyaggregatesreviewer_time_seconds_totalandreviewer_time_observations. -
2026-03-22 (provenance scaffold exception): Gate 3 provenance still rejects
manifest.jsonwhenmetadata.source.sha256is the all-zero sentinel for normal papers. Narrow exception: papers taggedhardness.primary:*with emptyclaims.jsonmay keep manifests during intake scaffolding.
SPEC Gate 7 requires: "Manifest signed; changelog generated; release artifact hash emitted." This section describes how the project satisfies Gate 7.
When a release is created (push to a tag v*), the release workflow runs and scripts/release_artifacts.sh produces:
-
Changelog
dist/CHANGELOG.mdis generated from git log (from previous tag to HEAD, or last 20 commits if no prior tag). The workflow emits a short preview in the job log. -
Checksums
dist/checksums.txtcontains:manifest_checksum: SHA-256 of the concatenated contents of all papermanifest.jsonfiles (sorted), so any change to manifests is detectable.- Per-file SHA-256 hashes of everything under
dist/. release_artifact_sha256: digest of the full release artifact content.
-
Manifest signed
The release workflow signsdist/checksums.txtwith Sigstore (cosign) keyless signing. Signature and certificate are written todist/checksums.txt.siganddist/checksums.txt.pem. To verify a release: recompute checksums as above; if you have the.sigand.pemfiles, runcosign verify-blob dist/checksums.txt --signature=dist/checksums.txt.sig --certificate=dist/checksums.txt.pem. Checksum recomputation remains the primary tamper check; signing attests the origin (GitHub Actions OIDC).
When a version tag (v*) is pushed, the release workflow creates a GitHub Release and uploads publish-ready assets:
CHANGELOG.md– changelog (previous tag to HEAD)checksums.txt– manifest and per-file checksums plusrelease_artifact_sha256checksums.txt.sig– Sigstore signature ofchecksums.txtchecksums.txt.pem– Sigstore certificaterelease-bundle.zip– fulldist/contents (corpus, changelog, checksums, signatures)
Verification instructions are deterministic: recompute checksums from the unpacked bundle and compare to checksums.txt; verify the signature with cosign as above.
To verify a release locally:
- Clone at the release tag, run
just bootstrapandjust build. - Optionally recompute hashes of
corpus/papers/*/manifest.jsonand compare to themanifest_checksumpublished in the release’sdist/checksums.txt.
See Running from a release for how to run from a release tag.
| Goal | Command |
|---|---|
| Setup | just bootstrap then just build |
| Validate corpus | just validate-corpus |
| Benchmark smoke | just benchmark-smoke |
| Add paper | just add-paper <paper_id> |
| Extract claims (scaffold) | just extract-claims <paper_id> |
| Scaffold formal | just scaffold-formal <paper_id> |
| Check one paper | just check-paper <paper_id> |
| Blueprint vs mapping | just check-paper-blueprint <paper_id> |
| Publish artifacts | just publish-artifacts <paper_id> |
| Scaffold gold | just scaffold-gold <paper_id> (for each paper in corpus/index.json; update baseline papers_with_gold when adding papers) |
| Derived metrics | just metrics (optional -o report.json; includes normalization-report with suggest_merge, dimension-suggestions) |
| Export diff baseline | just export-diff-baseline or uv run --project pipeline python -m sm_pipeline.cli export-diff-baseline --baseline-id <id> --title ... --narrative ... (repeat --highlight for each bullet; see release-policy) |
| Check tooling (pandoc) | just check-tooling |
| Extract from source | just extract-from-source <paper_id> (pandoc + source/main.tex) |
| Build Verso | just build-verso (see Verso integration (optional)) |
| MCP server | just mcp-server (optional; tools: list_declarations_for_paper, list_declarations_in_file, get_dependency_graph_for_declaration; requires uv sync --extra mcp) |
| LLM Lean suggestions (optional) | just llm-lean-proposals <paper_id> then review llm_lean_proposals.json; convert + proof-repair-apply per prime-intellect-llm.md |
| Full pre-PR check | just check |
Milestone 3 (20–40 machine-checked declarations) is exceeded at 62 machine-checked declarations across six formalized core papers (with two additional stress scaffolds currently intake-only; see status/repo-snapshot.md, regenerated with just repo-snapshot). See ROADMAP for the content sprint checklist and stretch goals. Use the "Content sprint (Milestone 3)" issue template when opening a content-focused issue. The portal dashboard and just metrics --proof-completion reflect declaration totals.