chore(deps): align ci_install_project.sh floors with pyproject.toml (Wave 1A) by an0mium · Pull Request #6583 · synaptent/aragora

an0mium · 2026-04-25T04:49:58Z

Wave 1A: dep floor reconciliation (paperwork PR)

Pure paperwork: ci_install_project.sh stated dep floors that were strictly less restrictive than pyproject.toml's. The constraint solver was already enforcing the higher pyproject floors at install time, so this change has zero behavioral impact. But having the two files disagree creates noise (pip-audit reports, dependabot reasoning) and makes future bumps confusing.

Aligned floors

Package	Old floor	New floor	pyproject extra
`pydantic`	`>=2.0`	`>=2.13.2`	[test]
`fastapi`	`>=0.109.0`	`>=0.135.3`	[gateway]/[all]
`uvicorn[standard]`	`>=0.27.0`	`>=0.44.0`	[gateway]/[all]
`pytest`	`>=7.0`	`>=9.0.3`	[test]
`mypy`	`>=1.8`	`>=1.19.0`	[dev]
`asyncpg`	`>=0.29.0`	`>=0.31.0`	[enterprise]/[all]
`python3-saml`	`>=1.15`	`>=1.16.0`	[enterprise]/[all]

Verification

Clean venv install with the aligned floors resolved cleanly:

pydantic           2.13.3
fastapi            0.136.1
uvicorn            0.46.0
pytest             9.0.3
mypy               1.20.2
asyncpg            0.31.0

All real PyPI versions exist at the new floors.

Why this is a paperwork PR

For each pin, pip's resolver takes the maximum of all stated floors. With pyproject.toml saying >=2.13.2 and ci_install_project.sh saying >=2.0, the resolver was already pulling >=2.13.2. So:

No version actually changes in CI
No new behavior gets exercised
The script just stops lying about what its floors are

This is the safest possible change.

Roadmap context

This is Wave 1A of the foundation-hardening roadmap (12-week plan). Subsequent waves:

Wave 1B: Batched security-libs floor bumps (cryptography, urllib3, PyJWT, bcrypt, pyotp, python-multipart) — single PR
Wave 1C: pip-audit ignore list audit
Wave 2: Each of FastAPI/httpx/aiohttp/uvicorn/boto3/websockets minor bumps in isolated PRs (per Codex review feedback)
Wave 3: Heavy-eco bumps (langchain, OpenAI, MCP, supabase, weaviate, redis) — also isolated

…Wave 1A) Pure paperwork: ci_install_project.sh stated dep floors that were strictly less restrictive than pyproject.toml's. The constraint solver was already enforcing the higher pyproject floors at install time, so this change has zero behavioral impact — but having the two files disagree creates noise (pip-audit reports, dependabot reasoning) and makes future bumps confusing. Aligned floors: pydantic 2.0 -> 2.13.2 (pyproject [test]) fastapi 0.109.0 -> 0.135.3 (pyproject [gateway]/[all]) uvicorn[standard] 0.27.0 -> 0.44.0 (pyproject [gateway]/[all]) pytest 7.0 -> 9.0.3 (pyproject [test]) mypy 1.8 -> 1.19.0 (pyproject [dev]) asyncpg 0.29.0 -> 0.31.0 (pyproject [enterprise]/[all]) python3-saml 1.15 -> 1.16.0 (pyproject [enterprise]/[all]) Verified all floors resolve to real PyPI versions in a clean venv: pydantic 2.13.3, fastapi 0.136.1, uvicorn 0.46.0, pytest 9.0.3, mypy 1.20.2, asyncpg 0.31.0. Added a short header comment explaining the lockstep policy so future edits keep the two files synchronized. This is Wave 1A of the foundation-hardening roadmap. Wave 1B will batch low-risk security floor bumps (cryptography, urllib3, PyJWT, bcrypt, pyotp, python-multipart) in a single PR. Wave 2 will isolate each FastAPI/httpx/pytest minor bump into its own PR per Codex review. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

github-actions · 2026-04-25T04:56:45Z

Aragora Code Review

Advisory-only review. Findings are surfaced for follow-up and do not fail this workflow.

Revised Security Reviewer Analysis: PR Diff Review (Round 2 — Skeptical Pass)

Note on Untrusted Trending Context

Disregarding the trending context block (Chinese-language HTML design tooling on GitHub) — irrelevant to this Python dependency review and explicitly marked untrusted.

Round 2 Posture: Skeptical Self-Review

The critic's Round 2 critique is essentially a verbatim restatement of their Round 1 critique and my own Round 1 findings. Severity ticked from 2.5 → 3.5, but no new substantive issues were raised. Per skeptical-mode instructions, I should not simply re-affirm convergence — I should stress-test my own previous proposal for weaknesses, edge cases, and unsupported claims. That's where this revision adds value.

Self-Skeptical Audit of My Round 1 Proposal

Weakness 1: My "Option A" recommendation is under-justified and possibly wrong

My Round 1 said:

Option A (best, ~3 lines): Delegate to pip's resolver entirely.
python -m pip install -e ".[test,gateway,enterprise,dev]"

On reflection, this is a weak recommendation given what the diff actually shows. Evidence in the diff that I previously underweighted:

The script defines LEGACY_CONTROL_PLANE_PACKAGE_NAME="aragora-debate" and LEGACY_CONTROL_PLANE_MARKER_PATH="aragora/server", and the three arrays are explicitly named LEGACY_CONTROL_PLANE_*. The naming strongly suggests this script handles a code path where the current pyproject.toml is not the project being installed — likely a legacy checkout, monorepo subdirectory, or vendored snapshot where pip install -e ".[...]" would resolve to the wrong project's extras, or fail entirely.
If that hypothesis is correct, then Option A as I wrote it would break the legacy control plane install path while appearing to "simplify" things. That's a regression masquerading as a cleanup.
The critic also did not catch this — they parroted Option A as "preferred." Two reviewers agreeing doesn't make it correct when both are reasoning from the same incomplete information.

Correction: I should have flagged this as a hypothesis requiring verification, not a recommendation. Demoting Option A in this revision.

Weakness 2: My drift-check script (Option C) has real bugs

I wrote it quickly in Round 1 and the critic accepted it without scrutiny. On re-read:

Tuple comparison bug: tuple(map(int, ver.split(".")[:3] + ["0","0","0"][:3])) — the slice [:3] on the right side gives ["0","0","0"], but the concatenation happens before the second [:3] only if precedence is read carelessly. Actually ver.split(".")[:3] + ["0","0","0"][:3] evaluates as (ver.split(".")[:3]) + (["0","0","0"][:3]), which can be a 4–6 element list, not a 3-tuple. Comparison still works element-wise but inconsistent lengths cause incorrect ordering.
Non-numeric version segments: int() will raise on pre-release tags like 1.19.0rc1 or 2024.1. The script would crash on yt-dlp>=2024.1 (which is in the diff).
Regex misses bracketed extras correctly but treats uvicorn[standard] as a different name than uvicorn when comparing to pyproject — pyproject typically lists uvicorn[standard]>=... but my normalization strips [standard] from one side and not the other consistently. There's an asymmetry bug.
No handling for environment markers (; python_version >= "3.11") which may appear in pyproject extras.
Picking "highest declared floor across groups" is wrong for verification — if pyproject has pydantic>=2.13.2 in [test] and pydantic>=2.10.0 in [gateway], the script's floor of 2.13.2 is correct (it satisfies both), but my drift checker would report drift if the script pin differs from any single group's pin.

Correction: The right approach for a drift check is to use packaging.requirements.Requirement and packaging.version.Version from the standard ecosystem, not hand-rolled regex+int parsing. I'll provide a corrected sketch but flag it as needing real testing before merge.

Weakness 3: My "no security issues" claim was too clean

I said "No issues found" for security in the diff. On skeptical re-read, there's a subtler concern I glossed over:

cryptography>=46.0,<48.0 (unchanged in this PR, but visible in context): the <48.0 upper bound is a security ceiling. If cryptography 48.x ships with a fix for a vulnerability present in 46.x/47.x, this ceiling would prevent the security update from being installed. The same applies to all the <X.0 upper bounds in the file.
This is a known pattern problem with upper-bounded version pins ([Henry Schreiner's "Should you upper-bound your dependencies?" is the canonical write-up]). It's not introduced by this PR, but it's a latent security concern in the file the PR touches.

Correction: Add a LOW finding about upper-bound ceilings on security-sensitive packages. This is something neither the critic nor my prior selves caught.

Weakness 4: The `python3-saml` "directionally good" language is still weasel-worded

I removed the unverified CVE claim but kept "directionally good" without evidence. Skeptically: if I can't cite a reason, I shouldn't editorialize. Let me actually check.

Source: python3-saml Changelog

Source: python-saml/changelog.md at master · SAML-Toolkits/python-saml

Source: GitHub - SAML-Toolkits/python3-saml · GitHub

Source: python-saml/changelog.md at master · poupyi/python-saml

Source: python3-saml 1.2.6

Source: python3-saml/changelog.md at master · SAML-Toolkits/python3-saml

Source: Release OneLogin's SAML Python3 Toolkit v1.4.0 · SAML-Toolkits/python3-saml

Source: Migrate from onelogin namespace to python3_saml namespace by Nyefan · Pull Request #441 · SAML-Toolkits/python3-saml

Source: python3-saml 1.2.5

Source: GitHub - horizon3ai/CVE-2022-47966: POC for CVE-2022-47966 affecting multiple ManageEngine products · GitHub
Now I have actual evidence. Updating the SAML claim with citations.

Final Revised Findings

Security

LOW — Floor bump for `python3-saml` is security-relevant (now evidence-backed)

Issue (re-characterized as positive observation): The bump from >=1.15 to >=1.16.0 is not merely "directionally good" — there is a documented reason.
There is a HIGH CVE in xmlsec and lxml that requires updating python3-saml to 1.16.0 to mitigate unless one wants to build them from source every time
, and
the underlying issue references CVE-2024-37388 in lxml, where upgrading lxml requires updating onelogin/python3-saml to a compatible point
. So this floor bump is concretely justified, not speculative.
Suggestion: The PR description (or an inline comment on the python3-saml line) should reference CVE-2024-37388 / the lxml-xmlsec interaction. This converts an audit-opaque pin into an audit-friendly one.

LOW — Unverifiable alignment claim (retained, unchanged)

Same as before: the "aligned to pyproject [X]" comments cannot be verified from this diff alone. Still LOW. Drift-check below addresses it.

LOW — Upper-bound version ceilings on security-sensitive packages (NEW, identified during skeptical self-audit)

Category: Security
Severity: LOW
Location: throughout the file (e.g., cryptography>=46.0,<48.0, PyJWT>=2.8,<3.0, python3-saml>=1.16.0,<2.0)
Issue: Strict major-version upper bounds on security-critical libraries can block emergency security upgrades. If cryptography 48.0 ships with a fix for a vulnerability in 47.x, this script would refuse to install it without a code change. This isn't introduced by this PR (the ceilings already existed), but it's a latent issue in the file under review and worth flagging while we're touching it.
Suggestion: Consider whether security-critical packages (cryptography, PyJWT, python3-saml, bcrypt) warrant looser ceilings (e.g., no upper bound, or <X+2.0) with compensating CI testing on the latest versions. Out of scope for this PR; track as follow-up.

Performance

No issues found. Install-time only.

Code Quality

MEDIUM — Dual source of truth perpetuated (retained)

Same finding, same severity, same suggested options — but with two important corrections from my Round 1 self-audit:

Option A demoted from "best" to "investigate first": The LEGACY_CONTROL_PLANE_* naming and the existence of LEGACY_CONTROL_PLANE_PACKAGE_NAME="aragora-debate" / LEGACY_CONTROL_PLANE_MARKER_PATH="aragora/server" strongly imply this script may install a different package than the current pyproject.toml describes. Recommending pip install -e ".[...]" without verifying this is irresponsible. Verify the install context before adopting Option A.
Option C drift-check script needs hardening: My Round 1 implementation has bugs (non-numeric version segments crash int(), asymmetric [extra] handling, naive cross-group floor reconciliation). Use packaging.requirements.Requirement and packaging.version.Version instead of regex+int math. Sketch:

# scripts/check_dep_floor_drift.py
import sys, tomllib, pathlib, re
from packaging.requirements import Requirement
from packaging.version import Version, InvalidVersion

def floor_of(req: Requirement) -> Version | None:
    for spec in req.specifier:
        if spec.operator == ">=":
            try:
                return Version(spec.version)
            except InvalidVersion:
                return None
    return None

pp = tomllib.loads(pathlib.Path("pyproject.toml").read_text())
groups = pp.get("project", {}).get("optional-dependencies", {})

# Map: package name (lowercased, no extras) -> max floor across all groups
canonical: dict[str, Version] = {}
for reqs in groups.values():
    for raw in reqs:
        try:
            r = Requirement(raw)
        except Exception:
            continue
        f = floor_of(r)
        if f is None:
            continue
        key = r.name.lower()
        if key not in canonical or f > canonical[key]:
            canonical[key] = f

script_text = pathlib.Path("scripts/ci_install_project.sh").read_text()
# Match `"name>=ver..."` or `"name[extras]>=ver..."` inside the bash arrays
line_re = re.compile(r'"([A-Za-z0-9_.\-]+)(?:\[[^\]]+\])?>=([^,"<]+)')
drift = []
for name, ver in line_re.findall(script_text):
    key = name.lower()
    if key not in canonical:
        continue
    try:
        sv = Version(ver.strip())
    except InvalidVersion:
        continue
    if sv < canonical[key]:
        drift.append((name, str(sv), str(canonical[key])))

if drift:
    for n, s, p in drift:
        print(f"DRIFT: {n} script floor {s} < pyproject floor {p}", file=sys.stderr)
    sys.exit(1)

Note this checks "script floor is at least pyproject's max floor" rather than strict equality — which is the correct semantic for a floor (script may legitimately pin higher, but never lower).

LOW — Inconsistent floor specificity (retained, unchanged)

LOW — `pydantic` comment may mislabel its group (retained, unchanged)

LOW — No per-bump rationale (retained, but partially resolved by evidence above)

For python3-saml specifically, the rationale now exists in citable form (CVE-2024-37388 / lxml-xmlsec interaction). For other bumps, the LOW finding stands.

What Changed in Round 2 (Skeptical Pass)

Change	Reason
Demoted Option A from "best" to "investigate first"	Self-audit caught that `LEGACY_CONTROL_PLANE_*` naming + `aragora-debate` package suggest the script may not be installing the same project as the current `pyproject.toml`. Recommending `pip install -e ".[...]"` without verifying this could break CI.
Rewrote drift-check script using `packaging` library	Self-audit caught real bugs in my Round 1 implementation: `int()` crashes on non-numeric segments (e.g., `2024.1` from yt-dlp), inconsistent `[extra]` handling, wrong cross-group reconciliation semantics.
Added evidence for `python3-saml` bump	Replaced weasel-worded "directionally good" with specific CVE reference:
python3-saml 1.16.0 is needed to pick up an lxml/xmlsec upgrade addressing CVE-2024-37388
.
Added NEW LOW finding on upper-bound ceilings	Skeptical re-read identified that `<48.0`, `<3.0` etc. on security-critical packages can block emergency upgrades. Neither the critic nor my prior selves caught this.
Did NOT add issues just to inflate the finding count	The critic's Round 2 critique added severity (2.5 → 3.5) but no new substantive issues. I'm not going to manufacture issues to match.

Where I Disagree with the Critic (Now)

Severity calibration: The critic raised severity from 2.5 to 3.5 between rounds without new evidence. I see no justification for the bump — the same MEDIUM/LOW findings shouldn't get more severe just because they're restated. Holding MEDIUM on the dual-source issue.
Option A as "preferred": I now actively disagree with the critic (and with my own Round 1 self) that pip install -e ".[...]" is the preferred fix. The diff's LEGACY_CONTROL_PLANE_* naming suggests this would break a legacy install path. Option C (drift detection) is the safer "preferred" choice until someone can verify the install context.

Overall Recommendation

Approve with:

The MEDIUM dual-source finding logged as a follow-up issue.
A request to add a CVE/rationale comment on the python3-saml line (citable evidence now exists).
Drift-check script (corrected version above) added in this PR or as immediate follow-up.

The PR itself remains low-risk and improves alignment. The structural concern is real, documented, and now better-evidenced — but does not block.

1 finding(s) across the diff

[CRITICAL] Finding

Finding

Generated by Aragora multi-agent code review

Bump low-risk crypto/auth parser dependency floors in scripts/ci_install_project.sh after #6583 alignment.

an0mium mentioned this pull request Apr 25, 2026

chore(deps): batched security-floor bumps for crypto/auth libs (Wave 1B) #6584

Merged

an0mium enabled auto-merge (squash) April 25, 2026 04:57

an0mium mentioned this pull request Apr 25, 2026

[automation] Stage-Gate Conductor Log #6432

Open

an0mium merged commit 1cd9616 into main Apr 25, 2026
41 checks passed

an0mium deleted the chore/dep-floor-alignment-w1a branch April 25, 2026 07:50

an0mium added a commit that referenced this pull request Apr 25, 2026

chore(deps): raise security dependency floors

1d1ccb6

Bump low-risk crypto/auth parser dependency floors in scripts/ci_install_project.sh after #6583 alignment.

an0mium mentioned this pull request Apr 25, 2026

[automation] Founder Brief #6106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): align ci_install_project.sh floors with pyproject.toml (Wave 1A)#6583

chore(deps): align ci_install_project.sh floors with pyproject.toml (Wave 1A)#6583
an0mium merged 1 commit intomainfrom
chore/dep-floor-alignment-w1a

an0mium commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

an0mium commented Apr 25, 2026

Wave 1A: dep floor reconciliation (paperwork PR)

Aligned floors

Verification

Why this is a paperwork PR

Roadmap context

Uh oh!

github-actions Bot commented Apr 25, 2026

Aragora Code Review

Revised Security Reviewer Analysis: PR Diff Review (Round 2 — Skeptical Pass)

Note on Untrusted Trending Context

Round 2 Posture: Skeptical Self-Review

Self-Skeptical Audit of My Round 1 Proposal

Weakness 1: My "Option A" recommendation is under-justified and possibly wrong

Weakness 2: My drift-check script (Option C) has real bugs

Weakness 3: My "no security issues" claim was too clean

Weakness 4: The python3-saml "directionally good" language is still weasel-worded

Final Revised Findings

Security

LOW — Floor bump for python3-saml is security-relevant (now evidence-backed)

LOW — Unverifiable alignment claim (retained, unchanged)

LOW — Upper-bound version ceilings on security-sensitive packages (NEW, identified during skeptical self-audit)

Performance

Code Quality

MEDIUM — Dual source of truth perpetuated (retained)

LOW — Inconsistent floor specificity (retained, unchanged)

LOW — pydantic comment may mislabel its group (retained, unchanged)

LOW — No per-bump rationale (retained, but partially resolved by evidence above)

What Changed in Round 2 (Skeptical Pass)

Where I Disagree with the Critic (Now)

Overall Recommendation

[CRITICAL] Finding

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Weakness 4: The `python3-saml` "directionally good" language is still weasel-worded

LOW — Floor bump for `python3-saml` is security-relevant (now evidence-backed)

LOW — `pydantic` comment may mislabel its group (retained, unchanged)