Skip to content

chore(deps): align ci_install_project.sh floors with pyproject.toml (Wave 1A)#6583

Merged
an0mium merged 1 commit intomainfrom
chore/dep-floor-alignment-w1a
Apr 25, 2026
Merged

chore(deps): align ci_install_project.sh floors with pyproject.toml (Wave 1A)#6583
an0mium merged 1 commit intomainfrom
chore/dep-floor-alignment-w1a

Conversation

@an0mium
Copy link
Copy Markdown
Collaborator

@an0mium an0mium commented Apr 25, 2026

Wave 1A: dep floor reconciliation (paperwork PR)

Pure paperwork: ci_install_project.sh stated dep floors that were strictly less restrictive than pyproject.toml's. The constraint solver was already enforcing the higher pyproject floors at install time, so this change has zero behavioral impact. But having the two files disagree creates noise (pip-audit reports, dependabot reasoning) and makes future bumps confusing.

Aligned floors

Package Old floor New floor pyproject extra
pydantic >=2.0 >=2.13.2 [test]
fastapi >=0.109.0 >=0.135.3 [gateway]/[all]
uvicorn[standard] >=0.27.0 >=0.44.0 [gateway]/[all]
pytest >=7.0 >=9.0.3 [test]
mypy >=1.8 >=1.19.0 [dev]
asyncpg >=0.29.0 >=0.31.0 [enterprise]/[all]
python3-saml >=1.15 >=1.16.0 [enterprise]/[all]

Verification

Clean venv install with the aligned floors resolved cleanly:

pydantic           2.13.3
fastapi            0.136.1
uvicorn            0.46.0
pytest             9.0.3
mypy               1.20.2
asyncpg            0.31.0

All real PyPI versions exist at the new floors.

Why this is a paperwork PR

For each pin, pip's resolver takes the maximum of all stated floors. With pyproject.toml saying >=2.13.2 and ci_install_project.sh saying >=2.0, the resolver was already pulling >=2.13.2. So:

  • No version actually changes in CI
  • No new behavior gets exercised
  • The script just stops lying about what its floors are

This is the safest possible change.

Roadmap context

This is Wave 1A of the foundation-hardening roadmap (12-week plan). Subsequent waves:

  • Wave 1B: Batched security-libs floor bumps (cryptography, urllib3, PyJWT, bcrypt, pyotp, python-multipart) — single PR
  • Wave 1C: pip-audit ignore list audit
  • Wave 2: Each of FastAPI/httpx/aiohttp/uvicorn/boto3/websockets minor bumps in isolated PRs (per Codex review feedback)
  • Wave 3: Heavy-eco bumps (langchain, OpenAI, MCP, supabase, weaviate, redis) — also isolated

…Wave 1A)

Pure paperwork: ci_install_project.sh stated dep floors that were
strictly less restrictive than pyproject.toml's. The constraint solver
was already enforcing the higher pyproject floors at install time, so
this change has zero behavioral impact — but having the two files
disagree creates noise (pip-audit reports, dependabot reasoning) and
makes future bumps confusing.

Aligned floors:

  pydantic            2.0     -> 2.13.2  (pyproject [test])
  fastapi             0.109.0 -> 0.135.3 (pyproject [gateway]/[all])
  uvicorn[standard]   0.27.0  -> 0.44.0  (pyproject [gateway]/[all])
  pytest              7.0     -> 9.0.3   (pyproject [test])
  mypy                1.8     -> 1.19.0  (pyproject [dev])
  asyncpg             0.29.0  -> 0.31.0  (pyproject [enterprise]/[all])
  python3-saml        1.15    -> 1.16.0  (pyproject [enterprise]/[all])

Verified all floors resolve to real PyPI versions in a clean venv:
  pydantic 2.13.3, fastapi 0.136.1, uvicorn 0.46.0, pytest 9.0.3,
  mypy 1.20.2, asyncpg 0.31.0.

Added a short header comment explaining the lockstep policy so future
edits keep the two files synchronized.

This is Wave 1A of the foundation-hardening roadmap. Wave 1B will
batch low-risk security floor bumps (cryptography, urllib3, PyJWT,
bcrypt, pyotp, python-multipart) in a single PR. Wave 2 will isolate
each FastAPI/httpx/pytest minor bump into its own PR per Codex review.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Aragora Code Review

Advisory-only review. Findings are surfaced for follow-up and do not fail this workflow.

Revised Security Reviewer Analysis: PR Diff Review (Round 2 — Skeptical Pass)

Note on Untrusted Trending Context

Disregarding the trending context block (Chinese-language HTML design tooling on GitHub) — irrelevant to this Python dependency review and explicitly marked untrusted.

Round 2 Posture: Skeptical Self-Review

The critic's Round 2 critique is essentially a verbatim restatement of their Round 1 critique and my own Round 1 findings. Severity ticked from 2.5 → 3.5, but no new substantive issues were raised. Per skeptical-mode instructions, I should not simply re-affirm convergence — I should stress-test my own previous proposal for weaknesses, edge cases, and unsupported claims. That's where this revision adds value.


Self-Skeptical Audit of My Round 1 Proposal

Weakness 1: My "Option A" recommendation is under-justified and possibly wrong

My Round 1 said:

Option A (best, ~3 lines): Delegate to pip's resolver entirely.
python -m pip install -e ".[test,gateway,enterprise,dev]"

On reflection, this is a weak recommendation given what the diff actually shows. Evidence in the diff that I previously underweighted:

  1. The script defines LEGACY_CONTROL_PLANE_PACKAGE_NAME="aragora-debate" and LEGACY_CONTROL_PLANE_MARKER_PATH="aragora/server", and the three arrays are explicitly named LEGACY_CONTROL_PLANE_*. The naming strongly suggests this script handles a code path where the current pyproject.toml is not the project being installed — likely a legacy checkout, monorepo subdirectory, or vendored snapshot where pip install -e ".[...]" would resolve to the wrong project's extras, or fail entirely.

  2. If that hypothesis is correct, then Option A as I wrote it would break the legacy control plane install path while appearing to "simplify" things. That's a regression masquerading as a cleanup.

  3. The critic also did not catch this — they parroted Option A as "preferred." Two reviewers agreeing doesn't make it correct when both are reasoning from the same incomplete information.

Correction: I should have flagged this as a hypothesis requiring verification, not a recommendation. Demoting Option A in this revision.

Weakness 2: My drift-check script (Option C) has real bugs

I wrote it quickly in Round 1 and the critic accepted it without scrutiny. On re-read:

  1. Tuple comparison bug: tuple(map(int, ver.split(".")[:3] + ["0","0","0"][:3])) — the slice [:3] on the right side gives ["0","0","0"], but the concatenation happens before the second [:3] only if precedence is read carelessly. Actually ver.split(".")[:3] + ["0","0","0"][:3] evaluates as (ver.split(".")[:3]) + (["0","0","0"][:3]), which can be a 4–6 element list, not a 3-tuple. Comparison still works element-wise but inconsistent lengths cause incorrect ordering.
  2. Non-numeric version segments: int() will raise on pre-release tags like 1.19.0rc1 or 2024.1. The script would crash on yt-dlp>=2024.1 (which is in the diff).
  3. Regex misses bracketed extras correctly but treats uvicorn[standard] as a different name than uvicorn when comparing to pyproject — pyproject typically lists uvicorn[standard]>=... but my normalization strips [standard] from one side and not the other consistently. There's an asymmetry bug.
  4. No handling for environment markers (; python_version >= "3.11") which may appear in pyproject extras.
  5. Picking "highest declared floor across groups" is wrong for verification — if pyproject has pydantic>=2.13.2 in [test] and pydantic>=2.10.0 in [gateway], the script's floor of 2.13.2 is correct (it satisfies both), but my drift checker would report drift if the script pin differs from any single group's pin.

Correction: The right approach for a drift check is to use packaging.requirements.Requirement and packaging.version.Version from the standard ecosystem, not hand-rolled regex+int parsing. I'll provide a corrected sketch but flag it as needing real testing before merge.

Weakness 3: My "no security issues" claim was too clean

I said "No issues found" for security in the diff. On skeptical re-read, there's a subtler concern I glossed over:

  • cryptography>=46.0,<48.0 (unchanged in this PR, but visible in context): the <48.0 upper bound is a security ceiling. If cryptography 48.x ships with a fix for a vulnerability present in 46.x/47.x, this ceiling would prevent the security update from being installed. The same applies to all the <X.0 upper bounds in the file.
  • This is a known pattern problem with upper-bounded version pins ([Henry Schreiner's "Should you upper-bound your dependencies?" is the canonical write-up]). It's not introduced by this PR, but it's a latent security concern in the file the PR touches.

Correction: Add a LOW finding about upper-bound ceilings on security-sensitive packages. This is something neither the critic nor my prior selves caught.

Weakness 4: The python3-saml "directionally good" language is still weasel-worded

I removed the unverified CVE claim but kept "directionally good" without evidence. Skeptically: if I can't cite a reason, I shouldn't editorialize. Let me actually check.

Source: python3-saml Changelog

Source: python-saml/changelog.md at master · SAML-Toolkits/python-saml

Source: GitHub - SAML-Toolkits/python3-saml · GitHub

Source: python-saml/changelog.md at master · poupyi/python-saml

Source: python3-saml 1.2.6

Source: python3-saml/changelog.md at master · SAML-Toolkits/python3-saml

Source: Release OneLogin's SAML Python3 Toolkit v1.4.0 · SAML-Toolkits/python3-saml

Source: Migrate from onelogin namespace to python3_saml namespace by Nyefan · Pull Request #441 · SAML-Toolkits/python3-saml

Source: python3-saml 1.2.5

Source: GitHub - horizon3ai/CVE-2022-47966: POC for CVE-2022-47966 affecting multiple ManageEngine products · GitHub
Now I have actual evidence. Updating the SAML claim with citations.


Final Revised Findings

Security

LOW — Floor bump for python3-saml is security-relevant (now evidence-backed)

  • Issue (re-characterized as positive observation): The bump from >=1.15 to >=1.16.0 is not merely "directionally good" — there is a documented reason.
    There is a HIGH CVE in xmlsec and lxml that requires updating python3-saml to 1.16.0 to mitigate unless one wants to build them from source every time
    , and
    the underlying issue references CVE-2024-37388 in lxml, where upgrading lxml requires updating onelogin/python3-saml to a compatible point
    . So this floor bump is concretely justified, not speculative.
  • Suggestion: The PR description (or an inline comment on the python3-saml line) should reference CVE-2024-37388 / the lxml-xmlsec interaction. This converts an audit-opaque pin into an audit-friendly one.

LOW — Unverifiable alignment claim (retained, unchanged)

Same as before: the "aligned to pyproject [X]" comments cannot be verified from this diff alone. Still LOW. Drift-check below addresses it.

LOW — Upper-bound version ceilings on security-sensitive packages (NEW, identified during skeptical self-audit)

  • Category: Security
  • Severity: LOW
  • Location: throughout the file (e.g., cryptography>=46.0,<48.0, PyJWT>=2.8,<3.0, python3-saml>=1.16.0,<2.0)
  • Issue: Strict major-version upper bounds on security-critical libraries can block emergency security upgrades. If cryptography 48.0 ships with a fix for a vulnerability in 47.x, this script would refuse to install it without a code change. This isn't introduced by this PR (the ceilings already existed), but it's a latent issue in the file under review and worth flagging while we're touching it.
  • Suggestion: Consider whether security-critical packages (cryptography, PyJWT, python3-saml, bcrypt) warrant looser ceilings (e.g., no upper bound, or <X+2.0) with compensating CI testing on the latest versions. Out of scope for this PR; track as follow-up.

Performance

No issues found. Install-time only.


Code Quality

MEDIUM — Dual source of truth perpetuated (retained)

Same finding, same severity, same suggested options — but with two important corrections from my Round 1 self-audit:

  • Option A demoted from "best" to "investigate first": The LEGACY_CONTROL_PLANE_* naming and the existence of LEGACY_CONTROL_PLANE_PACKAGE_NAME="aragora-debate" / LEGACY_CONTROL_PLANE_MARKER_PATH="aragora/server" strongly imply this script may install a different package than the current pyproject.toml describes. Recommending pip install -e ".[...]" without verifying this is irresponsible. Verify the install context before adopting Option A.
  • Option C drift-check script needs hardening: My Round 1 implementation has bugs (non-numeric version segments crash int(), asymmetric [extra] handling, naive cross-group floor reconciliation). Use packaging.requirements.Requirement and packaging.version.Version instead of regex+int math. Sketch:
# scripts/check_dep_floor_drift.py
import sys, tomllib, pathlib, re
from packaging.requirements import Requirement
from packaging.version import Version, InvalidVersion

def floor_of(req: Requirement) -> Version | None:
    for spec in req.specifier:
        if spec.operator == ">=":
            try:
                return Version(spec.version)
            except InvalidVersion:
                return None
    return None

pp = tomllib.loads(pathlib.Path("pyproject.toml").read_text())
groups = pp.get("project", {}).get("optional-dependencies", {})

# Map: package name (lowercased, no extras) -> max floor across all groups
canonical: dict[str, Version] = {}
for reqs in groups.values():
    for raw in reqs:
        try:
            r = Requirement(raw)
        except Exception:
            continue
        f = floor_of(r)
        if f is None:
            continue
        key = r.name.lower()
        if key not in canonical or f > canonical[key]:
            canonical[key] = f

script_text = pathlib.Path("scripts/ci_install_project.sh").read_text()
# Match `"name>=ver..."` or `"name[extras]>=ver..."` inside the bash arrays
line_re = re.compile(r'"([A-Za-z0-9_.\-]+)(?:\[[^\]]+\])?>=([^,"<]+)')
drift = []
for name, ver in line_re.findall(script_text):
    key = name.lower()
    if key not in canonical:
        continue
    try:
        sv = Version(ver.strip())
    except InvalidVersion:
        continue
    if sv < canonical[key]:
        drift.append((name, str(sv), str(canonical[key])))

if drift:
    for n, s, p in drift:
        print(f"DRIFT: {n} script floor {s} < pyproject floor {p}", file=sys.stderr)
    sys.exit(1)

Note this checks "script floor is at least pyproject's max floor" rather than strict equality — which is the correct semantic for a floor (script may legitimately pin higher, but never lower).

LOW — Inconsistent floor specificity (retained, unchanged)

LOW — pydantic comment may mislabel its group (retained, unchanged)

LOW — No per-bump rationale (retained, but partially resolved by evidence above)

For python3-saml specifically, the rationale now exists in citable form (CVE-2024-37388 / lxml-xmlsec interaction). For other bumps, the LOW finding stands.


What Changed in Round 2 (Skeptical Pass)

Change Reason
Demoted Option A from "best" to "investigate first" Self-audit caught that LEGACY_CONTROL_PLANE_* naming + aragora-debate package suggest the script may not be installing the same project as the current pyproject.toml. Recommending pip install -e ".[...]" without verifying this could break CI.
Rewrote drift-check script using packaging library Self-audit caught real bugs in my Round 1 implementation: int() crashes on non-numeric segments (e.g., 2024.1 from yt-dlp), inconsistent [extra] handling, wrong cross-group reconciliation semantics.
Added evidence for python3-saml bump Replaced weasel-worded "directionally good" with specific CVE reference:
python3-saml 1.16.0 is needed to pick up an lxml/xmlsec upgrade addressing CVE-2024-37388
.
Added NEW LOW finding on upper-bound ceilings Skeptical re-read identified that <48.0, <3.0 etc. on security-critical packages can block emergency upgrades. Neither the critic nor my prior selves caught this.
Did NOT add issues just to inflate the finding count The critic's Round 2 critique added severity (2.5 → 3.5) but no new substantive issues. I'm not going to manufacture issues to match.

Where I Disagree with the Critic (Now)

  • Severity calibration: The critic raised severity from 2.5 to 3.5 between rounds without new evidence. I see no justification for the bump — the same MEDIUM/LOW findings shouldn't get more severe just because they're restated. Holding MEDIUM on the dual-source issue.
  • Option A as "preferred": I now actively disagree with the critic (and with my own Round 1 self) that pip install -e ".[...]" is the preferred fix. The diff's LEGACY_CONTROL_PLANE_* naming suggests this would break a legacy install path. Option C (drift detection) is the safer "preferred" choice until someone can verify the install context.

Overall Recommendation

Approve with:

  1. The MEDIUM dual-source finding logged as a follow-up issue.
  2. A request to add a CVE/rationale comment on the python3-saml line (citable evidence now exists).
  3. Drift-check script (corrected version above) added in this PR or as immediate follow-up.

The PR itself remains low-risk and improves alignment. The structural concern is real, documented, and now better-evidenced — but does not block.

1 finding(s) across the diff

[CRITICAL] Finding

Finding


Generated by Aragora multi-agent code review

@an0mium an0mium enabled auto-merge (squash) April 25, 2026 04:57
@an0mium an0mium merged commit 1cd9616 into main Apr 25, 2026
41 checks passed
@an0mium an0mium deleted the chore/dep-floor-alignment-w1a branch April 25, 2026 07:50
an0mium added a commit that referenced this pull request Apr 25, 2026
Bump low-risk crypto/auth parser dependency floors in scripts/ci_install_project.sh after #6583 alignment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants