Skip to content

fix: enforce correct pass thresholds for deterministic and approximate tests (core_r0.17.0)#4314

Draft
ko3n1g wants to merge 2 commits intoNVIDIA:core_r0.17.0from
ko3n1g:ko3n1g/fix/enforce-pass-thresholds-core_r0.17.0
Draft

fix: enforce correct pass thresholds for deterministic and approximate tests (core_r0.17.0)#4314
ko3n1g wants to merge 2 commits intoNVIDIA:core_r0.17.0from
ko3n1g:ko3n1g/fix/enforce-pass-thresholds-core_r0.17.0

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Apr 15, 2026

Summary

Cherry-picks from `main` onto `core_r0.17.0`:

Commit Description
`4e85d74` fix: enforce correct pass thresholds for deterministic and approximate tests (`common.py` only)
`76371d4` Fix UT timeout (`test_attention_variant_dsa.py`)

`4e85d74` — Pass threshold fixes in `common.py`

  • Deterministic tests (`TypeOfTestResult.DETERMINISTIC`) and single-step evaluations now require all steps to pass (`np.all(is_close)`).
  • Approximate tests corrected formula: `>= 1 - (num_failing_steps_allowed / total_steps_evaluated)` instead of `>= (num_failing_steps_allowed / total_steps_evaluated)`.

Example — why the old formula was wrong:
```

Old (wrong): passes if mean(is_close) >= 1/100 = 0.01 → almost always passes

New (correct): passes if mean(is_close) >= 1 - 1/100 = 0.99 → requires 99/100 steps to pass

```

`76371d4` — Fix UT timeout

Restructures `test_attention_variant_dsa.py` to fix unit test timeouts.

Test plan

  • CI on `core_r0.17.0` passes

🤖 Generated with Claude Code

…e tests

Cherry-pick of 4e85d74 (common.py only)
from main onto core_r0.17.0.

- Deterministic tests and single-step evaluations now require all steps to
  pass (np.all), rather than a loose ratio-based threshold.
- Approximate tests keep the ratio-based check but with a corrected formula:
  `>= 1 - (num_failing_steps_allowed / total_steps_evaluated)` instead of
  `>= (num_failing_steps_allowed / total_steps_evaluated)`.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 15, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented Apr 15, 2026

/ok to test

Co-authored-by: Kunlun Li <kunlunl@cw-dfw-cs-001-login-02.cm.cluster>
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented Apr 15, 2026

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants