-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(scripts): [WIP] Add dependency version scanner tool #16867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
chalmerlowe
wants to merge
36
commits into
main
Choose a base branch
from
feat/add-version-scanner
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
f446ff7
feat(scripts): Add dependency version scanner tool
chalmerlowe 256b048
perf(search): Apply bot suggestions for regex optimization and imports
chalmerlowe 1010399
refactor(benchmark): Use tempfile for unique names and safe cleanup
chalmerlowe 68f61ee
refactor(benchmark): Remove redundant directory check
chalmerlowe cc960b4
test(integration): Check exit code of subprocess in integration test
chalmerlowe a4ad9ce
test(unit): Remove redundant and brittle test_regex_patterns
chalmerlowe 2743957
test(unit): Move import yaml to top of file
chalmerlowe 47450bb
refactor(benchmark): Remove redundant directory check in main
chalmerlowe c777e44
test(unit): Remove duplicate import yaml from function
chalmerlowe 8aab801
feat(version_scanner): handle invalid format strings in config and ad…
chalmerlowe f63053c
feat(version_scanner): handle PermissionError when reading config fil…
chalmerlowe 2af97b3
feat(version_scanner): extract read_package_file and handle file errors
chalmerlowe cb29438
refactor(version_scanner): simplify target resolution and remove dupl…
chalmerlowe ea0e8be
feat(version_scanner): add format_match_for_csv helper and tests
chalmerlowe a8824af
feat(version_scanner): integrate GitHub link generation into CSV report
chalmerlowe baafb74
feat(version_scanner): default output to results directory
chalmerlowe a1cc08e
feat(version_scanner): ignore version_scanner directory during scan
chalmerlowe 3ceea9b
feat(version_scanner): broaden version regex and add case insensitivity
chalmerlowe d756c07
feat(version_scanner): strip newlines from matched strings
chalmerlowe 075d04b
feat(version_scanner): add word boundaries and truncate long context …
chalmerlowe 85e9ff5
feat(version_scanner): add console summary table
chalmerlowe 5c8f673
feat(version_scanner): add .scannerignore file support
chalmerlowe efb3331
feat(version_scanner): move ignore defaults to .scannerignore file
chalmerlowe bf39072
docs(version_scanner): add README.md
chalmerlowe 9d9ce22
docs(version_scanner): update README options and CLI help strings
chalmerlowe 14e4dcc
feat(version_scanner): set default for --github-repo
chalmerlowe 7fc03ca
feat(version_scanner): default config path to script directory
chalmerlowe f64eac4
feat(version_scanner): support case-insensitive file ignores and add …
chalmerlowe fc47dd6
feat(version_scanner): update small package list for demos
chalmerlowe 95f6f19
Merge remote-tracking branch 'origin/main' into feat/add-version-scanner
chalmerlowe 761def6
Merge branch 'origin/main' into feat/add-version-scanner
chalmerlowe 9289c8c
feat(version_scanner): add combined_version_string rule and use word …
chalmerlowe d771258
feat(scanner): add ability to detect ignore pragma
chalmerlowe bafae70
feat(scanner): move .scannerignore to script directory and update loo…
chalmerlowe 94174bb
chore(scanner): ignore repositories.bzl in scanner
chalmerlowe d652dbf
feat(scanner): add filename scanning support
chalmerlowe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| .conductor/ | ||
| scanner_report.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| import argparse | ||
| import os | ||
| import random | ||
| import subprocess | ||
| import sys | ||
| import time | ||
| from typing import List, Dict | ||
|
|
||
| def get_package_subset(packages_dir: str, count: int) -> List[str]: | ||
| """ | ||
| Get a randomized subset of package names from the specified directory. | ||
|
|
||
| Args: | ||
| packages_dir: Path to the directory containing packages. | ||
| count: Number of packages to return. | ||
|
|
||
| Returns: | ||
| A list of package directory names. | ||
| """ | ||
| try: | ||
| all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))] | ||
| except FileNotFoundError: | ||
| print(f"Error: Packages directory not found: {packages_dir}") | ||
| return [] | ||
|
|
||
| if count >= len(all_packages): | ||
| return all_packages | ||
|
|
||
| return random.sample(all_packages, count) | ||
|
|
||
| def run_benchmark( | ||
| scanner_path: str, | ||
| root_path: str, | ||
| package_file: str, | ||
| dependency: str, | ||
| version: str | ||
| ) -> float: | ||
| """ | ||
| Run the scanner and return the duration in seconds. | ||
| """ | ||
| cmd = [ | ||
| "python3", scanner_path, | ||
| "-d", dependency, | ||
| "-v", version, | ||
| "-p", root_path, | ||
| "--package-file", package_file | ||
| ] | ||
|
|
||
| start_time = time.perf_counter() | ||
|
|
||
| try: | ||
| result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) | ||
| except subprocess.CalledProcessError as e: | ||
| print(f"Error running benchmark: {e}") | ||
| return -1.0 | ||
|
|
||
| duration = time.perf_counter() - start_time | ||
| return duration | ||
|
|
||
| def run_benchmarks( | ||
| scanner_path: str, | ||
| root_path: str, | ||
| packages_dir: str, | ||
| counts: List[int], | ||
| dependency: str, | ||
| version: str | ||
| ) -> Dict[int, float]: | ||
| """Runs benchmarks for specified counts and returns a dict of results.""" | ||
| results = {} | ||
|
|
||
| for count in counts: | ||
| subset = get_package_subset(packages_dir, count) | ||
| print(f" Testing {len(subset)} packages (e.g., {subset[:3]}...)") | ||
|
|
||
| # Create temp package file | ||
| pkg_file = "temp_packages.txt" | ||
| with open(pkg_file, 'w') as f: | ||
| for pkg in subset: | ||
| f.write(f"packages/{pkg}\n") | ||
|
|
||
| duration = run_benchmark(scanner_path, root_path, pkg_file, dependency, version) | ||
| results[count] = duration | ||
|
|
||
| # Clean up | ||
| if os.path.exists(pkg_file): | ||
| os.remove(pkg_file) | ||
|
|
||
| return results | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Benchmark the version scanner.") | ||
|
|
||
| parser.add_argument( | ||
| "-s", "--scanner-path", | ||
| default="version_scanner.py", | ||
| help="Path to version_scanner.py" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-r", "--root-path", | ||
| required=True, | ||
| help="Path to the monorepo root directory" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-p", "--packages-dir", | ||
| help="Path to packages directory (defaults to <root-path>/packages)" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-d", "--dependency", | ||
| default="python", | ||
| help="Dependency to search for" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-v", "--version", | ||
| default="3.7", | ||
| help="Version to search for" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-c", "--counts", | ||
| default="1,10,50", | ||
| help="Comma-separated list of package counts to test" | ||
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| packages_dir = args.packages_dir or os.path.join(args.root_path, "packages") | ||
|
|
||
| if not os.path.exists(packages_dir): | ||
| print(f"Error: Packages directory not found: {packages_dir}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| counts = [int(c) for c in args.counts.split(',')] | ||
|
|
||
| try: | ||
| all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))] | ||
| except FileNotFoundError: | ||
| print(f"Error: Packages directory not found: {packages_dir}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
chalmerlowe marked this conversation as resolved.
Outdated
|
||
|
|
||
| total_packages = len(all_packages) | ||
|
|
||
| print(f"Found {total_packages} packages in {packages_dir}") | ||
|
|
||
| # Filter counts that are greater than total packages | ||
| counts = [c for c in counts if c <= total_packages] | ||
| # Add total if not already there | ||
| if total_packages not in counts: | ||
| counts.append(total_packages) | ||
|
|
||
| print(f"Running benchmarks for counts: {counts}") | ||
|
|
||
| results = run_benchmarks( | ||
| scanner_path=args.scanner_path, | ||
| root_path=args.root_path, | ||
| packages_dir=packages_dir, | ||
| counts=counts, | ||
| dependency=args.dependency, | ||
| version=args.version | ||
| ) | ||
|
|
||
| print("\nBenchmark Results:") | ||
| print(f"{'Packages':<10} | {'Time (seconds)':<15}") | ||
| print("-" * 30) | ||
| for count, duration in results.items(): | ||
| print(f"{count:<10} | {duration:<15.4f}") | ||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| description: Search rules for identifying dependency versions | ||
| rules: | ||
| - name: explicit_version_string | ||
| description: Finds explicit version strings in code or configs. | ||
| examples: | ||
| - "'3.7'" | ||
| - '"3.7.1"' | ||
| - "'3.7.12'" | ||
| rules: | ||
| - | | ||
| ['"]{major}\.{minor}(\.\d+)?['"] | ||
|
|
||
| - name: python_requires | ||
| description: Finds various forms of python_requires declarations. | ||
| applies_to: [python] | ||
| examples: | ||
| - "python_requires = '==3.7'" | ||
| - "python_requires = '>=3.7'" | ||
| - "python_requires = '<=3.7'" | ||
| - "python_requires = '>3.6'" | ||
| - "python_requires = '<3.8'" | ||
| rules: | ||
| - | | ||
| python_requires\s*=\s*['"]==3\.{minor}['"] | ||
| - | | ||
| python_requires\s*=\s*['"]>=3\.{minor}['"] | ||
| - | | ||
| python_requires\s*=\s*['"]<=3\.{minor}['"] | ||
| - | | ||
| python_requires\s*=\s*['"]>3\.{minor_minus_one}['"] | ||
| - | | ||
| python_requires\s*=\s*['"]<3\.{minor_plus_one}['"] | ||
|
|
||
| - name: sys_version_info | ||
| description: Finds sys.version_info checks in code. | ||
| applies_to: [python] | ||
| examples: | ||
| - "sys.version_info == (3, 7)" | ||
| - "sys.version_info >= (3, 7)" | ||
| - "sys.version_info <= (3, 7)" | ||
| - "sys.version_info > (3, 6)" | ||
| - "sys.version_info < (3, 8)" | ||
| - "sys.version_info.minor == 7" | ||
| - "sys.version_info.minor >= 7" | ||
| - "sys.version_info.minor <= 7" | ||
| - "sys.version_info.minor > 6" | ||
| - "sys.version_info.minor < 8" | ||
| rules: | ||
| - | | ||
| sys\.version_info\s*==\s*\(3,\s*{minor}\) | ||
| - | | ||
| sys\.version_info\s*>=\s*\(3,\s*{minor}\) | ||
| - | | ||
| sys\.version_info\s*<=\s*\(3,\s*{minor}\) | ||
| - | | ||
| sys\.version_info\s*>\s*\(3,\s*{minor_minus_one}\) | ||
| - | | ||
| sys\.version_info\s*<\s*\(3,\s*{minor_plus_one}\) | ||
| - | | ||
| sys\.version_info\.minor\s*==\s*{minor} | ||
| - | | ||
| sys\.version_info\.minor\s*>=\s*{minor} | ||
| - | | ||
| sys\.version_info\.minor\s*<=\s*{minor} | ||
| - | | ||
| sys\.version_info\.minor\s*>\s*{minor_minus_one} | ||
| - | | ||
| sys\.version_info\.minor\s*<\s*{minor_plus_one} | ||
|
|
||
| - name: python_env_short | ||
| description: Finds short python environment names often used in tox or nox. | ||
| applies_to: [python] | ||
| examples: | ||
| - "py37" | ||
| - "py37-cover" | ||
| rules: | ||
| - | | ||
| py3{minor} | ||
|
|
||
| - name: explicit_python_command | ||
| description: Finds explicit python commands with version. | ||
| applies_to: [python] | ||
| examples: | ||
| - "python3.7" | ||
| - "python3.7 -m pip" | ||
| rules: | ||
| - | | ||
| python3\.{minor} | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| python3.7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| python_requires = '>=3.7' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| print("Hello") |
35 changes: 35 additions & 0 deletions
35
scripts/version_scanner/tests/integration/test_scanner_integration.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| import csv | ||
| import os | ||
| import subprocess | ||
| import pytest | ||
|
|
||
| def test_integration_scan(tmp_path): | ||
| # Paths to real tools | ||
| scanner_path = os.path.abspath("version_scanner.py") | ||
| config_path = os.path.abspath("regex_config.yaml") | ||
|
|
||
| # Static data directory (which we haven't created yet!) | ||
| data_dir = os.path.abspath("tests/data") | ||
|
|
||
| # Run the scanner in the tmp_path so the output file is created there | ||
| cmd = [ | ||
| "python3", scanner_path, | ||
| "-d", "python", | ||
| "-v", "3.7", | ||
| "-p", data_dir, | ||
| "--config", config_path, | ||
| "-o", "scanner_report.csv" | ||
| ] | ||
|
|
||
| # This will fail because tests/data doesn't exist or is empty! | ||
| result = subprocess.run(cmd, cwd=tmp_path, capture_output=True, text=True) | ||
|
chalmerlowe marked this conversation as resolved.
Outdated
|
||
|
|
||
| report_file = tmp_path / "scanner_report.csv" | ||
| assert report_file.exists(), f"Report file not found. Stderr: {result.stderr}" | ||
|
|
||
| with open(report_file, 'r', encoding='utf-8') as f: | ||
| reader = csv.DictReader(f) | ||
| rows = list(reader) | ||
|
|
||
| # We expect at least some matches when we build the data directory | ||
| assert len(rows) > 0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| import os | ||
| import pytest | ||
| from unittest.mock import patch | ||
| from benchmark import get_package_subset, run_benchmark, run_benchmarks | ||
|
|
||
| def test_get_package_subset(tmp_path): | ||
| # Create mock packages directory | ||
| packages_dir = tmp_path / "packages" | ||
| packages_dir.mkdir() | ||
|
|
||
| for i in range(10): | ||
| (packages_dir / f"pkg_{i}").mkdir() | ||
|
|
||
| # Test getting a subset of 5 | ||
| subset = get_package_subset(str(packages_dir), 5) | ||
| assert len(subset) == 5 | ||
| for pkg in subset: | ||
| assert pkg.startswith("pkg_") | ||
|
|
||
| def test_get_package_subset_all(tmp_path): | ||
| packages_dir = tmp_path / "packages" | ||
| packages_dir.mkdir() | ||
|
|
||
| for i in range(5): | ||
| (packages_dir / f"pkg_{i}").mkdir() | ||
|
|
||
| # Test getting all | ||
| subset = get_package_subset(str(packages_dir), 10) # Request more than available | ||
| assert len(subset) == 5 # Should return all available | ||
|
|
||
| def test_run_benchmark(tmp_path): | ||
| # Create a dummy package file | ||
| package_file = tmp_path / "packages.txt" | ||
| package_file.write_text("pkg1\n") | ||
|
|
||
| # Create dummy package directory | ||
| packages_dir = tmp_path / "packages" | ||
| packages_dir.mkdir() | ||
| (packages_dir / "pkg1").mkdir() | ||
| (packages_dir / "pkg1" / "test.py").write_text("version = '3.7'\n") | ||
|
|
||
| scanner_path = "version_scanner.py" | ||
|
|
||
| duration = run_benchmark( | ||
| scanner_path=scanner_path, | ||
| root_path=str(tmp_path), | ||
| package_file=str(package_file), | ||
| dependency="python", | ||
| version="3.7" | ||
| ) | ||
|
|
||
| assert isinstance(duration, float) | ||
| assert duration >= 0 | ||
|
|
||
| # Test run_benchmarks | ||
| @patch('benchmark.run_benchmark') | ||
| def test_run_benchmarks(mock_run, tmp_path): | ||
| mock_run.return_value = 1.5 | ||
|
|
||
| packages_dir = tmp_path / "packages" | ||
| packages_dir.mkdir() | ||
| for i in range(5): | ||
| (packages_dir / f"pkg_{i}").mkdir() | ||
|
|
||
| results = run_benchmarks( | ||
| scanner_path="dummy.py", | ||
| root_path=str(tmp_path), | ||
| packages_dir=str(packages_dir), | ||
| counts=[1, 3], | ||
| dependency="python", | ||
| version="3.7" | ||
| ) | ||
|
|
||
| assert len(results) == 2 | ||
| assert results[1] == 1.5 | ||
| assert results[3] == 1.5 | ||
| assert mock_run.call_count == 2 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.