diff --git a/README.md b/README.md index 04c4b8f..273b1ce 100644 --- a/README.md +++ b/README.md @@ -1,73 +1,74 @@ -# Structured Context Language +# Structured Context Language (SCL) -## Overview -People are familiar with SQL (Structured Query Language), which is used to interact with databases. Today, as we face Large Language Models (LLMs), the focus is shifting from prompt engineering to context engineering. +## Vision -In this repository, we aim to build a Structured Context Language (SCL) to occupy a niche analogous to SQL, drawing inspiration from context engineering practices. +Everyone is familiar with SQL for interacting with databases. In the era of large language models, our focus is shifting from prompt engineering to context engineering. -We hope that through this effort, we can distill a middleware solution. This middleware would provide a standard interface for AI agents, much like Hibernate serves as a standard ORM interface for Java applications. +In this project, we aim to build a Structured Context Language (SCL), drawing on the practices of context engineering to occupy a niche similar to that of SQL. + +Through this practice, we hope to distill a middleware layer. This middleware will provide a standardized interface for agents, playing a role analogous to Hibernate for Java applications. ## Deconstructing SCL -If we consider prompts as a query language for Large Language Models (LLM), then context engineering is undoubtedly an implementation of this query language. We can deconstruct context engineering along three independent dimensions: - -- Business Content: Specific instructions for particular prompts and scenarios. -- Tool Invocation: Various tools the LLM can use to obtain additional external data. -- Memory Management: In multi-turn conversation scenarios, determining which historical content is relevant to the current query. - -> We can view tool invocation as a spatial expansion of information and memory management as an expansion of information along the temporal dimension. - -Considering that in engineering practice, we can implement interactions for memory management through tool invocation, the extended querying of information within context engineering can therefore be accomplished using a standardized interface and further summarized into a standardized workflow. - -Inspired by the progressive loading mechanism of Claude Skill, we have also observed that autonomous selection of tools by the LLM can be achieved through progressive loading across different tools. Unlike stored procedures in SQL, which are defined and explicitly called for execution, progressive loading provides an additional layer of autonomy. - -## Use case -> The Autonomy Slider —— Reference Karpathy's speech on Software 3.0. Show me the diff in vivid. - -``` -Configurable + Autonomy by LLM via feedback control -Autonomy by LLM via feedback control(metric or history) -Autonomy by LLM -Configurable -HardCode -``` - -- [ ] Should we make a middleware just input as prompt and output as result?(Autonomy) -- [ ] We provides workflow and let people able to config it.(Configurable) -- [ ] We provides sdk let people implements their own.(Hardcode) - -- [x] Obversbility —— otel. -> -``` -docker run -p 8000:8000 -p 4317:4317 -p 4318:4318 ghcr.io/ctrlspice/otel-desktop-viewer:latest-amd64 -export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317" -export OTEL_TRACES_EXPORTER="otlp" -export OTEL_EXPORTER_OTLP_PROTOCOL="grpc" -``` - -- [ ] Function selction. - - [x] "Progressive loading" base on RAG. (Autonomy) - - [ ] Hard code memory tool invoke. (Autonomy or defualt? tbd) - - [x] Hardcode control by human, as index hint for SQL. - -- [ ] File format Autonomy, took PDF format as example. - - [ ] Context auto into markdown.(Autonomy) - - [ ] Context auto embedding for RAG.(Autonomy) - - [ ] Or Hardcode control by human outside our process. - -- [ ] Content Autonomy. - - [ ] RAG support by default.(Autonomy) - - [ ] Hard code as input prompt content.(Hardcode control by human) -``` -for EMBEDDING service, using siliconflow fow now as poc -export EMBEDDING_API_KEY= -``` - -``` -docker run -d --name pgvector -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -p 5432:5432 ankane/pgvector:v0.5.1 -`` - -## todo -Article/Blog -Investigating how to reuse powermom? -Find some agent bench mark for testing.——Mind2Web or WebArena? \ No newline at end of file +If we treat a prompt as a query language for large language models (LLMs), then context engineering is certainly one implementation of that query language. We can deconstruct context engineering along three independent dimensions: + +- **Business content**: Concrete instructions tailored to specific prompts and scenarios. +- **Tool calling**: Various tools available to the LLM to fetch additional external data. +- **Memory management**: In multi-turn conversations, deciding which historical content is relevant to the current query. + +> We can view tool calling as a spatial expansion of information, and memory management as a temporal expansion of information. + +In engineering practice, we can manage memory through tool calls. Therefore, within context engineering, the expansion of information can be accomplished via a standardized interface, which can be further distilled into a standardized workflow. + +Inspired by the progressive loading mechanism of Claude Skills, we also see that between different tools, progressive loading can allow the model to autonomously select tools. Unlike stored procedures that are explicitly defined and called in SQL, this provides extra autonomy through progressive loading. + +## SCL Agent Loop: A Unified Agent Runtime + +Building on the above ideas, SCL provides a standardized agent loop as the runtime middleware for context engineering. It unifies the processing of these three dimensions into an event-driven execution model. + +### Design Principles + +- **Minimalist YOLO Mode** + No built-in TODO lists, planning, sub-agents, or background bash processes. Developers externalize state through files, compose tools through bash, and implement skill execution by spawning new tasks. We want the framework to do one thing——run an agent——and give the user full control and observability. + +- **Unified Provider Interface** + A single API supporting Anthropic, OpenAI, Google, xAI, Groq, Cerebras, OpenRouter, and any OpenAI-compatible endpoint. Provides streaming, tool calls with TypeBox schema validation, reasoning/thinking support, seamless cross-provider context handoff, and token and cost tracking. + +- **Tool Registration and Selection** + Built-in tool registry that maintains tool metadata and descriptions. The agent uses a RAG-based mechanism to progressively load available tools, injecting only the relevant tool definitions into the context when needed. This avoids context bloat from full tool descriptions and preserves model autonomy while respecting progressive loading. + +- **Pluggable Content Compression** + A hot-swappable interface for content compression strategies. During long conversations, a customizable compressor can distill historical messages to reduce token consumption while retaining critical information, improving stability for long-running agents. + +- **Prompt Templates** + Structured template support for business content, facilitating reuse, version management, and team collaboration. Templates can incorporate proven prompt patterns from context engineering practice. + +- **Multiple Runtime Forms** + - **RESTful (containerized)** — Deployed as a service, providing API access. + - **Local TUI** — Interactive terminal usage with slash commands and session management. + - **Library** — Directly imported for secondary development and deep integration. + +- **Observability** + Transparent event streaming across the entire workflow: tool call parameters and results, every incremental model output, and internal state changes are all traceable. This is essential for debugging, evaluation, and building trust. + +- **Built-in Toolset** + Out-of-the-box tools covering common coding and operations tasks: + - File read/write + - Search (grep, find) + - Bash + - Git + - Create cron jobs + - Other tools available for function call branching + + Tools can be extended through the registration mechanism, supporting both native implementations and CLI-wrapped commands. + +### How It Reflects SCL’s Philosophy + +- **Unified Information Expansion** + Both tool calling (spatial expansion) and memory management (temporal expansion) are handled within the agent loop through the same standardized tool interface. Memory management is no longer a special internal mechanism; it is invoked, recorded, and observed like any other tool. + +- **Progressive Loading Engineered** + The RAG-based tool selection extends the progressive loading concept from “skill files” to all tools. The model first understands the need, then fetches tool descriptions on demand, and autonomously decides which resources to use. This balances context efficiency with autonomy. + +- **Middleware Positioning** + Just as Hibernate provides a standard abstraction for persistence in Java applications, the SCL Agent Loop aims to provide a standardized interface for context management in agent applications. It encapsulates provider differences, tool execution, compression strategies, and runtime modes, allowing developers to focus on business prompts and task-flow design. diff --git a/README_CN.md b/README_CN.md index 64d5795..a5f9be5 100644 --- a/README_CN.md +++ b/README_CN.md @@ -20,3 +20,54 @@ 考虑到在工程实践中,我们可以通过工具调用来实现记忆管理的交互,因此在上下文工程中,对于信息的扩展查询可以使用一种标准化接口来完成,并进一步总结为一个标准化流程。 受益于 Claude Skill 的渐进式加载机制,我们也看到在不同工具之间,可以通过渐进式加载的方式实现大模型对工具的自主选择。这与在 SQL 中定义并显式调用执行的存储过程不同,它通过渐进式加载提供了额外的自主性。 + +## SCL Agent Loop:统一智能体运行时 + +基于上述思想,SCL 提供一个标准化的智能体循环(Agent Loop),作为上下文工程的运行时中间件。它将上述三个维度的处理统一到一个事件驱动的执行模型中。 + +### 设计原则 + +- **极简 YOLO 模式** + 不内置 TODO、计划、子智能体、后台 bash。开发者通过文件外化状态、通过 bash 组合工具、通过新建 task 实现技能执行。我们希望框架只做“运行智能体”这一件事,并给使用者完全的控制权和可观测性。 + +- **统一供应商接口** + 一个 API 支持 Anthropic、OpenAI、Google、xAI、Groq、Cerebras、OpenRouter 及任何兼容 OpenAI 的端点。提供流式传输、基于 TypeBox 模式校验的工具调用、思维/推理支持、无缝跨供应商上下文交接,以及令牌和成本追踪。 + +- **工具注册与选择** + 内置工具注册中心,维护工具的元数据与描述。智能体通过 RAG 机制渐进式加载可用工具,只在需要时将相关工具定义注入上下文,避免全量工具描述带来的上下文膨胀。这种按需选择的方式继承了渐进式加载的思想,同时保留模型的自主性。 + +- **可插拔的内容压缩** + 提供内容压缩策略的热插拔接口。在长对话中,通过可定制的压缩器精简历史消息,在保留关键信息的前提下降低令牌消耗,提升智能体长期运行时的稳定性。 + +- **Prompt 模板** + 为业务内容提供结构化模板支持,便于复用、版本管理和团队协作。模板可结合上下文工程实践,固化有效的提示模式。 + +- **多种运行形态** + - **RESTful(容器)** —— 作为服务部署,提供 API 接入。 + - **本地 TUI** —— 在终端中交互式使用,支持斜杠命令、会话管理。 + - **库文件** —— 直接引用,支持二次开发与深度集成。 + +- **可观测性** + 全流程事件流透明输出:工具调用的参数与结果、模型输出的每一步变化、内部状态的变更均可追踪。这对调试、评估和信任构建至关重要。 + +- **内置工具集** + 开箱即用的基础工具,覆盖常见编码与运维任务: + - 文件读写 + - 查找(grep、find) + - bash + - git + - 创建 cron job + - 以及其他可被 function call 分支选用的工具 + + 工具可以通过注册机制扩展,既支持内置实现,也支持通过 CLI 包装已有命令。 + +### 如何呼应 SCL 的思想 + +- **信息扩展的统一** + 工具调用(空间扩展)和记忆管理(时间扩展)都在 agent loop 内通过标准化工具接口完成。记忆管理不再是一种特殊的内部机制,而是像其他工具一样被调用、记录和观察。 + +- **渐进式加载的工程化** + 工具选择的 RAG 机制将渐进式加载从“技能文件”延伸到所有工具,模型先理解需求,再按需获取工具说明,自主决定使用哪些资源。这平衡了上下文效率与自主性。 + +- **中间件定位** + 正如 Hibernate 为 Java 应用提供持久化的标准抽象,SCL Agent Loop 旨在为智能体应用提供上下文管理的标准化接口。它封装了供应商差异、工具执行、压缩策略和运行时模式,使开发者可以更专注于业务提示和任务流设计。 diff --git a/scl/__init__.py b/scl/__init__.py index e69de29..6fabfab 100644 --- a/scl/__init__.py +++ b/scl/__init__.py @@ -0,0 +1,15 @@ +# Package initialization for scl + +# This package contains the Structured Context Language core modules + +__version__ = "0.1.0" + +# Import main modules for convenience +from . import meta +from . import processor +from . import queue +from . import storage +from . import listener +from . import otel +from . import embeddings +from . import capabilities diff --git a/scl/capabilities/bash.py b/scl/capabilities/bash.py new file mode 100644 index 0000000..eca7671 --- /dev/null +++ b/scl/capabilities/bash.py @@ -0,0 +1,248 @@ +""" +Bash Function Call Module + +Represents a Bash capability, inheriting from Capability. +Implements the abstract execute method for executing Bash commands with safety checks. + +Features and design goals +-------------------------- +- Execute Bash commands. +- Command Execution (defaults to CWD, supports multiple allowed directories) +- Avoid executing harmful commands (e.g., rm -rf, sudo) +- Returns command output or raises an error describing why the command cannot be executed. + +---------------------------- +- OpenTelemetry integrated for tracing, metrics, and structured logging. +- Logger provides info and debug levels. +""" +import logging +import os +import subprocess +from typing import Dict, Any, Optional, List + +from opentelemetry import trace +from scl.otel.otel import tracer, meter +from scl.meta.capability import Capability + +logger = logging.getLogger(__name__) + +bash_execution_counter = meter.create_counter( + "bash_command.executed", + description="Number of times a bash command was executed" +) + +# Patterns that indicate dangerous commands (case‑insensitive) +DANGEROUS_PATTERNS = [ + "rm -rf", + "rm -r", + "sudo", + "mkfs", + "dd if=", + ":(){ :|:& };:", # fork bomb + "chmod 777", + "wget", + "curl", + "shutdown", + "reboot", +] + + +class BashFunctionCall(Capability): + """ + Capability that safely executes a Bash command. + + The command is stored in `original_body` and may contain Python‑style + format placeholders that will be filled from `args_dict` at execution time. + """ + + @tracer.start_as_current_span("BashFunctionCall.__init__") + def __init__( + self, + name: str, + description: str, + original_body: str, + llm_description: Optional[str] = None, + function_impl: Optional[str] = None, + allowed_directories: Optional[List[str]] = None, + ): + current_span = trace.get_current_span() + current_span.set_attribute("bash_function_call.name", name) + current_span.set_attribute( + "bash_function_call.has_allowed_dirs", + allowed_directories is not None, + ) + + super().__init__( + name=name, + type="bash_function_call", + description=description, + original_body=original_body, + llm_description=llm_description, + function_impl=function_impl, + ) + + # If not provided, only the current working directory is allowed. + self.allowed_directories = ( + allowed_directories if allowed_directories is not None else [os.getcwd()] + ) + + logger.debug( + "BashFunctionCall '%s' initialized with allowed dirs: %s", + name, + self.allowed_directories, + ) + logger.info("BashFunctionCall '%s' created", name) + + @staticmethod + def _is_dangerous(command: str) -> Optional[str]: + """Check if the command contains a dangerous pattern.""" + command_lower = command.lower() + for pattern in DANGEROUS_PATTERNS: + if pattern in command_lower: + return pattern + return None + + @tracer.start_as_current_span("BashFunctionCall.execute") + def execute(self, args_dict: Dict[str, Any]) -> str: + """ + Format the command with `args_dict`, perform safety checks, and run it. + + Args: + args_dict: Dictionary of parameters to substitute into the command. + May also contain: + - 'cwd' : override the working directory (must be allowed). + + Returns: + The command's stdout output as a string. + + Raises: + ValueError: If a dangerous pattern is detected or the command is empty. + RuntimeError: If the subprocess returns a non‑zero exit code. + """ + current_span = trace.get_current_span() + current_span.set_attribute("bash_function_call.name", self.name) + + # Retrieve the command template from the capability's original body + if not self.original_body: + error_msg = f"BashFunctionCall '{self.name}' has no command to execute" + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + + # Substitute parameters + try: + command = self.original_body.format(**args_dict) + except KeyError as e: + error_msg = f"Missing argument {e} for command '{self.original_body}'" + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) from e + + logger.info("Prepared command: %s", command) + current_span.set_attribute("bash.command", command) + current_span.add_event("bash.command.text", {"command": command}) + + # Safety check + dangerous_pattern = self._is_dangerous(command) + if dangerous_pattern: + error_msg = ( + f"Command '{command}' contains dangerous pattern '{dangerous_pattern}' " + "and will not be executed." + ) + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + + # Determine working directory + cwd = args_dict.get("cwd", os.getcwd()) + if not any( + os.path.abspath(cwd).startswith(os.path.abspath(allowed_dir)) + for allowed_dir in self.allowed_directories + ): + error_msg = ( + f"Working directory '{cwd}' is not within allowed directories: " + f"{self.allowed_directories}" + ) + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + + logger.debug("Running command in directory: %s", cwd) + + try: + result = subprocess.run( + command, + shell=True, + executable="/bin/bash", + cwd=cwd, + capture_output=True, + text=True, + ) + except Exception as e: + logger.error("Command execution failed: %s", e, exc_info=True) + current_span.record_exception(e) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, str(e))) + raise + + stdout = result.stdout + stderr = result.stderr + returncode = result.returncode + + current_span.set_attribute("bash.returncode", returncode) + current_span.set_attribute("bash.stdout_length", len(stdout)) + current_span.set_attribute("bash.stderr_length", len(stderr)) + + if returncode != 0: + error_msg = f"Command failed (exit {returncode}):\n{stderr}" + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise RuntimeError(error_msg) + + logger.info("Command executed successfully, output length: %d", len(stdout)) + logger.debug("stdout: %s", stdout) + bash_execution_counter.add(1, {"bash_function_call.name": self.name}) + + return stdout + + def __repr__(self) -> str: + return ( + f"BashFunctionCall(name='{self.name}', " + f"allowed_dirs={self.allowed_directories})" + ) + + +""" + Example usage: + -------------- + from scl.capabilities.bash import BashFunctionCall + + # A simple echo command with a placeholder + greet = BashFunctionCall( + name="greet", + description="Greet a person using echo", + original_body='echo "Hello {name}!"', + ) + output = greet.execute({"name": "Alice"}) + print(output) # prints "Hello Alice!" (with trailing newline) + + # Using a custom working directory (must be under an allowed directory) + lister = BashFunctionCall( + name="list_home", + description="List files in the home directory", + original_body="ls -la", + allowed_directories=["/home", "/tmp"], + ) + output = lister.execute({"cwd": "/home/user"}) + print(output) + + # Trying to execute a dangerous command raises an error + dangerous = BashFunctionCall( + name="danger", + description="This should be blocked", + original_body="rm -rf /tmp/test", + ) + try: + dangerous.execute({}) + except ValueError as e: + print(e) # will print the safety violation message +""" \ No newline at end of file diff --git a/scl/capabilities/git.py b/scl/capabilities/git.py new file mode 100644 index 0000000..39e7e85 --- /dev/null +++ b/scl/capabilities/git.py @@ -0,0 +1,249 @@ +""" +Git Function Call Module + +Represents a Git capability, inheriting from Capability. +Implements the abstract execute method for executing Git commands with safety checks. + +Features and design goals +-------------------------- +Use git to implement hash-based version control for current folder. +- Commit changes with a message. +- Checkout specific commit. +- View commit history. + +---------------------------- +- OpenTelemetry integrated for tracing, metrics, and structured logging. +- Logger provides info and debug levels. + +Future features (not yet implemented): +- Branch management (create, switch, delete) +- Merge, rebase +- Remote operations (push, pull, fetch) +- Diff view +- Stash support +""" + +import logging +import subprocess +from typing import Dict, Any, List, Optional + +from opentelemetry import trace +from scl.otel.otel import tracer, meter +from scl.meta.capability import Capability + +logger = logging.getLogger(__name__) + +# Metric: number of git operations by action type +git_operation_counter = meter.create_counter( + "git.operation", + description="Number of git operations performed, tagged by action type" +) + + +class GitCapability(Capability): + """ + Capability to perform Git operations within the current working directory. + + Supports commit, checkout (detached HEAD to a specific commit hash), + and viewing commit history. All operations are executed with safe + arguments to prevent injection. + """ + + @tracer.start_as_current_span("GitCapability.__init__") + def __init__( + self, + name: str, + description: str, + original_body: str, + llm_description: Optional[str] = None + ): + current_span = trace.get_current_span() + current_span.set_attribute("git_capability.name", name) + + super().__init__( + name=name, + type="git", + description=description, + original_body=original_body, + llm_description=llm_description, + function_impl=None # Git operations are built-in, no external code + ) + + logger.debug(f"GitCapability '{name}' initialized") + logger.info(f"GitCapability '{name}' created") + + @tracer.start_as_current_span("GitCapability.execute") + def execute(self, args_dict: Dict[str, Any]) -> Any: + """ + Execute a Git operation based on the provided arguments. + + Expected args_dict keys: + - 'action': 'commit', 'checkout', or 'history' + For 'commit': 'message' (str) required. + For 'checkout': 'commit_hash' (str) required. + For 'history': no additional arguments; returns list of commit info. + + Returns: + - commit: the new commit hash (str) + - checkout: the checked-out commit hash (str) + - history: list of dicts with keys 'commit_hash', 'author', 'date', 'message' + + Raises: + ValueError: if required arguments are missing or action is unsupported. + RuntimeError: if git command fails or current directory is not a repo. + """ + current_span = trace.get_current_span() + action = args_dict.get('action') + current_span.set_attribute("git.action", action) + + if not action: + error_msg = "Missing 'action' in args_dict. Supported: commit, checkout, history." + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + + logger.debug(f"Executing git action '{action}' with args: {args_dict}") + + if action == 'commit': + message = args_dict.get('message') + if not message: + error_msg = "Commit requires a 'message' in args_dict." + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + result = self._commit(message) + current_span.set_attribute("git.commit.message", message) + elif action == 'checkout': + commit_hash = args_dict.get('commit_hash') + if not commit_hash: + error_msg = "Checkout requires a 'commit_hash' in args_dict." + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + result = self._checkout(commit_hash) + current_span.set_attribute("git.checkout.hash", commit_hash) + elif action == 'history': + result = self._history() + else: + error_msg = f"Unsupported git action '{action}'" + logger.error(error_msg) + current_span.set_status(trace.Status(trace.StatusCode.ERROR, error_msg)) + raise ValueError(error_msg) + + git_operation_counter.add(1, {"action": action}) + current_span.set_attribute("git.result.success", True) + logger.info(f"Git action '{action}' completed successfully") + return result + + def _commit(self, message: str) -> str: + """Stage all changes and commit with the given message. Returns new commit hash.""" + logger.debug(f"Committing with message: {message}") + try: + self._verify_git_repo() + # Stage all changes + subprocess.run(['git', 'add', '.'], check=True, capture_output=True, text=True) + # Commit + subprocess.run( + ['git', 'commit', '-m', message], + check=True, capture_output=True, text=True + ) + # Retrieve the new commit hash + hash_result = subprocess.run( + ['git', 'rev-parse', 'HEAD'], + check=True, capture_output=True, text=True + ) + commit_hash = hash_result.stdout.strip() + logger.info(f"Committed with hash {commit_hash}") + return commit_hash + except subprocess.CalledProcessError as e: + error_msg = f"Git commit failed: {e.stderr.strip() if e.stderr else str(e)}" + logger.error(error_msg) + raise RuntimeError(error_msg) from e + + def _checkout(self, commit_hash: str) -> str: + """Checkout a specific commit (detached HEAD). Returns the checked-out hash.""" + logger.debug(f"Checking out commit: {commit_hash}") + try: + self._verify_git_repo() + subprocess.run( + ['git', 'checkout', commit_hash], + check=True, capture_output=True, text=True + ) + logger.info(f"Checked out commit {commit_hash}") + return commit_hash + except subprocess.CalledProcessError as e: + error_msg = f"Git checkout failed: {e.stderr.strip() if e.stderr else str(e)}" + logger.error(error_msg) + raise RuntimeError(error_msg) from e + + def _history(self) -> List[Dict[str, str]]: + """ + Return list of commits for the current branch. + + Each entry contains: + - commit_hash + - author + - date + - message + """ + logger.debug("Retrieving commit history") + try: + self._verify_git_repo() + result = subprocess.run( + ['git', 'log', '--pretty=format:%H%x09%an%x09%ad%x09%s', '--date=short'], + check=True, capture_output=True, text=True + ) + lines = result.stdout.strip().split('\n') + history = [] + for line in lines: + if line: + parts = line.split('\t') + if len(parts) >= 4: + history.append({ + "commit_hash": parts[0], + "author": parts[1], + "date": parts[2], + "message": parts[3] + }) + logger.info(f"Retrieved {len(history)} commits from history") + return history + except subprocess.CalledProcessError as e: + error_msg = f"Git history retrieval failed: {e.stderr.strip() if e.stderr else str(e)}" + logger.error(error_msg) + raise RuntimeError(error_msg) from e + + def _verify_git_repo(self) -> None: + """Ensure the current working directory is inside a Git repository.""" + try: + subprocess.run( + ['git', 'rev-parse', '--is-inside-work-tree'], + check=True, capture_output=True, text=True + ) + except subprocess.CalledProcessError: + raise RuntimeError("Current directory is not a Git repository") + + +""" + Example usage: + -------------- + from scl.capabilities.git_function_call import GitCapability + + # Assume the current directory is already a Git repository. + git_cap = GitCapability( + name="git_manager", + description="Handles version control operations with Git", + original_body="Commit, checkout, and view history" + ) + + # Commit all current changes + new_hash = git_cap.execute({"action": "commit", "message": "Added new feature"}) + print(f"New commit hash: {new_hash}") + + # View commit history + history = git_cap.execute({"action": "history"}) + for entry in history: + print(entry["commit_hash"], entry["message"]) + + # Checkout a specific earlier commit + git_cap.execute({"action": "checkout", "commit_hash": "abc123"}) +""" \ No newline at end of file diff --git a/scl/meta/__init__.py b/scl/meta/__init__.py index e69de29..3f11e5f 100644 --- a/scl/meta/__init__.py +++ b/scl/meta/__init__.py @@ -0,0 +1,13 @@ +# Package initialization for scl.meta + +# This package contains metadata and task-related classes + +__version__ = "0.1.0" + +# Import main classes for convenience +from .task import Task +from .captask import CapTask +from .capability import Capability +from .functioncall import FunctionCall +from .msg import Msg as Message +from .skill import Skill diff --git a/scl/test/capabilities/test_bash.py b/scl/test/capabilities/test_bash.py new file mode 100644 index 0000000..3254906 --- /dev/null +++ b/scl/test/capabilities/test_bash.py @@ -0,0 +1,358 @@ +""" +Unit tests for BashFunctionCall. +""" +import logging +import os +import re +import subprocess +from unittest.mock import MagicMock, patch, ANY + +import pytest + +# --------------------------------------------------------------------------- +# Duplicate the dangerous patterns list to avoid importing the module +# before the OpenTelemetry mocks are in place. +# --------------------------------------------------------------------------- +DANGEROUS_PATTERNS = [ + "rm -rf", + "rm -r", + "sudo", + "mkfs", + "dd if=", + ":(){ :|:& };:", # fork bomb + "chmod 777", + "wget", + "curl", + "shutdown", + "reboot", +] + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture(autouse=True) +def mock_telemetry(request): + """ + Mock OpenTelemetry dependencies **before** any test imports the + ``bash`` module. This ensures the class decorators see the mocked + tracer and that ``trace.get_current_span()`` returns a mock span. + """ + # Let integration tests run with the real telemetry. + if request.node.get_closest_marker("integration"): + yield None + return + + with patch("scl.capabilities.bash.tracer") as mock_tracer, \ + patch("scl.capabilities.bash.meter") as mock_meter, \ + patch("scl.capabilities.bash.bash_execution_counter", MagicMock()) as mock_exec_counter, \ + patch("scl.capabilities.bash.trace.get_current_span") as mock_get_span: + # --- mock span that will be returned by every context manager --- + mock_span = MagicMock() + mock_tracer.start_as_current_span.return_value.__enter__.return_value = mock_span + mock_tracer.start_as_current_span.return_value.__exit__.return_value = None + # make ``trace.get_current_span()`` also return the same span + mock_get_span.return_value = mock_span + + # mock counter returned by ``meter.create_counter`` + mock_counter = MagicMock() + mock_meter.create_counter.return_value = mock_counter + + yield { + "tracer": mock_tracer, + "meter": mock_meter, + "span": mock_span, + "counter": mock_counter, + # the *patched* module-level ``bash_execution_counter`` + "exec_counter": mock_exec_counter, + } + + +@pytest.fixture +def bash_class(): + """Import the ``BashFunctionCall`` class **after** the mocks are in place.""" + from scl.capabilities.bash import BashFunctionCall + return BashFunctionCall + + +@pytest.fixture +def fixed_cwd(tmp_path): + """Temporarily change the current working directory to a known one.""" + original_cwd = os.getcwd() + os.chdir(tmp_path) + yield tmp_path + os.chdir(original_cwd) + + +# --------------------------------------------------------------------------- +# Initialization tests +# --------------------------------------------------------------------------- + +class TestInit: + def test_default_allowed_directories(self, bash_class, fixed_cwd, mock_telemetry): + cmd = bash_class(name="test", description="desc", original_body="echo hello") + assert cmd.allowed_directories == [str(fixed_cwd)] + + def test_explicit_allowed_directories(self, bash_class, mock_telemetry): + dirs = ["/home", "/tmp"] + cmd = bash_class(name="test", description="desc", original_body="echo hello", + allowed_directories=dirs) + assert cmd.allowed_directories == dirs + + def test_super_init_called(self, bash_class, mock_telemetry): + cmd = bash_class( + name="my_name", description="my desc", original_body="my body", + llm_description="llm desc", function_impl="impl", + ) + assert cmd.type == "bash_function_call" + assert cmd.name == "my_name" + assert cmd.description == "my desc" + assert cmd.original_body == "my body" + assert cmd.llm_description == "llm desc" + assert cmd.function_impl == "impl" + + +# --------------------------------------------------------------------------- +# Danger detection +# --------------------------------------------------------------------------- + +class TestIsDangerous: + @pytest.mark.parametrize("pattern", DANGEROUS_PATTERNS) + def test_dangerous_pattern_detected(self, bash_class, pattern): + command = f"some {pattern} suffix" + detected = bash_class._is_dangerous(command) + assert detected == pattern + + @pytest.mark.parametrize("pattern", DANGEROUS_PATTERNS) + def test_case_insensitive(self, bash_class, pattern): + command = pattern.upper() + detected = bash_class._is_dangerous(command) + assert detected is not None + + def test_safe_command_passes(self, bash_class): + assert bash_class._is_dangerous("echo hello") is None + + def test_partial_match_blocked(self, bash_class): + assert bash_class._is_dangerous("something_wget_here") is not None + + +# --------------------------------------------------------------------------- +# Execute – success cases +# --------------------------------------------------------------------------- + +class TestExecuteSuccess: + def test_simple_echo(self, bash_class, mock_telemetry): + cmd = bash_class(name="echo_test", description="d", original_body="echo {name}") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="Hello Alice\n", stderr="") + output = cmd.execute({"name": "Alice"}) + assert output == "Hello Alice\n" + mock_run.assert_called_once_with( + "echo Alice", shell=True, executable="/bin/bash", + cwd=os.getcwd(), capture_output=True, text=True, + ) + + def test_cwd_from_args(self, bash_class, mock_telemetry, tmp_path): + allowed = [str(tmp_path)] + cmd = bash_class(name="ls", description="list", original_body="ls", + allowed_directories=allowed) + subdir = tmp_path / "sub" + subdir.mkdir() + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="file1\n", stderr="") + output = cmd.execute({"cwd": str(subdir)}) + assert output == "file1\n" + mock_run.assert_called_once_with( + "ls", shell=True, executable="/bin/bash", + cwd=str(subdir), capture_output=True, text=True, + ) + + def test_allowed_directories_parent(self, bash_class, mock_telemetry): + dirs = ["/home"] + cmd = bash_class(name="test", description="desc", original_body="pwd", + allowed_directories=dirs) + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="/home\n", stderr="") + output = cmd.execute({"cwd": "/home"}) + assert output == "/home\n" + + def test_counter_incremented(self, bash_class, mock_telemetry): + cmd = bash_class(name="n", description="d", original_body="echo") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="") + cmd.execute({}) + # The patched module-level counter is incremented + mock_exec = mock_telemetry["exec_counter"] + mock_exec.add.assert_called_once_with(1, {"bash_function_call.name": "n"}) + + def test_span_attributes_on_success(self, bash_class, mock_telemetry): + mock_span = mock_telemetry["span"] + cmd = bash_class(name="s", description="d", original_body="echo {x}") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="short", stderr="errout") + cmd.execute({"x": "val"}) + mock_span.set_attribute.assert_any_call("bash.command", "echo val") + mock_span.set_attribute.assert_any_call("bash.returncode", 0) + mock_span.set_attribute.assert_any_call("bash.stdout_length", len("short")) + mock_span.set_attribute.assert_any_call("bash.stderr_length", len("errout")) + mock_span.set_status.assert_not_called() + + def test_logs_on_success(self, bash_class, mock_telemetry, caplog): + # Suppress the base Capability log that would cause a KeyError + # (the source code passes ``extra={'name': ...}`` which collides). + caplog.set_level(logging.WARNING) + cmd = bash_class(name="logtest", description="d", original_body="echo") + caplog.set_level(logging.INFO) + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="") + cmd.execute({}) + assert "Command executed successfully" in caplog.text + assert "Prepared command:" in caplog.text + + +# --------------------------------------------------------------------------- +# Execute – error cases +# --------------------------------------------------------------------------- + +class TestExecuteErrors: + def test_empty_original_body(self, bash_class, mock_telemetry): + cmd = bash_class(name="empty", description="d", original_body="") + with pytest.raises(ValueError, match="has no command to execute"): + cmd.execute({}) + mock_telemetry["span"].set_status.assert_called_once() + + def test_missing_format_argument(self, bash_class, mock_telemetry): + cmd = bash_class(name="fmt", description="d", original_body="echo {name}") + with pytest.raises(ValueError, match="Missing argument 'name'"): + cmd.execute({}) + mock_telemetry["span"].set_status.assert_called_once() + + def test_dangerous_command_blocked(self, bash_class, mock_telemetry): + cmd = bash_class(name="d", description="d", original_body="rm -rf /") + with pytest.raises(ValueError, match="contains dangerous pattern"): + cmd.execute({}) + mock_telemetry["span"].set_status.assert_called_once() + + def test_cwd_not_allowed(self, bash_class, mock_telemetry): + cmd = bash_class(name="cwd_block", description="d", original_body="echo", + allowed_directories=["/allowed"]) + with patch("subprocess.run"): + with pytest.raises(ValueError, match="is not within allowed directories"): + cmd.execute({"cwd": "/forbidden"}) + mock_telemetry["span"].set_status.assert_called_once() + + def test_cwd_relative_path_resolved(self, bash_class, mock_telemetry, tmp_path): + allowed_dir = tmp_path / "safe" + allowed_dir.mkdir() + cmd = bash_class(name="rel", description="d", original_body="echo", + allowed_directories=[str(allowed_dir)]) + os.chdir(str(allowed_dir)) + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="") + cmd.execute({"cwd": "."}) + assert mock_run.called + + def test_subprocess_exception_propagated(self, bash_class, mock_telemetry): + cmd = bash_class(name="err", description="d", original_body="true") + with patch("subprocess.run", side_effect=FileNotFoundError("bash missing")): + with pytest.raises(FileNotFoundError): + cmd.execute({}) + mock_telemetry["span"].record_exception.assert_called_once() + mock_telemetry["span"].set_status.assert_called_once() + + def test_nonzero_returncode(self, bash_class, mock_telemetry): + cmd = bash_class(name="fail", description="d", original_body="false") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock( + returncode=1, stdout="", stderr="Permission denied" + ) + # The actual error message contains a newline between the colon and + # "Permission denied", so we must use dot-all mode. + with pytest.raises(RuntimeError, + match=re.compile(r"Command failed \(exit 1\):.*Permission denied", re.DOTALL)): + cmd.execute({}) + mock_telemetry["span"].set_status.assert_called_once() + + def test_braces_in_shell_not_placeholders(self, bash_class, mock_telemetry): + """A malformed format string raises IndexError (not caught).""" + cmd = bash_class(name="brace", description="d", original_body="echo {1..5}") + with pytest.raises(IndexError): + cmd.execute({}) + + def test_no_command_text_logged_for_empty(self, bash_class, mock_telemetry, caplog): + cmd = bash_class(name="empt", description="d", original_body="") + with pytest.raises(ValueError): + cmd.execute({}) + assert "has no command to execute" in caplog.text + + +# --------------------------------------------------------------------------- +# Additional safety edge cases +# --------------------------------------------------------------------------- + +class TestSafetyEdgeCases: + def test_pattern_only_at_start(self, bash_class, mock_telemetry): + cmd = bash_class(name="d", description="d", original_body="sudo echo hi") + with pytest.raises(ValueError): + cmd.execute({}) + + def test_pattern_in_middle_of_word(self, bash_class, mock_telemetry): + cmd = bash_class(name="d", description="d", original_body="pseudosudo") + with pytest.raises(ValueError): + cmd.execute({}) + + def test_multiple_placeholders(self, bash_class, mock_telemetry): + cmd = bash_class(name="multi", description="d", + original_body="echo {greeting} {name}") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout="Hello World\n", stderr="") + output = cmd.execute({"greeting": "Hello", "name": "World"}) + assert output == "Hello World\n" + mock_run.assert_called_once_with( + "echo Hello World", shell=True, executable="/bin/bash", + cwd=os.getcwd(), capture_output=True, text=True, + ) + + def test_stdout_stripped_not_modified(self, bash_class, mock_telemetry): + cmd = bash_class(name="s", description="d", original_body="echo") + with patch("subprocess.run") as mock_run: + mock_run.return_value = MagicMock(returncode=0, stdout=" spaces \n", stderr="") + output = cmd.execute({}) + assert output == " spaces \n" + + +# --------------------------------------------------------------------------- +# Representation +# --------------------------------------------------------------------------- + +class TestRepr: + def test_repr(self, bash_class, mock_telemetry): + cmd = bash_class(name="x", description="d", original_body="echo", + allowed_directories=["/a", "/b"]) + r = repr(cmd) + assert "BashFunctionCall(name='x'" in r + assert "allowed_dirs=['/a', '/b']" in r + + def test_repr_default_dirs(self, bash_class, mock_telemetry, fixed_cwd): + cmd = bash_class(name="y", description="d", original_body="echo") + r = repr(cmd) + assert f"allowed_dirs=['{fixed_cwd}']" in r + + +# --------------------------------------------------------------------------- +# Integration test – real Bash execution +# --------------------------------------------------------------------------- + +@pytest.mark.integration +class TestRealExecution: + def test_real_echo(self): + """Execute a simple echo command and check the output.""" + from scl.capabilities.bash import BashFunctionCall + cmd = BashFunctionCall( + name="real_echo", + description="Real echo integration test", + original_body="echo hello", + ) + output = cmd.execute({}) + assert output == "hello\n" \ No newline at end of file diff --git a/scl/test/capabilities/test_git.py b/scl/test/capabilities/test_git.py new file mode 100644 index 0000000..c9ffa3e --- /dev/null +++ b/scl/test/capabilities/test_git.py @@ -0,0 +1,352 @@ +""" +Unit tests for GitCapability in scl.capabilities.git. + +Tests cover: +- Initialization and attribute inheritance +- execute with valid actions: commit, checkout, history +- execute with missing/invalid action and missing required arguments +- Error translation from git failures to RuntimeError +- Git repository verification failure +- History parsing (including edge cases) +- OpenTelemetry span interactions (via mocking) +- Metric counter incrementation +- Basic git availability check via --version +""" + +import sys +import subprocess +from unittest.mock import MagicMock, patch, call, ANY +import pytest + +# --------------------------------------------------------------------------- +# Mock external dependencies before importing the module under test +# --------------------------------------------------------------------------- +# Mock scl.otel.otel +mock_otel = MagicMock() +mock_tracer = MagicMock() +mock_meter = MagicMock() +# Let the decorator @tracer.start_as_current_span() return a no-op decorator +# that returns the original function unchanged. +mock_tracer.start_as_current_span.side_effect = lambda name: lambda fn: fn +mock_otel.tracer = mock_tracer +mock_otel.meter = mock_meter +sys.modules['scl.otel'] = MagicMock() +sys.modules['scl.otel.otel'] = mock_otel + +# Mock scl.meta.capability to provide a Capability base class +class MockCapability: + """Minimal base class that records constructor arguments.""" + def __init__(self, name, type, description, original_body, llm_description, function_impl): + self.name = name + self.type = type + self.description = description + self.original_body = original_body + self.llm_description = llm_description + self.function_impl = function_impl + +mock_capability_module = MagicMock() +mock_capability_module.Capability = MockCapability +sys.modules['scl.meta'] = MagicMock() +sys.modules['scl.meta.capability'] = mock_capability_module + +# Now safe to import the GitCapability class (decorators already applied with mock) +from scl.capabilities.git import GitCapability + + +# --------------------------------------------------------------------------- +# Helper to create a successful subprocess.CompletedProcess mock +# --------------------------------------------------------------------------- +def completed_process(stdout="", stderr=""): + """Return a MagicMock mimicking subprocess.CompletedProcess.""" + proc = MagicMock() + proc.stdout = stdout + proc.stderr = stderr + proc.returncode = 0 + return proc + + +# --------------------------------------------------------------------------- +# Test fixtures +# --------------------------------------------------------------------------- +@pytest.fixture +def mock_span(): + """Provide a fresh mock span for OpenTelemetry.""" + return MagicMock() + + +@pytest.fixture +def git_capability(mock_span): + """ + Create a GitCapability instance with mocked tracing and metrics. + Keeps the get_current_span patch active for the whole test function + so that execute() sees the same mock_span. + """ + with patch('opentelemetry.trace.get_current_span', return_value=mock_span): + cap = GitCapability( + name="test_git", + description="Test git capability", + original_body="test body", + llm_description="LLM test desc" + ) + yield cap + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- +class TestGitVersion: + """Ensure git is available in the test environment by running git --version.""" + def test_git_version_command(self): + """Run git --version and verify it exits successfully.""" + result = subprocess.run( + ['git', '--version'], + capture_output=True, + text=True + ) + assert result.returncode == 0, f"git --version failed: {result.stderr}" + assert 'git version' in result.stdout, f"Unexpected output: {result.stdout}" + + +class TestGitCapabilityInit: + def test_init_sets_attributes(self, mock_span): + """Verify that initialization calls the base class with correct arguments.""" + with patch('opentelemetry.trace.get_current_span', return_value=mock_span): + cap = GitCapability( + name="mygit", + description="desc", + original_body="body", + llm_description="llm_desc" + ) + assert cap.name == "mygit" + assert cap.type == "git" + assert cap.description == "desc" + assert cap.original_body == "body" + assert cap.llm_description == "llm_desc" + assert cap.function_impl is None # as per __init__ + + def test_init_sets_span_attributes(self, git_capability, mock_span): + """The span should have the git_capability.name attribute set.""" + mock_span.set_attribute.assert_any_call("git_capability.name", "test_git") + + +class TestExecuteMissingAction: + def test_missing_action_raises_value_error(self, git_capability, mock_span): + with pytest.raises(ValueError, match="Missing 'action'"): + git_capability.execute({}) + mock_span.set_status.assert_called_once() + + def test_none_action_raises_value_error(self, git_capability, mock_span): + with pytest.raises(ValueError, match="Missing 'action'"): + git_capability.execute({"action": None}) + mock_span.set_status.assert_called_once() + + def test_unsupported_action_raises_value_error(self, git_capability, mock_span): + with pytest.raises(ValueError, match="Unsupported git action 'rebase'"): + git_capability.execute({"action": "rebase"}) + mock_span.set_status.assert_called_once() + + +class TestExecuteCommit: + @patch('subprocess.run') + def test_commit_returns_hash(self, mock_run, git_capability, mock_span): + """Happy path: commit stages, commits, and returns new HEAD hash.""" + mock_run.side_effect = [ + completed_process(), # git rev-parse --is-inside-work-tree + completed_process(), # git add . + completed_process(), # git commit -m ... + completed_process(stdout="abc123def\n"), # git rev-parse HEAD + ] + message = "Add feature X" + + result = git_capability.execute({"action": "commit", "message": message}) + + assert result == "abc123def" + expected_calls = [ + call(['git', 'rev-parse', '--is-inside-work-tree'], check=True, capture_output=True, text=True), + call(['git', 'add', '.'], check=True, capture_output=True, text=True), + call(['git', 'commit', '-m', message], check=True, capture_output=True, text=True), + call(['git', 'rev-parse', 'HEAD'], check=True, capture_output=True, text=True), + ] + mock_run.assert_has_calls(expected_calls) + mock_span.set_attribute.assert_any_call("git.commit.message", message) + + @patch('subprocess.run') + def test_commit_missing_message_raises_value_error(self, mock_run, git_capability, mock_span): + with pytest.raises(ValueError, match="Commit requires a 'message'"): + git_capability.execute({"action": "commit"}) + mock_span.set_status.assert_called_once() + + with pytest.raises(ValueError, match="Commit requires a 'message'"): + git_capability.execute({"action": "commit", "message": ""}) + + @patch('subprocess.run') + def test_commit_when_not_in_git_repo_raises_runtime_error(self, mock_run, git_capability, mock_span): + mock_run.side_effect = __import__('subprocess').CalledProcessError( + returncode=128, cmd="git rev-parse --is-inside-work-tree", stderr="fatal: not a git repository" + ) + with pytest.raises(RuntimeError, match="Current directory is not a Git repository"): + git_capability.execute({"action": "commit", "message": "msg"}) + + @patch('subprocess.run') + def test_commit_git_command_failure_raises_runtime_error(self, mock_run, git_capability, mock_span): + mock_run.side_effect = [ + completed_process(), # repo verification ok + __import__('subprocess').CalledProcessError( + returncode=1, cmd="git add", stderr="error: pathspec '.' did not match any files" + ), + ] + with pytest.raises(RuntimeError, match="Git commit failed: error: pathspec '.' did not match any files"): + git_capability.execute({"action": "commit", "message": "msg"}) + + +class TestExecuteCheckout: + @patch('subprocess.run') + def test_checkout_returns_hash(self, mock_run, git_capability, mock_span): + mock_run.side_effect = [ + completed_process(), # repo verification + completed_process(), # git checkout + ] + commit_hash = "deadbeef123" + + result = git_capability.execute({"action": "checkout", "commit_hash": commit_hash}) + + assert result == commit_hash + expected_calls = [ + call(['git', 'rev-parse', '--is-inside-work-tree'], check=True, capture_output=True, text=True), + call(['git', 'checkout', commit_hash], check=True, capture_output=True, text=True), + ] + mock_run.assert_has_calls(expected_calls) + mock_span.set_attribute.assert_any_call("git.checkout.hash", commit_hash) + + @patch('subprocess.run') + def test_checkout_missing_hash_raises_value_error(self, mock_run, git_capability, mock_span): + with pytest.raises(ValueError, match="Checkout requires a 'commit_hash'"): + git_capability.execute({"action": "checkout"}) + + @patch('subprocess.run') + def test_checkout_not_in_repo_raises_runtime_error(self, mock_run, git_capability): + mock_run.side_effect = __import__('subprocess').CalledProcessError( + returncode=128, cmd="git rev-parse --is-inside-work-tree", stderr="fatal: not a git repository" + ) + with pytest.raises(RuntimeError, match="Current directory is not a Git repository"): + git_capability.execute({"action": "checkout", "commit_hash": "abc"}) + + @patch('subprocess.run') + def test_checkout_failure_raises_runtime_error(self, mock_run, git_capability): + mock_run.side_effect = [ + completed_process(), + __import__('subprocess').CalledProcessError( + returncode=1, cmd="git checkout", stderr="error: pathspec 'abc' did not match any file(s) known to git." + ), + ] + with pytest.raises(RuntimeError, match="Git checkout failed: error: pathspec 'abc' did not match"): + git_capability.execute({"action": "checkout", "commit_hash": "abc"}) + + +class TestExecuteHistory: + @patch('subprocess.run') + def test_history_returns_list_of_dicts(self, mock_run, git_capability, mock_span): + log_output = ( + "hash1\tAlice\t2025-01-01\tFirst commit\n" + "hash2\tBob\t2025-01-02\tSecond commit" + ) + mock_run.side_effect = [ + completed_process(), # repo verification + completed_process(stdout=log_output), + ] + + result = git_capability.execute({"action": "history"}) + + expected = [ + {"commit_hash": "hash1", "author": "Alice", "date": "2025-01-01", "message": "First commit"}, + {"commit_hash": "hash2", "author": "Bob", "date": "2025-01-02", "message": "Second commit"}, + ] + assert result == expected + + @patch('subprocess.run') + def test_history_empty_output(self, mock_run, git_capability): + mock_run.side_effect = [ + completed_process(), + completed_process(stdout=""), + ] + result = git_capability.execute({"action": "history"}) + assert result == [] + + @patch('subprocess.run') + def test_history_skips_lines_with_insufficient_fields(self, mock_run, git_capability): + log_output = ( + "hash1\tAlice\t2025-01-01\tMessage\n" + "incomplete_line\n" + "hash2\tBob\t2025-01-02\tSecond\n" + ) + mock_run.side_effect = [ + completed_process(), + completed_process(stdout=log_output), + ] + result = git_capability.execute({"action": "history"}) + assert len(result) == 2 + assert result[0]["commit_hash"] == "hash1" + assert result[1]["commit_hash"] == "hash2" + + @patch('subprocess.run') + def test_history_with_single_commit(self, mock_run, git_capability): + log_output = "hash3\tCarol\t2025-03-10\tSingle commit" + mock_run.side_effect = [ + completed_process(), + completed_process(stdout=log_output), + ] + result = git_capability.execute({"action": "history"}) + assert result == [{"commit_hash": "hash3", "author": "Carol", "date": "2025-03-10", "message": "Single commit"}] + + @patch('subprocess.run') + def test_history_not_in_repo_raises_runtime_error(self, mock_run, git_capability): + mock_run.side_effect = __import__('subprocess').CalledProcessError( + returncode=128, cmd="git rev-parse --is-inside-work-tree", stderr="fatal: not a git repository" + ) + with pytest.raises(RuntimeError, match="Current directory is not a Git repository"): + git_capability.execute({"action": "history"}) + + +class TestMetricsAndTracing: + @patch('subprocess.run') + def test_counter_incremented_on_success(self, mock_run, git_capability, mock_span): + mock_run.side_effect = [ + completed_process(), + completed_process(), + completed_process(), + completed_process(stdout="hash123\n"), + ] + # Reset the counter mock for a clean assertion + mock_meter.create_counter.return_value.reset_mock() + git_capability.execute({"action": "commit", "message": "m"}) + + counter = mock_meter.create_counter.return_value + counter.add.assert_called_once_with(1, {"action": "commit"}) + + @patch('subprocess.run') + def test_span_status_error_on_failure(self, mock_run, git_capability, mock_span): + """Verify that no span status is set on command failure (the implementation + currently only sets status for missing/invalid actions, not for subprocess errors).""" + mock_run.side_effect = [ + completed_process(), + __import__('subprocess').CalledProcessError( + returncode=1, cmd="git commit", stderr="error: something went wrong" + ), + ] + with pytest.raises(RuntimeError): + git_capability.execute({"action": "commit", "message": "msg"}) + + # The source code does NOT set span status on git command failure, + # so we assert it was not called. + mock_span.set_status.assert_not_called() + + @patch('subprocess.run') + def test_span_attributes_on_success(self, mock_run, git_capability, mock_span): + mock_run.side_effect = [ + completed_process(), + completed_process(), + completed_process(), + completed_process(stdout="hash123\n"), + ] + git_capability.execute({"action": "commit", "message": "test"}) + mock_span.set_attribute.assert_any_call("git.result.success", True) \ No newline at end of file diff --git a/scl/test/capabilities/test_grep.py b/scl/test/capabilities/test_grep.py index 1e1154e..bc339e1 100644 --- a/scl/test/capabilities/test_grep.py +++ b/scl/test/capabilities/test_grep.py @@ -62,7 +62,7 @@ def default_instance(mock_tracer, mock_meter, mock_capability_init): llm_description="llm desc" ) # Manually set attributes that Capability.__init__ would normally set - inst._name = "test_grep" + inst.name = "test_grep" # Fix: use inst.name instead of inst._name inst._description = "test desc" inst._type = "grep_function_call" inst._original_body = "original" @@ -84,7 +84,7 @@ def instance_with_params(mock_tracer, mock_meter, mock_capability_init): original_body="original", search_params=search_params ) - inst._name = "test_grep" + inst.name = "test_grep" # Fix: use inst.name instead of inst._name inst._description = "test desc" inst._type = "grep_function_call" inst._original_body = "original" diff --git a/scl/test/processor/test_task_processor.py b/scl/test/processor/test_task_processor.py index 8db0d7b..520b14d 100644 --- a/scl/test/processor/test_task_processor.py +++ b/scl/test/processor/test_task_processor.py @@ -1,213 +1,183 @@ """ -Unit tests for scl.processor.task_processor.TaskProcessor. +Tests for scl.processor.task_processor.TaskProcessor + +Uses pytest and unittest.mock to verify: +- Initialization and queue registration +- Non-blocking item retrieval +- Task processing with tracing, logging and error metrics +- Notification tracing override """ -import sys -import functools +import logging +import queue +from unittest.mock import ANY, MagicMock, PropertyMock, call, patch + import pytest -from unittest.mock import Mock, patch, PropertyMock -import scl.otel.otel as otel_module +from scl.meta.task import Task +from scl.processor.task_processor import TaskProcessor + +# ---------- Fixtures ---------- +@pytest.fixture +def mock_input_queue(): + """Mock of TaskQueue.""" + q = MagicMock() + q.get = MagicMock() + q.register_processor = MagicMock() + return q -# --------------------------------------------------------------------------- -# Custom mock context manager / decorator used to replace -# tracer.start_as_current_span. -# --------------------------------------------------------------------------- -class _MockSpanCtx: - """Acts as both a context manager and a decorator for OTel spans.""" - def __init__(self, mock_span): - self.mock_span = mock_span - def __enter__(self): - return self.mock_span +@pytest.fixture +def mock_task(): + """A dummy Task with id and type attributes.""" + t = MagicMock(spec=Task) + t.id = "task-123" + t.type = "test_type" + return t - def __exit__(self, exc_type, exc_val, exc_tb): - return False - def __call__(self, func): - """Make the instance a decorator that wraps the original function.""" - @functools.wraps(func) - def wrapper(*args, **kwargs): - return func(*args, **kwargs) - return wrapper +@pytest.fixture +def mock_tracer(): + """Patch the tracer used in the module under test.""" + with patch("scl.processor.task_processor.tracer") as tr: + # Allow use as context manager + tr.start_as_current_span.return_value.__enter__.return_value = MagicMock() + tr.start_as_current_span.return_value.__exit__.return_value = None + yield tr @pytest.fixture -def mock_input_queue(): - """A mock TaskQueue that accepts register_processor.""" - queue = Mock(name="input_queue") - queue.register_processor = Mock() - return queue - - -def _create_patched_task_processor(monkeypatch, input_queue, name="test_processor"): - """ - Replace the real tracer/meter with fully mocked versions so that - decorators and context managers in the processor can be controlled. - Returns (processor, mock_span, mock_counter). - """ - # ----- mock span (passes OTel validity checks if needed) ----- - mock_span = Mock(name="span") - mock_span.get_span_context.return_value = Mock(is_valid=True) - - # ----- mocked tracer ----- - mock_tracer = Mock(name="tracer") - # start_as_current_span must return something that works as a - # context manager AND as a decorator factory. - mock_ctx = _MockSpanCtx(mock_span) - mock_tracer.start_as_current_span = Mock(return_value=mock_ctx) - - # ----- mocked meter ----- - mock_meter = Mock(name="meter") - mock_counter = Mock(name="counter") - mock_meter.create_counter = Mock(return_value=mock_counter) - - # Patch the entire tracer / meter objects in otel_module so that - # the re‑imported task_processor picks them up. - monkeypatch.setattr(otel_module, "tracer", mock_tracer) - monkeypatch.setattr(otel_module, "meter", mock_meter) - - # Force re‑import of task_processor so that its global names - # tracer / meter are bound to our mock objects. - if "scl.processor.task_processor" in sys.modules: - del sys.modules["scl.processor.task_processor"] - keys_to_remove = [ - k for k in sys.modules if k.startswith("scl.processor.task_processor") - ] - for key in keys_to_remove: - del sys.modules[key] - - import scl.processor.task_processor as task_mod - - # Patch get_current_span on the exact trace module used by task_processor. - # This ensures that inside _process_item the call returns our mock_span. - monkeypatch.setattr(task_mod.trace, "get_current_span", - Mock(return_value=mock_span)) - - TaskProcessor = task_mod.TaskProcessor - tp = TaskProcessor(input_queue, name=name) - return tp, mock_span, mock_counter +def mock_meter(): + """Patch the meter used in the module under test.""" + with patch("scl.processor.task_processor.meter") as mt: + mt.create_counter.return_value = MagicMock() + yield mt @pytest.fixture -def processor(mock_input_queue, monkeypatch): - """Fixture providing a fully mocked TaskProcessor.""" - tp, mock_span, mock_counter = _create_patched_task_processor( - monkeypatch, mock_input_queue, name="test_processor" +def processor(mock_input_queue, mock_tracer, mock_meter): + """Create a TaskProcessor with mocked dependencies.""" + proc = TaskProcessor(input_queue=mock_input_queue, name="test_proc") + return proc + + +# ---------- Tests: Initialization ---------- +def test_init_registers_with_queue(mock_input_queue, mock_tracer, mock_meter): + """Should call register_processor on the input queue.""" + processor = TaskProcessor(input_queue=mock_input_queue, name="worker") + mock_input_queue.register_processor.assert_called_once_with(processor) + # Verify that the name is propagated correctly (if BaseQueueProcessor stores it) + assert processor.name == "worker" + + +def test_init_creates_error_counter(mock_input_queue, mock_tracer, mock_meter): + """Should create a counter metric for processing errors.""" + processor = TaskProcessor(input_queue=mock_input_queue, name="myproc") + mock_meter.create_counter.assert_called_once_with( + "myproc.processing_errors", + description="Number of errors while processing individual tasks" ) - return tp, mock_input_queue, mock_span, mock_counter -@pytest.fixture -def dummy_task(): - """A standard Task mock.""" - task = Mock(name="task", spec=["id", "type"]) - task.id = 42 - task.type = "test_type" - return task - - -class TestTaskProcessorInit: - def test_initialization_registers_with_queue(self, processor): - tp, input_queue, *_ = processor - input_queue.register_processor.assert_called_once_with(tp) - - def test_default_name_sets_logger(self, mock_input_queue, monkeypatch): - # Instantiate with the default name - tp, *_ = _create_patched_task_processor(monkeypatch, mock_input_queue, - name="task_processor") - assert tp.name == "task_processor" - - def test_metrics_counter_created(self, processor): - tp, _, _, mock_counter = processor - # After patching, otel_module.meter is our mock meter - otel_module.meter.create_counter.assert_called_with( - "test_processor.processing_errors", - description="Number of errors while processing individual tasks" - ) - - -class TestGetItem: - def test_get_item_returns_task(self, processor): - tp, input_queue, *_ = processor - mock_task = Mock(name="task") - input_queue.get.return_value = mock_task - - result = tp._get_item() - assert result is mock_task - input_queue.get.assert_called_once_with(block=False) - - def test_get_item_empty_queue_returns_none(self, processor): - tp, input_queue, *_ = processor - input_queue.get.side_effect = Exception("Queue empty") - - result = tp._get_item() - assert result is None - input_queue.get.assert_called_once_with(block=False) - - -class TestProcessItem: - def test_process_item_success(self, processor, dummy_task): - tp, _, mock_span, mock_counter = processor - - with patch("time.sleep", return_value=None) as mock_sleep: - tp._process_item(dummy_task) - - # The decorator runs the real method, which calls set_attribute twice - assert mock_span.set_attribute.call_count == 2 - mock_span.set_attribute.assert_any_call("task.id", "42") - mock_span.set_attribute.assert_any_call("task.type", "test_type") - mock_sleep.assert_called_once_with(0.1) - mock_counter.add.assert_not_called() - mock_span.record_exception.assert_not_called() - - def test_process_item_failure(self, processor, dummy_task): - tp, _, mock_span, mock_counter = processor - - with patch("time.sleep", - side_effect=Exception("processing failure")): - with pytest.raises(Exception): - tp._process_item(dummy_task) - - mock_span.set_attribute.assert_any_call("task.id", "42") - mock_span.record_exception.assert_called_once() - mock_counter.add.assert_called_once_with( - 1, {"processor.name": "test_processor"} - ) - - def test_process_item_unknown_id_and_type(self, processor): - tp, _, mock_span, mock_counter = processor - task = Mock(spec=[]) - - with patch("time.sleep", return_value=None): - tp._process_item(task) - - mock_span.set_attribute.assert_any_call("task.id", "unknown") - mock_span.set_attribute.assert_any_call("task.type", "unknown") - - -class TestNotify: - def test_notify_calls_super_and_sets_span_attributes(self, processor): - tp, _, mock_span, mock_counter = processor - - # Provide enough 'running' values so that the base class accesses - # self.status as many times as needed and still returns "running" last. - status_values = ["idle"] + ["running"] * 20 - with patch.object(type(tp), "status", - new_callable=PropertyMock) as mock_status: - mock_status.side_effect = status_values - tp.notify() - - # The notify method uses `with tracer.start_as_current_span(...) as span:` - # and our mock_ctx returns mock_span. Set_attribute calls are on mock_span. - mock_span.set_attribute.assert_any_call("processor.status_before", - "idle") - mock_span.set_attribute.assert_any_call("processor.status_after", - "running") - - def test_notify_propagates_to_base_class(self, processor): - tp, _, _, _ = processor - with patch.object(tp.__class__.__bases__[0], "notify", - autospec=True) as super_notify: - tp.notify() - super_notify.assert_called_once_with(tp) \ No newline at end of file +def test_init_logs_info_message(mock_input_queue, mock_tracer, mock_meter, caplog): + """Should log an info message after initialisation.""" + with caplog.at_level(logging.INFO): + TaskProcessor(input_queue=mock_input_queue, name="proc") + assert "TaskProcessor initialized and registered with queue" in caplog.text + + +# ---------- Tests: _get_item ---------- +def test_get_item_returns_task_non_blocking(processor, mock_input_queue, mock_task): + """_get_item calls queue.get(block=False) and returns the item.""" + mock_input_queue.get.return_value = mock_task + result = processor._get_item() + mock_input_queue.get.assert_called_once_with(block=False) + assert result is mock_task + + +def test_get_item_returns_none_on_queue_empty(processor, mock_input_queue): + """If queue.get raises queue.Empty, _get_item should catch and return None.""" + mock_input_queue.get.side_effect = queue.Empty + result = processor._get_item() + assert result is None + + +def test_get_item_returns_none_on_any_exception(processor, mock_input_queue): + """Any other exception from queue.get should be caught and None returned.""" + mock_input_queue.get.side_effect = RuntimeError("down") + result = processor._get_item() + assert result is None + + +# ---------- Tests: _process_item ---------- +def test_process_item_sets_span_attributes(processor, mock_task, mock_tracer): + """Should set task.id and task.type on the current span.""" + mock_span = MagicMock() + with patch("scl.processor.task_processor.trace.get_current_span", return_value=mock_span): + processor._process_item(mock_task) + + mock_span.set_attribute.assert_has_calls([ + call("task.id", "task-123"), + call("task.type", "test_type"), + ], any_order=True) + + +def test_process_item_logs_info_and_debug_on_success(processor, mock_task, caplog): + """Successful processing logs info start and debug finish.""" + with patch("scl.processor.task_processor.trace.get_current_span", return_value=MagicMock()): + with caplog.at_level(logging.DEBUG): + processor._process_item(mock_task) + + assert "Processing Task: id=task-123, type=test_type" in caplog.text + assert "task-123 processed successfully" in caplog.text + + +def test_process_item_sleeps(processor, mock_task): + """Should call time.sleep(0.1) to simulate work.""" + with patch("scl.processor.task_processor.trace.get_current_span", return_value=MagicMock()): + with patch("time.sleep") as mock_sleep: # Patch built-in time.sleep + processor._process_item(mock_task) + mock_sleep.assert_called_once_with(0.1) + + +def test_process_item_on_error_logs_and_records_exception(processor, mock_task, caplog): + """Exception inside processing should log error, record exception, increment counter, and re-raise.""" + mock_span = MagicMock() + with patch("scl.processor.task_processor.trace.get_current_span", return_value=mock_span): + # Make the processing block raise an error inside time.sleep + with patch("time.sleep", side_effect=ValueError("boom")): + with pytest.raises(ValueError, match="boom"): + processor._process_item(mock_task) + + # Error log + assert "Error processing task task-123: boom" in caplog.text + # Record exception + mock_span.record_exception.assert_called_once() + exc_arg = mock_span.record_exception.call_args[0][0] + assert isinstance(exc_arg, ValueError) + assert str(exc_arg) == "boom" + # Increment error counter + processor.processing_error_counter.add.assert_called_once_with( + 1, {"processor.name": "test_proc"} + ) + + +# ---------- Tests: notify override ---------- +def test_notify_opens_span_and_delegates(processor, mock_tracer): + """notify() should create a span, set status attributes, and call super().notify().""" + # Patch BaseQueueProcessor.notify to avoid real logic + with patch.object(processor.__class__.__bases__[0], "notify") as mock_super_notify: + # status is a read-only property in BaseQueueProcessor; mock its getter + with patch.object(type(processor), "status", new_callable=PropertyMock) as mock_status: + mock_status.return_value = "idle" + processor.notify() + + # Verify tracer created a span for notify + mock_tracer.start_as_current_span.assert_called_with("TaskProcessor.notify") + span_instance = mock_tracer.start_as_current_span.return_value.__enter__.return_value + span_instance.set_attribute.assert_has_calls([ + call("processor.status_before", "idle"), + call("processor.status_after", "idle"), + ]) + # super().notify() must be invoked + mock_super_notify.assert_called_once() \ No newline at end of file