PR Action Plan v2.0 — Agent Review

Open

P0–P3 unresolved

Fixed

PR #272, #274 partial

Partial

CLI decompose

New Context

P4 scaffolding merged

Sprint progress

18%

Token Cost — This Document

This HTML file

~4,200

tokens · ~16 KB

Equivalent Markdown

~1,150

tokens · ~4.2 KB

HTML

4,200 tok

Markdown

1,150 tok

3.6× multiplier on output tokens. At Claude Sonnet pricing (May 2026), the delta is ~$0.009 per document — roughly $0.90/day at 100 docs/day. One missed P0 finding costs more than a month of that overhead.

Keep the raw .md for version control and agent-to-agent context passing. HTML is the human-facing render layer only.

P0 Critical Runtime crashes · Missing deps · Fail-fast gaps 3 open ▾

PR-P0-1a Guard bare OTEL import in observability runner

src/observability/runspan.py · L15

Bare top-level import crashes the entire agent process when the OpenTelemetry package is absent. The tracing module was patched in PR #272 — this file was missed.

from opentelemetry.trace import Status, StatusCode # ← no guard

Fix

Wrap in try/except ImportError, define fallback stubs for Status / StatusCode.

Open

PR-P0-1b Pin google-protobuf in [otel] dependency group

pyproject.toml · [dependency-groups].otel

Transitive requirement of the OTLP HTTP exporter. Unpinned version causes silent conflicts in isolated environments — fails only on first trace export.

Fix

Add google-protobuf>=4.25 to the otel dependency group.

Open

PR-P0-2 Fail-fast API key validation at object construction

src/llm/litellm.py · L52 vs __init__

External API key accessed at call time, not at construction. A misconfigured deployment fails only on first LLM invocation — potentially 10+ minutes into a run.

Fix

Move key validation into __init__, raise ValueError on construction.

Open

P1 Resource leaks · God functions · Coupling 3 open ▾

PR-P1-1 Fix resource leak in async agent runner

src/agent/deep_agent/runner.py · L199

MCP client instantiated without async context manager or explicit close. Server subprocesses may persist after agent exits.

client = MultiServerMCPClient(connections) if connections else None
tools  = await client.get_tools() if client is not None else []
# client.close() never called

Fix

Wrap in async with MultiServerMCPClient(connections) as client:. Verify no subprocess remains after CLI exits.

Open

PR-P1-2 Decompose DSP diagnostics god function

src/servers/vibration/main.py · L334–L546

Single function now spans 212 lines — up from 163 at previous baseline. Covers data loading, FFT/envelope DSP, bearing frequency, ISO classification, and summary generation in one call stack.

Size delta 163 → 212 lines (+49)

Suggested split

_load_and_validate_signal() · _run_dsp_pipeline() · _classify_and_summarise()

Open

PR-P1-3 Extract MCP orchestration from executor

src/agent/plan_execute/executor.py · 368 lines

PR #274 extracted shared CLI boilerplate — unrelated to this finding. MCP tool argument parsing and execution logic remain coupled inside the Plan/Execute orchestrator.

Fix

Extract into _mcp_handler.py. Executor retains only Plan/Execute flow.

Open

P2 DRY violations · CLI decompose · Bare exceptions 2 open · 1 partial ▾

PR-P2-1 DRY refactor — repeated load-filter-error pattern

src/servers/wo/tools.py · 7 tool functions

Identical load_data → filter → return ErrorResult boilerplate duplicated across 7 functions. Any change to error handling requires 7 synchronized edits.

Open

PR-P2-2 Complete CLI run function decomposition

src/agent/cli.py · _run() · 146 lines

PR #274 extracted shared boilerplate to _cli_common.py. The plan-execute CLI's _run() itself still lacks internal stage split, blocking unit testing of the CLI entry point.

Remaining

Split into _setup_environment() / _parse_arguments() / _execute_agent().

Partial

PR-P2-3 Replace bare exception catches with typed errors

src/agent/plan_execute/executor.py · L74, L100, L177

Three except Exception blocks suppressed via # noqa: BLE001 rather than replaced. Masks TimeoutError, MCPProtocolError, and domain-specific failures.

Open

P3 Dead code · Docstring inflation 2 open ▾

PR-P3-1 Remove or relocate dead utility functions

src/agent/plan_execute/executor.py · L333, L349

_resolve_args and _parse_tool_call retained via comment "kept for tests" but live in the production module. Move to test utilities or remove if superseded.

Open

PR-P3-2 Trim docstring inflation

src/llm/_litellm.py

14-line docstrings on 9-line functions. Compress to concise single-line or short-block format.

Open

✓ Fixed Resolved this sprint 1 item ▾

PR-P0-1c OTEL SDK guards in tracing module

src/observability/tracing.py · closed by PR #272

Bare OTEL import wrapped with try/except ImportError. Fallback stubs defined. runspan.py was not in scope — tracked as PR-P0-1a.

Fixed

P4 Evidence Ledger — Governance Scoring new context ▾

PR-P4-1 Implement governance scorer — distinguish tool-path compliance from hallucinated answers

src/evaluation/scorers/ · src/evaluation/models.py

PR #280 merged evaluation scaffolding: Evaluator, scorers/ registry, metrics_from_trajectory(). The code_based.py scorer stub raises NotImplementedError.

Gap: Agent A (hallucinated correct answer) and Agent C (hallucinated safety-critical output) score identically to Agent B (correct tool chain) under current scorers.

Effort revised 3–4 days → 1–2 days

4-step action

01Add GovernancePacket to models.py: mgp_coverage, hard_gate_triggered, process_score

02Implement scorers/governance.py — compare required tool list against unique_tools

03Add MGP metadata to scenario records (pattern type, required tools, gate condition)

04Register scorer in src/evaluation/__init__.py

New ctx

No items match the current filter.