Agent Review Dashboard
PR Action Plan v2.0
Updated 2026-05-16 · Rebased on upstream main · Includes #272, #274, #278–280, #295–300 · a928284b
Open
9
P0–P3 unresolved
Fixed
2
PR #272, #274 partial
Partial
1
CLI decompose
New Context
1
P4 scaffolding merged
Sprint progress
18%
Priority
Status

Token Cost — This Document

This HTML file
~4,200
tokens · ~16 KB
Equivalent Markdown
~1,150
tokens · ~4.2 KB
HTML
4,200 tok
Markdown
1,150 tok
3.6× multiplier on output tokens. At Claude Sonnet pricing (May 2026), the delta is ~$0.009 per document — roughly $0.90/day at 100 docs/day. One missed P0 finding costs more than a month of that overhead.

Keep the raw .md for version control and agent-to-agent context passing. HTML is the human-facing render layer only.
P0 Critical Runtime crashes · Missing deps · Fail-fast gaps 3 open
PR-P0-1a Guard bare OTEL import in observability runner
src/observability/runspan.py · L15
Bare top-level import crashes the entire agent process when the OpenTelemetry package is absent. The tracing module was patched in PR #272 — this file was missed.
from opentelemetry.trace import Status, StatusCode # ← no guard
Fix
Wrap in try/except ImportError, define fallback stubs for Status / StatusCode.
Open
PR-P0-1b Pin google-protobuf in [otel] dependency group
pyproject.toml · [dependency-groups].otel
Transitive requirement of the OTLP HTTP exporter. Unpinned version causes silent conflicts in isolated environments — fails only on first trace export.
Fix
Add google-protobuf>=4.25 to the otel dependency group.
Open
PR-P0-2 Fail-fast API key validation at object construction
src/llm/litellm.py · L52 vs __init__
External API key accessed at call time, not at construction. A misconfigured deployment fails only on first LLM invocation — potentially 10+ minutes into a run.
Fix
Move key validation into __init__, raise ValueError on construction.
Open
P1 Resource leaks · God functions · Coupling 3 open
PR-P1-1 Fix resource leak in async agent runner
src/agent/deep_agent/runner.py · L199
MCP client instantiated without async context manager or explicit close. Server subprocesses may persist after agent exits.
client = MultiServerMCPClient(connections) if connections else None tools = await client.get_tools() if client is not None else [] # client.close() never called
Fix
Wrap in async with MultiServerMCPClient(connections) as client:. Verify no subprocess remains after CLI exits.
Open
PR-P1-2 Decompose DSP diagnostics god function
src/servers/vibration/main.py · L334–L546
Single function now spans 212 lines — up from 163 at previous baseline. Covers data loading, FFT/envelope DSP, bearing frequency, ISO classification, and summary generation in one call stack.
Size delta 163 → 212 lines  (+49)
Suggested split
_load_and_validate_signal() · _run_dsp_pipeline() · _classify_and_summarise()
Open
PR-P1-3 Extract MCP orchestration from executor
src/agent/plan_execute/executor.py · 368 lines
PR #274 extracted shared CLI boilerplate — unrelated to this finding. MCP tool argument parsing and execution logic remain coupled inside the Plan/Execute orchestrator.
Fix
Extract into _mcp_handler.py. Executor retains only Plan/Execute flow.
Open
P2 DRY violations · CLI decompose · Bare exceptions 2 open · 1 partial
PR-P2-1 DRY refactor — repeated load-filter-error pattern
src/servers/wo/tools.py · 7 tool functions
Identical load_data → filter → return ErrorResult boilerplate duplicated across 7 functions. Any change to error handling requires 7 synchronized edits.
Open
PR-P2-2 Complete CLI run function decomposition
src/agent/cli.py · _run() · 146 lines
PR #274 extracted shared boilerplate to _cli_common.py. The plan-execute CLI's _run() itself still lacks internal stage split, blocking unit testing of the CLI entry point.
Remaining
Split into _setup_environment() / _parse_arguments() / _execute_agent().
Partial
PR-P2-3 Replace bare exception catches with typed errors
src/agent/plan_execute/executor.py · L74, L100, L177
Three except Exception blocks suppressed via # noqa: BLE001 rather than replaced. Masks TimeoutError, MCPProtocolError, and domain-specific failures.
Open
P3 Dead code · Docstring inflation 2 open
PR-P3-1 Remove or relocate dead utility functions
src/agent/plan_execute/executor.py · L333, L349
_resolve_args and _parse_tool_call retained via comment "kept for tests" but live in the production module. Move to test utilities or remove if superseded.
Open
PR-P3-2 Trim docstring inflation
src/llm/_litellm.py
14-line docstrings on 9-line functions. Compress to concise single-line or short-block format.
Open
✓ Fixed Resolved this sprint 1 item
PR-P0-1c OTEL SDK guards in tracing module
src/observability/tracing.py · closed by PR #272
Bare OTEL import wrapped with try/except ImportError. Fallback stubs defined. runspan.py was not in scope — tracked as PR-P0-1a.
Fixed
P4 Evidence Ledger — Governance Scoring new context
PR-P4-1 Implement governance scorer — distinguish tool-path compliance from hallucinated answers
src/evaluation/scorers/ · src/evaluation/models.py
PR #280 merged evaluation scaffolding: Evaluator, scorers/ registry, metrics_from_trajectory(). The code_based.py scorer stub raises NotImplementedError.

Gap: Agent A (hallucinated correct answer) and Agent C (hallucinated safety-critical output) score identically to Agent B (correct tool chain) under current scorers.
Effort revised 3–4 days → 1–2 days
4-step action
01Add GovernancePacket to models.py: mgp_coverage, hard_gate_triggered, process_score
02Implement scorers/governance.py — compare required tool list against unique_tools
03Add MGP metadata to scenario records (pattern type, required tools, gate condition)
04Register scorer in src/evaluation/__init__.py
New ctx
No items match the current filter.