# Test Runner

Use this scenario for continuous regression checks while developing a feature.

Primary goal:
1. Run relevant regression test plans by labels while development is in progress.
2. Provide fast feedback loops and tight failure triage.
3. Keep developers shipping while guarding against cross-feature regressions.

## Inputs to Collect

1. Feature or ticket context.
2. Target labels (example: `auth`, `billing`, `checkout`).
3. Optional plan filter text if multiple plans exist.
4. Override parameters in `key=value,key2=value2` format.
5. Desired cadence: one-off, after each commit slice, or pre-PR gate.

## MCP Behavior Notes

Ground workflow in current MCP implementation:
1. `run_test_plans` accepts:
   - `id` (plan id)
   - `labels` (array)
   - `overrideParameters` (comma-separated string)
2. `get_test_plan_run` returns run state and metadata.
3. `get_labels` returns label ids and descriptions (colors stripped).
4. `overrideParameters` parsing is strict and fails on malformed `key=value` pairs.

## Preflight Protocol

Execute before every run:
1. Confirm plan id.
2. Confirm exact labels to include.
3. Confirm override parameter string format.
4. Confirm whether this is baseline run or rerun.
5. If labels are unclear, call `get_labels` and propose a minimal high-signal set.

## Workflow

1. Resolve candidate plans with `search_test_plans`.
2. Confirm plan id and label set.
3. Trigger execution with `run_test_plans` using labels and overrides.
4. Track status with `get_test_plan_run` until terminal state.
5. Report summary with run id and run link.
6. On failure, identify failing suites and then drill into flow-level evidence when needed.
7. If additional detail is required, query recent flow runs with `search_test_suite_runs`.
8. Use `investigate_failed_test` prompt shape to present likely cause and next fix action.
9. Offer fast rerun loop with the same labels after code changes.

## Execution Guardrails

1. Never run without explicit label confirmation.
2. Validate override parameter format before running.
3. Prefer smaller, high-signal label sets for fast feedback loops.
4. Keep report concise and actionable for developers in active coding flow.
5. If run is still in progress, report partial state and explicit next poll time.
6. For repeat failures, recommend maintenance workflow to patch suite or feature behavior.

## Suggested Cadence During Feature Work

1. Run once after first implementation slice.
2. Re-run after each meaningful change set.
3. Run one final pass before PR handoff.
4. Re-run immediately after a fix to a previously failing flow.

## Failure Triage Protocol

When run fails:
1. List failing suites first.
2. List failing flows second.
3. Classify likely cause:
   - product regression
   - test fragility
   - environment/config issue
4. Suggest exactly one highest-value next action.
5. Include rerun recommendation and expected confirmation signal.

## Output Format

1. `Plan`: name and id.
2. `Labels`: exact list used.
3. `Overrides`: exact override string used, or `none`.
3. `Status`: pass/fail/running.
4. `Failures`: failing suites/flows only.
5. `Link`: test plan run URL.
6. `Next step`: rerun, investigate, maintenance fix, or merge gate decision.

## Example Operator Prompts

1. "Run regression for feature checkout with labels `checkout,payments`."
2. "Use the same labels as last run and rerun after my latest change."
3. "Run quick auth regression and tell me only blockers."

## Done Criteria

1. Plan run was triggered with confirmed labels.
2. Current run status is reported with run link.
3. Failures are triaged to actionable next step.
4. Rerun plan is proposed when appropriate.
