Plugin Perf Troubleshooting
Use this guide when plugin perf gates fail or CI compare output reports regressions.
1) Reproduce Locally With Artifacts
./scripts/perf-plugin-command-latency.sh \
--iterations 20 \
--warmup 5 \
--max-p95-ms 250 \
--max-p99-ms 350 \
--artifact-json /tmp/plugin-command-latency.json
./scripts/perf-plugin-runtime-matrix.sh \
--iterations 20 \
--warmup 5 \
--artifact-dir /tmp/plugin-runtime-matrix
For scale behavior:
./scripts/perf-plugin-runtime-matrix.sh \
--iterations 8 \
--warmup 2 \
--scale-profile medium \
--artifact-dir /tmp/plugin-runtime-matrix-scale
2) Distinguish Startup Noise vs Steady-State Regression
- Compare startup_ms to steady-state metrics (latency_steady_ms).
- A large startup spike with healthy steady-state can indicate cold cache/process startup noise.
- Re-run with warmup and multiple iterations before treating as a true regression.
3) Compare Against Baselines
./scripts/perf-plugin-artifact-compare.sh \
--candidate-dir /tmp/plugin-runtime-matrix \
--baseline-dir docs/perf-baselines/runtime-matrix \
--warn-regression-ms 20
Use scale baseline compare for scale scenarios:
./scripts/perf-plugin-artifact-compare.sh \
--candidate-dir /tmp/plugin-runtime-matrix-scale \
--baseline-dir docs/perf-baselines/runtime-matrix-scale \
--warn-regression-ms 30
For variance-focused comparison (all metrics: startup, p95, p99, avg, steady-state p95/p99/avg), run repeated samples:
./scripts/perf-plugin-variance.sh --runs 3 --iterations 8 --warmup 2
The compare output includes per-metric:
- delta_median, delta_mean, delta_min, delta_max, delta_stddev
- status (OK, IMPROVED, SPIKE, WARN)
- variance classification (stable, likely_noise, likely_regression, likely_regression_with_variance)
To evaluate compare output against policy (warn-only by default):
./scripts/perf-variance-policy-check.sh \
--report-dir /tmp/bmux-compare-reports \
--policy-file docs/perf-baselines/variance-policy.json \
--mode warn
Use --mode soft-fail to trial enforcement locally before promoting CI behavior. Allowlist suppressions require an expires_on date and CI warns when entries expire.
Scenario-level enforcement is controlled in docs/perf-baselines/variance-policy.json. Keep plugin-list-json and other known noisy paths in warn mode until they meet promotion criteria.
4) Baseline Update Policy
Refresh baselines only when behavior is intentionally changed or improved.
- Do not refresh baselines to mask accidental regressions.
- Keep baseline refresh in a dedicated PR when possible.
- Include before/after compare output in PR description.
Baseline refresh commands are documented in docs/perf-baselines/README.md. After refreshing baseline artifacts, refresh/check metadata with scripts/perf-baseline-metadata.sh so CI staleness checks stay meaningful.
5) What to Include in Perf PRs
- command lines used (iterations, warmup, scale profile/count)
- artifact file paths
- compare output (OK/IMPROVED/WARN lines)
- whether failures were startup-only or steady-state
- whether policy matches were allowlisted, warn-only, or soft-fail