Detection Metrics Dashboard

Tracking detector performance vs Python GT and Hand-Annotated GT.

Last updated: 2026-01-10 16:56:52 UTC

vs Python GT

Overall score
0–1
vs prev:
F1
Harmonic mean
vs prev:
Precision / Recall
Latest
P / R
FP / FN
Counts
false positives / negatives

vs Hand-Annotated GT

Overall score
0–1
vs prev:
F1
Harmonic mean
vs prev:
Precision / Recall
Latest
P / R
FP / FN
Counts
false positives / negatives
Overall score over time
Precision / Recall / F1 Both GTs per commit
False positives / negatives Both GTs per commit
Mean IoU vs combined error Both GTs - matched detections only
Per-commit metrics
Time (UTC) Commit Python GT Hand-Annotated GT
Overall P R F1 FP FN Overall P R F1 FP FN