V5 Evaluation Protocol

loss, 오프라인 PM/DM, rollout, 실기 테스트를 하나의 공식 판정 흐름으로 묶는 V5 평가 프로토콜.

Dev Log Sanity Main

핵심 원칙

정성 평가는 발견용, 정량 평가는 판정용입니다. V5 실험은 학습 loss만으로 닫히지 않고, perception → policy → rollout → real-world 평가 단계를 통과해야 완료된 것으로 봅니다.

Policy 실험이든 grounding 실험이든, 아래 레이어를 어디까지 통과했는지로 공식 상태를 판정합니다.

BBox IoU, center offset, direction accuracy.

PM, DM, confusion matrix, forward bias, stop/turn recall.

success rate, timeout, deviation, overshoot, recovery.

고정 시나리오 success rate, intervention count, completion time.

Policy 실험은 Layer 2와 Layer 3를 모두 통과해야 합니다. Grounding 실험은 Layer 1이 우선이지만, policy transfer를 주장하려면 Layer 3 연결 증거가 필요합니다.

Exp	Split	Perception	PM	DM	Forward Bias	Stop Recall	Sim Success	Real Success	Verdict
Exp04	val	N/A	?	?	?	?	?	?	미완료
Exp09	val	N/A	85.7	?	높음	낮음 추정	?	?	bias 지속
Exp10	val	IoU 0.87	간접	간접	N/A	stop discrepancy	미연결	?	grounding 성공
Exp11	val	N/A	미실행	미실행	미실행	미실행	미실행	미실행	계획

forward_collapsefalse_stopmissed_stoplate_turnearly_turnleft_right_confusionrotation_missingoscillationovershootperception_misstrajectory_divergence