Exp14 Step 1: BBox Feature MLP

Pure HF Kosmos-2 grounding에서 얻은 (cx, cy, area, has_bbox) × history=3 window 특징으로 8-class action을 예측하는 작은 MLP 학습.

Rule-based

41.1%

(baseline)

MLP (learned)

74.5%

(390/525)

PM per Path Type (test split)

참고: Train split은 episode-level stratified 80/20. MLP 입력은 최근 3프레임 BBox. Rule-based보다 유의미하게 나은지 확인하여 Step 2(image feature 결합) 진행 여부 판정.