← Back to main

Exp14 Feature Ablation: BBox vs Image

동일 MLP 용량(256→128→64)에서 input feature 조합만 바꿔 Step 1→Step 2의 +7.5%p 향상이 image feature 덕분인지 아키텍처 용량 증가 덕분인지 분리한다. 각 조건 5 split seed로 mean ± std 보고.

BBox-only

67.4%

± 9.8% (5 seeds)

Image-only

75.6%

± 0.8% (5 seeds)

BBox+Image

76.7%

± 1.3% (5 seeds)

PM per Path Type (seed-averaged)

Path Type	BBox-only	Image-only	BBox+Image
center_straight	90.0%	78.6%	81.4%
center_left	45.6%	64.4%	71.1%
center_right	43.3%	57.8%	47.8%
left_straight	83.3%	90.0%	91.1%
left_left	70.5%	69.5%	74.7%
left_right	67.4%	83.2%	82.1%
right_straight	83.3%	85.6%	88.9%
right_left	65.5%	77.5%	80.5%
right_right	62.1%	74.1%	72.7%

설정
MLP backbone: Linear(d, 256)→ReLU→Dropout(0.25)→Linear(256,128)→ReLU→Dropout(0.2)→Linear(128,64)→ReLU→Linear(64,8)
lr=2e-3, epochs=220, batch=128, AdamW, inverse-frequency class weights.
Dataset: bbox_dataset.json (45 eps, 794 frames, Pure HF Kosmos-2 grounding). Split: episode-level stratified 80/20, 5 seeds.