← Back to main
Exp14 Feature Ablation: BBox vs Image
동일 MLP 용량(256→128→64)에서 input feature 조합만 바꿔 Step 1→Step 2의
+7.5%p 향상이 image feature 덕분인지 아키텍처 용량 증가 덕분인지 분리한다.
각 조건 5 split seed로 mean ± std 보고.
BBox-only
67.4%
± 9.8% (5 seeds)
Image-only
75.6%
± 0.8% (5 seeds)
BBox+Image
76.7%
± 1.3% (5 seeds)
PM per Path Type (seed-averaged)
| Path Type | BBox-only | Image-only | BBox+Image |
| center_straight | 90.0% | 78.6% | 81.4% |
| center_left | 45.6% | 64.4% | 71.1% |
| center_right | 43.3% | 57.8% | 47.8% |
| left_straight | 83.3% | 90.0% | 91.1% |
| left_left | 70.5% | 69.5% | 74.7% |
| left_right | 67.4% | 83.2% | 82.1% |
| right_straight | 83.3% | 85.6% | 88.9% |
| right_left | 65.5% | 77.5% | 80.5% |
| right_right | 62.1% | 74.1% | 72.7% |
설정
MLP backbone: Linear(d, 256)→ReLU→Dropout(0.25)→Linear(256,128)→ReLU→Dropout(0.2)→Linear(128,64)→ReLU→Linear(64,8)
lr=2e-3, epochs=220, batch=128, AdamW, inverse-frequency class weights.
Dataset: bbox_dataset.json (45 eps, 794 frames, Pure HF Kosmos-2 grounding).
Split: episode-level stratified 80/20, 5 seeds.