← Back to main
Exp19: Step2 + Goal-Near Proxy Features
Exp14 Step2의 bbox history + 16x16 grayscale image feature에
non-leaky proxy signal 4개
(area, center_error_x, abs_delta_cx, recent_bbox_consistency)
를 추가한 실험입니다.
Step 2 Reference
75.9%
bbox + image baseline
Exp19 Proxy
76.6%
(121/158)
Delta vs Step 2
+0.6%
same split protocol
PM per Path Type (test split)
| Path Type | Correct/Total | PM |
| center_straight | 11/14 | 78.6% |
| center_left | 12/18 | 66.7% |
| center_right | 10/18 | 55.6% |
| left_straight | 17/18 | 94.4% |
| left_left | 15/19 | 78.9% |
| left_right | 13/19 | 68.4% |
| right_straight | 17/18 | 94.4% |
| right_left | 16/18 | 88.9% |
| right_right | 10/16 | 62.5% |
Proxy pack
area, center_error_x, abs_delta_cx, recent_bbox_consistency.
Split protocol과 backbone 용량은 Step 2와 동일하게 유지했습니다.
서버 배포 (2026-05-01)
End-to-end collapse 우회 — Exp19 Proxy를 별도 FastAPI 서버로 배포.
Exp35/36/38 평가에서 end-to-end가 100% FORWARD collapse 확인된 후, decomposition baseline을 메인 운영선으로 격상.
구성: Pure HF Kosmos-2 grounding → bbox+image MLP (Exp19 proxy features) → 8-class action.
가중치 (full 150ep, 2026-05-01)
| 항목 | 값 |
| Dataset | docs/v5/bbox_nav_step1/bbox_dataset_full.json (150ep, 2626 frames) |
| Train / Test windows | 2101 / 525 |
| Best test_acc | 76.4% (220 epoch best) |
| Weights | docs/v5/bbox_nav_exp19_proxy/exp19_proxy_mlp.pt (450KB) |
실행 (minum 서버)
source .venv/bin/activate
export VLA_API_KEY=<your-key>
export VLA_PROXY_DATASET_FILE=$PWD/docs/v5/bbox_nav_step1/bbox_dataset_full.json
export VLA_PROXY_DEVICE=cuda # MLP는 cpu도 충분 (~1ms)
export VLA_PROXY_GROUNDING_DEVICE=cuda # GPU 권장 (CPU는 frame당 ~21초)
python3 robovlm_nav/serve/proxy_inference_server.py --port 8001
API
| Endpoint | 설명 |
GET /health | 모델 로드 / test_acc / device 상태 |
POST /predict | X-API-Key 헤더 필요. 입력 {image: b64, instruction: str} → 출력 {action, action_3d, predicted_class, predicted_label, bbox, grounding_caption, ...} |
POST /reset | history 초기화 (에피소드 시작 시) |
Smoke Test 결과 (frame 0/9/17, center_left 에피소드)
| Frame | Predicted | BBox entity | Caption |
| 0 (start) | RIGHT | caption:center fallback | "the center of the image, with the white wall..." |
| 9 (mid) | FORWARD | "the gray air conditioner" (basket 오인식) | "the end of the room, and the gray air conditioner..." |
| 17 (end) | FORWARD | large white wall (is_basket: false) | "the end of a hallway, with a white wall..." |
클래스 다양성 확인: {FORWARD, RIGHT} → end-to-end Exp35/36/38의 100% FORWARD collapse와 다름.
Pure HF grounding이 회색 바스켓을 “air conditioner”로 부르는 건 알려진 recognition 33% 이슈
(RECOGNITION_PROOF_RESULT_20260428) — 방향 신호(cx, cy, area)는 정상.
두 서버 호환
billy (/home/billy/25-1kp/MoNaVLA/ROS_action/...) ↔ minum (/home/minum/minum/26CS/MoNa-pi/...) 자동 resolver 추가.
VLA_PROXY_DATA_DIR 환경변수로 강제 가능.