← Back to MoNaVLA

Attention Collapse Mechanism

각 Kosmos LM layer에서 마지막 real token의 attention을 측정. Pure Kosmos-2(학습 전)와 Exp11/Exp13(학습 후)의 layer별 image/text ratio, 그리고 ‘text-head alive’(text region 합 > 0.05) 개수를 비교합니다.
학습 후 text-heads alive가 layer 어디에서 0이 되는가가 collapse mechanism의 layer-level signature.

Per Layer Summary

Layerpure_kosmosexp11exp13
Img%Text%text-heads alive / 32Img%Text%text-heads alive / 32Img%Text%text-heads alive / 32
074.0%26.0%30.755.1%0.0%0.050.3%0.0%0.0
142.6%57.4%31.380.9%0.0%0.043.2%0.0%0.0
227.7%72.3%28.069.1%0.0%0.027.9%0.0%0.0
347.5%52.5%29.073.9%0.0%0.044.3%0.0%0.0
457.9%42.1%27.779.6%0.0%0.055.2%0.0%0.0
555.9%44.1%26.773.1%0.0%0.045.0%0.0%0.0
666.0%34.0%30.778.0%0.0%0.058.9%0.0%0.0
761.0%39.0%32.075.9%0.0%0.064.5%0.0%0.0
857.1%42.9%31.082.8%0.0%0.065.7%0.0%0.0
959.9%40.1%31.088.4%0.0%0.078.9%0.0%0.0
1056.9%43.1%32.088.6%0.0%0.079.2%0.0%0.0
1159.3%40.7%32.094.1%0.0%0.081.4%0.0%0.0
1250.7%49.3%32.092.7%0.0%0.081.5%0.0%0.0
1359.3%40.7%31.792.8%0.0%0.087.6%0.0%0.0
1462.9%37.1%32.097.1%0.0%0.094.0%0.0%0.0
1565.2%34.8%32.096.8%0.0%0.094.1%0.0%0.0
1662.1%37.9%31.097.8%0.0%0.093.5%0.0%0.0
1763.4%36.6%32.097.3%0.0%0.090.7%0.0%0.0
1876.3%23.7%27.796.9%0.0%0.082.7%0.0%0.0
1976.3%23.7%27.098.4%0.0%0.094.2%0.0%0.0
2073.9%26.1%21.398.1%0.0%0.091.2%0.0%0.0
2176.0%24.0%23.099.6%0.0%0.086.3%0.0%0.0
2273.5%26.5%23.398.9%0.0%0.088.9%0.0%0.0
2377.4%22.6%27.391.8%0.0%0.085.3%0.0%0.0