GSPO Paper Notes
对 Group Sequence Policy Optimization 的动机、目标函数和稳定性分析的阅读笔记。
阅读全文对 Group Sequence Policy Optimization 的动机、目标函数和稳定性分析的阅读笔记。
阅读全文对 Transformer 论文的结构化阅读笔记,包含自注意力、位置编码、训练策略与常见为什么问题。
阅读全文Vision encoder + MLP based vision language merger + LLM
阅读全文Cascade RL introduced in InternVL3.5, which enhances reasoning through a two stage process: offline RL for stable convergence , efficiently…
阅读全文Attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks whi…
阅读全文A novel method for pre training of large scale vision encoders, based on autoregressive pretraining to a multimodal setting(image and text)…
阅读全文GLM 4.1V Thinking (9B base/thinking) is a VLM designed to advance general purpose multimodal reasoning . The model gains the upper capabili…
阅读全文Three key dimensions of the approaches: data construction : diverse, scalable, extensively covers real world scenarios, knowledge based con…
阅读全文Main contributions 1. implement window attention in the visual encoder to optimize inference efficiency 2. introduce dynamic FPS sampling ,…
阅读全文CV systems that are trained to predict a fixed set of predetermined object categories are restricted from the supervision limitations of ge…
阅读全文use Vicuna(LLaMA 7B) as the LLM $f {\phi}(\cdot)$ parameterized by $\phi$, use a pre trained CLIP vision encoder ViT L/14 , provide the vis…
阅读全文Qwen2 VL introduces Naive Dynamic Resolution mechanism : enables the model to dynamically process images of varying resolutions into differ…
阅读全文