筛选标签

#mllm

全部文章: 10

2025-12-165 minZH

Vision encoder + MLP based vision language merger + LLM

2025-11-2414 minZH

Cascade RL introduced in InternVL3.5, which enhances reasoning through a two stage process: offline RL for stable convergence , efficiently…

2025-11-236 minZH

Attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks whi…

2025-10-293 minZH

A novel method for pre training of large scale vision encoders, based on autoregressive pretraining to a multimodal setting(image and text)…

2025-08-022 minEN

GLM 4.1V Thinking (9B base/thinking) is a VLM designed to advance general purpose multimodal reasoning . The model gains the upper capabili…

2025-05-126 minZH

Three key dimensions of the approaches: data construction : diverse, scalable, extensively covers real world scenarios, knowledge based con…

2025-04-175 minZH

Main contributions 1. implement window attention in the visual encoder to optimize inference efficiency 2. introduce dynamic FPS sampling ,…

2025-03-139 minZH

CV systems that are trained to predict a fixed set of predetermined object categories are restricted from the supervision limitations of ge…

2025-03-131 minZH

use Vicuna(LLaMA 7B) as the LLM $f {\phi}(\cdot)$ parameterized by $\phi$, use a pre trained CLIP vision encoder ViT L/14 , provide the vis…

2025-03-137 minZH

Qwen2 VL introduces Naive Dynamic Resolution mechanism : enables the model to dynamically process images of varying resolutions into differ…