语言
主题
回到博客首页
筛选标签

#mllm

全部文章: 10

2025-11-2414 minZH

InternVL Series

Cascade RL introduced in InternVL3.5, which enhances reasoning through a two stage process: offline RL for stable convergence , efficiently…

阅读全文
2025-11-236 minZH

ViT

Attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks whi…

阅读全文
2025-10-293 minZH

AIMv2

A novel method for pre training of large scale vision encoders, based on autoregressive pretraining to a multimodal setting(image and text)…

阅读全文
2025-08-022 minEN

GLM-4.1V-Thinking

GLM 4.1V Thinking (9B base/thinking) is a VLM designed to advance general purpose multimodal reasoning . The model gains the upper capabili…

阅读全文
2025-05-126 minZH

DeepSeek VL Series

Three key dimensions of the approaches: data construction : diverse, scalable, extensively covers real world scenarios, knowledge based con…

阅读全文
2025-04-175 minZH

Qwen2.5 VL

Main contributions 1. implement window attention in the visual encoder to optimize inference efficiency 2. introduce dynamic FPS sampling ,…

阅读全文
2025-03-139 minZH

CLIP

CV systems that are trained to predict a fixed set of predetermined object categories are restricted from the supervision limitations of ge…

阅读全文
2025-03-131 minZH

LLaVA 系列

use Vicuna(LLaMA 7B) as the LLM $f {\phi}(\cdot)$ parameterized by $\phi$, use a pre trained CLIP vision encoder ViT L/14 , provide the vis…

阅读全文
2025-03-137 minZH

Qwen2 VL

Qwen2 VL introduces Naive Dynamic Resolution mechanism : enables the model to dynamically process images of varying resolutions into differ…

阅读全文