语言
主题
博客
Research notes / Markdown / Math / Code

长文、实验、以及逐步沉淀下来的清晰思考。

文章以 Markdown 为底层内容格式,支持数学公式与代码高亮,后续可直接从 Obsidian 对齐导入。

12live notes
6tag clusters
MDXmath-ready prose
最新文章
按研究方向归档
2025-11-2414 minZH

InternVL Series

Cascade RL introduced in InternVL3.5, which enhances reasoning through a two stage process: offline RL for stable convergence , efficiently…

阅读全文
2025-11-236 minZH

ViT

Attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks whi…

阅读全文
2025-10-293 minZH

AIMv2

A novel method for pre training of large scale vision encoders, based on autoregressive pretraining to a multimodal setting(image and text)…

阅读全文
2025-08-022 minEN

GLM-4.1V-Thinking

GLM 4.1V Thinking (9B base/thinking) is a VLM designed to advance general purpose multimodal reasoning . The model gains the upper capabili…

阅读全文
2025-05-126 minZH

DeepSeek VL Series

Three key dimensions of the approaches: data construction : diverse, scalable, extensively covers real world scenarios, knowledge based con…

阅读全文
2025-04-175 minZH

Qwen2.5 VL

Main contributions 1. implement window attention in the visual encoder to optimize inference efficiency 2. introduce dynamic FPS sampling ,…

阅读全文
2025-03-139 minZH

CLIP

CV systems that are trained to predict a fixed set of predetermined object categories are restricted from the supervision limitations of ge…

阅读全文
2025-03-131 minZH

LLaVA 系列

use Vicuna(LLaMA 7B) as the LLM $f {\phi}(\cdot)$ parameterized by $\phi$, use a pre trained CLIP vision encoder ViT L/14 , provide the vis…

阅读全文
2025-03-137 minZH

Qwen2 VL

Qwen2 VL introduces Naive Dynamic Resolution mechanism : enables the model to dynamically process images of varying resolutions into differ…

阅读全文