주제에 vision-language 태그가 달렸습니다

글	댓글	조회수	활동
FastVLM: 고해상도에서도 빠르고 정확하게 동작하는 시각-언어 모델(VLM) 구현에 대한 연구 (feat. Apple) 읽을거리&정보공유 apple , multimodal , vision-language , vision-transformer , fastvit , llava , mlx , on-device , fastvlm , fastvithd , time-to-first-token , pareto-optimal-curve , high-resolution	0	582	7월 27, 2025
smolVLM과 llama.cpp를 활용한 실시간 웹캠 기반 시각-언어 모델 데모 프로젝트 읽을거리&정보공유 vision-language , llamacpp , small-multimodal , demo , smolvlm	0	623	5월 14, 2025
GhostWriter: reMarkable2 기기를 위한 인공지능 필기 보조 도구 읽을거리&정보공유 vision-language , ghostwriter , remarkable-tablet , handwriting-to-text , handwriting-to-image	0	247	2월 9, 2025
Ollama, Llama 3.2 Vision 모델 추가 및 사용 가능 읽을거리&정보공유 vision-language , ollama , llama-3-2 , llama-3-2-vision	0	1942	11월 9, 2024
Apple, 멀티모달 LLM 'MM1'에 대한 연구 결과 발표 (모델 공개X) 읽을거리&정보공유 apple , vision-language , mllm , mm1 , apple-mm1 , axlearn , clip , dfn , dfn-2b , dfn-5b , dfn-clip , paper	3	3189	11월 7, 2024
Molmo & PixMo: 공개된 가중치 모델(Molmo)과 데이터(PixMo)로 이루어진 최첨단 멀티모달 모델 (feat. AllenAI) 읽을거리&정보공유 multimodal , vision-language , paper , allen-ai , open-weights , multimodal-dataset , molmo , pixmo	0	612	10월 4, 2024
CAPTURE: Multimodal LLM(LVLM)의 이미지 캡션 생성 성능 평가 지표 (벤치마크 & 평가 데이터셋) 읽을거리&정보공유 multimodal , vision-language , large-vision-language-model , benchmark , capture , evaluation , mllm-benchmark , lvlm , image-captioning	0	613	9월 5, 2024
VLMs are blind: 시각-언어 모델이 실패하는 (인간에게는 쉬운) 시각적 작업들에 대한 연구 (feat. BlindTest) 읽을거리&정보공유 dataset , multimodal , vision-language , paper , large-vision-language-model , benchmark , blindtest	0	1145	7월 13, 2024
[GN] GPT-4o는 이미지를 어떻게 인코딩할까? 읽을거리&정보공유 transformer , geeknews , openai , vision-language , embedding , clip , llama3 , vq-vae , visual-encoder , gpt-4o , tesseract	1	909	6월 15, 2024
Dragonfly: 다중 해상도 줌을 갖춘, Llama-3 기반 Vision-Language 모델 (feat. TogetherAI) 읽을거리&정보공유 together , vision-language , paper , medical , llama3 , dragonfly , dragonfly-med , multi-resolution , open-weights	0	467	6월 12, 2024
[2024/05/27 ~ 06/02] 이번 주의 주요 ML 논문 (Top ML Papers of the Week) 읽을거리&정보공유 vision-language , paper , top-ml-papers-of-the-week , survey-paper , aya , cope , symbolic-cot , embedding-for-arithmetic , gnn-rag , graph-neural-retrieval , aaren , attention-as-rnn , lc-boost , simpo , preference-optimization	0	630	6월 3, 2024
Idefics2, Hugging Face가 공개한 8B 규모의 멀티모달 모델 (Vision-Language) 읽을거리&정보공유 huggingface , multimodal , vision-language , idefics2	0	812	5월 10, 2024
ScreenAI: UI와 시각적 언어 이해를 위한 시각-언어 모델(feat. Google) 읽을거리&정보공유 google , vision-language , detr , self-supervised , screenai , ui-understanding , pali , flexible-patching , pix2struct	0	873	4월 10, 2024
VLM(Vision-Language Model)과 시각 연역 추론에 대한 Apple의 연구 읽을거리&정보공유 apple , multimodal , vision-language , benchmark , visual-reasoning	0	1504	3월 12, 2024
[2024/01/29 ~ 02/04] 이번 주의 주요 ML 논문 (Top ML Papers of the Week) 읽을거리&정보공유 multimodal , vision-language , paper , llm-compression , rag , survey-paper , hallucination , llm-math , olmo , mm-llm , crag , corrective-rag , moe-llava , wrap , slicegpt	0	1613	2월 5, 2024
MoE-LLaVA: 대규모 Vision-Language 모델을 위한 전문가 혼합 기법 적용 (Mixture of Experts for Large Vision-Language Models) 읽을거리&정보공유 multimodal , vision-language , mixture-of-experts , llava , moe-llava	0	1422	2월 6, 2024
[2024/01/22 ~ 01/28] 이번 주의 주요 ML 논문 (Top ML Papers of the Week) 읽을거리&정보공유 llm-agent , multimodal , vision-language , paper , top-ml-papers-of-the-week , survey-paper , medusa , llm-security , lumiere , depth-anything , knowledge-fusion , mambabyte , warm , rtvlm , agentboard	0	910	1월 29, 2024
[2023/12/25 ~ 12/31] 이번 주의 주요 ML 논문 (Top ML Papers of the Week) 읽을거리&정보공유 multimodal , vision-language , paper , top-ml-papers-of-the-week , survey-paper , llm-reasoning , lmm , gemini , cogagent , q-star , promptbench , llm-fact-recall , mathpile , llara	0	889	1월 1, 2024
[TLDR] 오늘의 AI 뉴스, 2023-10-03: AI 디바이스, Humane Ai Pin 🧷, 리와인드 펜던트 📿, 효율적인 비디오 모델 학습 📹 읽을거리&정보공유 tldr-ai , vision-language , kosmos-2 , unilm , workers-ai , humane-ai-pin , rewind-pendant , streamingllm , gaflow , gaussian-attention , ai-saas , opencompass , vespio , bestever	1	411	12월 31, 2023
AnomalyGPT: 대형 시각-언어 모델을 사용한 산업 이상 탐지 (AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models) 읽을거리&정보공유 vision-language , anomalygpt , iad , industrial-anomaly-detection , large-vision-language-model	0	964	9월 19, 2023
[GN] XrayGPT: 메디컬 비전-언어 모델(VLM)을 이용한 흉부 방사선 사진 요약 읽을거리&정보공유 geeknews , vision-language , xraygpt , medical , x-ray	0	506	7월 4, 2023
Salesforce, InstructBLIP 모델의 논문 / 코드 / 가중치 공개 읽을거리&정보공유 instructblip , multimodal , salesforce , lavis , blip2 , gpt-4 , minigpt-4 , vision-language	1	1640	5월 17, 2023

FastVLM: 고해상도에서도 빠르고 정확하게 동작하는 시각-언어 모델(VLM) 구현에 대한 연구 (feat. Apple)

apple , multimodal , vision-language , vision-transformer , fastvit , llava , mlx , on-device , fastvlm , fastvithd , time-to-first-token , pareto-optimal-curve , high-resolution

0

582

7월 27, 2025

smolVLM과 llama.cpp를 활용한 실시간 웹캠 기반 시각-언어 모델 데모 프로젝트

읽을거리&정보공유

vision-language , llamacpp , small-multimodal , demo , smolvlm

0

623

5월 14, 2025

GhostWriter: reMarkable2 기기를 위한 인공지능 필기 보조 도구

읽을거리&정보공유

vision-language , ghostwriter , remarkable-tablet , handwriting-to-text , handwriting-to-image

0

247

2월 9, 2025

Ollama, Llama 3.2 Vision 모델 추가 및 사용 가능

읽을거리&정보공유

vision-language , ollama , llama-3-2 , llama-3-2-vision

0

1942

11월 9, 2024

Apple, 멀티모달 LLM 'MM1'에 대한 연구 결과 발표 (모델 공개X)

읽을거리&정보공유

apple , vision-language , mllm , mm1 , apple-mm1 , axlearn , clip , dfn , dfn-2b , dfn-5b , dfn-clip , paper

3

3189

11월 7, 2024

Molmo & PixMo: 공개된 가중치 모델(Molmo)과 데이터(PixMo)로 이루어진 최첨단 멀티모달 모델 (feat. AllenAI)

읽을거리&정보공유

multimodal , vision-language , paper , allen-ai , open-weights , multimodal-dataset , molmo , pixmo

0

612

10월 4, 2024

CAPTURE: Multimodal LLM(LVLM)의 이미지 캡션 생성 성능 평가 지표 (벤치마크 & 평가 데이터셋)

읽을거리&정보공유

multimodal , vision-language , large-vision-language-model , benchmark , capture , evaluation , mllm-benchmark , lvlm , image-captioning

0

613

9월 5, 2024

VLMs are blind: 시각-언어 모델이 실패하는 (인간에게는 쉬운) 시각적 작업들에 대한 연구 (feat. BlindTest)

읽을거리&정보공유

dataset , multimodal , vision-language , paper , large-vision-language-model , benchmark , blindtest

0

1145

7월 13, 2024

[GN] GPT-4o는 이미지를 어떻게 인코딩할까?

읽을거리&정보공유

transformer , geeknews , openai , vision-language , embedding , clip , llama3 , vq-vae , visual-encoder , gpt-4o , tesseract

1

909

6월 15, 2024

Dragonfly: 다중 해상도 줌을 갖춘, Llama-3 기반 Vision-Language 모델 (feat. TogetherAI)

읽을거리&정보공유

together , vision-language , paper , medical , llama3 , dragonfly , dragonfly-med , multi-resolution , open-weights

0

467

6월 12, 2024

[2024/05/27 ~ 06/02] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

읽을거리&정보공유

vision-language , paper , top-ml-papers-of-the-week , survey-paper , aya , cope , symbolic-cot , embedding-for-arithmetic , gnn-rag , graph-neural-retrieval , aaren , attention-as-rnn , lc-boost , simpo , preference-optimization

0

630

6월 3, 2024

Idefics2, Hugging Face가 공개한 8B 규모의 멀티모달 모델 (Vision-Language)

읽을거리&정보공유

huggingface , multimodal , vision-language , idefics2

0

812

5월 10, 2024

ScreenAI: UI와 시각적 언어 이해를 위한 시각-언어 모델(feat. Google)

읽을거리&정보공유

google , vision-language , detr , self-supervised , screenai , ui-understanding , pali , flexible-patching , pix2struct

0

873

4월 10, 2024

VLM(Vision-Language Model)과 시각 연역 추론에 대한 Apple의 연구

읽을거리&정보공유

apple , multimodal , vision-language , benchmark , visual-reasoning

0

1504

3월 12, 2024

[2024/01/29 ~ 02/04] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

읽을거리&정보공유

multimodal , vision-language , paper , llm-compression , rag , survey-paper , hallucination , llm-math , olmo , mm-llm , crag , corrective-rag , moe-llava , wrap , slicegpt

0

1613

2월 5, 2024

MoE-LLaVA: 대규모 Vision-Language 모델을 위한 전문가 혼합 기법 적용 (Mixture of Experts for Large Vision-Language Models)

읽을거리&정보공유

multimodal , vision-language , mixture-of-experts , llava , moe-llava

0

1422

2월 6, 2024

[2024/01/22 ~ 01/28] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

읽을거리&정보공유

llm-agent , multimodal , vision-language , paper , top-ml-papers-of-the-week , survey-paper , medusa , llm-security , lumiere , depth-anything , knowledge-fusion , mambabyte , warm , rtvlm , agentboard

0

910

1월 29, 2024

[2023/12/25 ~ 12/31] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

읽을거리&정보공유

multimodal , vision-language , paper , top-ml-papers-of-the-week , survey-paper , llm-reasoning , lmm , gemini , cogagent , q-star , promptbench , llm-fact-recall , mathpile , llara

0

889

1월 1, 2024

[TLDR] 오늘의 AI 뉴스, 2023-10-03: AI 디바이스, Humane Ai Pin 🧷, 리와인드 펜던트 📿, 효율적인 비디오 모델 학습 📹

읽을거리&정보공유

tldr-ai , vision-language , kosmos-2 , unilm , workers-ai , humane-ai-pin , rewind-pendant , streamingllm , gaflow , gaussian-attention , ai-saas , opencompass , vespio , bestever

1

411

12월 31, 2023

AnomalyGPT: 대형 시각-언어 모델을 사용한 산업 이상 탐지 (AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models)

읽을거리&정보공유

vision-language , anomalygpt , iad , industrial-anomaly-detection , large-vision-language-model

0

964

9월 19, 2023

[GN] XrayGPT: 메디컬 비전-언어 모델(VLM)을 이용한 흉부 방사선 사진 요약

읽을거리&정보공유

geeknews , vision-language , xraygpt , medical , x-ray