[2025/01/27 ~ 02/02] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 2월 3, 2025, 7:00오전

[2025/01/27 ~ 02/02] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선정된 논문들을 살펴보면, 다양한 주제와 기법들이 다루어지고 있는 것을 확인할 수 있습니다. 특히, 이번 주에는 AI/ML 모델의 개선과 최적화에 중점을 둔 연구가 많이 포함되어 있습니다. 여러 논문에서 공통적으로 다양한 모델의 최적화를 통해 성능을 높이고자 하는 시도가 관찰됩니다. 예를 들어, OpenAI의 o3-mini는 비용 효율성을 강화한 추론 모델로, 과학, 수학 및 코딩과 같은 분야에서 우수한 성능을 보입니다. 또한 Qwen은 긴 문맥 길이를 처리할 수 있는 LLM을 공개하여 성능 면에서 GPT-4o-mini를 능가하고 있습니다.
멀티모달 이해와 생성 분야에서도 새로운 발전이 있습니다. Janus-Pro 모델은 멀티모달 이해와 텍스트-이미지 생성 기능을 크게 개선한 버전으로, 다양한 벤치마크에서 높은 성과를 보였습니다. TokenVerse는 개인화된 이미지 생성을 위한 새로운 기술을 제안하며, 텍스트-이미지 확산 모델을 이용해 복잡한 시각적 개념을 추출하고 조합할 수 있게 해줍니다.
이번 주 논문들에서 뚜렷한 경향은 AI 모델의 성능을 높이기 위해 다양한 최적화 및 개선 기법을 적극 활용하고자 한다는 점입니다. 효율적인 리소스 사용을 위한 모델 경량화, 멀티에이전트 강화 학습을 통한 답변 생성의 질 향상, 그리고 다양한 개념을 통합할 수 있는 개인화된 이미지 생성 방안 등은 AI/ML 기술의 발전 방향을 잘 보여주고 있습니다. 이러한 연구들은 AI의 실질적인 활용 가능성을 높이고 사용자 경험을 개선하는 데 큰 기여를 할 것으로 보입니다. 특히, 다양한 데이터 처리 및 생성 작업에서 다양성과 정확성을 동시에 향상시키려는 시도가 돋보입니다. 이러한 흐름은 AI가 보다 복잡하고 다양한 현실 세계 문제들을 해결하는 데 중요한 역할을 할 것으로 기대됩니다.

OpenAI o3-mini 시스템 카드 / OpenAI o3-mini System Card

논문 소개

OpenAI는 비용 효율적인 최신 추론 모델인 o3-mini를 ChatGPT와 API로 출시했습니다. 이 모델은 특히 과학, 수학, 코딩 등 STEM 관련 작업에 탁월하며, 이전 모델인 o1-mini의 저렴한 비용과 짧은 지연 시간을 유지합니다. 함수 호출, 구조화된 출력, 개발자 메시지와 같은 주요 개발자 기능을 도입하여 출시와 동시에 프로덕션에 바로 사용할 수 있습니다.
o3-mini는 다양한 추론 노력 수준(낮음, 중간, 높음)을 포함하고 있으며 광범위한 작업에서 성능을 향상시킵니다. o1-mini보다 24% 더 빠른 응답을 제공하고 경시대회 수학, 박사급 과학 문제, 소프트웨어 엔지니어링 작업에서 주목할 만한 성과를 거두었습니다.

OpenAI has launched o3-mini, their newest cost-efficient reasoning model, available in ChatGPT and API. The model excels in STEM-related tasks, particularly in science, math, and coding, while maintaining the low cost and reduced latency of its predecessor o1-mini. It introduces key developer features like function calling, Structured Outputs, and developer messages, making it production-ready from launch. o3-mini includes different reasoning effort levels (low, medium, and high) and improves performance across a wide range of tasks. It delivered responses 24% faster than o1-mini and achieved notable results in competition math, PhD-level science questions, and software engineering tasks.

논문 링크

https://cdn.openai.com/o3-mini-system-card.pdf

더 읽어보기

OpenAI, 더 빠르고 더 저렴한 STEM 특화 모델 o3-mini 공개 읽을거리&정보공유

[OpenAI, 빠르고 저렴한 STEM 특화 모델 o3-mini 공개] OpenAI의 o3-mini 모델 소개 OpenAI가 새로운 소형 추론 모델인 o3-mini를 공개했습니다. 이 모델은 STEM(Science/과학, Technology/기술, Engineering/공학, Math/수학) 영역에 특화된 모델로, 수학, 코딩, 과학 분야에서 강력한 성능을 제공하면서도 낮은 비용과 짧은 응답 시간을 유지하는 것이 특징입니다. 특히, o3-mini 모델은 이전 모델인 o1-mini보다 응답 속도가 24%가량 빨라지고, 첫 번째 토큰 출력 속도도 2,500ms 향상되어 더욱 빠른 AI 경험을 제공합니다. 함수 호출, 구조화된 출력, 개발자 메시지 등의 기능을 지원해 실무에서 활용하기 좋습니다. o3-mini 모델의 주요한 특징은 다음과 같습니다: STEM(수학, 과학, 코딩) 특화 성능 빠른 응답 속도 (o1-mini 대비 24% 향상) 저렴한 비용 & 낮은 지연 시간 함수 호…

https://openai.com/index/openai-o3-mini/
https://x.com/OpenAI/status/1885406586136383634

Qwen2.5-1M

논문 소개

Qwen은 최대 100만 토큰의 컨텍스트 길이를 처리할 수 있는 두 가지 오픈소스 LLM, Qwen2.5-7B-Instruct-1M과 Qwen2.5-14B-Instruct-1M을 출시합니다.
이 모델은 4K 토큰으로 시작하여 256K 토큰까지 점진적으로 증가시킨 다음 길이 추정 기법을 사용하여 100만 토큰에 도달하는 점진적 학습 방식을 기반으로 구축되었습니다. 또한 스파스 어텐션 방법을 통해 긴 입력을 3~7배 더 빠르게 처리하는 vLLM 기반 추론 프레임워크도 출시했습니다.
이 모델은 긴 문맥과 짧은 텍스트 작업 모두에서 강력한 성능을 보여줍니다. 14B 모델은 여러 긴 문맥 데이터 세트에서 GPT-4o-mini보다 성능이 뛰어나면서도 짧은 작업에서는 비슷한 성능을 유지합니다.

Qwen releases two open-source LLMs, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, that can handle context lengths of up to 1 million tokens.
The models are built on a progressive training approach, starting with 4K tokens and gradually increasing to 256K tokens, then using length extrapolation techniques to reach 1M tokens. They've also released an inference framework based on vLLM that processes long inputs 3-7x faster through sparse attention methods.
The models show strong performance on both long-context and short-text tasks. The 14B model outperforms GPT-4o-mini across multiple long-context datasets while maintaining similar performance on shorter tasks.

논문 초록(Abstract)

이 기술 문서에는 컨텍스트 길이를 100만 토큰까지 확장하는 Qwen2.5-1M 시리즈를 소개합니다. 이전 128K 버전에 비해 Qwen2.5-1M 시리즈는 긴 컨텍스트 사전 학습과 사후 학습을 통해 긴 컨텍스트 기능을 크게 향상시켰습니다. 긴 데이터 합성, 점진적 사전 훈련, 다단계 감독 미세 조정과 같은 핵심 기술을 사용하여 훈련 비용을 줄이면서 긴 컨텍스트 성능을 효과적으로 향상시킵니다. 더 많은 사용자층이 긴 문맥 모델을 사용할 수 있도록 추론 프레임워크를 제시하고 오픈소스로 공개합니다. 이 프레임워크에는 추가 학습 없이도 모델 컨텍스트 길이를 최소 4배 이상 확장할 수 있는 길이 추정 방법이 포함되어 있습니다. 추론 비용을 줄이기 위해 배포 시나리오를 위한 청크 프리필 최적화와 함께 스파스 어텐션 방법을 구현하고, 정밀도를 향상시키기 위해 스파스 정제 방법을 구현합니다. 또한 커널 최적화, 파이프라인 병렬 처리, 최적화를 비롯한 추론 엔진의 세부적인 최적화를 통해 전반적인 추론 성능을 크게 향상시킵니다. 추론 프레임워크를 활용하여 Qwen2.5-1M 모델은 100만 토큰의 컨텍스트가 있는 시나리오에서 3배에서 7배의 놀라운 사전 채우기 속도 향상을 달성합니다. 이 프레임워크는 오픈 소스 모델을 사용하여 긴 컨텍스트 처리가 필요한 애플리케이션을 개발하기 위한 효율적이고 강력한 솔루션을 제공합니다. Qwen2.5-1M 시리즈에는 현재 오픈 소스 모델인 Qwen2.5-7B-Instruct1M과 Qwen2.5-14B-Instruct-1M, 그리고 API 액세스 모델인 Qwen2.5-Turbo가 포함되어 있습니다. 평가 결과, Qwen2.5-1M 모델은 짧은 컨텍스트 시나리오에서 성능 저하 없이 긴 컨텍스트 작업에서 크게 개선된 것으로 나타났습니다. 특히 Qwen2.5-14B-Instruct-1M 모델은 긴 컨텍스트 작업에서 GPT-4o-mini보다 성능이 훨씬 뛰어나며 8배 더 긴 컨텍스트를 지원합니다.

In this report, we introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pretraining and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1883905564004241789

Janus-Pro: 데이터 및 모델 확장을 통한 통합된 멀티모달 이해 및 생성 / Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

논문 소개

야누스-프로(Janus-Pro)는 멀티모달 이해 및 생성을 위한 기존 야누스(Janus) 모델의 향상된 버전입니다. 이 모델은 더 긴 초기 학습과 집중적인 미세 조정을 통해 최적화된 학습 전략, 이해를 위한 9천만 개의 새로운 샘플과 생성을 위한 7천2백만 개의 합성 미학 샘플을 포함한 확장된 학습 데이터, 최대 7B 파라미터까지 더 큰 모델 크기로 확장 등 세 가지 주요 개선 사항이 통합되어 있습니다. 야누스 프로는 멀티모달 이해와 텍스트-이미지 생성 기능 모두에서 상당한 개선을 이루었습니다. 이 모델은 다양한 벤치마크에서 기존 솔루션보다 뛰어난 성능을 발휘하며, 이해 작업의 경우 MMBench에서 79.2점을, 텍스트-이미지 생성의 경우 GenEval에서 80%의 정확도를 달성했습니다. 특히 짧은 프롬프트와 미세한 디테일에 대한 이미지 생성 안정성과 품질도 향상되었지만, 현재의 384x384 해상도는 특정 작업에서 여전히 한계로 남아 있습니다.

An enhanced version of the previous Janus model for multimodal understanding and generation. The model incorporates three key improvements: optimized training strategies with longer initial training and focused fine-tuning, expanded training data including 90 million new samples for understanding and 72 million synthetic aesthetic samples for generation, and scaling to larger model sizes up to 7B parameters.
Janus-Pro achieves significant improvements in both multimodal understanding and text-to-image generation capabilities. The model outperforms existing solutions on various benchmarks, scoring 79.2 on MMBench for understanding tasks and achieving 80% accuracy on GenEval for text-to-image generation. The improvements also enhance image generation stability and quality, particularly for short prompts and fine details, though the current 384x384 resolution remains a limitation for certain tasks.

논문 초록(Abstract)

이번 작업에서는 이전 작업인 야누스의 고급 버전인 야누스-Pro를 소개합니다. 구체적으로 Janus-Pro는 (1) 최적화된 학습 전략, (2) 확장된 학습 데이터, (3) 더 큰 모델 크기로의 확장 기능을 통합합니다. 이러한 개선을 통해 야누스-Pro는 다중 모드 이해와 텍스트-이미지 명령어 추종 기능 모두에서 상당한 발전을 이루었으며 텍스트-이미지 생성의 안정성도 향상되었습니다. 이 작업이 이 분야의 더 많은 탐구에 영감을 주기를 바랍니다. 코드와 모델은 공개적으로 사용할 수 있습니다.

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

논문 링크

더 읽어보기

https://x.com/giffmana/status/1884011657191637126

생각은 사방에 널려 있습니다: o1-Like LLM의 언더씽킹에 대하여 / Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

논문 소개

이 연구는 o1과 유사한 LLM의 '사고' 패턴을 보다 면밀히 살펴봅니다. 우리는 최근 몇 편의 논문에서 오버씽킹의 문제점을 지적하는 것을 보았습니다.
이제 언더씽킹이라는 새로운 현상이 등장했습니다! 무슨 내용일까요? 저자들은 o1형 LLM이 올바른 해결책에 도달하기 위한 유망한 경로를 충분히 탐색하지 않고 다른 추론적 사고 사이를 자주 전환한다는 사실을 발견했습니다.

This work looks more closely at the "thinking" patterns of o1-like LLMs. We have seen a few recent papers pointing out the issues with overthinking.
There is now a new phenomenon called underthinking! What is it about? The authors find that o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution.

논문 초록(Abstract)

OpenAI의 o1과 같은 대규모 언어 모델(LLM)은 테스트 시간 계산을 확장하고 인간과 같은 깊은 사고를 보여줌으로써 복잡한 추론 작업에서 놀라운 능력을 보여줬습니다. 그러나 우리는 o1과 같은 LLM이 올바른 해법에 도달하기 위해 유망한 경로를 충분히 탐색하지 않고 다른 추론 사고 사이를 자주 전환하는 '언더씽킹'이라는 현상을 발견했습니다. 이러한 행동은 특히 어려운 수학 문제에서 추론의 깊이가 부족하고 성적이 저하되는 결과를 초래합니다. 이 문제를 체계적으로 분석하기 위해 세 가지 도전적인 테스트 세트와 두 가지 대표적인 오픈소스 O1 유사 모델에 대한 실험을 수행하여 잦은 사고 전환이 오답과 상관관계가 있음을 밝혀냈습니다. 그리고 오답의 토큰 효율성을 측정하여 과소사고를 정량화할 수 있는 새로운 지표를 소개합니다. 과소사고를 해결하기 위해 생각 전환 페널티 팁을 적용한 디코딩 전략을 제안하여 생각 간의 조기 전환을 방지하고 각 추론 경로를 더 깊이 탐구하도록 장려합니다. 실험 결과에 따르면 이 접근 방식은 모델을 미세 조정할 필요 없이 까다로운 데이터 세트에서 정확도를 향상시킵니다. 이러한 연구 결과는 o1과 같은 LLM의 추론 비효율성을 이해하는 데 기여하고 문제 해결 능력을 향상시킬 수 있는 실용적인 솔루션을 제공합니다.

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1885349576456233177

다양한 환경 설정 최적화 / Diverse Preference Optimization

논문 소개

응답 품질을 유지하면서 언어 모델 출력의 다양성 부족 문제를 해결하는 것을 목표로 하는 새로운 학습 방법인 다양성 선호도 최적화(DivPO)를 소개합니다. 핵심 과제는 RLHF와 같은 현재의 선호도 최적화 기술이 출력 확률 분포를 날카롭게 하여 모델이 매우 유사한 응답을 생성하는 경향이 있다는 것입니다. 이는 다양한 출력이 필요한 창의적인 작업에서 특히 문제가 됩니다.
DivPO는 선호도 최적화 중에 학습 쌍을 선택하는 방식을 수정하는 방식으로 작동합니다. DivPO는 단순히 보상이 가장 높은 응답과 가장 낮은 응답을 선택하는 대신 품질 임계값을 충족하는 가장 다양한 응답을 선택하고 임계값 이하의 가장 다양하지 않은 응답과 대조합니다. 이 방법은 모델 확률, 단어 빈도 또는 LLM을 판단 기준으로 사용하는 등 다양한 방법으로 측정할 수 있는 다양성 기준을 도입합니다. 페르소나 생성 및 창의적 글쓰기 작업에 대한 실험 결과, DivPO는 기본 방법과 비교해 비슷한 품질 수준을 유지하면서 구조화된 작업에서 최대 45.6% 더 다양한 결과물을 산출하고 스토리 다양성을 81% 증가시키는 것으로 나타났습니다.

Introduces Diverse Preference Optimization (DivPO), a novel training method that aims to address the lack of diversity in language model outputs while maintaining response quality. The key challenge is that current preference optimization techniques like RLHF tend to sharpen the output probability distribution, causing models to generate very similar responses. This is particularly problematic for creative tasks where varied outputs are desired.
DivPO works by modifying how training pairs are selected during preference optimization. Rather than simply choosing the highest and lowest rewarded responses, DivPO selects the most diverse response that meets a quality threshold and contrasts it with the least diverse response below a threshold. The method introduces a diversity criterion that can be measured in different ways, including model probability, word frequency, or using an LLM as a judge. Experiments on persona generation and creative writing tasks show that DivPO achieves up to 45.6% more diverse outputs in structured tasks and an 81% increase in story diversity, while maintaining similar quality levels compared to baseline methods.

논문 초록(Abstract)

강화 학습, 선호도 최적화 또는 지도 미세 조정을 통해 언어 모델을 사후 학습하면 출력 확률 분포가 날카로워지고 생성된 응답의 다양성이 감소하는 경향이 있습니다. 이는 다양한 응답이 필요한 창의적인 생성 작업에서 특히 문제가 됩니다. 이 작업에서는 생성의 품질을 유지하면서 표준 파이프라인보다 훨씬 더 다양한 응답을 생성하는 방법을 학습하는 최적화 방법인 다양한 선호도 최적화(DivPO)를 소개합니다. DivPO에서는 먼저 응답 풀과 응답 간의 다양성 척도를 고려하여 선호도 쌍을 선택하고, 선택된 예는 더 드물지만 품질이 높은 것으로, 거부된 예는 더 흔하지만 품질이 낮은 것으로 선택합니다. DivPO를 사용하면 표준 기준과 비슷한 승률을 유지하면서 45.6% 더 다양한 페르소나 속성을 생성하고 스토리 다양성을 74.6% 증가시킬 수 있습니다.

Post-training of language models, either through reinforcement learning, preference optimization or supervised finetuning, tends to sharpen the output probability distribution and reduce the diversity of generated responses. This is particularly a problem for creative generative tasks where varied responses are desired. In this work we introduce Diverse Preference Optimization (DivPO), an optimization method which learns to generate much more diverse responses than standard pipelines, while maintaining the quality of the generations. In DivPO, preference pairs are selected by first considering a pool of responses, and a measure of diversity among them, and selecting chosen examples as being more rare but high quality, while rejected examples are more common, but low quality. DivPO results in generating 45.6% more diverse persona attributes, and an 74.6% increase in story diversity, while maintaining similar win rates as standard baselines.

논문 링크

더 읽어보기

https://x.com/jaseweston/status/1885399530419450257

DeepSeek-R1 모델에서 AI 안전성을 보장하기 위한 과제: 강화 학습 전략의 단점 / Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies

논문 소개

이 작업은 DeepSeek-R1 모델에 메시지를 표시하는 방법에 대한 일련의 권장 사항을 제공합니다. 다음은 주요 지침입니다:

프롬프트 엔지니어링:
- 명시적인 지침이 포함된 명확하고 구조화된 프롬프트 사용
- 몇 샷 프롬프트를 피하고 대신 제로 샷을 사용하세요.
출력 형식:
- 원하는 형식(JSON, 표, 마크다운)을 지정하세요.
- 추론 작업에 대한 단계별 설명 요청하기
언어:
- 혼용을 방지하기 위해 입력/출력 언어를 명시적으로 지정하세요.
  이 논문에는 다양한 모델 변형의 사용 시기, 미세 조정 시기 및 기타 안전 고려 사항도 요약되어 있습니다.

This work provides a set of recommendations for how to prompt the DeepSeek-R1 model. Below are the key guidelines:

Prompt Engineering:
- Use clear, structured prompts with explicit instructions
- Avoid few-shot prompting; use zero-shot instead
Output Formatting:
- Specify the desired format (JSON, tables, markdown)
- Request step-by-step explanations for reasoning tasks
Language:
- Explicitly specify input/output language to prevent mixing
  The paper also summarizes when to use the different model variants, when to fine-tune, and other safety considerations.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 추론, 정렬 및 작업별 성능에서 괄목할 만한 발전을 이루었습니다. 그러나 이러한 시스템에서 무해성을 보장하는 것은 특히 DeepSeek-R1과 같은 고급 모델에서 중요한 과제로 남아 있습니다. 이 논문에서는 DeepSeek-R1에서 유해한 결과물을 줄이기 위한 주요 접근 방식인 강화 학습(RL)의 한계를 살펴보고 이를 감독 미세 조정(SFT)과 비교합니다. RL은 추론 능력을 향상시키지만 보상 해킹, 일반화 실패, 언어 혼용, 높은 계산 비용과 같은 문제에 직면해 있습니다. 우리는 강력한 무해성 감소를 달성하기 위해 RL과 SFT를 결합한 하이브리드 학습 접근법을 제안합니다. 또한, 책임감 있는 DeepSeek-R1 배포를 위한 사용 권장 사항과 향후 방향도 제시합니다.

Large Language Models (LLMs) have achieved remarkable progress in reasoning, alignment, and task-specific performance. However, ensuring harmlessness in these systems remains a critical challenge, particularly in advanced models like DeepSeek-R1. This paper examines the limitations of Reinforcement Learning (RL) as the primary approach for reducing harmful outputs in DeepSeek-R1 and compares it with Supervised Fine-Tuning (SFT). While RL improves reasoning capabilities, it faces challenges such as reward hacking, generalization failures, language mixing, and high computational costs. We propose hybrid training approaches combining RL and SFT to achieve robust harmlessness reduction. Usage recommendations and future directions for deploying DeepSeek-R1 responsibly are also presented.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1884624296368292083

Docling: AI 기반 문서 변환을 위한 효율적인 오픈소스 툴킷 / Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

논문 소개

Docling은 여러 유형의 인기 있는 문서 형식을 통합된 풍부한 구조의 표현으로 구문 분석할 수 있는 오픈 소스 툴킷입니다.

Docling is an open-source toolkit that can parse several types of popular document formats into a unified, richly structured representation.

논문 초록(Abstract)

여러 유형의 인기 있는 문서 형식을 통합된 풍부한 구조의 표현으로 구문 분석할 수 있는 사용하기 쉬운 독립형 문서 변환용 오픈소스 툴킷으로, MIT 라이선스를 받은 오픈소스 툴킷인 Docling을 소개합니다. 레이아웃 분석(DocLayNet)과 표 구조 인식(TableFormer)을 위한 최첨단 전문 AI 모델로 구동되며, 적은 리소스 예산으로 상용 하드웨어에서 효율적으로 실행됩니다. Docling은 Python 패키지로 출시되며 Python API 또는 CLI 도구로 사용할 수 있습니다. Docling의 모듈식 아키텍처와 효율적인 문서 표현을 통해 확장 기능, 새로운 기능, 모델 및 사용자 지정을 쉽게 구현할 수 있습니다. Docling은 이미 다른 인기 있는 오픈소스 프레임워크(예: LangChain, LlamaIndex, spaCy)에 통합되어 있어 문서 처리와 고급 애플리케이션 개발에 자연스럽게 적합합니다. 오픈소스 커뮤니티는 Docling의 사용, 홍보, 개발에 적극 참여하여 한 달도 채 되지 않아 GitHub에서 1만 개의 별을 모았고 2024년 11월에는 전 세계 GitHub에서 인기 저장소 1위로 보고되었습니다.

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. Docling is released as a Python package and can be used as a Python API or as a CLI tool. Docling's modular architecture and efficient document representation make it easy to implement extensions, new features, models, and customizations. Docling has been already integrated in other popular open-source frameworks (e.g., LangChain, LlamaIndex, spaCy), making it a natural fit for the processing of documents and the development of high-end applications. The open-source community has fully engaged in using, promoting, and developing for Docling, which gathered 10k stars on GitHub in less than a month and was reported as the No. 1 trending repository in GitHub worldwide in November 2024.

논문 링크

다중 에이전트 강화 학습을 통한 검색-증강 생성 향상 / Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

논문 소개

이 작업은 답변 생성 품질을 개선하기 위해 RAG를 다중 에이전트 협력 작업으로 처리합니다. 쿼리 재작성, 문서 선택, 답변 생성과 같은 RAG 구성 요소를 정확한 답변을 생성하기 위해 협력하는 강화 학습 에이전트로 모델링합니다. 멀티 에이전트 프록시멀 정책 최적화(MAPPO)를 적용하여 답변 품질에 따라 공유 보상을 통해 모든 에이전트를 공동으로 최적화합니다.
이 프레임워크는 널리 사용되는 벤치마크에 대한 개선 외에도 도메인 외 시나리오에서 강력한 일반화 기능을 보여주며 다양한 RAG 시스템 구성에서 효율성을 유지합니다.

This work treats RAG as a multi-agent cooperative task to improve answer generation quality. It models RAG components like query rewriting, document selection, and answer generation as reinforcement learning agents working together toward generating accurate answers. It applies Multi-Agent Proximal Policy Optimization (MAPPO) to jointly optimize all agents with a shared reward based on answer quality.
Besides improvements on popular benchmarks, the framework shows strong generalization capabilities in out-of-domain scenarios and maintains effectiveness across different RAG system configurations.

논문 초록(Abstract)

검색-증강 생성(RAG)은 외부의 최신 지식을 대규모 언어 모델에 통합하는 데 광범위하게 활용되며, 이를 통해 환상을 최소화합니다. 표준 RAG 파이프라인은 쿼리 재작성, 문서 검색, 문서 필터링 및 답변 생성과 같은 여러 구성 요소로 구성될 수 있습니다. 그러나 이러한 구성 요소는 일반적으로 감독된 미세 조정을 통해 개별적으로 최적화되기 때문에 개별 모듈의 목표와 질문 답변(QA) 작업에서 정확한 답변을 생성하는 중요한 목표 간에 불일치가 발생할 수 있습니다. 최근 특정 RAG 구성 요소를 최적화하기 위해 강화 학습(RL)을 탐색하는 노력이 있었지만, 이러한 접근 방식은 종종 두 가지 구성 요소만 있는 지나치게 단순한 파이프라인에 초점을 맞추거나 모듈 간의 복잡한 상호 의존성과 협업 상호 작용을 적절히 다루지 못합니다. 이러한 문제를 극복하기 위해 저희는 RAG 파이프라인을 다중 에이전트 협력 작업으로 취급하고 각 구성 요소를 RL 에이전트로 간주할 것을 제안합니다. 구체적으로, 최종 정답의 F1 점수와 같은 통일된 보상을 향해 모든 에이전트의 목표를 조화시키기 위해 다중 에이전트 강화 학습을 사용하는 RAG용 다중 모듈 공동 최적화 알고리즘인 MMOA-RAG를 제시합니다. 다양한 QA 데이터 세트에 대한 실험을 통해 MMOA-RAG가 전반적인 파이프라인 성능을 개선하고 기존 기준선보다 뛰어난 성능을 발휘한다는 것이 입증되었습니다. 또한 포괄적인 제거 연구를 통해 개별 구성 요소의 기여도와 다양한 RAG 구성 요소 및 데이터 세트에 대한 MMOA-RAG의 적응성을 검증했습니다. MMOA-RAG의 코드는 GitHub - chenyiqun/MMOA-RAG: This is the code of MMOA-RAG. 에서 확인할 수 있습니다.

Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models, thereby minimizing hallucinations. A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, document filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tuning, which can lead to misalignments between the objectives of individual modules and the overarching aim of generating accurate answers in question-answering (QA) tasks. Although recent efforts have explored reinforcement learning (RL) to optimize specific RAG components, these approaches often focus on overly simplistic pipelines with only two components or do not adequately address the complex interdependencies and collaborative interactions among the modules. To overcome these challenges, we propose treating the RAG pipeline as a multi-agent cooperative task, with each component regarded as an RL agent. Specifically, we present MMOA-RAG, a Multi-Module joint Optimization Algorithm for RAG, which employs multi-agent reinforcement learning to harmonize all agents' goals towards a unified reward, such as the F1 score of the final answer. Experiments conducted on various QA datasets demonstrate that MMOA-RAG improves the overall pipeline performance and outperforms existing baselines. Furthermore, comprehensive ablation studies validate the contributions of individual components and the adaptability of MMOA-RAG across different RAG components and datasets. The code of MMOA-RAG is on GitHub - chenyiqun/MMOA-RAG: This is the code of MMOA-RAG..

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1884249075467575362

TensorLLM: LLM에서 향상된 추론과 압축을 위한 멀티 헤드 어텐션 텐서화 / TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs

논문 소개

멀티 헤드 텐서화 프로세스와 터커 분해를 통해 MHA 압축을 수행하는 프레임워크를 제안합니다. 추가 데이터, 학습 또는 미세 조정 없이도 MHA 가중치에서 최대 ∼ 250배의 압축률을 달성합니다.

Proposes a framework that performs MHA compression through a multi-head tensorisation process and the Tucker decomposition. Achieves a compression rate of up to ∼ 250x in the MHA weights, without requiring any additional data, training, or fine-tuning.

논문 초록(Abstract)

대규모 언어 모델(LLM)의 추론 능력은 구조적으로 가중치 노이즈를 제거함으로써 향상시킬 수 있지만, 기존 기법은 주로 트랜스포머 블록의 피드 포워드 네트워크(FFN)의 노이즈 제거에 집중하고 있어 트랜스포머 아키텍처의 핵심인 멀티 헤드 어텐션(MHA) 블록을 효율적으로 활용하지 못합니다. 이 문제를 해결하기 위해 저희는 멀티 헤드 텐서화 프로세스와 터커 분해를 통해 MHA 압축을 수행하는 새로운 직관적인 프레임워크를 핵심으로 하는 새로운 프레임워크를 제안합니다. 이를 통해 여러 어텐션 헤드의 가중치에 걸쳐 공유된 고차원 부분공간을 적용함으로써 고차원 구조의 노이즈 제거와 MHA 가중치 압축을 모두 수행할 수 있습니다. 이 접근 방식이 여러 벤치마크 데이터 세트와 인코더 전용 및 디코더 전용 아키텍처 모두에서 LLM의 추론 기능을 일관되게 향상시키는 동시에 추가 데이터, 학습 또는 미세 조정 없이도 MHA 가중치에서 최대 \sim 250 배의 압축률을 달성한다는 것을 입증합니다. 또한 제안한 방법을 기존의 FFN 전용 기반 노이즈 제거 기법과 원활하게 결합하여 LLM 추론 성능을 더욱 향상시킬 수 있음을 보여줍니다.

The reasoning abilities of Large Language Models (LLMs) can be improved by structurally denoising their weights, yet existing techniques primarily focus on denoising the feed-forward network (FFN) of the transformer block, and can not efficiently utilise the Multi-head Attention (MHA) block, which is the core of transformer architectures. To address this issue, we propose a novel intuitive framework that, at its very core, performs MHA compression through a multi-head tensorisation process and the Tucker decomposition. This enables both higher-dimensional structured denoising and compression of the MHA weights, by enforcing a shared higher-dimensional subspace across the weights of the multiple attention heads. We demonstrate that this approach consistently enhances the reasoning capabilities of LLMs across multiple benchmark datasets, and for both encoder-only and decoder-only architectures, while achieving compression rates of up to \sim 250 times in the MHA weights, all without requiring any additional data, training, or fine-tuning. Furthermore, we show that the proposed method can be seamlessly combined with existing FFN-only-based denoising techniques to achieve further improvements in LLM reasoning performance.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1884246306224496729

토큰버스: 토큰 변조 공간에서 다목적 다중 개념 개인화 / TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

논문 소개

학습된 개념에서 원하는 구성으로 새로운 이미지를 생성하는 새로운 기술을 제안합니다. 구글 딥마인드와 협력업체가 제안한 토큰버스는 사전 학습된 텍스트-이미지 확산 모델을 활용하여 여러 이미지에서 복잡한 시각적 개념을 분리하고 추출함으로써 다중 개념 개인화를 가능하게 합니다.
이 모델은 DiT의 변조 공간에서 작동하며 입력 캡션의 각 텍스트 토큰에 대해 개인화된 변조 벡터를 학습합니다. 이를 통해 물체, 재질, 조명, 포즈와 같은 고유한 개념을 유연하고 국지적으로 제어할 수 있습니다. 그런 다음 학습된 토큰 변조를 새로운 방식으로 결합하여 추가적인 분할 마스크 없이도 여러 개인화된 개념을 통합하는 새로운 이미지를 생성할 수 있습니다.

Proposes a new technique to generate new images from learned concepts in a desired configuration. Proposed by Google DeepMind and collaborators, TokenVerse enables multi-concept personalization by leveraging a pre-trained text-to-image diffusion model to disentangle and extract complex visual concepts from multiple images.
It operates in the modulation space of DiTs, learning a personalized modulation vector for each text token in an input caption. This allows flexible and localized control over distinct concepts such as objects, materials, lighting, and poses. The learned token modulations can then be combined in novel ways to generate new images that integrate multiple personalized concepts without requiring additional segmentation masks.

논문 초록(Abstract)

사전 학습된 텍스트-이미지 확산 모델을 활용하여 다중 개념 개인화를 위한 방법인 토큰버스를 소개합니다. 이 프레임워크는 최소 하나의 이미지에서 복잡한 시각적 요소와 속성을 추출하는 동시에 여러 이미지에서 추출한 개념 조합을 플러그 앤 플레이 방식으로 원활하게 생성할 수 있습니다. 기존 작업과 달리 토큰버스는 각각 여러 콘셉트를 가진 여러 이미지를 처리할 수 있으며, 오브젝트, 액세서리, 소재, 포즈, 조명 등 다양한 콘셉트를 지원합니다. 우리의 작업은 입력 텍스트가 어텐션과 변조(시프트 및 스케일)를 통해 생성에 영향을 미치는 DiT 기반 텍스트-이미지 모델을 활용합니다. 우리는 변조 공간이 의미론적이며 복잡한 개념을 국지적으로 제어할 수 있다는 것을 관찰했습니다. 이러한 인사이트를 바탕으로 이미지와 텍스트 설명을 입력으로 받아 변조 공간에서 각 단어의 뚜렷한 방향을 찾아내는 최적화 기반 프레임워크를 고안했습니다. 그런 다음 이러한 방향을 사용하여 학습된 개념을 원하는 구성으로 결합한 새로운 이미지를 생성할 수 있습니다. 까다로운 개인화 설정에서 토큰버스의 효과를 입증하고 기존 방식에 비해 토큰버스의 장점을 소개합니다. 프로젝트의 웹페이지 https://token-verse.github.io/ 입니다.

We present TokenVerse -- a method for multi-concept personalization, leveraging a pre-trained text-to-image diffusion model. Our framework can disentangle complex visual elements and attributes from as little as a single image, while enabling seamless plug-and-play generation of combinations of concepts extracted from multiple images. As opposed to existing works, TokenVerse can handle multiple images with multiple concepts each, and supports a wide-range of concepts, including objects, accessories, materials, pose, and lighting. Our work exploits a DiT-based text-to-image model, in which the input text affects the generation through both attention and modulation (shift and scale). We observe that the modulation space is semantic and enables localized control over complex concepts. Building on this insight, we devise an optimization-based framework that takes as input an image and a text description, and finds for each word a distinct direction in the modulation space. These directions can then be used to generate new images that combine the learned concepts in a desired configuration. We demonstrate the effectiveness of TokenVerse in challenging personalization settings, and showcase its advantages over existing methods. project's webpage in https://token-verse.github.io/

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1884618510275592610

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.*

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2025/01/27 ~ 02/02] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

OpenAI o3-mini 시스템 카드 / OpenAI o3-mini System Card

논문 소개

논문 링크

더 읽어보기

Qwen2.5-1M

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

Janus-Pro: 데이터 및 모델 확장을 통한 통합된 멀티모달 이해 및 생성 / Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

생각은 사방에 널려 있습니다: o1-Like LLM의 언더씽킹에 대하여 / Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

다양한 환경 설정 최적화 / Diverse Preference Optimization

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

DeepSeek-R1 모델에서 AI 안전성을 보장하기 위한 과제: 강화 학습 전략의 단점 / Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

Docling: AI 기반 문서 변환을 위한 효율적인 오픈소스 툴킷 / Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

논문 소개

논문 초록(Abstract)

논문 링크

다중 에이전트 강화 학습을 통한 검색-증강 생성 향상 / Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

TensorLLM: LLM에서 향상된 추론과 압축을 위한 멀티 헤드 어텐션 텐서화 / TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

토큰버스: 토큰 변조 공간에서 다목적 다중 개념 개인화 / TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR