[2024/12/23 ~ 12/29] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 12월 30, 2024, 8:43오전

[2024/12/23 ~ 12/29] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선택된 논문들을 살펴보면, 자연어 처리(NLP)와 대규모 언어 모델(LLM), 그리고 멀티모달 학습 분야에서 많은 관심이 있었음을 알 수 있습니다. 예를 들어, 'Large Concept Models', 'ModernBERT', 'A Survey on LLM Inference-Time Self-Improvement', 'LearnLM'과 같은 제목의 논문은 NLP와 LLM에 대한 연구를 집중적으로 다루고 있습니다. 또한, 'Empowering MLLM with o1-like Reasoning and Reflection' 논문은 멀티모달 학습의 가능성을 제시하는 것으로 보입니다.
이번 주의 트렌드는 대규모 데이터와 복잡한 모델을 활용한 다방면의 언어 처리 연구에 있다 할 수 있습니다. 이는 점점 복잡해지는 언어 이해와 생성을 요구하는 현실에 부합하기 위한 자연스러운 흐름입니다. 특히, LLM의 효율적인 추론과 자가 개선에 대한 관심이 많아지고 있는 것은 이러한 모델들이 실제 환경에서 얼마나 효과적으로 활용될 수 있는지를 고민하는 연구자들의 입장을 반영한 결과라 할 수 있습니다.
또한, 멀티모달 학습에 대한 연구가 활발해지고 있는 것은 인공지능 시스템이 언어뿐만 아니라 이미지, 소리 등 다양한 입력을 상대방 환경과 상호작용할 수 있게 하려는 최근의 경향을 나타내고 있습니다. 이로 인해, 이전보다 더 통합적이고 직관적인 AI 시스템의 개발이 점쳐지고 있습니다. 이러한 연구들은 다양한 응용 분야에서 인공지능의 가능성을 확장시키고, 실용적인 응답을 제공하는 데 중요한 역할을 할 것입니다.

DeepSeek-V3

논문 소개

토큰당 37억 개의 파라미터를 활성화하는 671억 개의 파라미터 MoE 언어 모델로, 효율적인 운영을 위해 MLA 및 DeepSeekMoE 아키텍처를 활용하고, 보조 손실 없는 로드 밸런싱 접근 방식을 도입하고, 성능 향상을 위해 훈련 중에 다중 토큰 예측을 사용하며, 14일에 사전 훈련 후, 8조 개의 토큰이 필요합니다.8조 개의 토큰을 대상으로 SFT와 RL 단계를 거친 이 모델은 다른 오픈 소스 모델을 능가하는 동시에 주요 폐쇄형 모델과 비슷한 성능을 달성했으며, 복구 불가능한 손실 급증을 방지하는 안정적인 훈련으로 훈련에 278만 H800 GPU 시간만 필요로 합니다.

A 671B-parameter MoE language model that activates 37B parameters per token, utilizing MLA and DeepSeekMoE architectures for efficient operation; it introduces an auxiliary-loss-free load balancing approach and employs multi-token prediction during training to enhance performance; following pre-training on 14.8 trillion tokens, the model underwent SFT and RL stages, achieving performance comparable to leading closed-source models while surpassing other open-source alternatives; the model requires only 2.788M H800 GPU hours for training, with stable training that avoids any irrecoverable loss spikes.

논문 링크

더 읽어보기

https://x.com/deepseek_ai/status/1872242657348710721

대형 컨셉 모델 / Large Concept Models

논문 소개

현재 LLM의 일반적인 토큰 수준 처리를 넘어 개념이라고 하는 문장 수준의 의미 표현에서 작동하는 접근 방식을 제시합니다. 이 모델은 SONAR 문장 임베딩을 활용하여 텍스트 및 음성 양식에 걸쳐 200개 언어를 지원하고, MSE 회귀에서 확산 기반 생성에 이르는 다양한 접근 방식을 사용하여 자동 회귀 문장 예측을 학습합니다. 각각 1.3T 및 7.7T 토큰으로 훈련된 1.6B 및 7B 파라미터 변형 실험에서 요약 및 요약 확장 같은 생성 작업에 강력한 성능을 입증했습니다.

Presents an approach that operates on sentence-level semantic representations called concepts, moving beyond token-level processing typical in current LLMs; the model leverages SONAR sentence embeddings to support 200 languages across text and speech modalities, training on autoregressive sentence prediction using various approaches from MSE regression to diffusion-based generation; experiments with both 1.6B and 7B parameter variants trained on 1.3T and 7.7T tokens respectively demonstrate strong performance on generative tasks like summarization and summary expansion.

논문 링크

더 읽어보기

https://x.com/AIatMeta/status/1871263650935365759

더 스마트하고, 더 좋고, 더 빠르고, 더 오래: 빠르고 메모리 효율적이며 긴 컨텍스트 미세 조정 및 추론을 위한 최신 양방향 인코더 / Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

논문 소개

분류 및 검색 작업에서 최첨단 성능을 달성하는 동시에 이전 인코더보다 더 효율적인 새로운 인코더 전용 트랜스포머 모델로, 8192 시퀀스 길이의 2T 토큰으로 훈련되었으며 BERT보다 크게 향상된 최신 최적화를 통합하고, 실제 배포를 위해 특별히 설계되어 일반 GPU에서 뛰어난 속도와 메모리 효율성을 제공합니다.

A new encoder-only transformer model that achieves state-of-the-art performance on classification and retrieval tasks while being more efficient than previous encoders; it was trained on 2T tokens with 8192 sequence length and incorporates modern optimizations that represent a significant improvement over BERT; the model is specifically designed for practical deployment, offering superior speed and memory efficiency on common GPUs.

논문 초록(Abstract)

BERT와 같은 인코더 전용 트랜스포머 모델은 대형 디코더 전용 모델에 비해 검색 및 분류 작업에서 성능과 크기 면에서 뛰어난 절충점을 제공합니다. 수많은 생산 파이프라인의 주력 모델임에도 불구하고 BERT는 출시 이후 파레토 개선이 제한적이었습니다. 이 백서에서는 인코더 전용 모델에 최신 모델 최적화를 적용하고 구형 인코더에 비해 파레토를 크게 개선한 ModernBERT를 소개합니다. 기본 8192 시퀀스 길이의 2조 개 토큰으로 훈련된 ModernBERT 모델은 다양한 분류 작업과 다양한 도메인(코드 포함)에서 단일 및 다중 벡터 검색을 모두 아우르는 대규모 평가 풀에서 최신 결과를 보여줍니다. 강력한 다운스트림 성능 외에도, ModernBERT는 속도와 메모리 효율성이 가장 뛰어난 인코더이며 일반적인 GPU에서 추론할 수 있도록 설계되었습니다.

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

논문 링크

더 읽어보기

https://x.com/jeremyphoward/status/1869786023963832509

파운데이션 모델을 사용한 인공 생명체 검색 자동화 / Automating the Search for Artificial Life with Foundation Models

논문 소개

이 시스템은 기초 모델을 사용하여 Boids, 레니아, 게임 오브 라이프와 같은 여러 플랫폼에서 흥미로운 인공 생명체 시뮬레이션을 자동으로 발견하고, 특정 목표 행동을 생성하는 시뮬레이션을 찾고, 시간적으로 개방된 참신함을 생성하는 시뮬레이션을 발견하고, 다양한 시뮬레이션 공간을 매핑할 수 있으며, 레니아와 보이드에서 새로운 생명체를 발견하는 동시에 이전에 정성적이었던 현상을 인간에 맞춘 방식으로 정량적으로 측정하는 새로운 접근방식을 제시합니다.

Presents a new approach that uses foundation models to automatically discover interesting artificial life simulations across multiple platforms like Boids, Lenia, and Game of Life; the system can find simulations that produce specific target behaviors, discovers simulations that generate temporally open-ended novelty, and map out diverse simulation spaces; it discovers new lifeforms in Lenia and Boids, while also enabling quantitative measurement of previously qualitative phenomena in a human-aligned way.

논문 초록(Abstract)

최근 단백질 발견의 급진적인 발전으로 노벨상이 수여되면서 대규모 조합 공간을 탐색하기 위한 기초 모델(FM)은 많은 과학 분야에 혁명을 일으킬 것으로 기대되고 있습니다. 인공 생명(ALife)은 아직 FM을 통합하지 않았기 때문에 주로 수동 설계와 시행착오에 의존하여 실제와 같은 시뮬레이션의 구성을 발견해야 했던 과거의 부담을 완화할 수 있는 중요한 기회를 제공하고 있습니다. 이 백서에서는 비전 언어 FM을 사용하여 이 기회를 성공적으로 실현한 사례를 처음으로 소개합니다. 자동화된 인공 생명 검색(ASAL)이라고 불리는 이 접근 방식은 (1) 목표 현상을 생성하는 시뮬레이션을 찾고, (2) 시간적으로 개방된 참신함을 생성하는 시뮬레이션을 발견하며, (3) 흥미롭고 다양한 시뮬레이션의 전체 공간을 조명합니다. FM의 범용성 덕분에 ASAL은 보이드, 파티클 라이프, 게임 오브 라이프, 레니아, 뉴럴 셀룰러 오토마타 등 다양한 ALife 서브스트레이트에서 효과적으로 작동합니다. 이 기술의 잠재력을 보여주는 주요 결과로는 이전에 볼 수 없었던 레니아 및 보이드 생명체와 콘웨이의 게임 오브 라이프와 같은 개방형 셀룰러 오토마타가 발견되었다는 점이 있습니다. 또한 FM을 사용하면 이전에는 정성적이었던 현상을 인간과 같은 방식으로 정량화할 수 있습니다. 이 새로운 패러다임은 인간의 독창성만으로는 불가능했던 ALife 연구를 가속화할 것입니다.

With the recent Nobel Prize awarded for radical advances in protein discovery, foundation models (FMs) for exploring large combinatorial spaces promise to revolutionize many scientific fields. Artificial Life (ALife) has not yet integrated FMs, thus presenting a major opportunity for the field to alleviate the historical burden of relying chiefly on manual design and trial-and-error to discover the configurations of lifelike simulations. This paper presents, for the first time, a successful realization of this opportunity using vision-language FMs. The proposed approach, called Automated Search for Artificial Life (ASAL), (1) finds simulations that produce target phenomena, (2) discovers simulations that generate temporally open-ended novelty, and (3) illuminates an entire space of interestingly diverse simulations. Because of the generality of FMs, ASAL works effectively across a diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. A major result highlighting the potential of this technique is the discovery of previously unseen Lenia and Boids lifeforms, as well as cellular automata that are open-ended like Conway's Game of Life. Additionally, the use of FMs allows for the quantification of previously qualitative phenomena in a human-aligned way. This new paradigm promises to accelerate ALife research beyond what is possible through human ingenuity alone.

논문 링크

더 읽어보기

https://x.com/SakanaAILabs/status/1871385917342265592

LLM 추론 시간 자기 개선에 관한 설문 조사 / A Survey on LLM Inference-Time Self-Improvement

논문 소개

향상된 디코딩과 같은 독립적인 방법, 외부 데이터를 사용하는 문맥 인식 접근 방식, 모델 협업 전략 등 세 가지 범주의 LLM 추론 시간 자체 개선 기술을 분석하는 설문조사를 발표합니다.

Presents a survey that analyzes three categories of LLM inference-time self-improvement techniques - independent methods like enhanced decoding, context-aware approaches using external data, and model collaboration strategies.

논문 초록(Abstract)

최근 테스트 시간에 계산을 늘려 추론 능력을 향상시키는 기술이 주목받고 있습니다. 이번 설문조사에서는 세 가지 관점에서 LLM 추론-시간 자기 개선의 현황을 조사합니다: 디코딩 또는 샘플링 방법을 통한 개선에 초점을 맞춘 독립적 자체 개선, 추가 컨텍스트 또는 데이터스토어를 활용하는 컨텍스트 인식 자체 개선, 모델 협업을 통해 개선을 달성하는 모델 지원 자체 개선. 최근의 관련 연구를 종합적으로 검토하고, 심층적인 분류법을 제시하며, 과제와 한계에 대해 논의하여 향후 연구를 위한 인사이트를 제공합니다.

Techniques that enhance inference through increased computation at test-time have recently gained attention. In this survey, we investigate the current state of LLM Inference-Time Self-Improvement from three different perspectives: Independent Self-improvement, focusing on enhancements via decoding or sampling methods; Context-Aware Self-Improvement, leveraging additional context or datastore; and Model-Aided Self-Improvement, achieving improvement through model collaboration. We provide a comprehensive review of recent relevant studies, contribute an in-depth taxonomy, and discuss challenges and limitations, offering insights for future research.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1870129825282658752

마음 이론 살펴보기 / Explore Theory-of-Mind

논문 소개

A* 검색을 사용해 다양하고 복잡한 이론적 시나리오를 생성하는 프레임워크인 ExploreToM을 도입하여 현재 LLM의 소셜 인텔리전스 기능에 상당한 한계를 드러냄; 테스트 결과 GPT-4 및 Llama-3와 같은 고급 모델도 단순한 벤치마크에서는 강력한 성능을 보이지만 이러한 까다로운 시나리오에서는 정확도가 5% 정도로 낮은 것으로 나타남; ExploreToM 데이터를 미세 조정하여 기존 벤치마크의 성능을 27포인트 향상시킴.

Introduces ExploreToM, a framework that uses A* search to generate diverse, complex theory-of-mind scenarios that reveal significant limitations in current LLMs' social intelligence capabilities; testing showed even advanced models like GPT-4 and Llama-3 perform poorly (as low as 5% accuracy) on these challenging scenarios, despite their strong performance on simpler benchmarks; fine-tuning on ExploreToM data improved performance on existing benchmarks by 27 points.

논문 링크

더 읽어보기

https://x.com/AIatMeta/status/1869457933727416375

LearnLM / LearnLM

논문 소개

교육적 지침을 따를 수 있는 새로운 LearnLM 모델로, 단순히 정보를 제시하는 데 그치지 않고 지정된 교육적 필요에 따라 교육 방식을 조정할 수 있습니다. 실험 결과에 따르면 LearnLM은 다른 주요 모델보다 선호도가 높아 GPT-4보다 31%, Claude 3.5보다 11%, Gemini 1.5 Pro보다 13% 높은 성능을 보였으며, 이러한 지침 따르는 접근 방식은 단일 교육 프레임워크에 집착하지 않고 대신 교사 및 개발자가 원하는 교수법을 지정하는 동시에 다른 기능과 함께 지속적으로 개선할 수 있게 해줍니다.

A new LearnLM model that can follow pedagogical instructions, allowing it to adapt its teaching approach based on specified educational needs rather than defaulting to simply presenting information; experimental results show that LearnLM is preferred over other leading models, outperforming GPT-4 by 31%, Claude 3.5 by 11%, and Gemini 1.5 Pro by 13%; this instruction-following approach avoids committing to a single pedagogical framework, instead enabling teachers and developers to specify their desired teaching behaviors while allowing for continuous improvement alongside other capabilities.

논문 링크

더 읽어보기

https://x.com/Google/status/1869798188233699346

Mulberry: 멀베리: 집단 몬테카를로 트리 검색을 통한 o1과 같은 추론과 성찰로 MLLM의 역량 강화 / Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

논문 소개

다중 모드 언어 모델이 여러 모델의 집단 지식을 활용하여 단계별 추론 능력을 개발할 수 있는 새로운 학습 추론 방법인 CoMCTS를 제안하고, 이 방법을 사용하여 명시적 추론 트리가 포함된 데이터 세트인 Mulberry-260k를 생성한 다음, Mulberry 모델 시리즈를 학습시키는 데 사용했습니다. 이 방법은 벤치마크에서 강력한 성능을 보여주며 모델이 향상된 추론 및 반영 기능을 보여주었습니다.

Proposes a new learning-to-reason method called CoMCTS that enables multimodal language models to develop step-by-step reasoning capabilities by leveraging collective knowledge from multiple models; the approach was used to create Mulberry-260k, a dataset with explicit reasoning trees, which was then used to train the Mulberry model series; the method demonstrates strong performance on benchmarks, with the models showing improved reasoning and reflection capabilities.

논문 초록(Abstract)

본 연구에서는 최종 해답에 이르는 추론의 각 중간 단계를 학습하여 문제를 이해하고 해결하는 MLLM을 개발하는 것을 목표로 합니다. 이를 위해 본 연구에서는 효과적이고 효율적인 추론 경로 탐색 및 학습을 위해 '트리 검색'에 집단 학습 개념을 도입한 MLLM의 새로운 추론 학습 방법인 집단 몬테카를로 트리 검색(CoMCTS)을 제안합니다. CoMCTS의 핵심 아이디어는 여러 모델의 집단 지식을 활용하여 확장, 시뮬레이션 및 오류 위치 지정, 역전파, 선택 등 네 가지 반복 작업을 통해 정답을 향한 효과적인 추론 경로를 공동으로 추측, 탐색 및 식별하는 것입니다. 각 질문에 대해 풍부하고 명확하며 잘 정의된 추론 노드로 구성된 트리를 갖춘 멀티모달 데이터 세트인 Mulberry-260k를 CoMCTS를 사용하여 구축합니다. Mulberry-260k를 통해 집단 SFT를 수행하여 o1과 유사한 단계별 추론 및 반영 기능을 갖춘 일련의 MLLM인 Mulberry 모델을 훈련합니다. 광범위한 실험을 통해 다양한 벤치마크에서 제안된 방법의 우수성을 입증했습니다. 코드는 GitHub - HJYao00/Mulberry: Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS 에서 확인할 수 있습니다

In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose Collective Monte Carlo Tree Search (CoMCTS), a new learning-to-reason method for MLLMs, which introduces the concept of collective learning into ``tree search'' for effective and efficient reasoning-path searching and learning. The core idea of CoMCTS is to leverage collective knowledge from multiple models to collaboratively conjecture, search and identify effective reasoning paths toward correct answers via four iterative operations including Expansion, Simulation and Error Positioning, Backpropagation, and Selection. Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. With Mulberry-260k, we perform collective SFT to train our model, Mulberry, a series of MLLMs with o1-like step-by-step Reasoning and Reflection capabilities. Extensive experiments demonstrate the superiority of our proposed methods on various benchmarks. Code will be available at GitHub - HJYao00/Mulberry: Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

논문 링크

더 읽어보기

https://x.com/_akhaliq/status/1872326647606841651

강화 학습: 강화 학습: 개요 / Reinforcement Learning: An Overview

논문 소개

강화 학습에 대한 포괄적인 개요를 제공합니다.

Presents a comprehensive overview of reinforcement learning.

논문 초록(Abstract)

이 원고에서는 (심층) 강화 학습과 순차적 의사 결정 분야에 대한 큰 그림과 최신 개요를 제공하며, 가치 기반 RL, 정책 점진적 방법, 모델 기반 방법 및 기타 다양한 주제(RL+LLM에 대한 매우 간략한 논의 포함)를 다룹니다.

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics (including a very brief discussion of RL+LLMs).

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1866123264965419460

DRT-o1: 긴 사고 사슬을 통한 최적화된 심층 추론 번역 / DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

논문 소개

특히 다양한 문화권의 은유와 직유를 처리하기 위해 기계 번역에 긴 사고 연쇄 추론을 적용하고, 번역가가 고문 및 평가자와 반복적으로 작업하는 다중 에이전트 프레임워크를 사용하여 더 나은 번역을 생성하며, Qwen2.5 모델을 사용한 테스트에서 DRT-o1-7B가 QwQ-32B-Preview 같은 대형 모델보다 뛰어난 성능을 보여 BLEU 및 CometScore 메트릭이 크게 개선됨을 확인했습니다.

Applies long chain-of-thought reasoning to machine translation, particularly for handling metaphors and similes across different cultures; the system uses a multi-agent framework with a translator working iteratively with an advisor and evaluator to produce better translations; testing with Qwen2.5 models showed significant improvements in BLEU and CometScore metrics, with DRT-o1-7B outperforming larger models like QwQ-32B-Preview.

논문 초록(Abstract)

최근 수학이나 코딩 작업과 같은 추론 작업에서 긴 생각 연쇄(CoT)의 효과를 보여주는 O1과 같은 모델이 대표적인 사례로 떠오르고 있습니다. 이 백서에서는 긴 CoT의 성공을 신경 기계 번역(MT)에 적용하려는 시도인 DRT-o1을 소개합니다. 특히 직유와 은유가 포함된 문학 서적의 경우, 이러한 텍스트를 대상 언어로 번역하는 것은 문화적 차이로 인해 현실적으로 매우 어렵습니다. 이러한 경우 문자 그대로의 번역은 의도한 의미를 효과적으로 전달하지 못하는 경우가 많습니다. 전문 번역가라 하더라도 번역 과정 전반에 걸쳐 의미를 보존하는 데 상당한 주의를 기울여야 합니다. MT에서 LLM의 긴 사고 능력을 시뮬레이션하기 위해 먼저 기존 문학 서적에서 직유나 은유가 포함된 문장을 채굴한 다음, 이러한 문장을 긴 사고를 통해 번역할 수 있는 다중 에이전트 프레임워크를 개발합니다. 다중 에이전트 프레임워크에서 번역기는 어드바이저가 제공하는 제안에 따라 소스 문장을 반복적으로 번역하는 데 사용됩니다. 긴 생각의 효과를 보장하기 위해 평가자가 현재 라운드의 번역이 이전 번역보다 더 나은지 여부를 판단하는 역할도 수행합니다. 이러한 방식으로 수만 건의 긴 생각 MT 데이터를 수집하여 DRT-o1을 훈련하는 데 사용합니다. 문헌 번역에 대한 실험 결과는 DRT-o1의 효과를 입증합니다. Qwen2.5-7B와 Qwen2.5-14B를 백본으로 사용한 결과, DRT-o1이 가져온 개선은 7.33~8.26 BLEU와 1.66~3.36 CometScore를 달성했습니다. 또한 DRT-o1-7B는 7.82 BLEU와 1.46 CometScore로 QwQ-32B-Preview를 능가하는 성능을 보여줌으로써 그 효과를 입증했습니다. 프로젝트는 GitHub - krystalan/DRT-o1: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought 에서 확인할 수 있습니다

Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an attempt to bring the success of long CoT to neural machine translation (MT). Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences. In such cases, literal translation often fails to convey the intended meaning effectively. Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process. To simulate LLMs' long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought. In the multi-agent framework, a translator is used to iteratively translate the source sentence under the suggestions provided by an advisor. To ensure the effectiveness of the long thoughts, an evaluator is also employed to judge whether the translation in the current round is better than the previous one or not. In this manner, we collect tens of thousands of long-thought MT data, which is used to train our DRT-o1. The experimental results on literature translation demonstrate the effectiveness of the DRT-o1. Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its effectiveness. The project is available at GitHub - krystalan/DRT-o1: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

논문 링크

더 읽어보기

https://x.com/_akhaliq/status/1871455986189574320

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.*

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2024/12/23 ~ 12/29] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

DeepSeek-V3

논문 소개

논문 링크

더 읽어보기

대형 컨셉 모델 / Large Concept Models

논문 소개

논문 링크

더 읽어보기

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

파운데이션 모델을 사용한 인공 생명체 검색 자동화 / Automating the Search for Artificial Life with Foundation Models

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

LLM 추론 시간 자기 개선에 관한 설문 조사 / A Survey on LLM Inference-Time Self-Improvement

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

마음 이론 살펴보기 / Explore Theory-of-Mind

논문 소개

논문 링크

더 읽어보기

LearnLM / LearnLM

논문 소개

논문 링크

더 읽어보기

Mulberry: 멀베리: 집단 몬테카를로 트리 검색을 통한 o1과 같은 추론과 성찰로 MLLM의 역량 강화 / Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

강화 학습: 강화 학습: 개요 / Reinforcement Learning: An Overview

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

DRT-o1: 긴 사고 사슬을 통한 최적화된 심층 추론 번역 / DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR