[2024/06/24 ~ 06/30] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 7월 1, 2024, 2:32오전

[2024/06/24 ~ 06/30] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선정된 논문들을 살펴보면, 주목할 만한 추세는 대규모 언어 모델(LLM: Large Language Models)의 사용 및 개선에 초점을 맞춘 연구들이 많다는 것입니다. 예를 들어, "LLM Compiler", "Enhancing RAG with Long-Context LLMs", "Improving Retrieval in LLMs through Synthetic Data", "Faster LLM Inference with Dynamic Draft Trees" 및 "On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation"과 같은 논문들은 모두 언어 모델의 성능 향상, 효율적인 질의 응답 시스템의 구축, 데이터 검색 및 생성 과정에서의 개선 등을 다루고 있습니다. 이러한 추세는 자연어 처리(NLP) 분야에서 대용량 언어 모델의 중요성이 계속해서 증가하고 있음을 반영합니다.
더불어, 대규모 언어 모델을 개선하고 활용하기 위한 새로운 기법들과 접근 방식의 다양화가 관찰됩니다. 예를 들어, "Following Length Constraints in Instructions" 및 "GraphReader"와 같은 논문은 언어 모델이 더 구체적이고 효율적인 방식으로 정보를 처리하고 제공할 수 있게 하는 새로운 방법론을 제시하고 있습니다. 이는 언어 모델이 단순한 텍스트 생성을 넘어서 보다 복잡한 태스크를 수행할 수 있게 되며, 이를 통해 사용자에게 더 정확하고 유용한 정보를 제공할 수 있게 되었음을 시사합니다.
이러한 추세는 AI 기술의 빠른 발전과 함께 언어 모델의 활용 범위가 점점 넓어지고 있음을 의미합니다. 특히, 대규모 언어 모델의 발전은 자연어 이해, 질의 응답 시스템, 문서 요약, 데이터 생성 및 가공 등 다양한 분야에서 혁신을 가능하게 하며, 앞으로도 계속해서 중요한 연구 주제가 될 것으로 예상됩니다. 이는 언어 모델의 성능을 끊임없이 개선하고, 더 다양하고 복잡한 문제를 해결할 수 있는 AI 시스템을 개발하는 데 중요한 역할을 할 것입니다.

ESM3: 언어 모델로 5억년의 진화 시뮬레이션 하기 / Simulating 500 million years of evolution with a language model

논문 소개

새로운 녹색 형광 단백질을 생성하는 새로운 LLM 기반 생물학적 모델, 양방향 변환기 기반, 목적 함수에 마스크 언어 모델 사용, 원자 좌표 표현을 위한 기하학적 주의 활용, 형광 단백질 생성을 위한 연쇄적 사고 유도 적용, 진화 시뮬레이터가 수행한 5억 년 이상의 자연 진화에 해당하는 것으로 추정되는 esmGFP를 생성하는 새로운 LLM 기반 생물학적 모델입니다.

A new LLM-based biological model that generates a new green fluorescent protein called esmGFP; builds on a bidirectional transformer, uses masked language models for the objective function, leverages geometric attention to represent atomic coordinates, and applies chain-of-thought prompting to generate fluorescent proteins; estimates that esmGFP represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator.

논문 초록 (Abstract)

30억 년 이상의 진화를 통해 자연 단백질의 공간에 인코딩된 생물학적 이미지를 자연 단백질의 공간. 여기서는 진화에 의해 생성된 토큰으로 훈련된 언어 모델이 진화 시뮬레이터로 작동하여 알려진 단백질과는 거리가 먼 기능적 단백질을 생성할 수 있음을 보여줍니다. 기능적 단백질을 생성할 수 있음을 보여줍니다. 새로운 차원의 단백질의 서열, 구조, 기능을 추론하는 멀티모달 생성 언어 모델인 추론하는 최첨단의 다중 모드 생성 언어 모델입니다. ESM3는 복잡한 프롬프트를 따라갈 수 있으며 복잡한 프롬프트를 따라갈 수 있으며 생물학적 정렬에 매우 민감합니다. 우리는 ESM3 형광 단백질을 생성하도록 유도했습니다. 생각. 우리가 합성한 세대 중, 우리는 알려진 형광 단백질과 멀리 떨어진 곳에서 밝은 형광 단백질을 발견했습니다. 알려진 형광 단백질과 58%의 동일성을 가진 단백질. 이와 비슷하게 멀리 떨어져 있는 천연 형광 단백질은 5억 년의 진화를 통해 5억 개 이상의 5억 년의 진화를 거쳤습니다.

More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to biological alignment. We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.

논문 링크

더 읽어보기

https://x.com/alexrives/status/1805559211394277697

젬마 2 / Gemma 2

논문 소개

2B~27B 매개변수 범위의 개방형 모델 제품군을 제공하며, 추론, 수학 및 코드 생성에서 강력한 기능을 발휘하여 2배 크기의 모델보다 뛰어난 성능을 발휘합니다.

Presents a family of open models ranging between 2B to 27B parameters; demonstrates strong capabilities in reasoning, math, and code generation, outperforming models twice its size.

논문 초록 (Abstract)

이번 연구에서는 20억 개에서 270억 개의 매개변수 규모로 구성된 가벼운 최신 개방형 모델인 Gemma 제품군에 새롭게 추가된 Gemma 2를 소개합니다. 90억 개 및 270억 개 매개변수 모델은 현재 사용 가능하며, 20억 개 매개변수 모델은 곧 출시될 예정입니다. 이 새 버전에서는 로컬-글로벌 주의(Beltagy 외., 2020a) 및 그룹 쿼리 주의(Ainslie 외., 2023) 인터리빙과 같은 몇 가지 기술적 수정 사항을 아키텍처에 제공합니다. 또한 다음 토큰 예측 대신 지식 증류(Hinton et al., 2015)를 통해 2B 및 9B 모델을 훈련합니다. 결과 모델은 규모에 비해 최고의 성능을 제공하며, 심지어 2~3배 더 큰 모델에 대한 경쟁력 있는 대안을 제공합니다. 저희는 모든 모델을 커뮤니티에 공개합니다.

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. The 9 billion and 27 billion parameter models are available today, with a 2 billion parameter model to be released shortly. In this new version, we provide several technical modifications to our architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3× bigger. We release all our models to the community.

논문 링크

더 읽어보기

Google, Gemma2 모델군(9B&27B) 공개 읽을거리&정보공유

Google, Gemma-2(9B&27B) 모델 공개 [Google, Gemma-2(9B&27B) 모델 공개] Gemma 2 모델군 소개 Google이 새롭게 공개(2024/06/27)한 Gemma 2 시리즈는 연구자와 개발자를 위한 차세대 오픈 모델로, 9B 및 27B 규모의 매개변수 크기들로 제공됩니다. 이 모델은 첫 번째 세대보다 더 높은 성능과 효율성을 자랑하며, 단일 NVIDIA H100 Tensor Core GPU 또는 TPU 호스트에서 실행할 수 있어 배포 비용을 크게 절감합니다. [Gemma 2 성능 비교] Gemma 2는 뛰어난 성능과 추론 효율성을 위해 새롭게 설계된 아키텍처를 기반으로 하며, 다음과 같은 특징을 갖습니다: 탁월한 성능: Gemma 2 27B 모델은 크기 대비 최고의 성능을 제공하며, 9B 모델도 동급 최고 성능을 자랑합니다. 효율성과 비용 절감: Gemma 2는 단일 Google Cloud TPU 호스트 또는 NVIDIA A100/…

https://x.com/omarsar0/status/1806352449956958501

LLM 컴파일러 / LLM Compiler

논문 소개

코드 최적화 작업을 위해 설계된 사전 학습된 개방형 모델(7B 및 13B 매개변수)의 모음으로, Code Llama를 기반으로 구축되었으며 5,460억 개의 LLVM-IR 및 어셈블리 코드 토큰으로 구성된 코퍼스에서 학습되었고, 인터프리터 컴파일러 동작에 맞게 미세 조정된 명령어, 자동 튜닝 검색의 77% 최적화 가능성을 달성하고 학습된 자동 튜닝 기술에 비해 14%의 정확한 디스어셈블링을 수행합니다.

A suite of open pre-trained models (7B and 13B parameters) designed for code optimization tasks; it’s built on top of Code Llama and trained on a corpus of 546 billion tokens of LLVM-IR and assembly code; it’s also instruction fine-tuned to interpreter compiler behavior; achieves 77% of the optimizing potential of autotuning search and performs accurate disassembling 14% of the time compared to the autotuning technique on which it was trained.

논문 초록 (Abstract)

대규모 언어 모델(LLM)은 다양한 소프트웨어 엔지니어링 및 코딩 작업에서 놀라운 기능을 입증해 왔습니다. 그러나 코드 및 컴파일러 최적화 영역에서의 적용은 아직 충분히 연구되지 않은 상태입니다. LLM을 훈련하려면 상당한 GPU 시간과 광범위한 데이터 수집이 필요하기 때문에 리소스 집약적일 수 있습니다. 이러한 격차를 해소하기 위해 유니티는 코드 최적화 작업을 위해 특별히 설계된 강력하고 공개적으로 사용 가능한 사전 학습된 모델 모음인 Meta Large Language Model Compiler(LLM 컴파일러)를 도입했습니다. Code Llama를 기반으로 구축된 LLM Compiler는 컴파일러 중간 표현(IR), 어셈블리 언어 및 최적화 기법에 대한 이해를 높여줍니다. 이 모델은 5,460억 개의 토큰으로 구성된 방대한 LLVM-IR 및 어셈블리 코드 코퍼스를 학습했으며 컴파일러 동작을 해석하기 위해 명령어 미세 조정을 거쳤습니다. LLM 컴파일러는 폭넓은 재사용이 가능하도록 맞춤형 상용 라이선스로 출시되며 두 가지 크기로 제공됩니다: 70억 개와 130억 개의 매개변수. 또한 코드 크기를 최적화하고 x86_64 및 ARM 어셈블리에서 LLVM-IR로 다시 분해하는 향상된 기능을 보여주는 미세 조정된 버전의 모델도 선보입니다. 이를 통해 오토튜닝 검색의 최적화 잠재력의 77%, 디스어셈블 왕복 45%(정확도 14%)를 달성할 수 있습니다. 이번 릴리스는 학계 연구자와 업계 실무자 모두의 컴파일러 최적화 연구 및 개발을 위한 확장 가능하고 비용 효율적인 기반을 제공하는 것을 목표로 합니다.

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training LLMs is resource-intensive, requiring substantial GPU hours and extensive data collection, which can be prohibitive. To address this gap, we introduce Meta Large Language Model Compiler (LLM Compiler), a suite of robust, openly available, pre-trained models specifically designed for code optimization tasks. Built on the foundation of Code Llama, LLM Compiler enhances the understanding of compiler intermediate representations (IRs), assembly language, and optimization techniques. The model has been trained on a vast corpus of 546 billion tokens of LLVM-IR and assembly code and has undergone instruction fine-tuning to interpret compiler behavior. LLM Compiler is released under a bespoke commercial license to allow wide reuse and is available in two sizes: 7 billion and 13 billion parameters. We also present fine-tuned versions of the model, demonstrating its enhanced capabilities in optimizing code size and disassembling from x86_64 and ARM assembly back into LLVM-IR. These achieve 77% of the optimising potential of an autotuning search, and 45% disassembly round trip (14% exact match). This release aims to provide a scalable, cost-effective foundation for further research and development in compiler optimization by both academic researchers and industry practitioners.

논문 링크

더 읽어보기

https://x.com/AIatMeta/status/1806361623831171318

LongRAG: 긴 컨텍스트 LLM을 통한 검색 증강 생성 향상 / LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

논문 소개

RAG와 긴 문맥 LLM을 결합하여 성능을 향상시키고, 롱 리트리버를 사용하여 긴 검색 단위에서 작동함으로써 추출 단위 수를 크게 줄이고, 롱 리더가 긴 검색 단위를 받아들이고 긴 문맥 LLM의 제로 샷 답변 추출 기능을 활용하여 전체 시스템의 성능을 향상시키고, 핫팟QA(전체 위키)에서 최신 모델과 동등한 64.3%를 달성한다고 주장하는 LongRAG를 제안합니다.

Proposes LongRAG, which combines RAG with long-context LLMs to enhance performance; uses a long retriever to significantly reduce the number of extracted units by operating on longer retrieval units; the long reader takes in the long retrieval units and leverages the zero-shot answer extraction capability of long-context LLMs to improve performance of the overall system; claims to achieve 64.3% on HotpotQA (full-wiki), which is on par with the state-of-the-art model.

논문 초록(Abstract)

전통적인 RAG 프레임워크에서 기본 검색 단위는 일반적으로 짧습니다. DPR과 같은 일반적인 검색기는 일반적으로 100단어 위키백과 단락으로 작동합니다. 이러한 설계는 리트리버가 '바늘' 단위를 찾기 위해 대규모 코퍼스를 검색하도록 합니다. 반대로 독자는 검색된 짧은 단위에서 답을 추출하기만 하면 됩니다. 이러한 불균형적인 '무거운' 리트리버와 '가벼운' 리더 설계는 최적의 성능에 미치지 못하는 결과를 초래할 수 있습니다. 이러한 불균형을 완화하기 위해 우리는 '긴 리트리버'와 '긴 리더'로 구성된 새로운 프레임워크 LongRAG를 제안합니다. LongRAG는 위키백과 전체를 기존보다 30배 더 긴 4K 토큰 단위로 처리합니다. 단위 크기를 늘림으로써 총 단위를 2200만 개에서 70만 개로 크게 줄였습니다. 이를 통해 리트리버의 부담이 크게 줄어들어 NQ에서 답변 회수율@1=71%(이전 52%), 핫팟QA(전체 위키)에서 답변 회수율@2=72%(이전 47%)라는 놀라운 검색 점수로 이어집니다. 그런 다음 검색된 상위 k개 단위(대략 30,000 토큰)를 기존의 긴 문맥 LLM에 공급하여 제로 샷 답변 추출을 수행합니다. 별도의 훈련 없이도 LongRAG는 NQ에서 62.7%의 EM을 달성하며, 이는 가장 잘 알려진 결과입니다. 또한 LongRAG는 HotpotQA(전체 위키)에서도 64.3%를 달성하여 SoTA 모델과 동등한 수준입니다. 이 연구는 RAG와 긴 컨텍스트 LLM을 결합하기 위한 향후 로드맵에 대한 통찰력을 제공합니다.

In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the needle unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced heavy retriever and light reader design can lead to sub-optimal performance. In order to alleviate the imbalance, we propose a new framework LongRAG, consisting of a long retriever and a long reader. LongRAG processes the entire Wikipedia into 4K-token units, which is 30x longer than before. By increasing the unit size, we significantly reduce the total units from 22M to 700K. This significantly lowers the burden of retriever, which leads to a remarkable retrieval score: answer recall@1=71% on NQ (previously 52%) and answer recall@2=72% (previously 47%) on HotpotQA (full-wiki). Then we feed the top-k retrieved units (\approx 30K tokens) to an existing long-context LLM to perform zero-shot answer extraction. Without requiring any training, LongRAG achieves an EM of 62.7% on NQ, which is the best known result. LongRAG also achieves 64.3% on HotpotQA (full-wiki), which is on par of the SoTA model. Our study offers insights into the future roadmap for combining RAG with long-context LLMs.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1805230323799560199

인공 바늘에서 실제 건초 더미까지: 합성 데이터 미세 조정을 통한 LLM의 검색 기능 개선하기 / From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

논문 소개

긴 문맥 입력에 대한 추론 기능을 유지하면서 LLM에서 정보 검색의 정확도를 향상시키는 미세 조정 접근법을 제안합니다. 미세 조정 데이터 세트는 숫자 사전 키-값 검색 작업(350개 샘플)으로 구성되며, 이 접근법이 '중간에서 손실' 현상을 완화하고 정보 검색과 긴 문맥 추론 모두에서 성능을 향상시킨다는 사실을 발견합니다.

Proposes a fine-tuning approach to improve the accuracy of retrieving information in LLMs while maintaining reasoning capabilities over long-context inputs; the fine-tuning dataset comprises numerical dictionary key-value retrieval tasks (350 samples); finds that this approach mitigates the "lost-in-the-middle" phenomenon and improves performance on both information retrieval and long-context reasoning.

논문 초록 (Abstract)

최근 연구에 따르면 대규모 언어 모델(LLM)은 긴 문맥의 입력을 처리할 때 정보를 정확하게 검색하고 추론 능력을 유지하는 데 어려움을 겪는 것으로 나타났습니다. 이러한 한계를 해결하기 위해 숫자 키-값 검색 작업으로 구성된 신중하게 설계된 합성 데이터 세트를 활용한 미세 조정 접근법을 제안합니다. GPT-3.5 Turbo 및 Mistral 7B와 같은 모델에 대한 실험을 통해 이 데이터세트에서 LLM을 미세 조정하면 긴 컨텍스트 환경에서 LLM의 정보 검색 및 추론 기능이 크게 향상됨을 입증했습니다. 미세 조정된 모델에 대한 분석을 통해 합성 작업 평가에서 실제 작업 평가로의 기술 이전을 보여줍니다(예: GPT-3.5 Turbo의 경우 위치 10 에서 20 문서 MDQA에서 10.5\% 향상). 또한 일반 벤치마크에서 미세 조정된 LLM의 성능은 거의 일정하게 유지되는 반면, 다른 기준의 긴 컨텍스트 증강 데이터에서 미세 조정된 LLM은 환각을 유발할 수 있음을 발견했습니다(예: TriviaQA에서 합성 데이터에 미세 조정된 Mistral 7B는 성능 저하가 발생하지 않았지만 다른 기준 데이터는 2.33\% 에서 6.19\% 범위의 저하가 발생할 수 있습니다). 이 연구는 합성 데이터에 대한 미세 조정을 통해 더 긴 컨텍스트 작업에서 LLM의 성능을 개선할 수 있는 잠재력을 보여줍니다.

Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B demonstrate that finetuning LLMs on this dataset significantly improves LLMs' information retrieval and reasoning capabilities in longer-context settings. We present an analysis of the finetuned models, illustrating the transfer of skills from synthetic to real task evaluations (e.g., 10.5\% improvement on 20 documents MDQA at position 10 for GPT-3.5 Turbo). We also find that finetuned LLMs' performance on general benchmarks remains almost constant while LLMs finetuned on other baseline long-context augmentation data can encourage hallucination (e.g., on TriviaQA, Mistral 7B finetuned on our synthetic data cause no performance drop while other baseline data can cause a drop that ranges from 2.33\% to 6.19\%). Our study highlights the potential of finetuning on synthetic data for improving the performance of LLMs on longer-context tasks.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1806738385039692033

그래프 리더: 그래프 기반 에이전트를 구축하여 대규모 언어 모델의 긴 컨텍스트 능력을 향상시키는 방법 / GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

논문 소개

긴 텍스트를 그래프로 구조화하고 단계별 합리적인 계획에 따라 사전 정의된 함수를 사용하여 그래프를 탐색하는 에이전트를 사용하여 질문에 대한 답변을 효과적으로 생성하는 그래프 기반 에이전트 시스템을 제안하여 16k에서 256k의 컨텍스트 길이에 걸쳐 GPT-4-128k보다 일관되게 뛰어난 성능을 발휘합니다.

Proposes a graph-based agent system to enhance the long-context abilities of LLMs; it structures long text into a graph and employs an agent to explore the graph (using predefined functions guided by a step-by-step rational plan) to effectively generate answers for questions; consistently outperforms GPT-4-128k across context lengths from 16k to 256k.

논문 초록(Abstract)

긴 컨텍스트 기능은 대규모 언어 모델(LLM)이 복잡하고 긴 입력 작업을 처리하는 데 필수적입니다. 긴 컨텍스트에 맞게 LLM을 최적화하기 위한 수많은 노력에도 불구하고 긴 입력을 안정적으로 처리하는 데는 여전히 어려움이 있습니다. 이 백서에서는 긴 텍스트를 그래프로 구조화하고 에이전트가 이 그래프를 자율적으로 탐색하도록 설계하여 긴 텍스트를 처리하도록 설계된 그래프 기반 에이전트 시스템인 GraphReader를 소개합니다. 질문을 받으면 에이전트는 먼저 단계별 분석을 수행하여 합리적인 계획을 수립합니다. 그런 다음 미리 정의된 함수 집합을 호출하여 노드 콘텐츠와 이웃을 읽고 그래프를 거칠게부터 세밀하게 탐색할 수 있도록 합니다. 탐색하는 동안 에이전트는 지속적으로 새로운 인사이트를 기록하고 현재 상황을 반영하여 답을 생성할 수 있는 충분한 정보를 수집할 때까지 프로세스를 최적화합니다. LV-Eval 데이터 세트에 대한 실험 결과에 따르면 4k 컨텍스트 창을 사용하는 GraphReader는 16k에서 256k의 컨텍스트 길이에 걸쳐 GPT-4-128k보다 큰 차이로 일관되게 우수한 성능을 보였습니다. 또한 이 접근 방식은 4개의 까다로운 싱글 홉 및 멀티 홉 벤치마크에서 우수한 성능을 보여줍니다.

Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1806802925517218078

EAGLE-2: 동적 초안 트리를 사용한 언어 모델의 빠른 추론 / EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

논문 소개

추론 속도를 높이기 위해 컨텍스트 인식 동적 초안 트리를 제시; 이전의 추측적 샘플링 방법은 샘플링에 위치에만 의존하고 컨텍스트 인식이 부족한 정적 초안 트리를 사용; 이전 작업보다 20%-40% 빠른 3.05배-4.26배의 속도 향상 비율 달성; 이러한 속도 향상 비율은 새로운 방법이 허용되는 초안 토큰 수를 크게 늘리기 때문에 발생합니다.

Presents a context-aware dynamic draft tree to increase the speed of inference; the previous speculative sampling method used a static draft tree for sampling which only depended on position but lacked context awareness; achieves speedup ratios ranging from 3.05x-4.26x, which is 20%-40% faster than previous work; these speedup ratios occur because the new method significantly increases the number of accepted draft tokens.

논문 초록(Abstract)

최신 대규모 언어 모델(LLM)을 사용한 추론은 비용과 시간이 많이 소요되며, 추측적 샘플링이 효과적인 솔루션으로 입증되었습니다. EAGLE과 같은 대부분의 추측 샘플링 방법은 정적 초안 트리를 사용하며, 초안 토큰의 수락률이 해당 토큰의 포지션에만 의존한다고 암묵적으로 가정합니다. 흥미롭게도 초안 토큰의 수락률도 컨텍스트에 따라 달라진다는 사실을 발견했습니다. 이 백서에서는 EAGLE을 기반으로 컨텍스트 인식 동적 초안 트리라는 새로운 기법을 초안 모델링에 도입한 EAGLE-2를 제안합니다. 이 개선안은 EAGLE의 초안 모델이 잘 보정되어 있다는 사실을 활용합니다. 초안 모델의 신뢰도 점수는 오차가 거의 없는 대략적인 수락률을 나타냅니다. 세 가지 일련의 LLM과 여섯 가지 작업에 대해 광범위한 평가를 수행한 결과, EAGLE-2는 EAGLE-1보다 20~40% 빠른 3.05배~4.26배의 속도 향상률을 달성했습니다. 또한 EAGLE-2는 생성된 텍스트의 분포가 변경되지 않도록 보장하므로 무손실 가속 알고리즘입니다.

Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution. Most speculative sampling methods such as EAGLE use a static draft tree, implicitly assuming that the acceptance rate of draft tokens depends only on their position. Interestingly, we found that the acceptance rate of draft tokens is also context-dependent. In this paper, building upon EAGLE, we propose EAGLE-2, which introduces a new technique of context-aware dynamic draft tree into drafting modeling. This improvement leverages the fact that the draft model of EAGLE is well-calibrated: the confidence scores from the draft model approximate acceptance rates with small errors. We conducted extensive evaluations on three series of LLMs and six tasks, with EAGLE-2 achieving speedup ratios 3.05x-4.26x, which is 20%-40% faster than EAGLE-1. EAGLE-2 also ensures that the distribution of the generated text remains unchanged, making it a lossless acceleration algorithm.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1805629496634294760

지시문에서 길이 제약 조건 따르기 / Following Length Constraints in Instructions

논문 소개

길이 편향을 처리하고 길이 제약 지침을 더 잘 따르는 언어 모델을 훈련하는 방법에 대한 접근 방식을 제시하고, 길이 지침이 증강된 데이터 세트와 함께 DPO를 사용하여 모델을 미세 조정하고 높은 응답 품질을 유지하면서 길이 제약 위반이 적은 모델을 보여줍니다.

Presents an approach for how to deal with length bias and train instruction following language models that better follow length constraint instructions; fine-tunes a model using DPO with a length instruction augmented dataset and shows less length constraint violations and while keeping a high response quality.

논문 초록(Abstract)

정렬된 명령어 추종 모델은 정렬되지 않은 모델보다 사용자 요청을 더 잘 이행할 수 있습니다. 그러나 이러한 모델을 평가할 때 길이 편향이 있으며, 훈련 알고리즘은 더 긴 응답을 학습함으로써 이러한 편향을 악용하는 경향이 있는 것으로 나타났습니다. 이 연구에서는 원하는 길이 제약 조건이 포함된 명령어로 추론 시점에 제어할 수 있는 모델을 훈련하는 방법을 보여줍니다. 이러한 모델은 길이 지시형 평가에서 우수하여 GPT4, Llama 3, Mixtral과 같은 표준 지시형 모델보다 성능이 뛰어납니다.

Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. Such models are superior in length instructed evaluations, outperforming standard instruction following models such as GPT4, Llama 3 and Mixtral.

논문 링크

더 읽어보기

https://x.com/jaseweston/status/1805771223747481690

LLM 기반 합성 데이터 생성, 큐레이션 및 평가에 대해: 서베이 논문 / On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey

논문 소개

LLM 기반 합성 데이터 생성, 큐레이션 및 평가에 관한 서베이 논문.

Survey on LLM-based synthetic data generation, curation, and evaluation.

논문 초록(Abstract)

진화하는 딥러닝 환경에서 데이터 양과 품질의 딜레마는 오랫동안 지속되어 온 문제였습니다. 최근 등장한 대규모 언어 모델(LLM)은 합성 데이터 생성을 통해 실제 데이터의 한계를 완화할 수 있는 데이터 중심 솔루션을 제공합니다. 그러나 현재 이 분야에 대한 연구는 통합된 프레임워크가 부족하고 대부분 표면적인 수준에 머물러 있습니다. 따라서 이 백서에서는 합성 데이터 생성의 일반적인 워크플로우를 기반으로 관련 연구를 정리합니다. 이를 통해 기존 연구의 부족한 부분을 강조하고 향후 연구가 나아가야 할 방향을 제시합니다. 이 연구는 학계와 산업계가 LLM 기반 합성 데이터 생성의 기능과 응용에 대해 보다 심도 있고 체계적인 탐구로 나아갈 수 있도록 안내하는 것을 목표로 합니다.

Within the evolving landscape of deep learning, the dilemma of data quantity and quality has been a long-standing problem. The recent advent of Large Language Models (LLMs) offers a data-centric solution to alleviate the limitations of real-world data with synthetic data generation. However, current investigations into this field lack a unified framework and mostly stay on the surface. Therefore, this paper provides an organization of relevant studies based on a generic workflow of synthetic data generation. By doing so, we highlight the gaps within existing research and outline prospective avenues for future study. This work aims to shepherd the academic and industrial communities towards deeper, more methodical inquiries into the capabilities and applications of LLMs-driven synthetic data generation.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1805652404404207919

아담-미니: 더 적은 학습률로 더 많은 것을 얻기 / Adam-mini: Use Fewer Learning Rates To Gain More

논문 소개

더 적은 학습 속도를 사용하여 메모리 풋프린트를 줄이고(45%~50% 적은 메모리 풋프린트), AdamW와 동등하거나 더 나은 성능을 달성하는 새로운 최적화 도구로, 매개변수를 블록으로 신중하게 분할하여 Adam보다 뛰어난 단일 고품질 학습을 할당하고, 사전 학습, SFT 및 RLHF의 경우 125M~7B 크기의 언어 모델에서 일관된 결과를 얻을 수 있습니다.

A new optimizer that reduces memory footprint (45%-50% less memory footprint) by using fewer learning rates and achieves on-par or even outperforms AdamW; it carefully partitions parameters into blocks and assigns a single high-quality learning that outperforms Adam; achieves consistent results on language models sized from 125M -7B for pre-training, SFT, and RLHF.

논문 초록(Abstract)

메모리 사용량을 45%~50% 줄이면서 AdamW와 동등하거나 더 나은 성능을 달성하는 최적화 도구인 Adam-mini를 제안합니다. Adam-mini는 Adam의 학습 속도 리소스(즉, 1/\sqrt{v})를 줄임으로써 메모리를 줄입니다. 우리는 (1) 우리가 제안한 헤시안 구조 원칙에 따라 파라미터를 블록으로 신중하게 분할하고 (2) 각 파라미터 블록에 단일하지만 좋은 학습률을 할당하면 $\geq$에서 이러한 학습률의 90%가 무해하게 제거될 수 있음을 발견했습니다. 또한, 이러한 각 파라미터 블록에 대해 충분한 리소스를 검색할 수 있다면 아담보다 성능이 우수한 단일 학습률이 존재한다는 사실을 발견했습니다. 그런 다음 우수한 학습률을 찾는 비용 효율적인 방법을 한 가지 제시하고 아담 미니를 제안합니다. 경험적으로, 사전 학습, 감독 미세 조정 및 RLHF를 위해 125M에서 7B 크기의 다양한 언어 모델에서 Adam-mini가 AdamW와 동등하거나 더 나은 성능을 발휘하는 것을 확인했습니다. 또한, Adam-mini의 메모리 풋프린트가 줄어들면 GPU와 CPU 간의 통신 오버헤드가 완화되어 처리량이 증가합니다. 예를 들어, Adam-mini는 2차례 A800-80GB GPU에서 Llama2-7B를 사전 훈련할 때 AdamW보다 49.6% 높은 처리량을 달성하여 사전 훈련에 필요한 월-클럭(wall-clock) 시간을 33% 절약할 수 있습니다.

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., 1/\sqrt{v}). We find that \geq 90% of these learning rates in v could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle on Hessian structure; (2) assign a single but good learning rate to each parameter block. We further find that, for each of these parameter blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. We then provide one cost-effective way to find good learning rates and propose Adam-mini. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on 2\times A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

논문 링크

더 읽어보기

https://x.com/arankomatsuzaki/status/1805439246318125299

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2024/06/24 ~ 06/30] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

ESM3: 언어 모델로 5억년의 진화 시뮬레이션 하기 / Simulating 500 million years of evolution with a language model

논문 소개

논문 초록 (Abstract)

논문 링크

더 읽어보기

젬마 2 / Gemma 2

논문 소개

논문 초록 (Abstract)

논문 링크

더 읽어보기

LLM 컴파일러 / LLM Compiler

논문 소개

논문 초록 (Abstract)

논문 링크

더 읽어보기

LongRAG: 긴 컨텍스트 LLM을 통한 검색 증강 생성 향상 / LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

인공 바늘에서 실제 건초 더미까지: 합성 데이터 미세 조정을 통한 LLM의 검색 기능 개선하기 / From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

논문 소개

논문 초록 (Abstract)

논문 링크

더 읽어보기

그래프 리더: 그래프 기반 에이전트를 구축하여 대규모 언어 모델의 긴 컨텍스트 능력을 향상시키는 방법 / GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

EAGLE-2: 동적 초안 트리를 사용한 언어 모델의 빠른 추론 / EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

지시문에서 길이 제약 조건 따르기 / Following Length Constraints in Instructions

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

LLM 기반 합성 데이터 생성, 큐레이션 및 평가에 대해: 서베이 논문 / On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

아담-미니: 더 적은 학습률로 더 많은 것을 얻기 / Adam-mini: Use Fewer Learning Rates To Gain More

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR