[2023/12/18 ~ 12/24] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 12월 25, 2023, 4:41오전

이 글은 GPT 모델로 자동 요약한 설명으로, 잘못된 내용이 있을 수 있으니 원문을 참고해주세요!
읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다!

소개

이번 주에 선정된 논문들을 살펴보면, 대부분 자연어 처리(Natural Language Processing, NLP)와 관련된 주제가 주를 이루는 듯 합니다. "Gemini’s Language Abilities", "LLM in a Flash", "ReST Meets ReAct", "Adversarial Attacks on GPT-4", "RAG for LLM" 그리고 "Findings of the BabyLLM Challenge" 등과 같은 제목에서 알 수 있듯이, 다양한 언어 모델과, 그에 대한 새로운 기법 또는 보안 취약점과 같은 이슈에 대한 연구가 진행되고 있습니다. 특히, GPT-4와 같이 최신의 대형 언어 모델에 대한 연구가 주목되며, 이들 모델을 어떻게 더 활용하거나 보완할 수 있을지에 대한 연구가 활발해 보입니다.

또한, "Multimodal Agents as Smartphone Users"와 "VideoPoet" 같은 제목에서 알 수 있듯 멀티모달(Multimodal) 시스템에 대한 연구도 주요 트렌드 중 하나로 보입니다. 이는 단순한 텍스트 정보뿐만 아니라, 비디오, 음성과 같은 다양한 형태의 데이터를 이용하여 인공지능의 이해력을 넓히고자 하는 노력의 일환으로 해석됩니다. 특정하게, "Discovery of a New Family of Antibiotics with Graph Deep Learning" 논문은 그래프 딥러닝을 활용하여 새로운 항생제 발견과 같은 생명공학 분야에서의 AI 응용 역시 강화되고 있음을 보여줍니다.

이러한 트렌드는 최근 인공지능 기술 발달의 가속과 함께, 다양한 산업 분야에서 알고리즘의 새로운 활용 혹은 보다 정교한 모델링 방법을 모색하려는 연구자들의 시도를 반영합니다. 특히 NLP 분야에서는 대형 언어 모델들이 인간처럼 복잡한 언어를 이해하고 생성하는 능력에 대한 기대감이 커지고 있으며, 이는 인간과 기계 간의 상호작용, 정보 처리, 콘텐츠 생성 등 여러 면에서 혁신을 가져올 것으로 예측됩니다. 한편, 멀티모달과 그래프 딥러닝 기술의 발전은 인공지능이 인간의 감각과 유사한 방식으로 세계를 인식하고, 보다 복잡한 데이터 구조를 이해하는 데에 도움을 줄 것으로 기대되기 때문에, 향후 다양한 응용 분야에서 획기적인 성과들을 기대해 볼 수 있습니다.

제미나이(Gemini)의 언어 능력에 대한 심층 분석 / An In-depth Look at Gemini's Language Abilities

논문 소개

공정하고 재현 가능한 연구를 통해 제미나이(Gemini), GPT, 믹스트랄(Mixtral) 등 여러 인기 모델을 비교한 결과, Gemini Pro가 현재 버전인 GPT 3.5 터보보다 정확도는 비슷하지만 약간 낮았고, Gemini와 GPT가 Mixtral보다 더 우수했습니다.

provides an impartial and reproducible study comparing several popular models like gemini, gpt, and mixtral; gemini pro achieves comparable but slightly lower accuracy than the current version of gpt 3.5 turbo; gemini and gpt were better than mixtral.

논문 초록

최근 출시된 Google Gemini 클래스 모델은 다양한 작업에서 OpenAI GPT 시리즈에 필적하는 결과를 종합적으로 보고한 최초의 모델입니다. 이 논문에서는 Gemini의 언어 능력을 심층적으로 탐구하여 두 가지 기여를 하고자 합니다. 첫째, 재현 가능한 코드와 완전히 투명한 결과를 통해 OpenAI GPT와 Google Gemini 모델의 능력을 제3자의 객관적인 비교를 통해 제공합니다. 둘째, 결과를 면밀히 분석하여 두 모델 클래스 중 어느 쪽이 더 뛰어난지 파악합니다. 추론, 지식 기반 질문에 대한 답변, 수학 문제 풀이, 언어 간 번역, 코드 생성, 명령어 추종 에이전트 역할 등 다양한 언어 능력을 테스트하는 10개의 데이터셋에 대해 이 분석을 수행합니다. 분석 결과, Gemini Pro는 벤치마킹한 모든 작업에서 해당 GPT 3.5 터보와 비슷하지만 약간 떨어지는 정확도를 달성하는 것으로 나타났습니다. 또한 숫자가 많은 수학적 추론의 실패, 객관식 답안 순서에 대한 민감도, 공격적인 콘텐츠 필터링 등 일부 성능 저하에 대한 설명도 제공합니다. 또한 비영어권 언어로의 생성, 더 길고 복잡한 추론 체인 처리 등 Gemini가 비교적 높은 성능을 보이는 영역도 확인했습니다. 재현을 위한 코드와 데이터는 GitHub - neulab/gemini-benchmark 에서 확인할 수 있습니다

The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible code and fully transparent results. Second, we take a closer look at the results, identifying areas where one of the two model classes excels. We perform this analysis over 10 datasets testing a variety of language abilities, including reasoning, answering knowledge-based questions, solving math problems, translating between languages, generating code, and acting as instruction-following agents. From this analysis, we find that Gemini Pro achieves accuracy that is close but slightly inferior to the corresponding GPT 3.5 Turbo on all tasks that we benchmarked. We further provide explanations for some of this under-performance, including failures in mathematical reasoning with many digits, sensitivity to multiple-choice answer ordering, aggressive content filtering, and others. We also identify areas where Gemini demonstrates comparably high performance, including generation into non-English languages, and handling longer and more complex reasoning chains. Code and data for reproduction can be found at GitHub - neulab/gemini-benchmark

논문 링크

GitHub 저장소

https://github.com/neulab/gemini-benchmark

더 읽어보기

https://x.com/gneubig/status/1737108966931673191

PowerInfer

논문 소개

LLM을 로컬에 배치하기 위한 고속 추론 엔진; LLM 추론의 높은 로컬리티를 활용하여 GPU-CPU 하이브리드 추론 엔진 설계; 빠른 액세스를 위해 핫 활성화 뉴런은 GPU에 미리 로드하고 콜드 활성화 뉴런(대부분)은 CPU에서 계산; 이 접근 방식은 GPU 메모리 요구량과 CPU-GPU 데이터 전송을 크게 줄입니다.

A high-speed inference engine for deploying llms locally; exploits the high locality in llm inference to design a gpu-cpu hybrid inference engine; hot-activated neurons are preloaded onto the gpu for fast access, while cold-activated neurons (the majority) are computed on the cpu; this approach significantly reduces gpu memory demands and cpu-gpu data transfer.

논문 링크

GitHub 저장소

https://github.com/SJTU-IPADS/PowerInfer

더 읽어보기

[GN] PowerInfer - 소비자용 GPU를 사용해서 빠르게 LLM 서빙하기

https://x.com/omarsar0/status/1737168751668187229

그래프 딥러닝을 통한 새로운 항생제 계열의 발견 / Discovery of a New Family of Antibiotics with Graph Deep Learning

논문 소개

설명 가능한 그래프 알고리즘으로 새로운 구조의 항생제 클래스 발견: 설명 가능한 딥러닝을 통해 항생제 활성의 근간이 되는 화학 물질 구조를 제공하는 데 도움이 되는 항생제 구조 클래스를 발견할 수 있는 접근 방식입니다.

Discovered a new structural class of antibiotics with explainable graph algorithms; the approach enables explainable deep learning guided discovery of structural classes of antibiotics which helps to provide chemical substructures that underlie antibiotic activity.

논문 초록

현재 진행 중인 항생제 내성 위기를 해결하기 위해서는 새로운 구조의 항생제를 발견하는 것이 시급합니다. 딥러닝 접근법은 화학적 공간을 탐색하는 데 도움이 되었습니다; 이러한 접근법은 일반적으로 블랙박스 모델을 사용하며 화학적 인사이트를 제공하지 않습니다. 여기서 우리는 신경망 모델이 학습한 항생제 활성과 관련된 화학 물질 구조를 식별하여 항생제의 구조적 등급을 예측하는 데 사용할 수 있다고 추론했습니다. 우리는 딥러닝을 통해 화학 공간을 효율적으로 탐색할 수 있는 설명 가능한 하위 구조 기반 접근법을 개발하여 이 가설을 테스트했습니다. 39,312개 화합물의 항생제 활성과 인간 세포 독성 프로파일을 확인하고 그래프 신경망 앙상블을 적용하여 12,076,365개 화합물에 대한 항생제 활성과 세포 독성을 예측했습니다. 설명 가능한 그래프 알고리즘을 사용하여 예측된 항생제 활성은 높고 예측된 세포 독성은 낮은 화합물에 대한 하위 구조 기반 근거를 확인했습니다. 283개의 화합물을 실험적으로 테스트한 결과 황색포도상구균에 대한 항생제 활성을 보이는 화합물은 근거에 따라 추정되는 구조 클래스가 더 풍부하다는 것을 발견했습니다. 이러한 구조 클래스의 화합물 중 하나는 메티실린 내성 황색포도상구균(MRSA)과 반코마이신 내성 장구균에 선택적이고, 상당한 내성을 회피하며, MRSA 피부 및 전신 허벅지 감염 마우스 모델에서 박테리아 역가를 감소시키는 것으로 나타났습니다.

The discovery of novel structural classes of antibiotics is urgently needed to address the ongoing antibiotic resistance crisis. Deep learning approaches have aided in exploring chemical spaces; these typically use black box models and do not provide chemical insights. Here we reasoned that the chemical substructures associated with antibiotic activity learned by neural network models can be identified and used to predict structural classes of antibiotics. We tested this hypothesis by developing an explainable, substructure-based approach for the efficient, deep learning-guided exploration of chemical spaces. We determined the antibiotic activities and human cell cytotoxicity profiles of 39,312 compounds and applied ensembles of graph neural networks to predict antibiotic activity and cytotoxicity for 12,076,365 compounds. Using explainable graph algorithms, we identified substructure-based rationales for compounds with high predicted antibiotic activity and low predicted cytotoxicity. We empirically tested 283 compounds and found that compounds exhibiting antibiotic activity against Staphylococcus aureus were enriched in putative structural classes arising from rationales. Of these structural classes of compounds, one is selective against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci, evades substantial resistance, and reduces bacterial titres in mouse models of MRSA skin and systemic thigh infection.

논문 링크

https://www.nature.com/articles/s41586-023-06887-8

더 읽어보기

https://x.com/EricTopol/status/1737505177052348545

비디오 시인 / VideoPoet

논문 소개

제로 샷 비디오 생성을 위한 대규모 언어 모델을 소개하고, 이미지 대 비디오 및 비디오 스타일링과 같은 다양한 비디오 생성 작업을 수행할 수 있으며, 여러 토큰화 도구를 사용하여 비디오, 이미지, 오디오 및 텍스트 양식에 걸쳐 학습할 수 있는 자동 회귀 모델을 학습하고, 언어 모델이 어느 정도의 시간적 일관성을 가지고 비디오를 합성 및 편집할 수 있음을 보여줍니다.

Introduces a large language model for zero-shot video generation; it’s capable of a variety of video generation tasks such as image-to-video and video stylization; trains an autoregressive model to learn across video, image, audio, and text modalities by using multiple tokenizers; shows that language models can synthesize and edit video with some degree of temporal consistency.

논문 링크

더 읽어보기

https://x.com/GoogleAI/status/1737235593078456389

앱에이전트: 스마트폰 사용자로서의 멀티모달 에이전트 / AppAgent: Multimodal Agents as Smartphone Users

논문 소개

스마트폰 애플리케이션을 작동하기 위한 LLM 기반 멀티모달 에이전트 프레임워크를 소개하고, 자율 탐색 또는 사람의 시연을 관찰하여 새로운 앱을 탐색하는 방법을 학습하며, 이메일, 소셜 미디어, 쇼핑, 편집 도구 등 여러 애플리케이션에서 다양한 작업을 처리하는 데 능숙함을 보여 줍니다.

Introduces an llm-based multimodal agent framework to operate smartphone applications; learns to navigate new apps through autonomous exploration or observing human demonstrations; shows proficiency in handling diverse tasks across different applications like email, social media, shopping, editing tools, and more.

논문 초록

최근 대규모 언어 모델(LLM)의 발전으로 복잡한 작업을 수행할 수 있는 지능형 에이전트가 탄생했습니다. 이 논문에서는 스마트폰 애플리케이션을 작동하도록 설계된 새로운 LLM 기반 멀티모달 에이전트 프레임워크를 소개합니다. 이 프레임워크를 사용하면 에이전트가 단순화된 작업 공간을 통해 스마트폰 애플리케이션을 조작할 수 있으며, 탭과 스와이프와 같은 인간과 유사한 상호 작용을 모방할 수 있습니다. 이 새로운 접근 방식은 시스템 백엔드 액세스의 필요성을 우회하여 다양한 앱에 걸쳐 적용 범위를 넓힙니다. 에이전트 기능의 핵심은 혁신적인 학습 방식입니다. 에이전트는 자율 탐색을 통해 또는 사람의 데모를 관찰하여 새로운 앱을 탐색하고 사용하는 방법을 학습합니다. 이 프로세스는 에이전트가 여러 애플리케이션에서 복잡한 작업을 실행할 때 참조하는 지식 기반을 생성합니다. 에이전트의 실용성을 입증하기 위해 소셜 미디어, 이메일, 지도, 쇼핑, 정교한 이미지 편집 도구 등 10가지 애플리케이션에서 50개 이상의 작업을 광범위하게 테스트했습니다. 그 결과 에이전트가 다양한 고난이도 작업을 능숙하게 처리할 수 있음을 확인할 수 있었습니다.

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.

논문 링크

더 읽어보기

텐센트가 공개한, 스마트폰 사용자처럼 동작하는 AppAgent 모델 읽을거리&정보공유

PyTorchKR 12/18~24의 주요 ML 논문에 소개된 AppAgent 모델을 정리해보았습니다. 멀티모달 분야에서의 대규모 모델들(LMM)의 연구/공개됨에 따라 더 많은 '에이전트'들이 등장할 것 같습니다. 이 글은 GPT 모델로 자동 요약한 설명으로, 잘못된 내용이 있을 수 있으니 원문을 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다! 소개 AppAgent는 대규모 언어 모델(Large Language Models, LLMs)의 능력을 활용하여 인간과 유사하게 스마트폰 애플리케이션을 운영하는 새로운 멀티모달 에이전트 프레임워크입니다. AppAgent는 독자적인 탐험 또는 인간의 시연을 관찰하며 새로운 스마트폰 앱의 사용법을 학습하고, 이를 통해 다양한 애플리케이션에서 작업을 실행하기 위해 참조할 수 있는 지식(KB; Knowledg…

순식간에 끝나는 LLM: 제한된 메모리로 효율적인 대규모 언어 모델 추론 / LLM in a flash: Efficient Large Language Model Inference with Limited Memory

논문 소개

모델 파라미터를 플래시 메모리에 저장하고 필요할 때마다 이를 DRAM으로 가져옴으로써 사용 가능한 DRAM 용량을 초과하는 LLMS를 효율적으로 실행하는 접근 방식을 제안하며, CPU와 GPU의 네이티브 로딩 방식에 비해 추론 속도가 각각 4~5배, 20~25배 증가하여 사용 가능한 DRAM의 최대 2배 크기까지 모델을 실행할 수 있습니다.

Proposes an approach that efficiently runs llms that exceed the available dram capacity by storing the model parameters on flash memory but bringing them on demand to dram; enables running models up to twice the size of the available dram, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in cpu and gpu, respectively.

논문 초록

대규모 언어 모델(LLM)은 최신 자연어 처리의 핵심으로, 다양한 작업에서 탁월한 성능을 제공합니다. 하지만 집중적인 연산 및 메모리 요구사항으로 인해 특히 DRAM 용량이 제한된 디바이스의 경우 어려움을 겪을 수 있습니다. 이 논문에서는 모델 파라미터를 플래시 메모리에 저장하고 필요에 따라 이를 DRAM으로 가져옴으로써 사용 가능한 DRAM 용량을 초과하는 LLM을 효율적으로 실행하는 문제를 다룹니다. 이 방법은 플래시 메모리 동작과 조화를 이루는 추론 비용 모델을 구축하여 플래시에서 전송되는 데이터의 양을 줄이고 더 크고 연속적인 청크로 데이터를 읽는 두 가지 중요한 영역에서 최적화하도록 안내합니다. 이러한 플래시 메모리 기반 프레임워크 내에서 두 가지 주요 기술을 소개합니다. 첫째, 이전에 활성화된 뉴런을 재사용하여 데이터 전송을 전략적으로 줄이는 '윈도우링', 둘째, 플래시 메모리의 순차적 데이터 액세스 강점에 맞춘 '행-열 번들링'을 통해 플래시 메모리에서 읽는 데이터 청크의 크기를 늘리는 것입니다. 이러한 방법을 종합하면 CPU와 GPU의 네이티브 로딩 방식에 비해 추론 속도가 각각 4~5배, 20~25배 향상되어 사용 가능한 DRAM의 최대 2배 크기까지 모델을 실행할 수 있습니다. 희소성 인식, 컨텍스트 적응형 로딩, 하드웨어 지향 설계의 통합은 메모리가 제한된 장치에서 LLM을 효과적으로 추론할 수 있는 길을 열어줍니다.

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory.

논문 링크

더 읽어보기

https://x.com/gabrielnocode/status/1737307286887133552

ReST와 ReAct의 만남: 다단계 추론 LLM 에이전트를 위한 자체 개선 사항 / ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

논문 소개

긴 형식의 질문 답변 작업을 개선하기 위해 자기 비평 기능을 갖춘 반응 스타일 에이전트를 제안하고, 추론 흔적에 대한 휴식 스타일(강화 자기 학습) 반복 미세 조정을 통해 에이전트를 개선할 수 있음을 보여주고, 특히 지속적인 자기 개선 및 자기 증류를 위해 AI 피드백과 함께 성장 배치 RL을 사용하고, 다른 최근 논문과 마찬가지로 인간의 개입을 최소화하는 데 중점을 둡니다(즉, 사람이 레이블을 지정한 학습 데이터에 의존하지 않음), AI 피드백을 통한 자기 개선으로 합성 데이터를 생성하여 에이전트를 사전 학습된 에이전트와 비슷한 성능의 더 작은 모델(1/2배 규모)로 증류하는 데 사용할 수 있습니다.

Proposes a react-style agent with self-critique for improving on the task of long-form question answering; it shows that the agent can be improved through rest-style (reinforced self-training) iterative fine-tuning on its reasoning traces; specifically, it uses growing-batch rl with ai feedback for continuous self-improvement and self-distillation; like a few other recent papers, it focuses on minimizing human involvement (i.e., doesn't rely on human-labeled training data); it generates synthetic data with self-improvement from ai feedback which can then be used to distill the agent into smaller models (1/2 orders magnitude) with comparable performance as the pre-trained agent.

논문 초록

복잡한 자연어 질문에 답하려면 다단계 추론과 외부 정보 통합이 필요한 경우가 많습니다. 몇몇 시스템은 이러한 질문에 답하기 위해 지식 검색과 대규모 언어 모델(LLM)을 결합했습니다. 그러나 이러한 시스템은 다양한 실패 사례를 겪고 있으며, 외부 지식과의 상호 작용이 차별화되지 않기 때문에 이러한 실패를 해결하기 위해 엔드투엔드로 직접 학습시킬 수 없습니다. 이러한 결함을 해결하기 위해 외부 지식을 추론하고 이를 바탕으로 행동할 수 있는 능력을 갖춘 ReAct 스타일의 LLM 에이전트를 정의합니다. 이전 궤적을 반복적으로 학습하는 ReST와 유사한 방법을 통해 에이전트를 더욱 개선하고, 지속적인 자기 개선과 자기 증류를 위해 AI 피드백과 함께 성장 배치 강화 학습을 사용합니다. 프롬프트가 표시된 대형 모델에서 시작하여 알고리즘을 단 두 번만 반복하면 두 배나 적은 수의 매개변수로 까다로운 구성 질문 답변 벤치마크에서 비슷한 성능을 달성하는 미세 조정된 소형 모델을 생성할 수 있습니다.

Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1736587397830176910

GPT-4에 대한 적 공격 / Adversarial Attacks on GPT-4

논문 소개

간단한 무작위 검색 알고리즘을 사용하여 gpt-4에 대한 적대적 공격을 구현합니다. 원래 요청에 적대적 접미사를 추가한 다음 접미사에 약간의 무작위 변경을 반복적으로 적용하고 응답의 첫 번째 위치에서 토큰의 로그 확률이 "확실"으로 높아지면 변경 사항을 유지하여 탈옥을 달성합니다.

Uses a simple random search algorithm to implement adversarial attacks on gpt-4; it achieves jailbreaking by appending an adversarial suffix to an original request, then iteratively making slight random changes to the suffix, and keeping changes if it increases the log probability of the token “sure” at the first position of the response.

논문 링크

더 읽어보기

https://x.com/maksym_andr/status/1737844601891983563

대규모 언어 모델을 위한 검색 증강 생성: 서베이 논문 / Retrieval-Augmented Generation for Large Language Models: A Survey

논문 소개

현재 진행 중인 모든 검색 증강 세대(RAG) 연구에 대한 개요입니다.

An overview of all the retrieval augmented generation (rag) research that has been happening.

논문 초록

대규모 언어 모델(LLM)은 강력한 기능을 보여주지만, 실제 적용 시에는 여전히 환각, 느린 지식 업데이트, 답변의 투명성 부족 등의 문제에 직면해 있습니다. 검색 증강 생성(RAG)은 LLM으로 질문에 답변하기 전에 외부 지식 기반에서 관련 정보를 검색하는 것을 말합니다. RAG는 특히 지식 집약적인 작업에서 답변의 정확성을 크게 향상시키고 모델 환각을 줄이는 것으로 입증되었습니다. 사용자는 출처를 인용함으로써 답변의 정확성을 검증하고 모델 결과물에 대한 신뢰를 높일 수 있습니다. 또한 지식 업데이트와 도메인별 지식 도입을 용이하게 합니다. RAG는 LLM의 매개변수화된 지식과 매개변수화되지 않은 외부 지식 기반을 효과적으로 결합하여 대규모 언어 모델을 구현하는 데 가장 중요한 방법 중 하나입니다. 이 논문에서는 LLM 시대에 RAG의 개발 패러다임을 세 가지로 요약하여 설명합니다: 나이브 RAG, 고급 RAG, 모듈형 RAG입니다. 그런 다음 RAG의 세 가지 주요 구성 요소인 리트리버, 제너레이터, 증강 방법과 각 구성 요소의 핵심 기술을 요약 및 정리합니다. 또한 RAG 모델의 효과성을 평가하는 방법에 대해 설명하며, RAG에 대한 두 가지 평가 방법을 소개하고, 평가를 위한 주요 지표와 기능을 강조하며, 최신 자동 평가 프레임워크를 제시합니다. 마지막으로 수직적 최적화, 수평적 확장성, RAG의 기술 스택 및 생태계의 세 가지 측면에서 향후 연구 방향을 소개합니다.

Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG.

논문 링크

더 읽어보기

BabyLLM 챌린지 결과 / Findings of the BabyLLM Challenge

논문 소개

발달적으로 그럴듯한 말뭉치에 대해 샘플을 효율적으로 사전 학습하는 새로운 도전에 대한 결과를 발표합니다. 화려한 LTG Bert를 사용한 우승작이 3/4 평가에서 llama 2 70b를 이겼습니다. 좋은 결과를 보인 다른 접근 방식에는 데이터 전처리 또는 짧은 문맥에 대한 학습이 포함되었습니다.

Presents results for a new challenge that involves sample-efficient pretraining on a developmentally plausible corpus; the winning submission, which uses flashy ltg bert, beat llama 2 70b on 3/4 evals; other approaches that saw good results included data preprocessing or training on shorter context.

논문 링크

https://aclanthology.org/volumes/2023.conll-babylm/

더 읽어보기

https://x.com/a_stadt/status/1737849248560066794