[2024/07/08 ~ 07/14] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 7월 14, 2024, 9:00오후

[2024/07/08 ~ 07/14] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선정된 논문들을 살펴보면, 크게 두 가지 추세를 발견할 수 있습니다. 첫 번째로, 주목할 만한 경향은 대규모 언어 모델(LLMs)의 효율성, 추론 및 오류 수정에 초점을 맞춘 연구들이 많다는 것입니다. 논문 제목인 "Reasoning in LLMs: A Geometric Perspective", "Contextual Hallucinations Mitigation in LLMs", 그리고 "RouteLLM"은 이러한 주제를 탐구하는 연구의 예시입니다. 이는 최근 몇 년 동안 대규모 언어 모델의 개발과 적용이 급속도로 증가함에 따라, 이러한 모델들의 성능 최적화와 정확도 향상이 중요한 이슈로 떠오르고 있음을 반영합니다.
두 번째 주요 추세는 전문가 모음체(Mixture of Experts, MoE) 모델에 관한 연구의 증가입니다. "Mixture of A Million Experts"와 "A Survey on Mixture of Experts"는 이 분야에서 주목할만한 논문입니다. MoE 모델은 여러 전문가 네트워크를 조합하여 더 나은 성능을 달성하려는 시도로, 특히 대규모 모델에서 계산 효율성과 성능을 동시에 고려할 때 유용합니다. 이러한 모델의 연구 증가는 AI 분야가 모델의 성능 향상뿐만 아니라, 에너지 소비와 계산 비용을 줄이는 방향으로도 관심을 확장하고 있음을 나타냅니다.
이러한 추세는 AI 연구의 현재 방향성을 잘 보여줍니다. 대규모 언어 모델은 NLP 분야뿐만 아니라 다양한 분야에서 중요한 역할을 하고 있으며, 이로 인해 성능 최적화 및 오류 최소화에 대한 중요성이 점점 더 강조되고 있습니다. 반면, MoE 모델에 대한 연구 증가는 모델의 효율성과 확장성 측면에서 지속 가능한 AI 발전을 추구하는 연구 커뮤니티의 노력을 반영합니다. 이와 같은 연구 경향은 앞으로의 기술 발전 방향에 대한 중요한 인사이트를 제공합니다.

플래시어텐션-3 / FlashAttention-3

논문 소개

최신 하드웨어를 활용하기 위해 플래시어텐션을 조정할 것을 제안합니다. 최신 GPU에서 어텐션 속도를 높이는 데 사용되는 기술에는 생산자-소비자 비동기, 블록 단위 매물 및 소프트맥스 인터리빙 연산, 블록 양자화 및 비일관성 처리 등이 있으며, FP16은 최대 740 TFLOPs/s(75% 활용)에 도달하고 FP8은 1.2 PFLOP/s에 가까운 속도로 H100 GPU에서 1.5-2.0x까지 속도 향상을 달성합니다.

Proposes to adapt FlashAttention to take advantage of modern hardware; the techniques used to speed up attention on modern GPUs include producer-consumer asynchrony, interleaving block-wise matmul and softmax operations, and block quantization and incoherent processing; achieves speedup on H100 GPUs by 1.5-2.0x with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s.

논문 링크

더 읽어보기

https://x.com/tri_dao/status/1811453622070444071

RankRAG: 컨텍스트 랭킹과 검색 증강 세대의 통합을 통한 LLM의 검색 강화 / RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

논문 소개

새로운 명령어 미세 조정 프레임워크를 도입하여 효과적인 컨텍스트 순위 지정 및 답변 생성을 수행하여 LLM의 RAG 기능을 향상시키고, 소규모 순위 데이터 세트를 활용하여 기존 전문가 순위 모델을 능가하며, 9개의 지식 집약적 벤치마크에서 Llama3-RankRAG가 Llama3-ChatQA-1.5 및 GPT-4 모델보다 훨씬 뛰어난 성능을 발휘함을 보여 줍니다.

Introduces a new instruction fine-tuning framework to perform effective context ranking and answering generation to enhance an LLM’s RAG capabilities; it leverages a small ranking dataset to outperform existing expert ranking models; shows that a Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 일반적으로 검색 증강 생성(RAG)에서 리트리버의 상위 k개의 컨텍스트를 활용합니다. 이 연구에서는 RAG에서 컨텍스트 순위 지정과 답변 생성이라는 두 가지 목적을 위해 단일 LLM을 인스트럭션 튜닝하는 새로운 인스트럭션 미세 조정 프레임워크인 RankRAG를 제안합니다. 특히, 인스트럭션 튜닝된 LLM은 랭킹 데이터의 일부만 학습 블렌드에 추가해도 놀라울 정도로 잘 작동하며, 대량의 랭킹 데이터에 대해서만 미세 조정된 동일한 LLM을 포함한 기존의 전문가 랭킹 모델보다 뛰어난 성능을 발휘합니다. 생성 시에는 GPT-4-0613, GPT-4-turbo-2024-0409, RAG 벤치마크에서 최첨단 성능을 자랑하는 오픈 소스 모델인 ChatQA-1.5 등 여러 강력한 베이스라인과 모델을 비교합니다. 특히 9개의 지식 집약적 벤치마크에서 Llama3-RankRAG는 Llama3-ChatQA-1.5 및 GPT-4 모델보다 훨씬 뛰어난 성능을 발휘합니다. 또한 생물의학 데이터에 대한 명령어 미세 조정 없이도 생물의학 영역의 5개 RAG 벤치마크에서 GPT-4와 비슷한 성능을 보여 새로운 영역으로의 일반화에 대한 뛰어난 역량을 입증했습니다.

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend, and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks. Specifically, our Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. In addition, it also performs comparably to GPT-4 on five RAG benchmarks in the biomedical domain without instruction fine-tuning on biomedical data, demonstrating its superb capability for generalization to new domains.

논문 링크

더 읽어보기

https://x.com/_weiping/status/1808551184309104896

백만 명에 달하는 전문가들의 조합 / Mixture of A Million Experts

논문 소개

백만 개의 작은 전문가로부터 희소 검색을 위해 제품 키 기술을 활용하는 매개변수 효율적인 전문가 검색 메커니즘을 도입하고, 라우팅에 사용되는 학습된 인덱스 구조를 통해 매우 많은 수의 작은 전문가에게 효율적으로 라우팅함으로써 계산 비용을 매개변수 수에서 분리하고, 고밀도 FFW, 거친 입자의 MoE 및 제품 키 메모리(PKM) 계층에 비해 우수한 효율성을 보여줍니다.

Introduces a parameter-efficient expert retrieval mechanism that leverages the product key technique for sparse retrieval from a million tiny experts; it attempts to decouple computational cost from parameter count by efficiently routing to a very large number of tiny experts through a learned index structure used for routing; demonstrates superior efficiency compared to dense FFW, coarse-grained MoEs, and Product Key Memory (PKM) layers.

논문 초록(Abstract)

표준 트랜스포머 아키텍처의 피드포워드(FFW) 레이어는 숨겨진 레이어 폭이 커짐에 따라 계산 비용과 활성화 메모리가 선형적으로 증가합니다. 모델 크기와 계산 비용을 분리하여 이 문제를 해결하기 위한 실행 가능한 접근 방식으로 희소 전문가 혼합(MoE) 아키텍처가 등장했습니다. 최근 세분화된 MoE 스케일링 법칙이 발견되면서 세분성이 높을수록 성능이 향상된다는 사실이 밝혀졌습니다. 그러나 기존 MoE 모델은 계산 및 최적화 문제로 인해 소수의 전문가만 사용할 수 있습니다. 이 백서에서는 제품 키 기법을 활용하여 방대한 소규모 전문가 풀(100만 명 이상)에서 희소 검색을 수행하는 새로운 계층 설계인 PEER(매개변수 효율적 전문가 검색)를 소개합니다. 언어 모델링 작업에 대한 실험 결과, 성능-계산 트레이드오프 측면에서 PEER 계층이 밀도가 높은 FFW와 거친 단위의 MoE보다 뛰어난 성능을 발휘하는 것으로 나타났습니다. PEER는 수많은 전문가를 효율적으로 활용할 수 있게 함으로써 계산 효율성을 유지하면서 트랜스포머 모델을 더욱 확장할 수 있는 잠재력을 열어줍니다.

The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher granularity leads to better performance. However, existing MoE models are limited to a small number of experts due to computational and optimization challenges. This paper introduces PEER (parameter efficient expert retrieval), a novel layer design that utilizes the product key technique for sparse retrieval from a vast pool of tiny experts (over a million). Experiments on language modeling tasks demonstrate that PEER layers outperform dense FFWs and coarse-grained MoEs in terms of performance-compute trade-off. By enabling efficient utilization of a massive number of experts, PEER unlocks the potential for further scaling of transformer models while maintaining computational efficiency.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1810389538340290724

대규모 언어 모델에서의 추론: 기하학적 관점 / Reasoning in Large Language Models: A Geometric Perspective

논문 소개

기하학적 관점에서 LLM의 추론을 탐구하고, 내재적 차원이 높을수록 LLM의 표현력이 크다는 것을 보고하고, LLM의 표현력과 자기 주의 그래프의 밀도 사이의 연관성을 확립하고, 이러한 그래프의 밀도가 MLP 블록에 대한 입력의 내재적 차원을 정의한다는 것을 분석하여 보고합니다.

Explores the reasoning of LLMs from a geometrical perspective; reports that a higher intrinsic dimension implies greater expressive capacity of the LLM; reports that they establish a connection between the expressive power of LLMs and the density of their self-attention graphs; their analysis demonstrates that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks.

논문 초록(Abstract)

실제 애플리케이션을 위한 대규모 언어 모델(LLM)의 발전은 추론 능력을 향상시키는 데 결정적으로 달려 있습니다. 이 작업에서는 기하학적 이해를 통해 대규모 언어 모델(LLM)의 추론 능력을 탐구합니다. 우리는 LLM의 표현력과 자기 주의 그래프의 밀도 사이의 연관성을 확립합니다. 분석 결과 이러한 그래프의 밀도가 MLP 블록에 대한 입력의 내재적 차원을 정의한다는 사실이 입증되었습니다. 이론적 분석과 장난감 예시를 통해 내재적 차원이 높을수록 LLM의 표현 능력이 더 크다는 것을 증명합니다. 또한 이 기하학적 프레임워크와 LLM의 추론 능력을 향상시키기 위한 최근의 발전된 방법을 연결하는 경험적 증거를 제시합니다.

The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of large language models (LLMs) through their geometrical understanding. We establish a connection between the expressive power of LLMs and the density of their self-attention graphs. Our analysis demonstrates that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. We demonstrate through theoretical analysis and toy examples that a higher intrinsic dimension implies a greater expressive capacity of the LLM. We further provide empirical evidence linking this geometric framework to recent advancements in methods aimed at enhancing the reasoning capabilities of LLMs.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1810329294884741594

룩백 렌즈: 어텐션 지도만을 사용하여 대규모 언어 모델에서 맥락적 착각을 감지하고 완화하기 / Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

논문 소개

LLM에서 맥락적 환각을 감지하고 크게 감소시키는 새로운 방법을 제안합니다(예, 엑스섬 요약 작업에서 10% 감소), 문맥에 대한 주의 가중치와 새로 생성된 토큰의 비율(각 주의 헤드에 대해)로 주어진 입력 특징을 기반으로 환각 감지 모델을 구축, 문맥 환각은 LLM이 주어진 문맥 정보에 주의하는 정도와 관련이 있다는 가설, 또한 문맥 환각을 완화하는 감지 방법을 기반으로 디코딩 전략 제안, 재학습 없이도 모델 간에 감지기를 이전할 수 있는 방법 등을 제안합니다.

Proposes a new method that detects and significantly reduces contextual hallucinations in LLMs (e.g., reduces by 10% in the XSum summarization task); builds a hallucination detection model based on input features given by the ratio of attention weights on the context vs. newly generated tokens (for each attention head); the hypothesis is that contextual hallucinations are related to the extent to which an LLM attends to the provided contextual information; they also propose a decoding strategy based on their detection method which mitigates the contextual hallucination; the detector can also be transferred across models without the need for retraining.

논문 초록(Abstract)

기사를 요약하거나 주어진 구절에 대한 질문에 답하라는 요청을 받으면 대규모 언어 모델(LLM)은 세부 사항을 착각하고 입력 문맥과 관련하여 정확하지 않은 근거 없는 답변으로 응답할 수 있습니다. 이 백서에서는 이러한 문맥 착각을 감지하는 간단한 접근 방식을 설명합니다. 우리는 문맥 환각이 LLM이 제공된 문맥의 정보에 얼마나 주의를 기울이는지와 자신의 세대와 관련이 있다는 가설을 세웠습니다. 이러한 직관을 바탕으로 문맥에 대한 주의 가중치와 새로 생성된 토큰(각 주의 헤드에 대해)의 비율로 입력 특징을 부여하는 간단한 환각 탐지 모델을 제안합니다. 이러한 룩백 비율 특징에 기반한 선형 분류기가 LLM의 전체 숨겨진 상태를 활용하는 더 풍부한 검출기나 텍스트 기반 수반 모델만큼 효과적이라는 것을 발견했습니다. 룩백 비율 기반 검출기인 룩백 렌즈는 작업과 모델 간에 전이가 가능하여 7B 모델에서 학습된 검출기를 재학습 없이 더 큰 13B 모델에 적용할 수 있는 것으로 밝혀졌습니다. 또한 이 탐지기를 적용하여 문맥적 착각을 완화한 결과, 간단한 분류기 기반 디코딩 접근 방식을 통해 착각의 양을 9.6%까지 줄일 수 있다는 사실을 발견했습니다(예: XSum 요약 작업에서).

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1811072508637884750

RouteLLM: 환경 설정 데이터로 LLM 라우팅 학습하기 / RouteLLM: Learning to Route LLMs with Preference Data

논문 소개

비용과 성능 간의 균형을 달성하기 위해 추론 중에 더 강력한 LLM과 약한 LLM을 동적으로 선택하는 효율적인 라우터 모델을 제안하고, 학습 프레임워크는 인간의 선호 데이터와 데이터 증강 기술을 활용하여 성능을 향상시키며, 특정 경우 응답 품질을 유지하면서 비용을 2배 이상 크게 절감하는 것으로 나타났습니다.

Proposes efficient router models to dynamically select between stronger and weak LLMs during inference to achieve a balance between cost and performance; the training framework leverages human preference data and data augmentation techniques to boost performance; shows to significantly reduce costs by over 2x in certain cases while maintaining the quality of responses.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 다양한 작업에서 인상적인 기능을 발휘하지만, 어떤 모델을 사용할지 선택할 때는 성능과 비용 간의 절충점을 찾아야 하는 경우가 많습니다. 더 강력한 모델은 효과적이지만 비용이 더 많이 드는 반면, 성능이 낮은 모델은 비용 효율성이 더 높습니다. 이러한 딜레마를 해결하기 위해 비용과 응답 품질 간의 균형을 최적화하기 위해 추론 중에 더 강력한 LLM과 더 약한 LLM 중에서 동적으로 선택하는 몇 가지 효율적인 라우터 모델을 제안합니다. 또한 사람의 선호도 데이터와 데이터 증강 기술을 활용하여 성능을 향상시키는 라우터용 학습 프레임워크를 개발합니다. 널리 알려진 벤치마크에 대한 평가 결과, 이러한 접근 방식은 응답 품질에 영향을 주지 않으면서도 비용을 2배 이상 크게 절감하는 것으로 나타났습니다. 흥미롭게도 라우터 모델은 테스트 시점에 강한 모델과 약한 모델이 변경되더라도 성능을 유지하면서 상당한 전이 학습 기능을 보여줍니다. 이는 라우터가 LLM 배포를 위한 비용 효율적이면서도 고성능 솔루션을 제공할 수 있는 잠재력을 보여줍니다.

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.

논문 링크

더 읽어보기

[GN⁺] RouteLLM - LLM 라우터 서빙 및 평가를 위한 프레임워크 읽을거리&정보공유

[[GN⁺] RouteLLM - LLM 라우터 서빙 및 평가를 위한 프레임워크] RouteLLM은 LMSys와 Anyscale이 협력하여 개발한 LLM 라우터 serving 및 평가를 위한 프레임워크 핵심 기능: OpenAI 클라이언트를 대체하여 간단한 쿼리를 저렴한 모델로 라우팅 학습된 라우터 제공, 새로운 라우터 확장 및 벤치마크에서 라우터 성능 비교 등 모델 지원 GPT-4와 Mixtral 8x7B 외에도 strong-model과 weak-model 인수를 수정하여 다양한 모델 조합 사용 가능 LiteLLM을 활용해 다양한 오픈소스 및 closed 모델에서 chat completions 지원 OpenAI 호환 엔드포인트도 사용 가능 다양한 모델 제공업체의 API 키 설정 방법 제공 개발 동기 비용과 기능이 다양한 LLM을 배포할 때 고품질 응답을 위해 가장 강력한 모델을 사용하면 비용이 많이 들고, 저렴한 모델을 사용하면 품질이 낮아질 수 있음 LLM 라우팅은 …

https://x.com/lmsysorg/status/1807812671238258931

전문가 혼합에 대한 종합적인 연구 / A Survey on Mixture of Experts

논문 소개

MoE의 기술적 세부 사항, 오픈소스 구현, 평가 기법, 실제 MoE 적용 사례 등 전문가 혼합(MoE)에 대한 서베이 논문입니다.

A survey paper on Mixture of Experts (MoE), including the technical details of MoE, open-source implementations, evaluation techniques, and applications of MoE in practice.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 자연어 처리에서 컴퓨터 비전에 이르기까지 다양한 분야에서 전례 없는 발전을 이뤄냈습니다. LLM의 우수성은 상당한 모델 크기, 광범위하고 다양한 데이터 세트, 학습 중에 활용되는 방대한 연산 능력에 의해 뒷받침되며, 이 모든 것이 소규모 모델에는 없는 LLM의 새로운 능력(예: 상황 내 학습)에 기여합니다. 이러한 맥락에서 전문가 혼합(MoE)은 최소한의 계산 오버헤드로 모델 용량을 크게 확장할 수 있는 효과적인 방법으로 부상하여 학계와 업계에서 큰 주목을 받고 있습니다. MoE의 보급이 증가하고 있음에도 불구하고 MoE에 대한 체계적이고 종합적인 문헌 검토가 부족합니다. 이 설문조사는 이러한 격차를 해소하고 MoE의 복잡성을 탐구하는 연구자들에게 필수적인 리소스를 제공하고자 합니다. 먼저 MoE 계층의 구조를 간략히 소개한 다음, MoE의 새로운 분류법을 제안합니다. 그런 다음 알고리즘 및 시스템 측면을 포함한 다양한 MoE 모델의 핵심 설계와 함께 사용 가능한 오픈 소스 구현, 하이퍼파라미터 구성 및 경험적 평가 모음을 살펴봅니다. 또한, 실제 MoE의 다각적인 적용 사례를 설명하고 향후 연구 방향에 대해 간략하게 설명합니다. MoE 연구에 대한 지속적인 업데이트와 최신 개발 사항을 쉽게 공유할 수 있도록 GitHub - withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs: The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models". 에서 액세스할 수 있는 리소스 저장소를 구축했습니다.

Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at GitHub - withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs: The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models"..

논문 링크

더 읽어보기

https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts

https://x.com/omarsar0/status/1811127876819026283

에이전트의 인터넷: 협업 인텔리전스를 위한 이기종 에이전트의 웹 짜기 / Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

논문 소개

다양한 타사 에이전트 통합, 동적 작업 요구 사항에 대한 적응성 등 멀티 에이전트 프레임워크의 여러 가지 한계를 해결하기 위한 새로운 프레임워크로, 에이전트 통합 프로토콜, 인스턴트 메시징 아키텍처 설계, 이기종 에이전트 간의 효과적인 협업을 위한 동적 메커니즘을 도입합니다.

A new framework to address several limitations in multi-agent frameworks such as integrating diverse third-party agents and adaptability to dynamic task requirements; introduces an agent integration protocol, instant messaging architecture design, and dynamic mechanisms for effective collaboration among heterogeneous agents.

논문 초록(Abstract)

대규모 언어 모델(LLM)의 급속한 발전은 고도의 자율 에이전트를 개발할 수 있는 길을 열어주었습니다. 하지만 기존의 멀티 에이전트 프레임워크는 자체 에코시스템 내에 정의된 에이전트에 의존하기 때문에 다양한 기능을 갖춘 타사 에이전트를 통합하는 데 어려움을 겪는 경우가 많습니다. 또한 대부분의 프레임워크는 단일 디바이스 설정으로 제한되어 있기 때문에 분산 환경을 시뮬레이션하는 데 어려움을 겪습니다. 게다가 이러한 프레임워크는 하드코딩된 통신 파이프라인에 의존하는 경우가 많아 동적인 작업 요구사항에 대한 적응성이 제한됩니다. 인터넷의 개념에서 영감을 받아 LLM 기반 멀티 에이전트 협업을 위한 유연하고 확장 가능한 플랫폼을 제공함으로써 이러한 한계를 해결하는 새로운 프레임워크인 에이전트 인터넷(IoA)을 제안합니다. IoA는 에이전트 통합 프로토콜, 인스턴트 메시징과 유사한 아키텍처 설계, 에이전트 팀 구성 및 대화 흐름 제어를 위한 동적 메커니즘을 도입합니다. 일반 어시스턴트 작업, 구현된 AI 작업, 검색 증강 생성 벤치마크에 대한 광범위한 실험을 통해 IoA가 최첨단 기준선을 지속적으로 능가하며 이기종 에이전트 간의 효과적인 협업을 촉진하는 능력을 입증했습니다. IoA는 인터넷과 같은 환경에서 다양한 에이전트를 연결하여 에이전트가 원활하게 협업하여 더 큰 인텔리전스와 기능을 달성할 수 있는 단계로 나아가는 것을 의미합니다. 코드베이스는 \url{GitHub - OpenBMB/IoA: An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.}에서 공개되었습니다.

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. Our codebase has been released at \url{GitHub - OpenBMB/IoA: An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.}.

논문 링크

더 읽어보기

https://github.com/OpenBMB/IoA

https://x.com/_akhaliq/status/1810872693501157855

3DGen

논문 소개

1분 이내에 엔드투엔드 텍스트-3D 에셋을 생성하는 새로운 파이프라인, AssetGen 및 TextureGen과 같은 최첨단 컴포넌트를 통합하여 뷰 공간, 볼류메트릭 공간, UV 공간의 세 가지 방식으로 3D 오브젝트를 표현, 단일 단계 모델 대비 68%의 승률을 달성합니다.

A new pipeline for end-to-end text-to-3D asset generation in under a minute; integrates state-of-the-art components like AssetGen and TextureGen to represent 3D objects in three ways, namely view space, in volumetric space, and in UV space; achieves a win rate of 68% with respect to the single-stage model.

논문 링크

더 읽어보기

https://x.com/AIatMeta/status/1808157832497488201

학습(시험 시간에 학습): 표현형 숨겨진 상태를 가진 RNN / Learning to (Learn at Test Time): RNNs with Expressive Hidden States

논문 소개

선형 복잡도와 표현형 숨겨진 상태를 갖춘 새로운 시퀀스 모델링 레이어 제안, 숨겨진 상태를 테스트 시퀀스에서도 업데이트가 가능한 ML 모델 자체로 정의, 선형 모델과 2계층 MLP 기반 숨겨진 상태는 Transformers, Mamba, 최신 RNN과 같은 기준 모델과 일치하거나 더 나은 것으로 확인, 선형 모델은 8k 맥락에서 Transformer보다 빠르고 월클럭 시간에서 Mamba와 일치.

Proposes new sequence modeling layers with linear complexity and an expressive hidden state; defines a hidden state as an ML model itself capable of updating even on test sequence; by a linear model and a two-layer MLP based hidden state is found to match or exceed baseline models like Transformers, Mamba, and modern RNNs; the linear model is faster than Transformer at 8k context and matches Mamba in wall-clock time.

논문 초록(Abstract)

자기 주의는 긴 컨텍스트에서 잘 작동하지만 이차적 복잡성을 가집니다. 기존 RNN 레이어는 선형 복잡성을 갖지만, 긴 컨텍스트에서의 성능은 숨겨진 상태의 표현력에 의해 제한됩니다. 저희는 선형 복잡도와 표현력이 뛰어난 숨겨진 상태를 가진 새로운 클래스의 시퀀스 모델링 레이어를 제안합니다. 핵심 아이디어는 숨겨진 상태를 머신러닝 모델 그 자체로 만들고 업데이트 규칙을 자기 지도 학습의 단계로 만드는 것입니다. 숨겨진 상태는 테스트 시퀀스에서도 학습을 통해 업데이트되므로 이러한 레이어를 테스트 시간 학습(TTT) 레이어라고 합니다. 두 가지 인스턴스를 고려합니다: 숨겨진 상태가 각각 선형 모델과 2계층 MLP인 TTT-Linear와 TTT-MLP입니다. 1억 2,500만 개에서 13억 개의 파라미터 규모로 인스턴스화를 평가하고 강력한 Transformer 및 최신 RNN인 Mamba와 비교합니다. TTT-Linear와 TTT-MLP 모두 기준선과 일치하거나 초과합니다. 트랜스포머와 마찬가지로 더 많은 토큰을 조건화하여 복잡성을 계속 줄일 수 있는 반면, 맘바는 16k 컨텍스트 이후에는 불가능합니다. 사전 시스템 최적화를 통해 TTT-Linear는 이미 8k 컨텍스트에서 Transformer보다 빠르며 월 클럭 시간에서도 Mamba와 일치합니다. TTT-MLP는 여전히 메모리 I/O에서 어려움을 겪고 있지만, 긴 컨텍스트에서 더 큰 잠재력을 보여주며 향후 연구의 유망한 방향을 제시하고 있습니다.

Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called Test-Time Training (TTT) layers. We consider two instantiations: TTT-Linear and TTT-MLP, whose hidden state is a linear model and a two-layer MLP respectively. We evaluate our instantiations at the scale of 125M to 1.3B parameters, comparing with a strong Transformer and Mamba, a modern RNN. Both TTT-Linear and TTT-MLP match or exceed the baselines. Similar to Transformer, they can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context. With preliminary systems optimization, TTT-Linear is already faster than Transformer at 8k context and matches Mamba in wall-clock time. TTT-MLP still faces challenges in memory I/O, but shows larger potential in long context, pointing to a promising direction for future research.

논문 링크

더 읽어보기

https://x.com/arankomatsuzaki/status/1810148710258508046

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2024/07/08 ~ 07/14] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

플래시어텐션-3 / FlashAttention-3

논문 소개

논문 링크

더 읽어보기

RankRAG: 컨텍스트 랭킹과 검색 증강 세대의 통합을 통한 LLM의 검색 강화 / RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

백만 명에 달하는 전문가들의 조합 / Mixture of A Million Experts

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

대규모 언어 모델에서의 추론: 기하학적 관점 / Reasoning in Large Language Models: A Geometric Perspective

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

룩백 렌즈: 어텐션 지도만을 사용하여 대규모 언어 모델에서 맥락적 착각을 감지하고 완화하기 / Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

RouteLLM: 환경 설정 데이터로 LLM 라우팅 학습하기 / RouteLLM: Learning to Route LLMs with Preference Data

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

전문가 혼합에 대한 종합적인 연구 / A Survey on Mixture of Experts

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

에이전트의 인터넷: 협업 인텔리전스를 위한 이기종 에이전트의 웹 짜기 / Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

3DGen

논문 소개

논문 링크

더 읽어보기

학습(시험 시간에 학습): 표현형 숨겨진 상태를 가진 RNN / Learning to (Learn at Test Time): RNNs with Expressive Hidden States

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR