[2024/06/03 ~ 06/09] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 6월 10, 2024, 5:51오전

[2024/06/03 ~ 06/09] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선정된 논문들을 살펴보면, 대부분의 논문이 대규모 언어 모델(LLMs)에 초점을 맞추고 있는 경향이 있습니다. 구체적으로는, 대규모 언어 모델의 개념 추출(Extracting Concepts from GPT-4), 효율성 향상(MatMul-free LLMs), 모델의 사고 과정 이해(Buffer of Thoughts), LLMs의 기하학적 구조(The Geometry of Concepts in LLMs), 그리고 이들 모델의 정렬(Aligning LLMs with Demonstrated Feedback, Towards Scalable Automated Alignment of LLMs)에 대한 연구로 요약할 수 있습니다. 이들 주제는 인공지능 분야에서 LLMs의 이해, 개선, 그리고 적용 가능성을 탐색하는 현재의 관심사를 반영하고 있습니다. 비록 모든 논문의 내용을 상세히 살펴본 것은 아니지만, 제목만으로도 최근 연구의 경향을 파악하는 데 충분해 보입니다.
이 같은 경향은 몇 가지 이유로 설명될 수 있습니다. 먼저, GPT-4와 같은 대규모 언어 모델의 성공 이후 인공지능 연구 분야에서 이러한 모델들에 대한 관심이 급증했습니다. 이러한 모델들은 자연어 처리(NLP)는 물론, 다양한 지식 작업에서 인간 수준의 성능을 달성하는 데 중요한 역할을 하고 있습니다. 두 번째로, LLMs의 이해와 발전은 더 복잡하고 창의적인 작업을 수행할 수 있는 AI 시스템 개발로 이어질 수 있는 기회를 제공합니다. 마지막으로, 이러한 연구는 AI의 안전성과 윤리적 사용을 강화하는 데 필수적인, 모델의 행동을 이해하고 조절할 수 있는 기술 발전에 기여할 수 있습니다. 결과적으로, 이번 주 선택된 논문들은 AI 기술, 특히 대규모 언어 모델 발전의 최전선에서 일어나고 있는 연구와 실험을 반영하고 있습니다.

NLLB: 신경망 기계 번역을 200개 언어로 확장 / Scaling neural machine translation to 200 languages

논문 소개

200개 언어에 걸쳐 전이 학습을 활용하는 대규모 다국어 모델을 제안하고, 희소성 있는 전문가 혼합 아키텍처를 기반으로 하며, 리소스가 적은 언어에 맞춤화된 접근 방식을 통해 데이터를 학습하고, 4만 건의 번역을 평가하여 평균 44%의 번역 품질 향상을 달성했습니다.

Proposes a massive multilingual model that leverages transfer learning across 200 languages; it’s based on a sparsely Gated Mixture of Experts architecture and trained on data via an approach tailored for low-resource languages; evaluates on 40K translations and achieves an average of 44% improvement in translation quality.

논문 초록(Abstract)

신경망 기술의 발전은 기계 번역 연구의 새로운 길을 열었습니다. 오늘날 신경망 기계 번역(NMT) 시스템은 고도의 다국어 용량을 활용하고 제로 샷 번역까지 수행할 수 있어 언어 범위와 품질 측면에서 유망한 결과를 제공합니다. 그러나 고품질의 NMT를 확장하려면 대량의 병렬 이중 언어 데이터가 필요한데, 전 세계 7,000개 이상의 언어에 대해 동일하게 사용할 수 있는 것은 아닙니다. 상대적으로 적은 수의 고자원 언어 그룹의 번역 품질을 개선하는 데 집중하다 보면 자원이 부족한 언어에 연구 관심을 집중하는 대신 장기적으로 디지털 불평등을 악화시킬 수 있습니다. 이러한 패턴을 깨기 위해 언어 간 전이 학습을 활용하는 단일 대규모 다국어 모델인 No Language Left Behind(NLLB)를 소개합니다. 저희는 스파스 게이트 혼합 전문가 아키텍처를 기반으로 조건부 계산 모델을 개발했으며, 이 모델은 자원이 부족한 언어에 맞춘 새로운 마이닝 기법으로 얻은 데이터로 학습했습니다. 또한 수천 개의 작업을 학습하면서 과적합에 대응하기 위해 여러 가지 아키텍처 및 학습 개선 사항을 고안했습니다. 이를 위해 특별히 개발된 도구, 즉 자동 벤치마크(FLORES-200), 인간 평가 지표(XSTS), 모델의 모든 언어를 포괄하는 독성 검출기를 사용하여 40,000개 이상의 번역 방향에 대한 모델의 성능을 평가했습니다. 이전의 최첨단 모델과 비교했을 때, 당사의 모델은 BLEU에서 측정한 번역 품질이 평균 44% 향상되었습니다. NMT를 200개 언어로 확장하는 방법을 시연하고 이러한 노력의 모든 기여를 비상업적 용도로 자유롭게 사용할 수 있도록 함으로써 범용 번역 시스템 개발을 위한 중요한 토대를 마련했습니다.

The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world. Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture, which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

논문 링크

https://www.nature.com/articles/s41586-024-07335-x

더 읽어보기

https://x.com/AIatMeta/status/1798420492774432769

GPT-4에서 개념 추출하기 / Extracting Concepts from GPT-4

연구 소개

GPT-4에서 약 1,600만 개의 해석 가능한 패턴을 추출하기 위해 희소 자동 인코더를 기반으로 확장 가능한 새로운 방법을 제안합니다. 이 방법은 예측 가능한 확장성을 보여주며 이전 기술보다 효율적입니다.

Proposes a new scalable method based on sparse autoencoders to extract around 16 million interpretable patterns from GPT-4; the method demonstrates predictable scaling and is more efficient than previous techniques.

논문 초록

SAE(Sparse AutoEncoder, 희소 오토 인코더)는 희소 병목 계층에서 활성화를 재구성하여 언어 모델에서 해석 가능한 특징을 추출하기 위한 유망한 비지도 접근 방식을 제공합니다. 언어 모델은 많은 개념을 학습하기 때문에 모든 관련 특징을 복구하려면 자동 인코더의 크기가 매우 커야 합니다. 그러나 재구성과 희소성 목표의 균형을 맞출 필요가 있고 데드 잠상도 존재하기 때문에 자동 인코더 스케일링의 속성을 연구하는 것은 어렵습니다. 저희는 K-스페어 자동 인코더[Makhzani and Frey, 2013]를 사용하여 희소성을 직접 제어함으로써 튜닝을 간소화하고 재구성 희소성의 경계를 개선할 것을 제안합니다. 또한 시도한 가장 큰 규모에서도 데드 잠상이 거의 발생하지 않는 수정 사항을 발견했습니다. 이러한 기술을 사용하여 자동 인코더 크기 및 희소성과 관련하여 깨끗한 스케일링 법칙을 찾았습니다. 또한 가설화된 특징의 복구, 활성화 패턴의 설명 가능성, 다운스트림 효과의 희소성을 기반으로 특징 품질을 평가하기 위한 몇 가지 새로운 메트릭을 도입했습니다. 이러한 지표는 모두 일반적으로 자동 인코더 크기에 따라 향상됩니다. 저희 접근 방식의 확장성을 입증하기 위해 400억 개의 토큰에 대해 1,600만 개의 잠재적 자동 인코더를 GPT-4 활성화로 훈련했습니다. 오픈 소스 모델용 코드와 자동 인코더, 비주얼라이저를 공개합니다.

Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents. We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally, we find modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity. We also introduce several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects. These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. We release code and autoencoders for open-source models, as well as a visualizer.

연구 및 논문 링크

https://openai.com/index/extracting-concepts-from-gpt-4/

더 읽어보기

https://x.com/OpenAI/status/1798762092528586945

트랜스포머는 SSM입니다: 구조화된 상태 공간 이중성을 통한 일반화된 모델과 효율적인 알고리즘 / Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

논문 소개

상태 공간 모델(SSM)과 구조화된 주의력을 결합한 새로운 아키텍처는 8배 더 큰 상태를 사용하고 50% 더 빠르게 훈련하며, 새로운 상태 공간 이중성 계층은 Mamba에서 사용된 접근 방식에 비해 더 효율적이고 확장 가능하며, 대용량 상태 용량이 필요한 작업의 결과를 개선합니다.

A new architecture that combines state space models (SSMs) and structured attention; it uses 8x larger states and trains 50% faster; the new state space duality layer is more efficient and scalable compared to the approach used in Mamba; it also improves results on tasks that require large state capacity.

논문 초록(Abstract)

트랜스포머는 언어 모델링에서 딥러닝의 성공을 이끈 주요 아키텍처였지만, 최근에는 Mamba와 같은 상태 공간 모델(SSM)이 중소 규모에서 트랜스포머와 비슷하거나 더 뛰어난 성능을 발휘하는 것으로 나타났습니다. 우리는 이러한 모델 군이 실제로 매우 밀접하게 관련되어 있음을 보여주고, 잘 연구된 구조화된 반분할 행렬의 다양한 분해를 통해 연결된 SSM과 주의의 변형 사이의 풍부한 이론적 연결 프레임워크를 개발합니다. 상태 공간 이중성(SSD) 프레임워크를 통해 언어 모델링에서 Transformers와 경쟁력을 유지하면서 2~8배 더 빠른 Mamba의 선택적 SSM을 개선한 새로운 아키텍처(Mamba-2)를 설계할 수 있습니다.

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

논문 링크

더 읽어보기

https://x.com/_albertgu/status/1797651223035904355

확장 가능한, 행렬곱(MatMul) 없는 언어 모델링 / Scalable MatMul-free Language Modeling

논문 소개

LLM에서 행렬 곱셈 연산을 제거하면서 10억 개의 매개변수 규모에서 성능을 유지하는 구현을 제안하고, 모델 크기가 커질수록 고정밀 트랜스포머와 MatMul이 없는 모델 간의 성능이 좁아지며, 추론 중에 최적화된 커널을 사용하면 메모리 소비가 10배 이상 줄어든다고 주장합니다.

Proposes an implementation that eliminates matrix multiplication operations from LLMs while maintaining performance at billion-parameter scales; the performance between full precision Transformers and the MatMul-free models narrows as the model size increases; claims that by using an optimized kernel during inference, memory consumption is reduced by more than 10x.

논문 초록(Abstract)

일반적으로 행렬 곱셈(MatMul)은 대규모 언어 모델(LLM)의 전체 계산 비용을 가장 많이 차지합니다. 이 비용은 LLM이 더 큰 임베딩 차원과 컨텍스트 길이로 확장될 때만 증가합니다. 이 연구에서는 10억 개 매개변수 규모에서 강력한 성능을 유지하면서 LLM에서 MatMul 연산을 완전히 제거할 수 있음을 보여줍니다. 실험 결과, 우리가 제안한 MatMul 없는 모델이 최소 27억 개 이상의 파라미터 규모에서 추론 시 훨씬 더 많은 메모리를 필요로 하는 최신 Transformer와 대등한 성능을 달성하는 것으로 나타났습니다. 확장 법칙을 조사한 결과, 모델 크기가 커질수록 MatMul-free 모델과 완전 정밀 Transformer 간의 성능 격차가 좁혀지는 것을 확인했습니다. 또한 이 모델의 GPU 효율적 구현을 통해 훈련 시 최적화되지 않은 기준선보다 메모리 사용량을 최대 61%까지 줄일 수 있습니다. 추론 중에 최적화된 커널을 활용하면 최적화되지 않은 모델에 비해 모델의 메모리 소비를 10배 이상 줄일 수 있습니다. 아키텍처의 효율성을 적절히 정량화하기 위해 우리는 GPU가 할 수 있는 것 이상의 경량 연산을 활용하는 맞춤형 하드웨어 솔루션을 FPGA에 구축했습니다. 사람이 읽을 수 있는 처리량을 넘어서는 13W로 10억 개의 매개변수 규모 모델을 처리하여 LLM을 두뇌와 같은 효율성에 가깝게 만들었습니다. 이 작업은 효과적인 성능을 유지하면서 LLM을 얼마나 줄일 수 있는지 보여줄 뿐만 아니라 차세대 경량 LLM을 처리할 때 미래의 가속기가 최적화해야 할 연산 유형을 제시합니다. 코드 구현은 \url{GitHub - ridgerchu/matmulfreellm: Implementation for MatMul-free LM.}에서 확인할 수 있습니다.

Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2.7B parameters. We investigate the scaling laws and find that the performance gap between our MatMul-free models and full precision Transformers narrows as the model size increases. We also provide a GPU-efficient implementation of this model which reduces memory usage by up to 61% over an unoptimized baseline during training. By utilizing an optimized kernel during inference, our model's memory consumption can be reduced by more than 10x compared to unoptimized models. To properly quantify the efficiency of our architecture, we build a custom hardware solution on an FPGA which exploits lightweight operations beyond what GPUs are capable of. We processed billion-parameter scale models at 13W beyond human readable throughput, moving LLMs closer to brain-like efficiency. This work not only shows how far LLMs can be stripped back while still performing effectively, but also points at the types of operations future accelerators should be optimized for in processing the next generation of lightweight LLMs. Our code implementation is available at \url{GitHub - ridgerchu/matmulfreellm: Implementation for MatMul-free LM.}.

논문 링크

더 읽어보기

https://github.com/ridgerchu/matmulfreellm

https://x.com/omarsar0/status/1798373841741185261

생각의 버퍼: 대규모 언어 모델을 사용한 사고 증강 추론 / Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

논문 소개

LLM 기반 추론의 정확성, 효율성 및 견고성을 향상시키기 위한 사고 증강 추론 접근 방식을 제시합니다. 문제 해결 프로세스에서 추출된 높은 수준의 사고(사고 템플릿)가 포함된 메타 버퍼를 활용하고, 관련 사고 템플릿을 검색한 다음 사고 증강 추론 프로세스를 위한 작업별 추론 구조로 인스턴스화합니다. 트리 오브 생각과 같은 다중 질의 프롬프트 방법의 12% 비용으로 10개 도전 과제에서 SOTA 성능을 입증했습니다.

Presents a thought-augmented reasoning approach to enhance the accuracy, efficiency, and robustness of LLM-based reasoning; it leverages a meta-buffer containing high-level thoughts (thought templates) distilled from problem-solving processes; the relevant thought template is then retrieved and instantiated with task-specific reasoning structures for the thought-augmented reasoning process; it demonstrates SOTA performance on 10 challenging tasks while requiring 12% of the cost of multi-query prompting methods like Tree-of-Thoughts.

논문 초록(Abstract)

대규모 언어 모델(LLM)의 정확성, 효율성, 견고성을 향상시키기 위한 새롭고 다양한 사고 증강 추론 접근 방식인 생각의 버퍼(BoT)를 소개합니다. 구체적으로는 다양한 작업의 문제 해결 프로세스에서 추출한 일련의 유익한 고급 사고, 즉 사고 템플릿을 저장하는 메타 버퍼를 제안합니다. 그런 다음 각 문제에 대해 관련 사고 템플릿을 검색하고 이를 특정 추론 구조로 적응적으로 인스턴스화하여 효율적인 추론을 수행합니다. 또한 확장성과 안정성을 보장하기 위해 메타 버퍼를 동적으로 업데이트하는 버퍼 관리자를 제안하여 더 많은 과제가 해결될수록 메타 버퍼의 용량을 향상시킵니다. 10개의 까다로운 추론 집약적 작업에 대한 광범위한 실험을 수행한 결과, 기존 SOTA 방식에 비해 게임 오브 24에서 11%, 기하학적 도형에서 20%, 체크메이트 인 원에서 51% 등 상당한 성능 향상을 달성했습니다. 추가 분석 결과, 평균적으로 다중 쿼리 프롬프트 방식(예: 트리/생각 그래프)의 12%에 불과한 비용만 소요되는 반면, BoT의 우수한 일반화 능력과 모델 견고성이 입증되었습니다. 특히, 저희는 Llama3-8B+BoT가 Llama3-70B 모델을 능가할 수 있는 잠재력을 가지고 있음을 발견했습니다. 프로젝트는 다음 링크에서 확인할 수 있습니다: GitHub - YangLing0818/buffer-of-thought-llm: [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To guarantee the scalability and stability, we further propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity of meta-buffer as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, and achieve significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. Further analysis demonstrate the superior generalization ability and model robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B+BoT has the potential to surpass Llama3-70B model. Our project is available at: GitHub - YangLing0818/buffer-of-thought-llm: [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

논문 링크

더 읽어보기

https://github.com/YangLing0818/buffer-of-thought-llm

https://x.com/omarsar0/status/1799113545696567416

SaySelf: 자기 성찰적 근거로 자신감을 표현하도록 LLM 교육하기 / SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

논문 소개

LLM이 보다 정확한 세분화된 신뢰도 추정치와 자기 반성적 근거를 표현하도록 가르치는 훈련 프레임워크로, 여러 추론 체인 간의 차이 요약이 포함된 데이터 세트에서 감독된 미세 조정을 수행한 다음 강화 학습을 적용하여 신뢰도 추정치를 보정함으로써 LLM이 정확하고 높은 신뢰도의 예측을 생성하고 잘못된 출력에 대해 과신하는 것을 불이익을 주도록 장려합니다.

A training framework to teach LLMs to express more accurate fine-grained confidence estimates and self-reflective rationales; it performs supervised finetuning on a dataset that contains summaries of the difference between multiple reasoning chains; reinforcement learning is then applied to calibrate confidence estimates, encouraging the LLM to produce accurate, high-confidence predictions and penalize overconfidence in erroneous outputs.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 종종 부정확하거나 조작된 정보를 생성하고 일반적으로 신뢰도를 표시하지 않아 광범위한 적용이 제한되는 경우가 많습니다. 이전 연구에서는 직접 또는 자체 일관성 프롬프트 또는 감독된 미세 조정을 위한 특정 데이터 세트를 구성하여 LLM에서 신뢰도를 이끌어 냈습니다. 프롬프트 기반 접근 방식은 성능이 떨어지고, 훈련 기반 접근 방식은 이진 또는 부정확한 그룹 수준의 신뢰도 추정으로 제한됩니다. 이 연구에서는 LLM이 보다 정확한 세분화된 신뢰도 추정치를 표현하도록 가르치는 훈련 프레임워크인 고급 SaySelf를 소개합니다. 또한, 신뢰도 점수 외에도 SaySelf는 LLM이 파라메트릭 지식의 격차를 명확하게 식별하고 불확실성을 설명하는 자기 성찰적 근거를 생성하도록 유도하는 프로세스를 시작합니다. 이는 LLM을 사용하여 자연어를 통해 특정 지식의 불확실성을 자동으로 요약함으로써 이루어집니다. 요약은 샘플링된 여러 추론 체인의 불일치 분석을 기반으로 하며, 결과 데이터는 감독된 미세 조정에 활용됩니다. 또한 세심하게 설계된 보상 함수와 함께 강화 학습을 활용하여 신뢰도 추정치를 보정함으로써 LLM이 정확하고 신뢰도가 높은 예측을 제공하고 잘못된 출력에 대한 과신에 불이익을 주도록 동기를 부여합니다. 배포 내 및 배포 외 데이터 세트의 실험 결과는 신뢰도 보정 오류를 줄이고 작업 성능을 유지하는 데 있어 SaySelf가 효과적임을 보여줍니다. 또한 생성된 자기반성적 근거가 합리적이며 보정에 더욱 기여할 수 있음을 보여줍니다. 코드는 GitHub - tianyang-x/SaySelf: Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales" 에 공개되어 있습니다.

Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at GitHub - tianyang-x/SaySelf: Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales".

논문 링크

더 읽어보기

https://github.com/xu1868/SaySelf

https://x.com/omarsar0/status/1797682549608833477

대규모 언어 모델에서 범주형 및 계층형 개념의 기하학 / The Geometry of Categorical and Hierarchical Concepts in Large Language Models

논문 소개

범주형 개념의 기하학적 구조와 이들 간의 계층적 관계가 LLM에서 어떻게 인코딩되는지 연구하고, 단순한 범주형 개념은 LLM에 의해 단순화로 표현되고 복잡한 개념은 계층적 구조를 반영하는 단순화의 직접 합으로 구성된 폴리토프로 표현된다는 사실을 발견합니다.

Studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs; finds that simple categorical concepts are represented as simplices by the LLMs and complex concepts are represented as polytopes constructed from direct sums of simplices, which reflect the hierarchical structure.

논문 초록(Abstract)

대규모 언어 모델의 표현 공간에서 의미론적 의미가 어떻게 부호화되는지를 이해하는 것은 해석 가능성의 근본적인 문제입니다. 이 논문에서는 이 분야의 두 가지 기본 질문을 연구합니다. 첫째, {'포유류', '새', '파충류', '물고기'}와 같은 범주형 개념은 어떻게 표현되는가? 둘째, 개념 간의 계층적 관계는 어떻게 인코딩되어 있을까요? 예를 들어, '개'가 '포유류'의 일종이라는 사실은 어떻게 인코딩될까요? 이 질문에 답하기 위해 선형 표상 가설을 확장하는 방법을 보여드립니다. 단순한 범주형 개념은 단순화로 표현되고, 계층적으로 관련된 개념은 우리가 정확하게 표현하는 의미에서 직교하며, (결과적으로) 복잡한 개념은 계층적 구조를 반영하여 단순화의 직접 합으로 구성된 폴리토프로 표현된다는 놀랍도록 단순한 구조를 발견합니다. 이러한 이론적 결과를 Gemma 대규모 언어 모델에서 검증하여 WordNet의 데이터를 사용하여 계층적으로 관련된 957개의 개념에 대한 표현을 추정합니다.

Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure. We validate these theoretical results on the Gemma large language model, estimating representations for 957 hierarchically related concepts using data from WordNet.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1798010546522103898

보여주기만 하고 말하지 않기: 언어 모델을 데모 피드백에 맞게 조정하기 / Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

논문 소개

피드백으로 아주 적은 수의 데모를 통해 특정 설정에 맞게 LLM을 조정하는 방법을 제안하고, 사용자의 데모 행동에 맞게 LLM 출력을 조정하며, 여러 도메인에서 세분화된 스타일과 작업 조정을 학습할 수 있고, 테스트된 벤치마크에서 소수 샷 프롬프트, SFT 및 셀프 플레이 방법보다 뛰어난 성능을 발휘합니다.

Proposes a method to align LLMs to a specific setting via a very small number of demonstrations as feedback; it aligns LLM outputs to a user’s demonstrated behaviors and can learn fine-grained style and task alignment across domains; outperforms few-shot prompting, SFT, and self-play methods on the tested benchmarks.

논문 초록(Abstract)

언어 모델은 다수의 집단적 목소리를 에뮬레이션하도록 조정되기 때문에 어느 누구와도 일치하지 않는 결과를 낳습니다. LLM을 일반적인 출력에서 벗어나도록 조정하는 것은 감독된 미세 조정 또는 RLHF를 통해 가능하지만, 새로운 임시 작업을 위해서는 엄청나게 큰 데이터 세트가 필요합니다. 대신 아주 적은 수의 데모(<10)를 피드백으로 활용하여 특정 설정에 맞게 LLM을 조정하는 것이 가능하다고 주장합니다. 우리의 방법인 데모 반복 작업 최적화(DITTO)는 언어 모델 출력을 사용자의 데모 행동에 직접 정렬합니다. 온라인 모방 학습에서 아이디어를 얻어 파생된 DITTO는 사용자의 데모를 LLM과 중간 체크포인트의 출력보다 선호되는 것으로 취급하여 온라인 비교 데이터를 저렴하게 생성합니다. 뉴스 기사, 이메일, 블로그 게시물과 같은 도메인 전반에 걸쳐 세분화된 스타일과 작업 정렬을 학습하는 DITTO의 능력을 평가합니다. 또한 참가자들로부터 다양한 데모를 요청하는 사용자 연구를 수행합니다(N=16). 벤치마크와 사용자 연구 결과, DITTO의 승률은 몇 번의 프롬프트, 감독에 의한 미세 조정 및 기타 셀프 플레이 방식보다 평균 19% 포인트 더 높은 것으로 나타났습니다. 데모를 직접 피드백으로 사용함으로써 DITTO는 LLM을 효과적으로 커스터마이징할 수 있는 새로운 방법을 제공합니다.

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (<10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N=16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.

논문 링크

더 읽어보기

https://x.com/arankomatsuzaki/status/1797833884463472653

확장 가능한 LLM의 자동화된 정렬을 향해: 서베이 논문 / Towards Scalable Automated Alignment of LLMs: A Survey

논문 소개

LLM을 정렬하는 데 사용되는 방법에 대한 개요를 제공하고 다음 4가지 방향을 살펴봅니다: 1) 귀납적 편향을 통한 정렬, 2) 행동 모방을 통한 정렬, 3) 모델 피드백을 통한 정렬, 4) 환경 피드백을 통한 정렬.

Provides an overview of methods used for alignment of LLMs; explores the 4 following directions: 1) aligning through inductive bias, 2) aligning through behavior imitation, 3) aligning through model feedback, and 4) aligning through environment feedback.

논문 초록(Abstract)

정렬은 인간의 요구를 충족하는 대규모 언어 모델(LLM)을 구축하는 데 있어 가장 중요한 단계입니다. LLM이 점차 인간의 능력을 능가하는 수준으로 빠르게 발전함에 따라 수작업 주석에 기반한 기존의 얼라인먼트 방식으로는 확장성 요구를 충족할 수 없는 경우가 점점 더 많아지고 있습니다. 따라서 자동화된 정렬 신호의 새로운 소스와 기술적 접근 방식을 모색할 필요성이 절실히 요구되고 있습니다. 이 백서에서는 최근에 등장한 자동 정렬 방법을 체계적으로 검토하여 LLM의 역량이 인간의 역량을 넘어설 때 효과적이고 확장 가능한 자동 정렬을 달성하는 방법을 모색하고자 합니다. 특히 정렬 신호의 출처에 따라 기존의 자동 정렬 방법을 크게 4가지로 분류하고, 각 분류의 현황과 발전 가능성에 대해 논의합니다. 또한 자동 정렬을 가능하게 하는 기본 메커니즘을 살펴보고, 정렬의 근본적인 역할에서 자동 정렬 기술을 실현하고 효과적으로 만드는 필수 요소에 대해 논의합니다.

Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1798014572663583165

AgentGym: 다양한 환경에서 대규모 언어 모델 기반 에이전트 진화하기 / AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

논문 소개

광범위한 실시간 동시 에이전트 탐색을 위해 다양한 환경과 작업을 지원하는 새로운 프레임워크로, 자체 진화 능력을 갖춘 범용 LLM 기반 에이전트를 구축하고 작업과 환경 전반에서 이전에 본 데이터 이상의 잠재력을 탐색할 수 있습니다.

A new framework featuring various environments and tasks for broad, real-time, and concurrent agent exploration; builds a generally capable LLM-based agent with self-evolution abilities and explores its potential beyond previously seen data across tasks and environments.

논문 초록(Abstract)

다양한 작업을 처리하고 여러 환경에서 스스로 진화할 수 있는 제너럴리스트 에이전트를 구축하는 것은 AI 커뮤니티의 장기적인 목표입니다. 대규모 언어 모델(LLM)은 일반화된 기능으로 인해 이러한 에이전트를 구축할 수 있는 유망한 기반으로 간주됩니다. 현재의 접근 방식은 LLM 기반 에이전트가 전문가가 제공한 궤적을 단계별로 모방하도록 하여 사람의 감독이 필요하므로 확장하기 어렵고 환경 탐색이 제한적이거나, 에이전트가 고립된 환경에서 탐색하고 학습하도록 하여 일반화가 제한된 전문 에이전트를 만들게 하는 방식이 있습니다. 이 백서에서는 자기 진화 능력을 갖춘 범용 LLM 기반 에이전트를 구축하기 위한 첫걸음을 내딛습니다. 우리는 세 가지 요소를 확인합니다: 1) 에이전트의 탐색과 학습을 위한 다양한 환경, 2) 에이전트의 기본 능력과 사전 지식을 갖추기 위한 궤적 세트, 3) 효과적이고 확장 가능한 진화 방법. 유니티는 광범위한 실시간 단일 형식의 동시 에이전트 탐색을 위한 다양한 환경과 작업을 갖춘 새로운 프레임워크인 에이전트짐을 제안합니다. 에이전트짐에는 확장된 지침이 포함된 데이터베이스, 벤치마크 제품군, 환경 전반에 걸친 고품질 궤적도도 포함되어 있습니다. 다음으로, 작업과 환경 전반에서 이전에 볼 수 있었던 데이터를 넘어 에이전트 자체 진화의 가능성을 조사하기 위해 새로운 방법인 AgentEvol을 제안합니다. 실험 결과, 진화한 에이전트가 SOTA 모델에 필적하는 결과를 달성할 수 있음을 보여줍니다. 플랫폼, 데이터 세트, 벤치마크, 체크포인트 및 알고리즘 구현을 포함한 AgentGym 제품군을 출시합니다. 에이전트짐 제품군은 GitHub - WooooDyy/AgentGym: Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al. 에서 확인할 수 있습니다.

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on GitHub - WooooDyy/AgentGym: Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al..

논문 링크

더 읽어보기

https://github.com/WooooDyy/AgentGym

https://x.com/arankomatsuzaki/status/1798904095669121443

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2024/06/03 ~ 06/09] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

NLLB: 신경망 기계 번역을 200개 언어로 확장 / Scaling neural machine translation to 200 languages

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

GPT-4에서 개념 추출하기 / Extracting Concepts from GPT-4

연구 소개

논문 초록

연구 및 논문 링크

더 읽어보기

트랜스포머는 SSM입니다: 구조화된 상태 공간 이중성을 통한 일반화된 모델과 효율적인 알고리즘 / Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

확장 가능한, 행렬곱(MatMul) 없는 언어 모델링 / Scalable MatMul-free Language Modeling

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

생각의 버퍼: 대규모 언어 모델을 사용한 사고 증강 추론 / Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

SaySelf: 자기 성찰적 근거로 자신감을 표현하도록 LLM 교육하기 / SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

대규모 언어 모델에서 범주형 및 계층형 개념의 기하학 / The Geometry of Categorical and Hierarchical Concepts in Large Language Models

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

보여주기만 하고 말하지 않기: 언어 모델을 데모 피드백에 맞게 조정하기 / Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

확장 가능한 LLM의 자동화된 정렬을 향해: 서베이 논문 / Towards Scalable Automated Alignment of LLMs: A Survey

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

AgentGym: 다양한 환경에서 대규모 언어 모델 기반 에이전트 진화하기 / AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR