[GN] LIMA - 65B LLaMA 모델과 1000개의 프롬프트만으로 GPT-4에 근접하는 성능 내기

9bow · 5월 23, 2023, 5:36오전

GeekNews의 xguru 님께 허락을 받고 GN에 올라온 글들 중에 AI 관련된 소식들을 공유하고 있습니다.

소개

"Less Is More for Alignment"
LIMA = 강화학습 이나 선호도 모델링 없이 잘 큐레이트된 1000개의 프롬프트만으로 파인튜닝한 LLaMA 모델
43% 사례에서 GPT-4와 동등하거나 더 선호, Bard와 비교했을 때 58%, 휴먼 피드백으로 훈련한 DaVinci003에 비해 65% 높음
논문의 가설은 LLM의 거의 모든 지식은 Pretraining중에 학습되는 것이며, Alignment는 사용자와 상호작용하는 포맷/스타일을 학습하는 간단한 프로세스 라는 것
Meta AI 의 새로운 논문

원문

출처 / GeekNews

9bow · 5월 23, 2023, 5:37오전

마지막의 Discussion Section만 DeepL로 돌려보았습니다.

신중하게 선별된 1,000개의 예시를 통해 사전 학습된 강력한 언어 모델을 미세 조정하면 다양한 프롬프트에서 놀랍고 경쟁력 있는 결과를 얻을 수 있음을 보여줍니다. 하지만 이러한 접근 방식에는 한계가 있습니다. 우선, 이러한 예시를 구축하는 데 상당한 정신적 노력이 필요하고 확장하기가 어렵습니다. 둘째, LIMA는 제품급 모델만큼 강력하지 않으며, 일반적으로 좋은 응답을 생성하지만 디코딩 중에 샘플이 잘못되거나 적대적인 프롬프트가 있으면 종종 약한 응답으로 이어질 수 있습니다. 하지만 이 연구에서 제시된 증거는 간단한 접근 방식으로 정렬의 복잡한 문제를 해결할 수 있는 잠재력을 보여줍니다.

We show that fine-tuning a strong pretrained language model on 1,000 carefully curated examples can produce remarkable, competitive results on a wide range of prompts. However, there are limitations to this approach. Primarily, the mental effort in constructing such examples is significant and difficult to scale up. Secondly, LIMA is not as robust as product-grade models; while LIMA typically generates good responses, an unlucky sample during decoding or an adversarial prompt can often lead to a weak response. That said, the evidence presented in this work demonstrates the potential of tackling the complex issues of alignment with a simple approach.