[GN] 오픈소스 언어 모델의 현재

9bow · 8월 10, 2023, 6:18오전

GeekNews의 xguru님께 허락을 받고 GN에 올라온 글들 중에 AI 관련된 소식들을 공유하고 있습니다.

소개

LLaMA 2 출시후 더 뜨거워짐
- 거의 모든 오픈소스 모델 그룹들이 새로운 베이스 모델로 자신의 모델을 업데이트
  - WizardLM, Airoboros, Hermes 등
- 현재 가장 강력한 모델은 StabilityAI의 Stable Beluga 2
  - Llama2 70B 모델을 Orca 스타일 데이터셋으로 파인 튜닝
  - ChatGPT 와 비교 가능
Long 모델들
- LLaMA 7B 16K, LLaMA 13B 16K, LLaMA 7B 32K
작지만 강력한 모델들
- 7B 만큼 강력한 3B 파라미터 모델들
- 아직 한계에 도달하지 않았고, 더 나아가야 할 것이 많음
- SlimPajama, SwiGLU, ALiBI, Variable Sequence Length, Maximal update parameterization (muP)
오픈모델이 MMLU 에서 ChatGPT를 이기다 : llama-2-70b-guanaco-qlora
Multi-Turn 채팅 : llama2-13b-orca-8k-3319
중국 모델들이 싸우는중 : CodeGeex2
오픈 모델이 ChatGPT 수준에 도달했나?
- 아직, 하지만 곧 도달할 수도

9bow · 8월 10, 2023, 6:23오전

위 트윗에 언급된 모델들을 정리해봤습니다.

Model 제목	모델 설명	URL
New WizardLM model	An updated version of the WizardLM model.	Link
New Airoboros model	An updated version of the Airoboros model.	Link
New Hermes model	An updated version of the Hermes model.	Link
Stable Beluga 2	A powerful model comparable to ChatGPT in most measurable metrics.	Link
LLaMA 7B 16K	A longer version of the base model with extended context window length.	Link
LLaMA 13B 16K	Another longer version of the base model with extended context window length.	Link
LLaMA 7B 32K	A longer version of the base model with even more extended context window length.	Link
BTLM-8K-Base	A 3B-parameter model as powerful as a 7B model, with various training tricks.	Link
Open model	A model that defeated ChatGPT at MMLU.	Link
Multi-Turn Chat model	A model trained for multi-turn long chats according to Orca's methods.	Link
CodeGeex2	A powerful programming model with a size of 6B parameters and high HumanEval score.	Link