[GN] Alibaba, 오픈소스 AI 모델 Qwen-7B 시리즈 공개

9bow · 8월 10, 2023, 6:13오전

GeekNews의 xguru님께 허락을 받고 GN에 올라온 글들 중에 AI 관련된 소식들을 공유하고 있습니다.

소개

4월에 발표한 대규모 언어모델 Tongyi Qianwen. 영어와 중국어를 지원함
Qwen-7B 및 Qwen-7B-Chat 모델을 오픈소스로 공개할 예정
1억 MAU가 넘는 회사들은 알리바바로 부터 로열티-프리 라이센스를 취득해야 함
알리바바는 이 언어모델을 이용하여 자체 앱을 구축했음

원문

기술 문서

github.com/QwenLM/Qwen

tech_memo.md

main

# Introducing Qwen-7B: Open foundation and human-aligned models (of the state-of-the-arts)

Large language models have recently attracted an extremely large amount of
attention.
The boom of [ChatGPT](https://openai.com/blog/chatgpt) rocketed the development of artificial general intelligence and indicates that large language models compress world knowledge into neural networks, and the alignment to human cognition can lead to powerful conversational agents that can provide assistance by interacting with human users.
Now, the latest version of ChatGPT based on [GPT-4](https://arxiv.org/abs/2303.08774) demonstrates tremendously exciting performance across unlimited capabilities, say, language understanding, logical reasoning, planning, etc., and its incorporation with external tools, including tools and models, releases the power of an agent capable of understanding instructions, executing code, using tools, and so on, to reach the objectives set up by human users.

These significant progresses indicate the importance of large language models as _the foundation of AI services_.

We are happy to release the 7B-parameter models of our large pretrained model series Qwen (abbr. Tongyi Qianwen), Qwen-7B.
This release includes model weights and codes for pretrained and human-aligned language models of 7B parameters:

- `Qwen-7B` is the pretrained language model, and `Qwen-7B-Chat` is fine-tuned to align with human intent.
- `Qwen-7B` is pretrained on over 2.2 trillion tokens with a context length of 2048. On the series of benchmarks we tested, Qwen-7B generally performs better than existing open models of similar scales and appears to be on par with some of the larger models.
- `Qwen-7B-Chat` is fine-tuned on curated data, including not only task-oriented data but also specific security- and service-oriented data, which seems insufficient in existing open models.
- Example codes for fine-tuning, evaluation, and inference are included. There are also guides on long-context and tool use in inference.

**Goal of release**:
We believe that while the recent waves of releases of LLMs may have deepened our understanding of model behaviors under standard regimes, it is yet to be revealed how the accompanied techniques of nowadays LLMs, such as 1) quantization and fine-tuning after quantization, 2) training-free long-context inference, and 3) fine-tuning with service-oriented data, including search and tool uses, affect the models as a whole.
The open release of Qwen-7B marks our first step towards fully understanding the real-world application of such techniques.

This file has been truncated. show original

Web Demo

GitHub 저장소

https://github.com/QwenLM/Qwen-7B

CNBC 기사

출처 / GeekNews

[TLDR] 오늘의 AI 뉴스, 2023-08-07: 알리바바의 오픈소스 AI 모델

, TPU 제조업체, 칩 회사 설립

, 제로-샷 이미지 분류 🖼️ 읽을거리&정보공유

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다. 더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요! [TLDR-AI 뉴스 레터 썸네일] 주요 뉴스 & 신규 출시 소식 / Headlines & Launches TPU 제조업체, 칩 회사 설립 / TPU maker starts chip company (2 minute read) 컴퓨팅 병목현상에 대해 많은 이야기가 있었습니다. 이는 일반적으로 기업가들이 뛰어들어 해결책을 찾는다는 것을 의미합니다. 새로운 회사가 전용 트랜스포머 칩을 만들려고 시도하고 있습니다. 추론뿐만 아니라 학습에도 사용할 수 있기를 바랍니다. 그래도 팀은 최고 수준입니다! There has been lots of talk about compute bottlenecks. This us…