Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop...

KevinTheRainmaker · 7월 31, 2024, 2:43오전

Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop and Mobile

안녕하세요

최근 PyTorch에서 torchchat이라는 새로운 라이브러리를 발표했길래 공유합니다! 로컬 디바이스에서 LLM을 구동시키는 것에 관심 있는 분들께 유용할 것 같네요ㅎㅎ

torchchat은 Llama 3, 3.1 같은 LLM들을 노트북, 데스크톱, 심지어 모바일에서도 로컬로 실행할 수 있게 해주는 라이브러리입니다. 기회가 된다면 Ollama와 비교도 해보고 싶네요

다양한 환경 지원:
- Python으로 REST API 사용 가능
- C++로 만든 데스크톱용 프로그램
- 모바일에서도 실행 가능한 파일 제공
유용한 기능들:
- 모델 export
- 양자화 (모델 경량화)
- 성능 평가

Llama 3 8B Instruct on Apple MacBook Pro M1 Max 64GB Laptop

Llama 3 8B Instruct on Linux x86 and CUDA
Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz with 180GB Ram + A100 (80GB)

Llama3 8B Instruct on Mobile
Torchchat achieves > 8T/s on the Samsung Galaxy S23 and iPhone using 4-bit GPTQ via ExecuTorch.

9bow · 7월 31, 2024, 9:06오전

우왓 감사합니다!
안 그래도 블로그 글 번역해서 올려야지 생각하고 있었는데, 곧 정리해보겠습니다. ^^