[TLDR] 오늘의 AI 뉴스, 2023-08-21: OpenAI 콘텐츠 조정 🦺, 토큰 3조개짜리 데이터셋 3️⃣, 새로운 환경에서의 3D 감지 🚗

9bow · 8월 22, 2023, 11:20오전

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.

더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

토큰 3조개 규모의 오픈 토큰 데이터셋 출시 / 3 trillion open token dataset released (4 minute read)

오픈 데이터셋은 많지만 프론티어 모델을 학습시키기에 충분한 규모의 데이터셋은 거의 없습니다. 앨런 인공지능 연구소의 돌마(Dolma) 데이터셋은 연구자들이 대규모로 데이터 효과를 연구할 수 있도록 하는 것을 목표로 합니다.

There are many open datasets, but few of sufficient scale to train a frontier model. The Allen Institute for AI’s Dolma dataset aims to enable researchers to study data effects at scale.

OpenAI의 콘텐츠 중재 / OpenAI’s content moderation (5 minute read)

OpenAI는 디지털 플랫폼 전반에서 콘텐츠 정책 개발 및 콘텐츠 중재 결정에 GPT-4를 사용하여 사람의 부담을 덜어주고 콘텐츠에 보다 일관된 라벨을 지정하고 피드백 루프를 더 빠르게 만들 수 있다고 제안합니다. 그렇다면 누가 무엇을 말할 수 있는지에 대한 권한을 AI가 가져야 하는가라는 질문이 제기됩니다

OpenAI proposes using GPT-4 for content policy development and content moderation decisions across digital platforms, taking the burden away from humans and allowing more consistent labeling of content and faster feedback loops. This begs the question, should AI be the authority on who can say what?

연구 & 혁신 관련 소식 / Research & Innovation

로봇 피아니스트를 위한 RL 환경 / RL environment for robot pianist (GitHub Repo)

로봇 피아니스트는 에이전트가 클래식 피아노 곡을 연주하는 로봇 손을 제어하는 방법을 학습하는 놀라운 프로젝트였습니다. 이제 여러분도 직접 학습할 수 있도록 오픈소스로 공개되었습니다.

The robot pianist was an amazing project whereby an agent learned to control a robot hand that played classical piano songs. It is now open sourced for you to train our own.

베이지안 플로우 네트워크용 코드 / Code for Bayesian Flow Networks (GitHub Repo)

베이지안 흐름 네트워크는 흥미롭고 새로운 아키텍처 및 학습 알고리즘입니다. 이것은 논문의 간단한 비공식 복제본입니다. 목표는 GPT-2 크기의 모델로 확장할 수 있는 코드베이스를 구축하는 것입니다.

Bayesian Flow Networks are an exciting new architecture and training algorithm. This is a simple, unofficial, reproduction of the paper. The goal is to build a codebase that can scale to GPT-2 sized models.

bayesian-flow-networks

txtai (GitHub Repo)

txtai는 시맨틱 검색, LLM 오케스트레이션 및 언어 모델 워크플로우를 위한 올인원 임베딩 데이터베이스입니다.

txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows.

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

언어 이미지 사전 학습을 위한 시그모이드 손실 / Sigmoid loss for Language Image Pretraining (30 minute read)

CLIP은 컴퓨터 비전에 있어 다소 혁명적인 기술입니다. 매우 영리한 알고리즘 트릭을 통해 우수한 성능을 발휘합니다. 이 논문에서는 이러한 트릭의 많은 부분에 의문을 제기하고 동일한 작업에서 CLIP보다 성능이 뛰어난 더 간단한 학습 설정을 찾습니다.

CLIP was somewhat revolutionary for computer vision. It has some extremely clever algorithmic tricks (e.g., large in-batch negatives) that make it perform well. This paper questions many of those tricks and finds a simpler training setup that outperforms CLIP for the same task.

GPA-3D: 새로운 환경에서 3D 감지 개선 / Improving 3D Detection in New Environments (16 minute read)

새로운 장소로 이동하면 데이터 모양이 달라지기 때문에 LiDAR를 사용하여 3D로 물체를 감지하는 데 어려움이 있습니다. 이 새로운 방법인 GPA-3D는 특수 기준점을 사용하여 3D 모양을 더 잘 이해하고 익숙하지 않은 영역에서도 감지기가 잘 작동하도록 도와줍니다.

Detecting objects in 3D using LiDAR has challenges when moving to new places because of differences in how data looks. This new method, GPA-3D, uses special reference points to understand 3D shapes better and helps the detector to work well even in unfamiliar areas.

텍스트-이미지 생성 결과를 평가하는 효율적인 방법 / An Efficient Way to Evaluate Text-to-Image Generations (17 minute read)

텍스트로 생성된 이미지의 품질을 판단하는 현재의 방법에는 이미지가 얼마나 좋은지 또는 텍스트와 얼마나 잘 일치하는지 제대로 파악하지 못하는 등의 몇 가지 문제가 있습니다. 이 논문에서는 이미지의 가장 중요한 부분에 초점을 맞춰 생성된 이미지가 주어진 텍스트와 일치할 가능성을 추정하는 새로운 방법을 소개합니다. 이 방법은 더 적은 수의 샘플로 효율적으로 이 작업을 수행합니다.

Current methods for judging the quality of images created from text have some problems, like not truly capturing how good the image looks or how well it matches the text. This paper introduces a new method that estimates how likely a created image matches a given text, focusing on the most important parts of the image. It does this efficiently with fewer samples.

그 외 소식 / Miscellaneous

과장 없는 LLM 읽기 목록 / Anti-Hype LLM reading list (20 minute read)

현재 인공지능에 대한 논쟁에서 무엇이 진짜인지 알기가 어렵습니다. 이 목록에는 기초 논문, 흥미로운 공개 질문, 이 분야에 대한 더 깊은 통찰력을 얻을 수 있는 가이드가 포함되어 있습니다.

It's hard to know what is real in the current AI kerfuffle. This list contains foundational papers, interesting open questions, and a guide for gaining deeper insights into the space.

아마 LLM을 미세 조정할 필요는 없을 것입니다 / You Probably Don’t Need To Fine Tune An LLM (8 minute read)

대부분의 LLM 애플리케이션에는 미세 조정이 필요하지 않은 경우가 많습니다. 대신 Few-Shot 프롬프트 또는 검색 증강 생성(RAG; Retrieval-Augmented Generation)을 사용하는 것이 더 나을 수 있습니다. 소수 샷 프롬프트는 원하는 출력의 예시를 LLM에 제공하는 반면, RAG는 LLM이 학습하지 않은 정보를 벡터 데이터베이스에 쿼리하는 것을 포함합니다.

Fine-tuning is often not necessary for most LLM applications. It may be better to use few-shot prompting or retrieval-augmented generation (RAG) instead. Few-shot prompting involves providing the LLM with examples of the desired output, while RAG involves querying a vector database for information that the LLM was not trained on.