[TLDR] 오늘의 AI 뉴스, 2023-08-07: 알리바바의 오픈소스 AI 모델 💻, TPU 제조업체, 칩 회사 설립 💾, 제로-샷 이미지 분류 🖼️

9bow · 8월 8, 2023, 6:00오전

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.

더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

TPU 제조업체, 칩 회사 설립 / TPU maker starts chip company (2 minute read)

컴퓨팅 병목현상에 대해 많은 이야기가 있었습니다. 이는 일반적으로 기업가들이 뛰어들어 해결책을 찾는다는 것을 의미합니다. 새로운 회사가 전용 트랜스포머 칩을 만들려고 시도하고 있습니다. 추론뿐만 아니라 학습에도 사용할 수 있기를 바랍니다. 그래도 팀은 최고 수준입니다!

There has been lots of talk about compute bottlenecks. This usually means entrepreneurs will dive in and find a solution. A new company is attempting to build a dedicated Transformer chip. I hope it works for training and not just inference. The team is top notch though!

Alibaba, 오픈소스 AI 모델 출시 / Alibaba Launches An Open-Source AI Model (1 minute read)

Alibaba는 타사 개발자에게 LLM 모델인 Tongyi Qianwen을 공개하고 있습니다.

Alibaba is opening up its LLM model Tongyi Qianwen to third-party developers.
alibaba

Google, ChatGPT 및 Bard와 유사한 AI로 어시스턴트를 '슈퍼차지'할 예정 / Google will ‘supercharge’ Assistant with AI that’s more like ChatGPT and Bard (3 minute read)

구글은 생성형 AI로 구동되는 기능으로 어시스턴트를 업데이트할 계획입니다. 이러한 변화의 일환으로 구글은 어시스턴트를 개발하던 팀의 일부를 해고했습니다.

Google is planning to update Assistant with features powered by generative AI. As part of this change, Google laid off parts of the team that worked on Assistant.

google

연구 & 혁신 관련 소식 / Research & Innovation

Functionary (GitHub Repo)

함수형은 함수/플러그인을 해석하고 실행할 수 있는 언어 모델입니다. 이 모델은 함수를 실행할 시기를 결정하고 그 출력을 이해할 수 있습니다. 필요할 때만 함수를 트리거합니다. 함수 정의는 OpenAI GPT 함수 호출과 유사하게 JSON 스키마 객체로 제공됩니다.

Functionary is a language model that can interpret and execute functions/plugins. The model determines when to execute a function and can understand its output. It only triggers functions as needed. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls.

Magic123 - 2D에서 3D 이미지(GitHub Repo)

연구 커뮤니티는 단일 샷 사진에서 3D 에셋을 생성하기 위해 열심히 노력해 왔습니다. 일반적으로 미묘한 차이가 있는 물체에는 실패하고 일반적으로 잘 작동하지 않습니다. 하지만 2D와 3D 선행 이미지를 사용하는 새로운 방법인 Magic123은 대표성이 크게 높아진 것으로 보입니다.

The research community has been hard at work trying to generate 3D assets from single shot photos. It usually falls over for slightly nuanced objects and doesn’t work well in general. However, Magic123, a new method that uses 2D and 3D priors, seems to have leapfrogged in representative power.

LLM을 사용한 추론 세그멘테이션 - LISA / Reasoning Segmentation with LLMS - LISA (GitHub Repo)

세그멘테이션은 이미지에서 객체에 속하는 모든 영역에 라벨을 붙이는 작업입니다. 기존의 분류보다 훨씬 어려운 문제입니다. 또한 미리 정의된 객체 목록 없이 일반적인 세그먼테이션을 수행하는 것도 어렵습니다. 일부 세계 표현을 포함하는 언어 모델은 세분화에 큰 도움이 될 수 있습니다. LISA는 복잡하고 종종 모호한 텍스트 쿼리가 주어지면 세분화 마스크를 반환할 수 있습니다. 이는 다소 BEIT와 Kosmos-2를 연상시킵니다

Segmentation is the process of labeling every region in an image that belongs to an object. It is a much harder problem than traditional classification. Also, it is challenging to do general segmentation without predefined lists of objects. Language models that contain some world representation can provide segmentation with a huge boost. LISA can return a segmentation mask given a complex and often vague text query. This is somewhat reminiscent of BEIT and Kosmos-2

lisa

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

나는 당신이 Zoom에 무엇을 입력하는지 알고 있다 / I know what you’re typing over Zoom (25 minute read)

이 뉴스레터에서는 보안 관련 내용을 자주 강조하지 않지만, 이 내용은 그냥 지나치기에는 너무 좋은 내용입니다. 마이크를 음소거하지 않은 상태에서 입력하면 신경망이 93%의 정확도로 사용자가 입력하는 내용을 예측할 수 있습니다. 이는 이전의 어떤 방법보다 훨씬 더 나은 방법이며, 이 과대 포장된 기술의 흥미로운 활용입니다.

We don’t often highlight security work in this newsletter, but this is too good to pass up. If you type with your microphone unmuted, a neural network can predict what you are typing with 93% accuracy. This is significantly better than any previous method and a fascinating use of this hyped technology.

PerceptionCLIP: 제로샷 이미지 분류 향상 / Enhancing Zero-Shot Image Classification (GitHub Repo)

이 연구에서는 대표적인 비전 언어 모델인 CLIP의 기능을 더 잘 활용하기 위해 인간의 시각적 인식 과정을 모방한 이미지 분류 방법인 2단계 방법인 PerceptionCLIP을 소개합니다. 먼저 배경 속성을 식별하고 이를 사용하여 전경 객체를 구별함으로써 새로운 접근 방식은 이미지 분류 작업에서 일반화, 견고성 및 해석 가능성을 향상시킵니다.

This study introduces PerceptionCLIP, a two-step method for image classification that mimics the human visual perception process to better utilize the capabilities of CLIP, a prominent vision language model. By first identifying background attributes and using them to distinguish the foreground object, the new approach leads to improved generalization, robustness, and interpretability in image classification tasks.

Baby's CoThought: LLM 기법으로 아기 언어 모델 강화하기 / Boosting Baby Language Models with LLM Techniques (14 minute read)

이 백서의 저자들은 큰 언어 모델을 사용하여 작은 "아기" 모델의 학습을 개선하는 "CoThought"라는 방법을 개발했습니다. 이들은 GPT-3.5-turbo로 데이터셋을 재작업하고 RoBERTa와 유사한 방식으로 더 작은 모델을 학습하여 언어 테스트에서 더 나은 성능을 발휘하는 모델을 만들었습니다.

The authors of this paper have developed a method called "CoThought" that uses large language models to improve training for smaller "baby" models. They produced a model that performed better in language tests by reworking a dataset with GPT-3.5-turbo and training the smaller model in a RoBERTa-like manner.

그 외 소식 / Miscellaneous

전문가 모델 혼합(MoE)이 무엇인가요? / What are Mixtures of Expert models? (12 minute read)

현재 사용 중인 언어 모델에는 두 가지 클래스가 있습니다: 밀집형과 스파스형. 밀도 모델은 모든 토큰이 모든 모델 매개변수를 사용하는 기존의 2017년형 트랜스포머와 같습니다. 스파스 모델은 그 직후에 도입되었으며 라우팅 메커니즘(종종 학습)을 사용하므로 각 토큰은 모델 매개변수의 하위 집합만 사용합니다. 이는 더 효율적이며 실제로 더 강력한 모델을 생성합니다.

There are two classes of language models currently in use: Dense and Sparse. Dense models are like the traditional 2017 Transformer where every token uses every model parameter. Sparse models were introduced shortly thereafter and use a routing mechanism (often learned), which means each token only uses a subset of the model parameters. This is more efficient and actually produces stronger models.

mixture-of-experts

AI와 추론의 구조 / AI And The Structure Of Reasoning (15 minute read)

생성형 AI는 놀랍지만 추론 유형에 대한 근본적인 개념적 한계로 인해 아직 인간 지능 수준에는 미치지 못합니다. 이는 현재의 AI 기능뿐만 아니라 진정한 AGI를 만들기 위해 필요한 것에도 중요합니다.

Although generative AI is amazing, it’s not at the level of human intelligence yet due to fundamental conceptual limits on types of reasoning. This is important not just for current AI capabilities, but also for what is needed to create a true AGI.

더 읽어보기 / Quick Links

AI 발전이 느려질 것 같지 않은 이유 / Why AI Progress Is Unlikely To Slow Down (6 minute read)

AI의 발전이 조만간 둔화될 것 같지 않은 이유를 보여주는 4가지 차트.

4 charts that show why AI progress is unlikely to slow down anytime soon.

팀 쿡, 애플은 '수년간' 생성형 AI에 투자해 왔다고 발언 / Tim Cook Says Apple Has Been Investing In Generative AI For ‘Years’ (2 minute read)

로이터와의 인터뷰에서 팀 쿡 애플 CEO는 애플이 수년 동안 생성형 AI를 포함한 광범위한 AI 기술에 대한 연구를 해왔다고 말했습니다.

In an interview with Reuters, Apple CEO Tim Cook said that Apple has been doing research across a wide range of AI technologies, including generative AI, for years.

SlideSpeak (Product)

슬라이드스피크는 AI를 사용하여 파워포인트 프레젠테이션에서 요약을 생성하고, 질문에 답하고, 실행 항목을 가져옵니다.

SlideSpeak uses AI to generate summaries, answer questions, and get action items out of any PowerPoint presentation.