[TLDR] 오늘의 AI 뉴스 @ 2023-05-08

9bow · 5월 9, 2023, 12:34오전

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터 의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.

Headlines & Launches

위대한 개츠비 문맥 맞추기 / Fitting the Great Gatsby in context (2 minute read)

언어 모델은 텍스트 쿼리를 입력으로 받아 텍스트 응답을 출력합니다. 쿼리와 출력의 길이를 모두 컨텍스트라고 합니다. 언어 모델에는 문맥 길이가 제한되어 있습니다(ChatGPT는 8k '단어'). 최근 몇 가지 멋진 알고리즘 변경을 통해 MosaicML은 65,000개의 '단어'로 작동할 수 있는 모델을 출시했습니다. 이는 소설 위대한 개츠비 전체를 문맥에 맞추고 모델이 에필로그를 작성하기에 충분한 양입니다.

Language models take text queries as input and output text responses. The length of both query and output are referred to as the context. Language models have limited context length (ChatGPT is 8k 'words'). With some cool recent algorithmic changes, MosaicML has released a model that can operate in 65k 'words'. This is enough to fit the entire novel Great Gatsby in context and have the model write an epilogue.

OpenAI의 규제 문제는 이제 시작일 뿐 / OpenAI’s Regulatory Troubles Are Just Beginning (10 minute read)

더버지(The Verge)는 ChatGPT에 대한 유럽연합의 일반 데이터 보호 규정(GDPR)을 준수하는 데 있어 OpenAI가 직면한 문제에 대해 논의합니다. 최근 이탈리아 데이터 보호 당국이 GDPR에 따라 AI가 생성한 콘텐츠의 사용에 대해 의문을 제기하면서, AI 기업이 EU 시장에서 어떻게 나아가야 하는지에 대한 우려가 커지고 있습니다. 이 글에서는 EU 내에서 사업을 운영하는 기업에게 GDPR 준수가 점점 더 중요해짐에 따라 AI의 혁신과 사용자 개인정보 및 데이터 보호 사이의 균형을 맞추는 것이 중요하다는 점을 강조합니다.

The Verge discusses the challenges OpenAI faces in complying with the European Union's General Data Protection Regulation (GDPR) for ChatGPT. The Italian Data Protection Authority recently questioned the use of AI-generated content under GDPR, leading to concerns about how AI companies should proceed in the EU market. The article highlights the importance of balancing innovation in AI with ensuring user privacy and data protection, as GDPR compliance becomes increasingly crucial for companies operating within the EU.

이 회사에서 AI를 도입한 후 (인간) 근로자들에게 일어난 일 / This Company Adopted AI. Here’s What Happened To It’s Human Workers (7 minute read)

NPR의 이 기사에서는 AI를 도입한 한 회사가 인간 근로자에게 어떤 영향을 미쳤는지에 대해 설명합니다. AI를 도입한 결과 효율성이 향상되고 일부 직무 역할이 변경되었지만 실직으로 이어지지는 않았습니다. 대신 직원들은 AI와 함께 일할 수 있도록 숙련도를 높였습니다.

NPR's article discusses a company that adopted AI and how it impacted human workers. The implementation led to increased efficiency and changed some job roles, but it didn't result in job loss. Instead, employees were upskilled to work alongside AI.

Research & Innovation

HiPool: 그래프 신경망을 사용한 긴 문서 모델링 / HiPool: Modeling Long Documents Using Graph Neural Networks (13 minute read)

긴 텍스트 덩어리를 인코딩하는 것은 자연어 처리(NLP) 모델에 있어 어려운 작업입니다. 이 연구는 그래프와 새로운 주의 메커니즘을 사용하여 프로세스를 개선함으로써 모델이 문장 간의 관계를 이해하고 긴 시퀀스에서 더 나은 성능을 발휘할 수 있도록 도와줍니다.

Encoding long chunks of text is a tough task for natural language processing (NLP) models. This study improves the process by using graphs and a new attention mechanism, which helps the model understand relationships between sentences and perform better on longer sequences.

인공지능의 상식 이해도 향상시키기 / Improving AI Commonsense Understanding (30 minute read)

오늘날의 언어 모델은 상식적인 지식에 관해서는 여전히 어리석은 실수를 할 수 있습니다. VERA는 상식을 기반으로 진술의 타당성을 추정하는 새로운 모델로, 잘못된 정보를 걸러내고 실제 환경에서 성능을 개선하는 데 도움을 줍니다.

Today's language models can still make silly mistakes when it comes to commonsense knowledge. VERA is a new model that estimates the plausibility of statements based on commonsense, helping to filter out incorrect information and improve performance in real-world settings.

Otter: 컨텍스트 내 명령어 튜닝이 가능한 멀티모달 모델 / Otter: A Multi-Modal Model with In-Context Instruction Tuning (GitHub Repo)

이 연구에서는 Otter와 같은 모델이 이미지와 텍스트를 포함한 다양한 상황에서 지시를 이해하고 따르는 방식을 개선하기 위한 새로운 방법인 MIMIC-IT을 소개합니다. 이러한 모델에 대한 접근성을 높임으로써 연구자들은 더 나은 AI 시스템을 더 손쉽게 만들 수 있습니다.

This study introduces a new method, called MIMIC-IT, for improving how models like Otter understand and follow instructions in different situations, including images and text. By making these models more accessible, researchers can use them more easily to create better AI systems.

낮은 테스트 커버리지와 느린 QA 주기는 이제 안녕 (스폰서 링크) / Goodbye low test coverage and slow QA cycles (Sponsor)

인하우스 QA는 확장하는 데 몇 년이 걸릴 수 있습니다. QA Wolf를 사용하면 4개월 만에 80%의 자동화된 테스트 커버리지를 달성하고 이를 유지할 수 있습니다. CI/CD에 직접 통합되고 공급업체에 종속되지 않고 Playwright로 작성된 완전 자동화된 테스트가 자동으로 수행됩니다. 90일 파일럿으로 시작하세요.

In-house QA can take years to scale. QA Wolf gets you to 80% automated test coverage in 4 months and keeps you there. It’s zero-effort, fully automated testing that’s done for you — integrated directly into your CI/CD, and written in Playwright with no vendor lock-in whatsoever. Start with a 90-day pilot.

Engineering & Resources

인공지능 채팅 어시스턴트로 분쟁적인 주제에 대한 대화를 개선하는 방법 / AI chat assistants can improve conversations about divisive topics (23 minute read)

우리는 다소 분열된 세상에 살고 있, 서로에 대한 이해가 증진되는 의미 있는 대화를 나누기가 어렵습니다. 흥미롭게도 어려운 주제들에 대해서 AI 챗봇이 대화를 중재하면 양쪽 모두 결과와 이해도가 향상되었다고 보고합니다.

We live in a somewhat divisive world and it's hard to have meaningful conversations where understanding is fostered. Interestingly, if you have an AI chatbot intermediate conversations about difficult topics - both parties report improved outcomes and understanding.

서로 다른 목적으로 훈련된 모델을 병합해주는 ZipIT / ZipIT, merging models trained on different tasks (28 minute read)

서로 다른 작업을 위해 학습된 동일한 아키텍처의 서로 다른 두 가지 모델이 있다고 가정해 봅시다. 이 두 모델을 두 작업 모두에서 잘 작동하는 단일 모델로 병합할 수 있을까요? 리바신이나 모델 수프와 같은 현재의 방법은 작업별 특징이 다르기 때문에 작동하지 않습니다. 이 새로운 작업은 이전 알고리즘보다 60%까지 개선된 2단계 프로세스를 통해 앞으로 나아갈 방법을 제안합니다.

Let's say we have two different models of the same architecture trained on different tasks. Is it possible to merge them into a single model that works well on both tasks? Current methods like rebasin or model soup don't work because of the presence of different task-specific features. This new work proposes a way forward with the two step process which improves as much as 60% over previous algorithms.

Shap-E로 텍스트를 3D로 변환하기 / Text to 3D with shap-e (18 minute read)

OpenAI는 몇 번의 릴리즈를 통해 텍스트를 3D로 변환하는 작업을 조용히 진행해 왔습니다. 이 새로운 작업은 그 연구 라인에 재미있는 추가 작업입니다. 표준 객체나 몇 가지 구성 아이디어가 있는 객체에서 상당히 잘 작동합니다. 이 모델은 2D 이미지에서 3D 개체를 만들 수도 있지만, 그다지 잘 작동하지는 않는 것 같습니다.

OpenAI has quietly been working on text to 3D with a few releases. This new work is a fun addition to that line of research. It works fairly well in standard objects or objects with a few compositional ideas. This model can also create 3D objects from 2D images, although that doesn't seem to work quite as well.

Miscellaneous

생물학 분야의 파운데이션 모델을 향하여 / Towards a foundation model for biology (16 minute read)

사전 훈련된 대규모 모델은 텍스트에서 흥미롭지만, 생물학에서 단일 세포 작업을 위한 모델을 구축할 수 있을까요? scGPT는 다양한 생물학 관련 작업에 탁월한 트랜스포머입니다. 현재 공개되어 있으며 가중치를 다운로드할 수 있습니다.

Large pretrained models are exciting in text, can we build one for single cell tasks in biology? scGPT is a transformer that excels at different biology related tasks. It's open and you can download the weights.

ChatGPT를 떠받치는 시급 $15의 계약직들 / ChatGPT Is Powered By These Contractors Making $15/Hour (6 minute read)

NBC 뉴스 기사에서는 OpenAI의 ChatGPT를 뒷받침하는 그림자 인력의 역할에 대해 자세히 설명합니다. 이 인력은 AI 시스템을 개선하고 유지보수하는 일을 담당하는 계약직으로 구성됩니다. 이 기사에서는 계약업체가 AI 모델을 학습시키기 위해 잠재적인 결과물을 검토하고 평가하는 등 ChatGPT와 같은 AI 모델의 개발과 기능에 수반되는 인간의 노력을 조명하고 있습니다. 이러한 계약직은 개발 프로세스의 중요한 부분임에도 불구하고 종종 백그라운드에 머물러 있으며 정규직에 비해 인정을 덜 받습니다.

The NBC News article delves into the role of the shadow workforce behind OpenAI's ChatGPT. This workforce consists of contractors responsible for refining and maintaining the AI system. The article sheds light on the human effort involved in the development and functioning of AI models like ChatGPT, as contractors review and rate potential outputs to train the AI model. Despite being a significant part of the development process, these contractors often remain in the background and are less acknowledged in comparison to full-time employees.

AI의 가장 큰 리스크는 AI를 통제하는 기업 / AI’s Biggest Risk Is The Corporations That Control Them (5 minute read)

Fast Company의 기사에는 AI Now Institute의 설립자이자 선도적인 AI 연구자인 메러디스 휘태커(Meredith Whittaker)와의 인터뷰가 실려 있습니다. 휘태커는 AI의 진정한 위험은 의식이나 초지능이 아니라 이러한 기술을 통제하는 기업에서 비롯된다고 강조합니다. 그녀는 AI가 사회 및 경제 시스템에 깊이 내재되어 있으며, 따라서 AI를 개발하고 배포하는 기업의 동기와 우선순위에 따라 그 영향이 크게 결정된다고 주장합니다. 이 글은 잠재적 위험을 이해하고 완화하기 위해 AI가 작동하는 권력 역학 관계와 사회 정치적 맥락을 고려하는 것이 중요하다는 점을 강조합니다.

Fast Company's article features an interview with Meredith Whittaker, a leading AI researcher and founder of the AI Now Institute. Whittaker emphasizes that the real risk in AI doesn't come from consciousness or superintelligence, but rather from the corporations that control these technologies. She argues that AI is deeply embedded in societal and economic systems, and thus, its impact is largely determined by the motivations and priorities of corporations that develop and deploy AI. The article highlights the importance of considering the power dynamics and socio-political context in which AI operates in order to understand and mitigate potential risks.

Quick Links

회사 내에서 ChatGPT를 어떻게 사용하고 있나요? (HackeNews 쓰레드) / How Are You Using ChatGPT Internally At Your Company? (HN Thread)

코딩 도우미부터 해외 고객을 위한 이메일 및 문서 번역에 이르기까지 다양한 답변과 함께 사람들이 직장에서 ChatGPT를 어떻게 사용하고 있는지에 대한 해커 뉴스 스레드입니다.

A hacker news thread diving into how people are using ChatGPT at work, with answers ranging from as a coding assistant to translating emails and documents for international customers.

GPT4의 32k 모델이 곧 출시될 것 같습니다 (OpenAI 포럼) / It Looks Like GPT-4-32k Is Rolling Out (Forum Thread)

GPT-4-32k가 곧 출시될 것으로 보입니다.

GPT-4-32k appears to be on the way soon.

Kadoa (Product Launch)

Kadoa는 AI를 사용하여 웹 데이터를 탐색, 추출, 변환합니다. 웹 스크레이퍼를 만들고 유지 관리하는 데 드는 시간을 절약하세요. Kadoa로 필요한 데이터를 손쉽게 추출하세요.

Kadoa uses AI to explore, extract, and transform web data. Save hours of time creating and maintaining web scrapers. Extract the data you need effortlessly with Kadoa.

mRNA 백신 설계에 능숙한 AI / AI Is Very Good At Designing mRNA Vaccines (3 minute read)

mRNA 백신에서 발견되는 유전자 서열을 최적화하는 AI 도구는 전 세계에 배포할 수 있는 더 강력한 효능과 안정성을 갖춘 백신을 만드는 데 도움이 될 수 있습니다.

An AI tool that optimizes the gene sequences found in mRNA vaccines could help to create jabs with greater potency and stability that could be deployed across the globe.

Google, 더 개인화된 검색을 위한 AI 추가 중 / Google Is Adding AI To Make Search More Personable (3 minute read)

Google은 수십 년 동안 지배적인 검색 엔진으로 자리 잡은 웹 사이트 결과 목록에서 벗어나 짧은 동영상 및 소셜 미디어 게시물과 함께 인공 지능과의 대화를 통합하여 검색 결과를 표시하는 방식을 바꾸고 있습니다.

Google is shifting the way it presents search results to incorporate conversations with artificial intelligence, along with more short video and social-media posts, a departure from the list of website results that has made it the dominant search engine for decades.