[TLDR] 오늘의 AI 뉴스, 2023-07-04: OpenAI의 글로벌 대화 인사이트 💭, AI와 업무 자동화 💼, 모든 딥페이크 탐지 🫥

9bow · 7월 5, 2023, 12:48오전

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.

더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

글로벌 대화에서 얻은 인사이트 / Insights From Global Conversations (5 minute read)

OpenAI가 AI 관련 주제에 대한 대중의 의견을 수렴하기 위해 진행한 글로벌 대화 이니셔티브에서 얻은 인사이트에 대해 설명합니다.

OpenAI discusses the insights gained from their global conversations initiative, which aimed to gather public input on AI-related topics.

RunwayML, 1억 4,100만 달러 규모의 시리즈 C 확장 투자 유치 / RunwayML raises $141m series C extension (1 minute read)

이 짧은 발표는 RunwayML이 여전히 크리에이티브 툴을 만드는 데 전념하고 있음을 보여줍니다. 투자금의 대부분은 내부 R&D 자금으로 사용될 예정입니다.

This short announcement shows that RunwayML is still dedicated to building creative tools. Much of it will go to fund internal R&D.

연구 & 혁신 관련 소식 / Research & Innovation

2D 이미지와 텍스트로 3D 도형 생성하기 / Creating 3D Shapes from 2D Images and Texts (4 minute read)

이 프로젝트는 2D 이미지나 텍스트로부터 3D 도형을 생성하여 치수 차이로 인한 불일치를 줄이는 방법을 소개합니다. 이 방법은 두 가지 모델 프레임워크인 모양-이미지-텍스트 정렬 변형 자동 인코더(SITA-VAE; Shape-Image-Text-Aligned Variational Auto-Encoder)와 정렬된 모양 잠재 디퓨젼 모델(ASLDM; Aligned Shape Latent Diffusion Model)을 사용하여 3D 모양을 이미지 및 텍스트와 정렬되는 공간에 매핑합니다.

This project introduces a method to generate 3D shapes from 2D images or texts, reducing inconsistencies that result from the dimensional differences. The method uses a two-model framework, the Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and Aligned Shape Latent Diffusion Model (ASLDM), to map the 3D shapes into a space that aligns with images and texts.

GraMMaR: 보다 사실적인 3D 움직임 분석을 위한 새로운 방법 / A New Method for More Realistic 3D Movement Analysis (GitHub Repo)

이 연구는 환경과의 관계에서 3D 움직임에 대한 이해를 향상시키는 새로운 방법을 소개하며, 믿을 수 없는 결과를 생성하는 기존 기법과 달리 보다 사실적인 포즈를 제공합니다.

The research introduces a new method that improves the understanding of 3D movement in relation to the environment, offering more realistic poses in contrast to existing techniques that produce implausible results.

텍스트로부터 끊김 없는 영상 생성 / Seamless Video Generation with Text (GitHub Repo)

이 논문은 텍스트로부터 일관된 고품질 비디오를 생성하는 새로운 방법을 제시합니다. 이 방법은 적응된 이미지 모델을 사용하여 키 프레임을 생성한 다음 특별한 매칭 및 블렌딩 기법을 사용하여 이러한 스타일을 비디오 전체에 확산하여 멋지고 매끄럽게 흐르는 비디오를 생성하는 것입니다.

This paper presents a new method for generating consistent, high-quality videos from text. It involves creating key frames using an adapted image model, and then spreading these styles across the video with special matching and blending techniques, resulting in videos that look great and flow smoothly.

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

언어 모델을 위한 분류기 없는 가이던스 / Classifier-Free guidance for language models (21 minute read)

오늘날 AI의 흥미로운 트렌드를 찾는다면 새로운 방식으로 토큰을 해독하는 아이디어가 활발히 연구되고 있는 분야입니다. 현재 각 모델은 각 토큰에 동일한 양의 컴퓨팅을 제공하며 샘플링 방법도 다소 단순합니다. 샘플링과 가이드를 개선할 수 있다면 재교육 없이도 성능을 획기적으로 개선할 수 있을 것입니다. 이 논문에서는 텍스트에서 이미지로의 디퓨젼 모델의 맥락에서 처음 탐구된 분류기 없는 가이던스를 사용하는 것만으로도 효과적인 모델 크기를 두 배로 늘릴 수 있다고 제안합니다.

If you're looking for interesting trends in AI today, the idea of decoding tokens in new ways is an avenue of active research. Currently, each model gives the same amount of compute to each token, and the sampling method is somewhat simple. If we can improve sampling and guidance, we may be able to dramatically improve performance without retraining. This paper suggests that we can double our effective model size just by using classifier free guidance, which was first explored in the context of text to image diffusion models.

어떠한 딥페이크도 감지하는 새로운 프레임워크 / A New Framework to Detect Any Deepfakes (21 minute read)

이 연구에서는 조작된 얼굴 이미지(딥페이크)의 탐지 및 정확한 위치 식별을 개선하기 위한 새로운 시스템인 '모든 딥페이크 탐지(DADF; Detect Any Deepfakes)'를 소개합니다. 세그먼트 애니씽 모델(SAM; Segment Anything Model)을 사용하고 멀티스케일 어댑터와 재구성 유도 주의(RGA)를 통합하여 딥페이크 탐지 성능을 개선하였습니다.

This research presents a new system called Detect Any Deepfakes (DADF) for improved detection and precise location identification of manipulated facial images (deepfakes). Using the Segment Anything Model (SAM) and incorporating a multiscale adapter and Reconstruction Guided Attention (RGA), DADF improves deepfake detection performance.

인간 뇌 활동의 잠재 디퓨젼 모델을 사용한 고해상도 이미지로 재구성 / High-resolution image reconstruction with latent diffusion models from human brain activity (12 minute read)

연구원들은 디퓨젼 모델, 특히 스테이블 디퓨전 모델이라는 잠재 디퓨젼 모델(LDM; latent diffusion model)을 사용하여 fMRI를 통해 캡처한 인간의 뇌 활동에서 고해상도, 고충실도 이미지를 재구성하는 새로운 방법을 제안했습니다. 생성 성능을 유지하면서 계산 비용을 절감하는 이 방법은 복잡한 딥 러닝 모델을 추가로 학습하거나 미세 조정할 필요가 없습니다. 이 연구는 또한 다양한 LDM 구성 요소에 대한 신경과학적 해석을 제공하여 뇌가 시각적 경험을 어떻게 표현하는지 이해하는 데 유망한 새로운 접근 방식을 제시합니다.

Researchers have proposed a new method using a diffusion model, specifically a latent diffusion model (LDM) named Stable Diffusion, for reconstructing high-resolution, high-fidelity images from human brain activity captured via fMRI. The method, which reduces computational cost while maintaining generative performance, does not require extra training or fine-tuning of complex deep learning models. The study also gives a neuroscientific interpretation of different LDM components, presenting a promising new approach to understand how our brain represents visual experiences.

그 외 소식 / Miscellaneous

메가바이트 저자와 함께하는 팟캐스트 / Podcast with the author of MEGABYTE (90 min listen)

토큰화와 디코딩은 현재 언어 모델의 두 가지 약점입니다. 특히 토큰화는 진정한 멀티모달 솔루션으로 전환하는 데 방해가 되기 때문에 어려운 과제입니다. 최근 메타에서 백만 바이트에 대한 작업을 진행하면서 토큰화가 필요 없고 단일 모델이 모든 종류의 입력 데이터에서 작동할 수 있다는 점에서 앞으로 나아갈 길을 보여줍니다.

Tokenization and decoding are two weak points of current language models. Tokenization specifically is a challenge because it hinders our ability to move to truly multimodal solutions. The recent work from Meta around working on a million bytes shows a way forward because it eliminates the need for tokenization and allows a single model to operate on any sort of input data.

AI와 업무 자동화 / AI And The Automation Of Work (12 minute read)

AI가 업무에 어떤 영향을 미칠지, 특히 인간의 필요성을 자동화할 것인지에 대한 탐구.

An exploration into how AI will affect work, specifically whether it will automate away the need for humans.

OpenAI의 코드 인터프리터가 금융을 다시 만들다 / OpenAI’s Code Interpreter Is About to Remake Finance (9 minute read)

이 글에서는 코드 인터프리터라는 ChatGPT 플러그인으로 인한 회계 업무의 인공지능 자동화의 획기적인 발전과 이 플러그인이 어떻게 재무 지식 업무의 완성 방식을 혁신할 수 있는지에 대해 자세히 설명합니다.

This article details a breakthrough in artificial intelligence automation for accounting minutiae due to a ChatGPT plug-in called Code Interpreter and how it could potentially revolutionize the way finance knowledge work is completed.

더 읽어보기 / Quick Links

Google, AI 학습을 위해 사용자가 온라인에 게시하는 모든 것을 스크랩할 것이라고 밝힘 / Google Says It’ll Scrape Everything You Post Online For AI (2 minute read)

Google은 포럼 게시물과 소셜 미디어를 포함하여 공개적으로 사용 가능한 모든 온라인 콘텐츠를 스크랩하고 색인을 생성하여 AI 모델을 학습시킬 것입니다. 이러한 움직임은 개인 정보 보호에 대한 우려를 불러일으키며, 구글이 방대한 양의 개인 정보에 접근할 수 있는 가능성을 높입니다.

Google will be scraping and indexing all publicly available online content, including forum posts and social media, to train its AI models. This move raises concerns about privacy and the potential for Google to have access to vast amounts of personal information.

기술 업계 사람들은 AI 반향실(echo chamber)에 갇혀있는걸까요? / Are People In Tech In An AI Echo Chamber? (Hacker News Thread)

기술 업계 사람들이 AI의 힘을 과대포장하고 있는지 여부를 다룬 해커 뉴스 토론입니다.

A Hacker News discussion that dives into whether or not people in tech are overblowing the power of AI.

기술직 종사자들이 AI 전문가가 되기 위해 노력 중 / Tech Workers Are Scrambling To Become AI Experts (4 minute read)

AI의 중요성이 계속 커지면서 기술직 종사자들은 취업 시장에서 관련성을 유지하기 위해 새로운 기술을 배우기 위해 동분서주하고 있습니다. 많은 사람들이 부트캠프, 온라인 교육 과정, 독학 등을 통해 고용주들이 점점 더 많이 찾고 있는 AI 기술을 습득하고 있습니다.

As AI continues to grow in importance, tech workers are scrambling to learn new skills in order to stay relevant in the job market. Many are turning to boot camps, online courses, and self-education in order to gain the AI skills that employers are increasingly seeking.

Wondercraft AI (Product Launch)

원더크래프트는 AI 음성을 활용하여 누구나 몇 분 만에 아이디어에서 팟캐스트 제작까지 할 수 있는 팟캐스트 빌더입니다. 예를 들어 기존 콘텐츠(뉴스레터, 블로그, 인터뷰 또는 녹음)의 용도를 손쉽게 변경하여 매력적인 팟캐스트를 만들 수 있습니다.

Wondercraft is a podcast builder that leverages AI voices to let anyone go from idea to published podcast in minutes. For example, you can effortlessly repurpose existing content (newsletters, blogs, interviews, or recordings), to create engaging podcasts.