[TLDR] 오늘의 AI 뉴스, 2023-11-17: 딥마인드 및 유튜브 음악 합성 🎵, 메타 비디오 편집 모델 🎬, 마이크로소프트 딥페이크 크리에이터 😶‍🌫️

9bow · 11월 20, 2023, 11:00오후

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.
더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

딥마인드와 YouTube의 음악 합성 파트너십 / DeepMind and YouTube Partner on Music Synthesis (12 minute read)

딥마인드는 수년 동안 음악 합성을 연구해 왔으며, 이제 막 강력한 새 시스템을 발표했습니다. 흥미롭게도 이 시스템의 상당 부분은 수익 공유를 통한 데이터 파트너십에서 비롯되었습니다. 즉, 아티스트의 음악을 학습하여 더 나은 성능의 모델을 개발하는 동시에 아티스트에게 보상을 제공한다는 의미입니다. 이 모델은 여러 가지 형태로 제공될 예정이며, 그 중 하나는 YouTube 숏 스튜디오를 통해 제공됩니다.

DeepMind has been working on music synthesis for a number of years and now just announced a powerful new system. Interestingly, much of the boost came from a data partnership with revenue sharing. Meaning it trained on artists’ music for a better performing model while ensuring that the artists were compensated. The model will be available in a number of forms, one of which is via YouTube Shorts Studio.

(더 읽어보기 [GN] 구글 DeepMind, AI 음악 생성 모델 Lyria 공개)

메타, 동영상 편집 및 제작 모델 Emu Video와 Emu Edit 발표 / Meta Announces Video Editing and Creation Models (6 minute read)

제너레이티브 모델로 출력 이미지를 생성할 때 원하는 이미지와 다른 경우가 종종 있습니다. 하지만 동일한 모델로 해당 이미지를 편집하는 것은 매우 어렵습니다. 메타는 모든 세대를 인스트럭션으로 취급하면 편집 기능을 구현할 수 있다는 핵심 인사이트를 얻었습니다. 이는 모델 아키텍처의 새로운 단순성과 결합되어 상당히 큰 진전을 이루었습니다.

Oftentimes when you generate an output image with a generative model, it isn’t quite what you were looking for. However, editing that image with the same model is extremely challenging. Meta had a key insight that treating all generations as instructions allows editing capabilities to emerge. This, coupled with the new simplicity of the model architecture, is quite a nice step forward.

(더 읽어보기 [2023/11/13 ~ 11/19] 이번 주의 주요 ML 논문 (Top ML Papers of the Week))

마이크로소프트, 딥페이크 크리에이터 출시 / Microsoft Launches A Deepfake Creator (2 minute read)

Microsoft는 Microsoft Ignite 2023 행사에서 Azure AI 음성 텍스트 음성 변환 아바타를 출시하여 사용자가 텍스트 음성 변환 기술을 사용하여 다양한 언어로 스크립트된 텍스트를 말할 수 있는 사실적인 아바타를 만들 수 있도록 했습니다.

Microsoft launched Azure AI Speech text-to-speech avatar at the Microsoft Ignite 2023 event, allowing users to create photorealistic avatars that can speak scripted text in various languages using text-to-speech technology.

(더 읽어보기 [GN] Microsoft AI Ignite 이벤트 발표 제품들 요약 (영문/YouTube))

연구 & 혁신 관련 소식 / Research & Innovation

Flipped-VQA: 새로운 방법으로 비디오 질문 답변 개선하기 / Improving Video Question Answering with a New Method (16 minute read)

연구자들은 대규모 언어 모델이 언어에 지나치게 의존하고 실제 동영상 콘텐츠를 무시함으로써 동영상 질의응답(VideoQA)에서 오류를 범하는 경우가 있다는 사실을 발견했습니다. 이를 해결하기 위해 연구자들은 이러한 모델이 동영상, 질문, 답변 간의 관계를 더 잘 이해하여 보다 정확한 결과를 도출하도록 하는 Flipped-VQA라는 새로운 접근 방식을 도입했습니다.

Researchers found that Large Language Models sometimes make errors in Video Question Answering (VideoQA) by depending too much on the language and ignoring the actual video content. To solve this, researchers introduced a new approach called Flipped-VQA, which makes these models better understand the relationship between videos, questions, and answers, leading to more accurate results.

SCB-ST-Dataset4: 학생 행동을 이해하기 위한 데이터셋 / A Dataset to Understand Student Behavior (6 minute read)

연구원들은 딥러닝을 사용하여 학생들의 교실 행동을 더 잘 이해하고 감지하기 위해 손 들기, 읽기, 쓰기와 같은 활동을 캡처하는 SCB-ST-Dataset4를 확장했습니다.

Researchers have expanded the SCB-ST-Dataset4, which captures activities like hand-raising, reading, and writing to better understand and detect students' classroom behaviors using deep learning.

SentAlign: 대용량 문서를 위한 문장 정렬 도구 / Sentence Alignment for Large Documents (11 minute read)

SentAlign은 대규모 병렬 문서의 문장을 정렬하는 새로운 도구로, 수천에서 수만 개의 문장을 효율적으로 처리할 수 있습니다.

SentAlign is a new tool for aligning sentences in large parallel documents, capable of handling thousands to tens of thousands of sentences efficiently.

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

순수한 파이토치로 SAM 모델 가속하기 / Accelerating Segment Anything with Pure PyTorch (11 minute read)

Torch 컴파일, Sparsity, 고객 커널과 Triton 및 기타 여러 PyTorch 성능 기능을 사용하여 Segment Anything의 속도를 8배 높일 수 있습니다. segment-anything

We can speed up Segment Anything 8x by using Torch compile, Sparsity, customer kernels with Triton, and a number of other PyTorch performance features.

(광고) AI 제품에 대한 사용량 기반 청구 해결 / Solve usage-based billing for your AI product (Sponsor)

대부분의 기업은 여전히 AI 및 LLM 도구에 대한 과금 방법을 고민하고 있습니다. 패키지? 크레딧? 토큰? 어떤 모델을 선택하든 상관없습니다, Orb를 사용하면 최대한 쉽게 구현할 수 있습니다. 가격 모델을 선택하고 청구 가능한 메트릭을 선택하면 완료됩니다! 소비를 추적하고, 사기를 방지하고, 가격을 가치에 맞게 조정하세요(GPU 런타임 포함): 무료 샌드박스 체험하기

Most companies are still figuring out how to charge for AI and LLM tools. Packages? Credits? Tokens? Whichever model you choose, Orb makes it as easy as can be to implement — just choose your pricing model and billable metric and you’re done! Track consumption, prevent fraud, and align pricing to value (including GPU runtime): Try the free sandbox

심층 분석: 상용 수준의 LLM에 대한 개발자 가이드 / Deep Dive: Developer’s Guide to Production LLMs (21 minute read)

언어 모델로 툴을 빌드하는 것은 고성능 컴퓨팅, GPU 오케스트레이션 및 모니터링을 아우르는 새로운 엔지니어링 분야입니다. llm-in-production

Building tools with language models is an emerging engineering discipline that spans high performance computing, GPU orchestration, and monitoring.

AI 익스플로잇 / AI Exploits (GitHub Repo)

책임감 있게 공개된 취약점에 대한 실제 AI/ML 익스플로잇 모음입니다.

A collection of real-world AI/ML exploits for responsibly disclosed vulnerabilities.

그 외 소식 / Miscellaneous

뮤직 컨트롤넷 / Music ControlNet (8 minute read)

컨트롤넷은 이미지 합성 모델을 세밀하게 제어할 수 있는 새로운 방식이었습니다. 이제 음성과 음정 등 다양한 기능을 제어할 수 있는 다소 유사한 음악 생성 모델이 있습니다.

ControlNet was a novel way to give fine-grained control to image synthetics models. Now there is a somewhat analogous model for music generation that lets you control a number of features like speech and pitch.

AI 민주화를 위한 모델이 있다 / There’s A Model For Democratizing AI (7 minute read)

AI 의사 결정에서 민주적 프로세스를 구현하는 것에 대한 OpenAI의 제안 요청은 제한적으로 보이며, 민감한 정치적 문제를 책임지지 않고 처리하는 것을 선호하여 잠재적으로 AI 거버넌스에서 민주주의의 범위와 효과를 제한할 수 있습니다.

OpenAI's call for proposals on implementing democratic processes in AI decision-making appears restrictive and seems to prefer handling sensitive political issues without taking responsibility, potentially limiting the scope and effectiveness of democracy in AI governance.

코파일럿은 현존하는 비즈니스 모델입니다 / Copilot Is An Incumbent Business Model (2 minute read)

Copilot AI 비즈니스 모델은 새로운 시장을 창출하거나 하위 시장을 혼란에 빠뜨리지 않고 기존 워크플로를 개선하여 효율성을 높이지만, 진정한 파괴적 잠재력은 훨씬 더 큰 시장 기회를 창출할 수 있는 워크플로를 재구상하는 데 있습니다.

The Copilot AI business model enhances existing workflows for efficiency without creating new markets or disrupting lower ends, but its true disruptive potential lies in reimagining workflows, a challenge that could unlock significantly larger market opportunities.

더 읽어보기 / Quick Links

GPT-4 터보 노트 테이커 / GPT-4 Turbo Note Taker (Product)

GPT-4 Turbo를 사용해 AI가 회의 노트를 자동화하도록 하세요.

Let AI automate your meeting notes using GPT-4 Turbo.

구글은 AI 생성 음악에 들리지 않는 워터마크를 삽입하고 있습니다 / Google Is Embedding Inaudible Watermarks Right Into Its AI Generated Music (1 minute read)

구글 딥마인드의 AI 리리아 모델은 SynthID를 사용하여 해당 기술을 사용하여 생성된 오디오 트랙에 워터마크를 표시하여 압축이나 기타 수정 후에도 청취 경험에 영향을 주지 않고 AI가 만든 것임을 식별할 수 있도록 합니다.

Google DeepMind's AI Lyria model will use SynthID to watermark audio tracks generated using its technology, making them identifiable as AI-created without affecting the listening experience, even after compression or other modifications.

YouTube, AI를 사용하는 동영상에 라벨 표시 / YouTube Will Show Labels On Videos That Use AI (1 minute read)

YouTube는 크리에이터가 AI로 변경된 콘텐츠와 완전히 합성된 콘텐츠를 모두 포함하여 동영상에 AI가 사용되었음을 공개하도록 요구하는 새로운 가이드라인을 시행하고 있습니다.

YouTube is implementing new guidelines requiring creators to disclose the use of AI in their videos, including both AI-altered and entirely synthetic content.