[TLDR] 오늘의 AI 뉴스, 2023-11-13: 구글, Character.AI에 투자 💰, FigJam AI 🖌️, 언어 모델에 대한 적대적 공격 👊

9bow · 11월 14, 2023, 11:39오후

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.
더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

구글, Character.AI에 수억 달러 투자하기로 협의 중 / Google In Talks To Invest Hundreds Of Millions Into Character.AI (2 minute read)

Google은 수억 달러를 투자하여 Character.AI와의 관계를 더욱 강화하기 위해 논의 중입니다.

Google is in talks to deepen its relationships with Character.AI by investing hundreds of millions of dollars.

재피어 AI 액션 / Zapier AI Actions (2 minute read)

재피어는 개발자가 모든 AI 플랫폼에서 재피어의 20,000개 이상의 자동화 작업을 실행할 수 있는 도구인 AI 액션을 도입했습니다. AI 액션은 사용자가 자연어 명령을 AI 플랫폼에 보내면 해당 플랫폼이 원하는 작업을 수행하는 방식으로 작동합니다. 이 서비스는 간단한 설정과 고유한 API 통합을 통해 여러 AI 플랫폼을 지원합니다.

Zapier introduced AI Actions, a tool for developers to let any AI platform run Zapier's 20,000+ automation actions. AI Actions works by letting users send natural language commands to the AI platform, which then performs the desired action. The service supports several AI platforms, with simple setup and inherent API integrations.

FigJam에 AI를 도입하다 / Introducing AI to FigJam (3 minute read)

Figma는 디자인 협업을 간소화하고 향상시키기 위해 디지털 화이트보드 도구인 FigJam에 AI 지원을 통합했습니다. AI 기반 프로젝트 잼봇에서 파생된 것과 같은 유틸리티 중심의 개선 사항은 사용자가 가상 캔버스에서 보다 효과적으로 협업할 수 있도록 도와줍니다. Figma의 목표는 시각 디자인에 머신 러닝 기능을 활용하여 다양한 사용자 요구 사항에 걸쳐 적용 범위를 넓히는 것입니다.

Figma has incorporated AI assistance into FigJam, its digital whiteboard tool, to simplify and enhance design collaborations. Utility-oriented enhancements, like those derived from the AI-powered project Jambot, help users collaborate more effectively on a virtual canvas. Figma’s goal is to broaden the applicability across various user requirements by tapping machine learning capabilities for visual design.

연구 & 혁신 관련 소식 / Research & Innovation

DG-SCT: 새로운 어텐션으로 시청각 모델 개선하기 / Enhancing Audio-Visual Models with a New Attention (GitHub Repo)

이 프로젝트에서는 멀티 모달 작업을 위해 사전 학습된 시청각 모델을 향상시키는 듀얼 가이드 공간-채널-시간(DG-SCT) 어텐션 메커니즘을 소개합니다.

This project introduces the Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism that enhances pre-trained audio-visual models for multi-modal tasks.

심층 분석: 언어 모델에 대한 적대적 공격 / Deep Dive: Adversarial attacks on language models (24 minute read)

이 블로그 게시물은 언어 모델 시스템에 대해 새롭게 등장하는 다양한 공격에 대해 설명합니다. 여기에는 다양한 유형의 공격에 대한 훌륭한 정보와 팀에서 효과적인 것으로 확인된 몇 가지 완화 방법이 포함되어 있습니다.

This blog post is about different attacks that are emerging against language model systems. It contains excellent information about different types of attacks and some mitigations that teams have found to be effective.

3DStyle-Diffusion: 2D 디퓨전 모델을 사용한 3D 메시 스타일라이제이션 / Stylization of 3D Meshes Using 2D Diffusion Models (16 minute read)

이 연구에서는 3D 메시의 세부적인 스타일화를 위한 새로운 접근 방식인 3D 스타일 디퓨전 모델을 소개하며, 2D 디퓨전 모델을 통합하여 모양과 지오메트리에 대한 추가적인 제어 기능을 제공합니다. 이 모델은 먼저 암시적 MLP 네트워크를 사용하여 3D 메시의 텍스처를 반사율과 조명으로 매개변수화한 다음 사전 학습된 2D 확산 모델을 사용하여 렌더링된 이미지를 텍스트 프롬프트에 정렬하고 기하학적 일관성을 보장하는 방식으로 작동합니다.

This research presents the 3DStyle-Diffusion model, a novel approach for detailed stylization of 3D meshes, integrating 2D Diffusion models for added control over appearance and geometry. It works by first parameterizing a 3D mesh's texture into reflectance and lighting, using implicit MLP networks, and then using a pre-trained 2D Diffusion model to align the rendered images with the text prompt and ensure geometric consistency.

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

(광고) 체계적인 테스트를 통해 신뢰할 수 있는 AI 구축 / Building trustworthy AI through systematic testing (Sponsor)

엄격함 ML 모델에 대한 단위 테스트을 사용하면 AI/ML 시스템에 대한 신뢰도를 높이고 짧은 시간 내에 모델 성능에 대한 심층적인 인사이트를 확보할 수 있습니다.
예정된 콜레나의 웨비나에서 이 작업을 수행하는 방법을 알아보세요. 모델 품질 관리 프로세스를 설계하고, 테스트 커버리지를 표준화하며, 견고성, 공정성 및 편견을 다루는 방법을 배웁니다. 라이브 시청 또는 세션 종료 후 녹화본을 받으려면 RSVP를 신청하세요

Rigorous unit testing for ML models increases trust in your AI/ML systems, and enables deeper insights into model performance in a fraction of the time.
Discover how it’s done in this upcoming webinar by Kolena. You’ll learn how to design a model quality management process; standardize test coverage; and deal with robustness, fairness, and biases. RSVP to watch live or get the recording after the session

AudioSR: 초고해상도 오디오 / Audio super resolution (GitHub Repo)

초고해상도 오디오는 모든 오디오(실제 또는 합성)의 품질과 충실도를 높이는 프로세스입니다. 대부분의 초고해상도 시스템은 단일 오디오 데이터 유형(예: 음성 대 음악)에 대해 학습된 단일 모델을 사용하는 작업별 시스템입니다. 이 새로운 작업은 단일 모델이 여러 작업에서 오디오 품질을 향상시키는 역할을 할 수 있는 놀라운 진전입니다.

Audio super-resolution is the process of increasing the quality and fidelity of any audio, real or synthetic. Most super-resolution systems are task-specific, with single models trained for single audio data types (e.g., speech vs music). This new work is an amazing step forward where a single model can serve to increase the quality of audio across tasks.

Tarsier: 웹 에이전트용 툴킷 / Toolkit for web agents (GitHub Repo)

강력한 새 비전 모델의 등장으로 많은 그룹에서 비전을 사용하여 웹 요소와 상호 작용하는 에이전트를 구축하려고 시도하고 있습니다. 타시어 툴킷은 표준 도구 세트(예: 요소 태깅)를 소개합니다. 모든 비전 시스템을 사용하여 웹 페이지를 이해하고 조치를 취할 수 있습니다. 또한 비비전 언어 모델을 탐색할 수 있는 유틸리티도 포함되어 있습니다.

With the advent of powerful new vision models, many groups are attempting to build agents that use vision to interact with web elements. Tarsier toolkit introduces a standard set of tools (e.g., element tagging). You can use any vision system to understand the web page and take action. It also includes utilities for non-vision language models to browse.

허깅페이스 정렬 핸드북 / HuggingFace alignment handbook (12 minute read)

최근 출시된 우수한 제퍼 언어 모델을 통해 허깅페이스 팀은 사전 학습된 몇 안 되는 강력한 오픈 소스 모델을 기반으로 개인화된 모델을 학습하는 방법을 선보입니다.

With the recent release of the excellent Zephyr language model, the HuggingFace team showcases how you can train personalized models built on top of the few powerful pre-trained open source models available.

그 외 소식 / Miscellaneous

LLaVa-Plus: 언어-시각 기술 습득을 위한 모델 / LLaVa Plus model for language-vision skill acquisition (5 minute read)

LLaVa 모델은 언어와 시각을 모두 결합한 오픈 소스 모델입니다. 이 새로운 버전에서는 명령어로 조정된 모델이 이미지 편집, 생성 등을 위한 도구를 사용할 수 있습니다.

The LLaVa model is an open-source model that combines both language and vision. This new version allows the instruction-tuned model to use tools for image editing, generation, and more.

Bark를 사용한 초고속 텍스트 음성 변환 생성 예제 / Extra fast text-to-speech generation (Jupyter Notebook)

Bark 텍스트 음성 변환 시스템을 사용하여 일관되고 빠르며 매우 긴 오디오 출력을 생성하는 예제입니다.

An example of using Bark text-to-speech system to generate coherent, fast, and extra-long audio output.

RWKV로 로컬 AI 타운 운영 / Run local AI town with RWKV (GitHub Repo)

AI 타운은 수백 명의 에이전트가 언어 모델에서 프롬프트 상태로 일상 생활을 하는 놀라운 실험입니다. RWKV 모델은 표준 트랜스포머보다 리소스를 적게 필요로 하는 선형 언어 모델입니다. 이 리포지토리는 이 저렴한 모델을 사용하여 로컬 머신에서 AI 타운을 실행합니다.

AI town is an amazing experiment where hundreds of agents live out their daily lives as a prompt state in a language model. The RWKV model is a linear language model that requires fewer resources than standard Transformers. This repo uses this cheaper model to run AI town on your local machine.

더 읽어보기 / Quick Links

오마이신트 / OHMYSYNT (Product)

브랜드를 위한 맞춤형 AI 콘텐츠.

Personalized AI content for brands.

Poe를 사용한 크리에이터 수익 창출 소개 / Introducing creator monetization for Poe (3 minute read)

Poe는 봇 개발자와 소규모 AI 회사의 역량을 강화하기 위해 구독 수익 공유 및 메시지당 수수료를 통해 크리에이터의 수익 창출을 지원하도록 AI 플랫폼을 업데이트했습니다. 수익 창출은 미국에서 시작되었으며, 곧 전 세계로 확대될 예정입니다.

Poe has updated its AI platform to support creator monetization through subscription revenue shares and per-message fees, aiming to empower bot developers and small AI companies. Monetization is live in the US, with global expansion on the horizon.