[TLDR] 오늘의 AI 뉴스, 2023-07-28: a16z의 개인 금융을 위한 AI 💰, Stable Diffusion XL 1.0 🖼️, 멀티모달 요약 기능 향상 📃

9bow · 7월 29, 2023, 2:55오전

파이토치 한국 사용자 모임에서는 TLDR 뉴스레터의 승인을 받아 AI 소식을 DeepL로 번역하여 전합니다.

더 많은 AI 소식 및 정보를 공유하고 함께 성장하고 싶으신가요? 지금 파이토치 한국어 커뮤니티에 방문해주세요!

주요 뉴스 & 신규 출시 소식 / Headlines & Launches

Stability AI, Stable Diffusion XL 1.0 발표 / Stability AI Announces Stable Diffusion XL 1.0 (4 minute read)

Stability AI가 최신 고급 텍스트-이미지 변환 모델인 Stable Diffusion XL(SDXL) 1.0을 출시했으며, Amazon Bedrock과 자체 API를 통해 이용할 수 있습니다. 이 새로운 모델은 향상된 색상 정확도와 디테일을 제공하며 특수 이미지 생성을 위한 베타 미세 조정 기능을 도입했습니다.

Stability AI has released its newest advanced text-to-image model, Stable Diffusion XL (SDXL) 1.0, available on Amazon Bedrock and via its own API. The new model offers improved color accuracy and detail and introduces a beta fine-tuning feature for specialized image generation.

[GN] Stability AI, Stable Diffusion XL 1.0 모델 발표

AI가 개인 재무 관리의 잠금을 해제할지 여부를 탐구하는 a16z / a16z explores whether AI will unlock personal finance management (4 minute read)

자신의 재정을 스스로 관리하고자 하는 사람들이 점점 더 많아지고 있습니다. 재정 고문, 펀드 및 자산 관리자의 수수료, 그리고 머니 테크의 비약적인 발전으로 인해 사람들은 개인 재정에 관한 한 혼자서 해결하려는 경향이 강해지고 있습니다. 이러한 추세에 인공지능이 주도권을 잡을 수 있는 기회가 있을까요?

We’re increasingly seeing more people want to take control of their finances. Fees charged by financial advisors, funds and wealth managers, and valuable leaps in money tech are encouraging people to go it alone when it comes to personal finance. Is there an opportunity for AI to take the wheel on this trend?

Tabular LLM: 표 형식 파운데이션 모델 / Tabular foundation model (3 minute read)

자연어는 흥미롭지만, 대부분의 비즈니스는 시끄럽지만 가치 있는 구조화된 표 형식 데이터로 운영됩니다. 5,000억 개의 표 형식 토큰으로 학습된 이 새로운 표 형식 LLM은 구조화된 데이터를 채우고, 합성하고, 균형을 맞출 수 있습니다. 또한 데이터가 없는 경우에도 제로 샷으로 작동합니다!

While natural language is exciting, most businesses run on structured tabular data, which is noisy but valuable. This new tabular LLM trained on 500B tabular tokens can in-fill, synthesize, and balance structured data. It also works zero shot if you don't have any data!

연구 & 혁신 관련 소식 / Research & Innovation

GrammarGPT: 문법 교정을 위한 오픈소스 AI 활용 / GrammarGPT: Leveraging Open-Source AI for Grammar Correction (GitHub Repo)

이 저장소에서는 오픈소스 AI 시스템을 사용하여 중국어 원문의 문법을 개선하기 위해 개발된 강력한 도구인 "GrammarGPT"를 소개합니다.

This repository presents "GrammarGPT", a powerful tool developed to improve the grammar of native Chinese text using an open-source AI system.

HQTrack: 고품질 비디오 오브젝트 추적 및 세그멘테이션 프레임워크 / High-Quality Video Object Tracking and Segmentation Framework (GitHub Repo)

이 저장소에서는 고급 인식 알고리즘을 활용하여 단일 및 다중 객체를 모두 추적하고 경계를 세분화하는 고품질 비디오 객체 추적 프레임워크인 HQTrack을 소개합니다. 제한된 데이터 세트로 학습되었음에도 불구하고 HQTrack은 추가적인 데이터 증강이나 모델 앙상블을 사용하지 않고도 시각적 객체 추적 및 세분화(VOTS2023) 챌린지에서 2위를 차지하며 그 강점을 입증했습니다.

This repo introduces HQTrack, a high-quality video object tracking framework that leverages advanced perception algorithms to track both single and multiple objects as well as refine their boundaries. Despite being trained on a limited dataset, HQTrack demonstrates its strength by securing second place in the Visual Object Tracking and Segmentation (VOTS2023) challenge without using any additional data augmentations or model ensembles.

엔지니어링 및 리소스 관련 소식 / Engineering & Resources

CFSum: 멀티모달 요약 개선 / Enhancing Multimodal Summarization (16 minute read)

이 논문에서는 요약에서 이미지를 사용하는 방식을 개선하는 새로운 도구인 '다중 모드 요약용 거친-세밀한 기여 네트워크(Coarse-to-Fine contribution network for multimodal Summarization, CFSum)'를 소개합니다. CFSum은 불필요한 이미지를 걸러내고 유용한 이미지를 더 효과적으로 사용합니다.

This paper presents a new tool called the Coarse-to-Fine contribution network for multimodal Summarization (CFSum) that improves how summaries use images. CFSum filters out unnecessary images and uses helpful ones more effectively.

[논문 소개] 다중 모달 요약을 위한 대략적-정밀한 기여 네트워크 (Coarse-to-Fine Contribution Network for Multimodal Summarization)

PKU-GoodsAD: 슈퍼마켓과 스마트 제조를 위한 이상 징후 탐지 / Anomaly Detection for Supermarkets and Smart Manufacturing (11 minute read)

이 논문에서는 무인 슈퍼마켓과 스마트 제조 시나리오에서 이상 징후 탐지를 개선하기 위해 특별히 고안된 슈퍼마켓 상품의 고해상도 이미지 모음인 GoodsAD 데이터세트를 소개합니다.

This paper introduces the GoodsAD dataset, a large collection of high-resolution images of supermarket goods specifically designed to improve anomaly detection in unmanned supermarkets and smart manufacturing scenarios.

그 외 소식 / Miscellaneous

TinyML에 여전히 열광하기 어려운 이유 / Why TinyML is still so hard to get excited about (6 minute read)

최근 열린 tinyML 서밋에서 소니는 에너지 효율적인 이미지 센서를 시연했고, 유즈풀 센서는 QR코드 스캐닝 장치를 공개했습니다. 이 컨퍼런스는 TinyML 애플리케이션을 검색 가능하게 만드는 데 따르는 어려움과 많은 성공적인 애플리케이션이 명확하지 않거나 평범하다는 현실을 강조했습니다. 산업용 애플리케이션은 유망한 분야로 여겨지지만 경쟁 관계로 인해 비밀에 부쳐지는 경우가 많습니다.

At the recent tinyML Summit, Sony demonstrated energy-efficient image sensors while Useful Sensors revealed a QR-code scanning device. The conference underlined the challenges of making TinyML applications discoverable and the reality that many successful applications are non-obvious or mundane. Industrial applications are considered promising, but are often kept under wraps for competitive reasons.

AI를 사용하여 두 번째 두뇌 구축하기 / Building a second brain using AI (7 minute read)

저는 주로 흥미로운 읽기, 인포그래픽 및 생각을 큐레이션하기 위해 Notion을 적극적으로 사용하고 있습니다. 이 글에서 Azeem Azhar는 AI를 사용하여 연구 기술을 향상시키고 지식을 저장하고 주제별 연결과 패턴을 더 쉽게 발견할 수 있는 귀중한 데이터베이스로 Notion을 전환하는 방법을 설명합니다.

I’m an active Notion user, largely for curating interesting reads, infographics, and thoughts. In this article, Azeem Azhar unpacks how you can use AI to level up your research skills and turn Notion into a valuable database where you can store knowledge and more easily spot thematic connections and patterns.

JourneyDB - 제너레이티브 이미지 벤치마크 / JourneyDB - Generative Image Benchmark (6 minute read)

시각적 질문에 대한 답변을 위해 4백만 개 이상의 고품질의 엄선된 텍스트 및 이미지 쌍이 공개되었습니다. 이 모든 데이터는 Midjourney에서 합성적으로 생성되었기 때문에 일부에서는 이를 농담 삼아 Midjourney 증류 데이터세트라고 부르기도 합니다.

4+ million high quality, curated, text and image pairs for visual question answering have been released. They are all synthetically generated from Midjourney - as a result, some have jokingly called this the Midjourney distillation dataset.

더 읽어보기 / Quick Links

Tidepool: AI 앱을 위한 제품 분석 / Introducing Tidepool, Product Analytics for AI Apps (4 minute read)

AI 기반 텍스트 인터페이스는 소프트웨어와 상호 작용하는 새로운 방법을 열어주지만, 구조화되지 않은 텍스트에서 인사이트를 찾기는 어렵습니다. 타이드풀은 사용자 텍스트 상호 작용에서 패턴을 찾아내어 더 나은 제품 결정을 내릴 수 있도록 도와줍니다.

AI-based text interfaces unlock a new way to interact with software, but it's difficult to find insights in unstructured text. Tidepool finds patterns in user text interactions to help you make better product decisions.

Lingoedit (Product)

AI 기반 번역 편집기를 사용하면 사용자 지정 프롬프트 필드, 번역 기록 보기 및 뛰어난 텍스트 편집기를 통해 언어 장벽을 쉽게 극복할 수 있습니다.

The AI-powered translation editor enables you to effortlessly overcome language barriers with its custom prompt fields, translation history view, and superb text editor.

ReliableGPT (GitHub Repo)

OpenAI는 종종 모델 다운타임이 발생하거나 서비스가 다소 불안정합니다. 이 저장소는 몇 가지 영리한 트릭을 사용하여 API에서 매우 높은 가동 시간을 확보합니다.

OpenAI often has model downtime or somewhat flaky service. This repo uses some clever tricks to get extremely high uptime from the API.