[2024/02/26 ~ 03/03] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

9bow · 3월 4, 2024, 9:58오전

[2024/02/26 ~ 03/03] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR

이번 주에 선정된 논문들을 보면, 대규모 언어 모델(Large Language Models, LLMs)에 초점을 맞춘 연구가 주를 이루고 있는 것으로 보입니다. "Genie", "Mistral Large", "The Era of 1-bit LLMs", "Dataset for LLMs", "PlanGPT" 등의 제목에서 언어 모델링과 이를 개선하거나 새로운 데이터셋을 활용하는 접근에 대한 관심이 높다는 것을 알 수 있습니다. 이는 최근 몇 년 간 언어 처리 기술의 급격한 발전과 거대한 언어 모델이 다양한 자연어 처리 작업에서 뛰어난 성능을 발휘하고 있기 때문일 수 있습니다.
언어 모델, 특히 GPT-4와 같은 대규모 모델들은 다양한 산업과 연구 분야에서 응용되고 있으며, 이로 인해 이들 모델을 더 효율적으로 학습시키고, 더 다양한 데이터에 적용하며, 심지어는 더 적은 비트로도 높은 성능을 유지할 수 있는 새로운 기술의 개발에 대한 요구가 증가하고 있습니다. 또한, "On the Societal Impact of Open Foundation Models"와 같은 제목은 언어 모델들이 사회에 미치는 영향에 대한 연구가 이루어지고 있음을 시사하며, 기술 발전뿐만 아니라 그로 인한 사회적 변화와 책임에 대한 인식도 높아지고 있다는 것을 의미합니다.
한편, "LearnAct"나 "EMO"처럼 덜 구체적인 제목들도 눈에 띄는데, 이러한 논문들은 언어 모델에 국한되지 않고, 더 넓은 응용이나 이론적 발전에 초점을 맞추고 있을 가능성이 있습니다. 전반적으로 이번 주는 언어 모델링에 대한 새로운 접근과 이를 활용한 다양한 응용 연구들이 중심 트렌드로 자리매김한 것으로 확인되며, 이는 AI 분야 내에서 중요하게 여겨지는 다양한 도전 과제들에 대한 연구 및 개발 열기가 반영된 결과로 볼 수 있습니다.

Genie: 생성형 인터랙티브 환경 / Genie: Generative Interactive Environments

논문 소개

인터넷 동영상으로 학습되고 이미지 프롬프트가 주어지면 동작 제어가 가능한 다양한 2D 세계를 생성할 수 있는 파운데이션 모델인 지니는 11B 규모의 파라미터를 가지고 있으며 시공간 비디오 토큰화, 자동 회귀 동적 모델, 확장 가능한 잠재 액션 모델로 구성되어 있으며, 잠재 액션 공간을 통해 학습 에이전트가 보이지 않는 비디오의 동작을 모방할 수 있어 보다 일반적인 에이전트를 구축하는 데 유망합니다.

A foundation model trained from internet videos and with the ability to generate a variety of action-controllable 2d worlds given an image prompt; genie has 11b parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamic model, and a scalable latent action model; the latent action space enables training agents to imitate behaviors from unseen video which is promising for building more generalist agents.

논문 초록(Abstract)

라벨이 없는 인터넷 동영상에서 무감독 방식으로 학습된 최초의 제너레이티브 인터랙티브 환경인 Genie를 소개합니다. 이 모델은 텍스트, 합성 이미지, 사진, 심지어 스케치를 통해 설명된 무한히 다양한 액션 제어 가능한 가상 세계를 생성하도록 요청할 수 있습니다. 11B 매개변수에서 Genie는 기초 월드 모델로 간주할 수 있습니다. 시공간 비디오 토큰화, 자동 회귀 동역학 모델, 간단하고 확장 가능한 잠재 액션 모델로 구성되어 있습니다. Genie를 사용하면 일반적으로 세계 모델 문헌에서 볼 수 있는 실사 기반 액션 레이블이나 기타 도메인별 요구 사항 없이도 프레임 단위로 생성된 환경에서 프레임별로 액션을 취할 수 있습니다. 또한 학습된 잠재 액션 공간을 통해 에이전트가 보이지 않는 비디오의 동작을 모방할 수 있어 미래의 제너럴리스트 에이전트를 학습할 수 있는 길을 열어줍니다.

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

논문 링크

더 읽어보기

Genie: DeepMind가 공개한, 생성적 상호작용 환경 모델(Generative Interactive Environments) 읽을거리&정보공유

PyTorchKR Google DeepMind가 공개한 Genie는 레이블이 없는 인터넷 비디오에서 비지도 학습을 통해 학습된 모델로, 텍스트, 합성 이미지, 사진 및 스케치 등을 입력으로 받아 다양한 액션 제어가 가능한 가상 세계를 생성할 수 있습니다. 11B 규모의 파라미터를 가진 Genie 모델은 Foundation World Model로, 시공간 비디오 토크나이저, 자기회귀 동적 모델 및 확장 가능한 잠재 액션 모델을 통합합니다. 사용자는 Genie 모델을 활용하여 정답(GT, Ground-Truth) 액션 레이블이나 도메인에 특화된 요구사항 없이 생성된 환경과 프레임별로 상호 작용할 수 있습니다. 또한, Genie의 학습된 잠재 액션 공간은 보이지 않는 비디오에서 행동을 모방하는 에이전트를 학습시키는데 도움이 될 것입니다. [Genie: 생성적 상호작용 환경 모델(Generative Inte…

https://x.com/_rockt/status/1762026090262872161

미스트랄 라지 / Mistral Large

논문 소개

강력한 다국어, 추론, 수학, 코드 생성 기능을 갖춘 새로운 언어 엔진의 특징은 다음과 같습니다: 1) 32k 토큰 컨텍스트 창, 2) 기본 다국어 기능, 3) 추론, 지식, 수학, 코딩 벤치마크에 대한 강력한 능력, 4) 함수 호출 및 JSON 형식 기본 지원.

A new llm with strong multilingual, reasoning, maths, and code generation capabilities; features include: 1) 32k tokens context window, 2) native multilingual capacities, 3) strong abilities in reasoning, knowledge, maths, and coding benchmarks, and 4) function calling and json format natively supported.

논문 링크

더 읽어보기

[GN⁺] Mistral AI, GPT-4에 이어 강력한 성능을 보여주는 Mistral Large 및 Small 모델(의 API) 공개 읽을거리&정보공유

GeekNews의 xguru님께 허락을 받고 GN에 올라온 글들 중에 AI 관련된 소식들을 공유하고 있습니다. 소개 [[GN⁺] Mistral Large 모델 공개] Mistral Large는 최첨단 텍스트 생성 모델로, 최고 수준의 추론 능력을 갖춤 다양한 언어로 복잡한 추론 작업을 수행할 수 있으며, 텍스트 이해, 변환, 코드 생성에 사용 가능 MMLU 벤치마크에서 강력한 성능을 보여주며, API를 통해 일반적으로 사용 가능한 세계에서 두 번째로 순위가 높은 모델임 GPT-4 86.4% 다음인 81.2%, Claude 2가 78.5%, Gemini Pro가 71.8% Mistral Large의 새로운 기능과 강점 영어, 프랑스어, 스페인어, 독일어, 이탈리아어에 원어민 수준으로 능통하며, 문법과 문화적 맥락에 대한 미묘한 이해를 제공 32K 토큰 컨텍스트 윈도우를 통해 대규모 문서에서 정확한 정보 회상이 가능 정확…

https://x.com/omarsar0/status/1762140818654064721

1비트 LLM의 시대: 모든 대형 언어 모델은 1.58비트입니다 / The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

논문 소개

모든 파라미터가 삼항식 {-1, 0, 1}인 BitNet b1.58이라는 고성능의 비용 효율적인 1비트 LLM 변형을 소개합니다. 동일한 모델 크기와 트레이닝 토큰이 주어지면 비트넷 b1.58은 고정밀 변환기 LLM(즉, fp16)의 복잡성과 작업 성능에 맞출 수 있으며 이 1비트 LLM의 장점은 지연 시간, 메모리, 전체 및 에너지 소비가 훨씬 개선된다는 것입니다.

Introduces a high-performing and cost-effective 1-bit llm variant called bitnet b1.58 where every parameter is a ternary {-1, 0, 1}; given the same model size and training tokens, bitnet b1.58 can match the perplexity and task performance of a full precision transformer llm (i.e., fp16); the benefits of this 1-bit llm are significantly better latency, memory, throughout, and energy consumption.

논문 초록(Abstract)

BitNet과 같은 최근의 연구는 1비트 대규모 언어 모델(LLM)의 새로운 시대를 위한 길을 열어가고 있습니다. 이 글에서는 LLM의 모든 단일 파라미터(또는 가중치)가 삼항식 {-1, 0, 1}인 1비트 LLM 변형, 즉 BitNet b1.58을 소개합니다. 이는 모델 크기와 트레이닝 토큰이 동일한 고정밀(즉, FP16 또는 BF16) 트랜스포머 LLM과 복잡도 및 최종 작업 성능 측면에서 모두 일치하며 지연 시간, 메모리, 처리량 및 에너지 소비 측면에서 훨씬 더 비용 효율적입니다. 더 심오하게는 1.58비트 LLM은 고성능과 비용 효율을 모두 갖춘 차세대 LLM을 학습하기 위한 새로운 확장 법칙과 레시피를 정의합니다. 또한 새로운 연산 패러다임을 가능하게 하고 1비트 LLM에 최적화된 특정 하드웨어를 설계할 수 있는 문을 열어줍니다.

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

논문 링크

더 읽어보기

https://x.com/_akhaliq/status/1762729757454618720

대규모 언어 모델용 데이터 세트: 종합적인 서베이 논문 / Datasets for Large Language Models: A Comprehensive Survey

논문 소개

180 페이지 이상의 종합적인 개요 및 LLM 데이터셋의 분석.

A comprehensive overview (180+ pages) and analysis of llm datasets.

논문 초록(Abstract)

이 논문에서는 LLM의 눈부신 발전에 중요한 역할을 하는 대규모 언어 모델(LLM) 데이터셋에 대한 탐구를 시작합니다. 데이터셋은 LLM의 발전을 유지하고 육성하는 루트 시스템에 비유할 수 있는 기초 인프라 역할을 합니다. 따라서 이러한 데이터셋에 대한 조사가 연구의 중요한 주제로 떠오르고 있습니다. LLM 데이터세트에 대한 종합적인 개요와 철저한 분석이 부족한 현재의 상황을 해결하고 현재 상태와 향후 동향에 대한 인사이트를 얻기 위해 본 설문조사는 (1) 사전 학습 코퍼라, (2) 명령어 미세 조정 데이터세트, (3) 선호 데이터세트, (4) 평가 데이터세트, (5) 전통적인 자연어 처리(NLP) 데이터세트의 다섯 가지 관점에서 LLM 데이터세트의 기본 측면을 통합하고 분류합니다. 이 설문조사는 현재 당면한 문제를 조명하고 향후 조사를 위한 잠재적인 방향을 제시합니다. 또한 8개 언어 범주와 32개 도메인에 걸친 444개 데이터 세트의 통계를 포함하여 현재 사용 가능한 데이터 세트 리소스에 대한 종합적인 검토도 제공됩니다. 20개 차원의 정보가 데이터 세트 통계에 통합되어 있습니다. 조사된 총 데이터 규모는 사전 학습 코퍼스의 경우 774.5TB, 기타 데이터 세트의 경우 7억 인스턴스를 넘어섰습니다. 저희는 LLM 텍스트 데이터 세트의 전체 환경을 제시하여 이 분야의 연구자들에게 포괄적인 참고 자료가 되고 향후 연구에 기여하는 것을 목표로 하고 있습니다. 관련 리소스는 GitHub - lmmlzn/Awesome-LLMs-Datasets: Summarize existing representative LLMs text datasets. 에서 확인할 수 있습니다.

This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that sustains and nurtures the development of LLMs. Consequently, examination of these datasets emerges as a critical topic in research. In order to address the current lack of a comprehensive overview and thorough analysis of LLM datasets, and to gain insights into their current status and future trends, this survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives: (1) Pre-training Corpora; (2) Instruction Fine-tuning Datasets; (3) Preference Datasets; (4) Evaluation Datasets; (5) Traditional Natural Language Processing (NLP) Datasets. The survey sheds light on the prevailing challenges and points out potential avenues for future investigation. Additionally, a comprehensive review of the existing available dataset resources is also provided, including statistics from 444 datasets, covering 8 language categories and spanning 32 domains. Information from 20 dimensions is incorporated into the dataset statistics. The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets. We aim to present the entire landscape of LLM text datasets, serving as a comprehensive reference for researchers in this field and contributing to future studies. Related resources are available at: GitHub - lmmlzn/Awesome-LLMs-Datasets: Summarize existing representative LLMs text datasets..

논문 링크

더 읽어보기

https://github.com/lmmlzn/Awesome-LLMs-Datasets

https://x.com/omarsar0/status/1763233452852134001

액션 러닝을 통한 대규모 언어 모델 에이전트 역량 강화 / Empowering Large Language Model Agents through Action Learning

논문 소개

파이썬 함수를 사용하여 액션을 생성하고 개선하는 반복 학습 전략을 통해 언어 에이전트를 위한 오픈 액션 학습을 탐색하고, 제안된 프레임워크(learnact)는 각 반복마다 실행 피드백을 기반으로 사용 가능한 액션을 수정 및 업데이트하여 액션 공간을 확장하고 액션 효과를 개선하며, 로봇 계획 및 알프월드 환경에서 학습한 결과, 알프월드에서 리액트+리플렉션 대비 32%의 에이전트 성능 향상 효과를 확인했습니다.

Explores open-action learning for language agents through an iterative learning strategy that creates and improves actions using python functions; on each iteration, the proposed framework (learnact) expands the action space and enhances action effectiveness by revising and updating available actions based on execution feedback; the learnact framework was tested on robotic planning and alfworld environments; it improves agent performance by 32% in alfworld compared to react+reflexion.

논문 초록(Abstract)

최근 대규모 언어 모델(LLM) 에이전트에 대한 관심이 높아지고 있지만, 지능형 행동의 핵심 요소인 시행착오를 통한 학습 능력에는 한계가 있습니다. 이 연구에서는 경험을 통해 새로운 행동을 학습하는 능력이 LLM 에이전트의 학습을 발전시키는 데 필수적이라고 주장합니다. 인간은 경험 학습을 통해 자연스럽게 행동 공간을 확장하고 기술을 개발하는 반면, LLM 에이전트는 일반적으로 고정된 행동 공간 내에서 작동하므로 성장 가능성이 제한됩니다. 이러한 문제를 해결하기 위해 본 연구에서는 언어 에이전트를 위한 개방형 액션 학습을 탐구합니다. Python 함수 형태로 액션을 생성하고 개선하는 반복 학습 전략이 포함된 프레임워크 LearnAct를 소개합니다. 각 반복에서 LLM은 실패한 학습 작업에서 확인된 오류를 기반으로 현재 사용 가능한 액션을 수정하고 업데이트하여 액션 효과를 향상시킵니다. 로보틱 플래닝 및 알프월드 환경에서의 실험적 평가에 따르면 몇 가지 학습 작업 인스턴스에서 학습한 후 개방형 액션 학습에 대한 접근 방식이 해당 작업 유형에 대한 에이전트 성능을 현저하게 향상시키는 것으로 나타났습니다(예: AlfWord에서 ReAct+Reflexion에 비해 32%). 이는 보다 지능적인 LLM 에이전트 개발에 있어 경험적 액션 학습의 중요성을 강조하는 결과입니다.

Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through experiential learning, LLM agents typically operate within fixed action spaces, limiting their potential for growth. To address these challenges, our study explores open-action learning for language agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions. In each iteration, LLM revises and updates the currently available actions based on the errors identified in unsuccessful training tasks, thereby enhancing action effectiveness. Our experimental evaluations across Robotic Planning and Alfworld environments reveal that after learning on a few training task instances, our approach to open-action learning markedly improves agent performance for the type of task (by 32 percent in AlfWorld compared to ReAct+Reflexion, for instance) highlighting the importance of experiential action learning in the development of more intelligent LLM agents.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1762533498492010761

EMO: 이모트 포트레이트 얼라이브 - 약한 조건에서 오디오2비디오 디퓨젼 모델을 사용하여 표현력 있는 포트레이트 비디오 생성하기 / EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

논문 소개

Audio-to-Video 디퓨젼 모델을 활용하여 중간 3D 모델이나 얼굴 랜드마크가 필요 없는 Audio-to-Video 직접 합성 방식을 활용하여 표현력 있는 비디오를 생성하는 새로운 프레임워크인 이모는 표현력과 사실성 측면에서 기존 방식보다 뛰어난 성능을 발휘하면서 다양한 스타일의 설득력 있는 말하기 비디오와 노래 비디오를 제작할 수 있습니다.

A new framework for generating expressive video by utilizing a direct audio-to-video synthesis approach; by leveraging an audio2video diffusion model it bypasses the need for intermediate 3d models or facial landmarks; emo can produce convincing speaking videos and singing videos in various styles while outperforming existing methods in terms of expressiveness and realism.

논문 초록(Abstract)

이 연구에서는 오디오 단서와 얼굴 움직임 사이의 역동적이고 미묘한 관계에 초점을 맞춰 말하는 머리 비디오 생성의 사실성과 표현력을 향상시키는 과제를 해결합니다. 유니티는 인간 표정의 전체 스펙트럼과 개별 얼굴 스타일의 고유성을 포착하지 못하는 기존 기술의 한계를 파악합니다. 이러한 문제를 해결하기 위해 유니티는 중간 3D 모델이나 얼굴 랜드마크가 필요 없는 직접 오디오-비디오 합성 방식을 활용하는 새로운 프레임워크인 EMO를 제안합니다. 이 방식은 비디오 전체에서 원활한 프레임 전환과 일관된 아이덴티티 보존을 보장하여 표현력이 뛰어나고 생생한 애니메이션을 제작할 수 있습니다. 실험 결과에 따르면 EMO는 설득력 있는 말하기 동영상뿐만 아니라 다양한 스타일의 노래 동영상도 제작할 수 있으며, 표현력과 사실성 측면에서 기존의 최첨단 방법론을 크게 능가하는 것으로 나타났습니다.

In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.

논문 링크

더 읽어보기

https://x.com/_akhaliq/status/1762686465777999932

오픈 파운데이션 모델의 사회적 영향력 / On the Societal Impact of Open Foundation Models

논문 소개

오픈 파운데이션 모델과 그 영향, 혜택, 위험에 초점을 맞춘 입장문은 위험 분석을 위한 위험 평가 프레임워크를 제안하고 일부 경우 오픈 파운데이션 모델의 한계 위험이 낮은 이유를 설명하며, 오픈 파운데이션 모델의 사회적 영향에 대한 보다 근거 있는 평가를 제시합니다.

A position paper with a focus on open foundation models and their impact, benefits, and risks; proposes a risk assessment framework for analyzing risk and explains why the marginal risk of open foundation models is low in some cases; it also offers a more grounded assessment of the societal impact of open foundation models.

논문 링크

https://crfm.stanford.edu/open-fms/

더 읽어보기

https://x.com/sayashk/status/1762508812370551207

스타코더 2 / StarCoder 2

논문 소개

세 가지 크기(3b, 7b, 15b)의 코드용 오픈 머신러닝 제품군으로, 15b 모델은 14조 개의 토큰과 600개 이상의 프로그래밍 언어로 학습되었으며, 16k 토큰의 컨텍스트 창과 중간 채우기 목표를 사용하여 코드 완성, 코드 추론, PAL을 통한 수학 추론 등 여러 평가에서 33b 이상의 모델과 일치합니다.

A family of open llms for code with three different sizes (3b, 7b, and 15b); the 15b model was trained on 14 trillion tokens and 600+ programming languages with a context window of 16k token and employing a fill-in-the-middle objective; it matches 33b+ models on many evaluation like code completion, code reasoning, and math reasoning aided through pal.

논문 링크

더 읽어보기

https://x.com/_philschmid/status/1762843489220296881

표 형식 데이터의 대규모 언어 모델(LLM): 예측, 생성 및 이해 - 서베이 논문 / Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey

논문 소개

주요 기술, 메트릭, 데이터 세트, 모델, 최적화 접근 방식을 포함한 표 형식 데이터 작업을 위한 LLMS의 개요와 향후 연구 방향에 대한 인사이트와 함께 한계와 미개척 아이디어를 다룹니다.

An overview of llms for tabular data tasks including key techniques, metrics, datasets, models, and optimization approaches; it covers limitations and unexplored ideas with insights for future research directions.

논문 초록(Abstract)

최근 대규모 언어 모델링의 획기적인 발전으로 예측, 표 형식 데이터 합성, 질문 답변, 표 이해 등 표 형식 데이터 모델링과 관련된 다양한 작업에서 그 적용을 엄격하게 탐색할 수 있게 되었습니다. 각 작업에는 고유한 과제와 기회가 있습니다. 그러나 현재 이 연구 영역의 주요 기술, 메트릭, 데이터 세트, 모델, 최적화 접근법을 요약하고 비교하는 종합적인 검토가 부족합니다. 이 설문조사는 이러한 분야의 최근 진전을 통합하고, 활용되는 데이터 세트, 지표, 방법론에 대한 철저한 조사와 분류를 제공함으로써 이러한 격차를 해소하는 것을 목표로 합니다. 기존 문헌의 강점, 한계, 미개척 영역, 격차를 파악하는 동시에 이 중요하고 빠르게 진화하는 분야의 향후 연구 방향에 대한 인사이트를 제공합니다. 또한 관련 코드 및 데이터 세트 참조도 제공합니다. 이 포괄적인 리뷰를 통해 관심 있는 독자들에게 적절한 참고 자료와 통찰력 있는 관점을 제공하고, 해당 분야의 당면 과제를 효과적으로 탐색하고 해결하는 데 필요한 도구와 지식을 제공할 수 있기를 바랍니다.

Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field. It also provides relevant code and datasets references. Through this comprehensive review, we hope to provide interested readers with pertinent references and insightful perspectives, empowering them with the necessary tools and knowledge to effectively navigate and address the prevailing challenges in the field.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1763187964501254492

PlanGPT: 맞춤형 언어 모델과 효율적인 검색을 통한 도시 계획 개선 / PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval

논문 소개

검색 증강, 미세 조정, 도구 사용 등과 같은 여러 접근 방식을 LLM을 활용하고 결합하는 방법을 보여줍니다. 제안된 프레임워크는 도시 및 공간 계획에 적용되지만 다른 영역에도 적용할 수 있는 많은 인사이트와 실용적인 팁이 있습니다.

Shows how to leverage llms and combine multiple approaches like retrieval augmentation, fine-tuning, tool usage, and more; the proposed framework is applied to urban and spatial planning but there are a lot of insights and practical tips that apply to other domains.

논문 초록(Abstract)

도시 계획 분야에서 범용 대형 언어 모델은 기획자의 특정 요구 사항을 충족하는 데 어려움을 겪는 경우가 많습니다. 도시 계획 텍스트 생성, 관련 정보 검색, 계획 문서 평가와 같은 작업은 고유한 과제를 안고 있습니다. 도시 전문가의 효율성을 높이고 이러한 장애물을 극복하기 위해 도시 및 공간 계획에 맞춤화된 최초의 전문 대형 언어 모델인 PlanGPT를 소개합니다. 중국 도시계획학회와 같은 기관과의 협업을 통해 개발된 PlanGPT는 맞춤형 로컬 데이터베이스 검색 프레임워크, 도메인별 기본 모델 미세 조정, 고급 툴링 기능을 활용합니다. 경험적 테스트에 따르면 PlanGPT는 도시 계획의 복잡성에 정확하게 맞춘 우수한 품질의 응답을 제공하면서 뛰어난 성능을 달성했습니다.

In the field of urban planning, general-purpose large language models often struggle to meet the specific needs of planners. Tasks like generating urban planning texts, retrieving related information, and evaluating planning documents pose unique challenges. To enhance the efficiency of urban professionals and overcome these obstacles, we introduce PlanGPT, the first specialized Large Language Model tailored for urban and spatial planning. Developed through collaborative efforts with institutions like the Chinese Academy of Urban Planning, PlanGPT leverages a customized local database retrieval framework, domain-specific fine-tuning of base models, and advanced tooling capabilities. Empirical tests demonstrate that PlanGPT has achieved advanced performance, delivering responses of superior quality precisely tailored to the intricacies of urban planning.

논문 링크

더 읽어보기

https://x.com/omarsar0/status/1763424166890377691

원문

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~

[2024/02/26 ~ 03/03] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

PyTorchKR​

Genie: 생성형 인터랙티브 환경 / Genie: Generative Interactive Environments

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

미스트랄 라지 / Mistral Large

논문 소개

논문 링크

더 읽어보기

1비트 LLM의 시대: 모든 대형 언어 모델은 1.58비트입니다 / The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

대규모 언어 모델용 데이터 세트: 종합적인 서베이 논문 / Datasets for Large Language Models: A Comprehensive Survey

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

액션 러닝을 통한 대규모 언어 모델 에이전트 역량 강화 / Empowering Large Language Model Agents through Action Learning

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

EMO: 이모트 포트레이트 얼라이브 - 약한 조건에서 오디오2비디오 디퓨젼 모델을 사용하여 표현력 있는 포트레이트 비디오 생성하기 / EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

오픈 파운데이션 모델의 사회적 영향력 / On the Societal Impact of Open Foundation Models

논문 소개

논문 링크

더 읽어보기

스타코더 2 / StarCoder 2

논문 소개

논문 링크

더 읽어보기

표 형식 데이터의 대규모 언어 모델(LLM): 예측, 생성 및 이해 - 서베이 논문 / Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

PlanGPT: 맞춤형 언어 모델과 효율적인 검색을 통한 도시 계획 개선 / PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval

논문 소개

논문 초록(Abstract)

논문 링크

더 읽어보기

원문

PyTorchKR