OpenAI, 언어 모델의 동작을 조사하는 Transformer Debugger (TDB) 공개

9bow · 3월 12, 2024, 11:43오후

PyTorchKR

OpenAI의 Superalignment 팀에서 소형 언어 모델(Small LM)의 행동을 조사(investigation)하기 위해 개발한 Transformer Debugger (TDB)를 공개하였습니다. 언어 모델이 A 토큰 대신 B 토큰을 출력하는지, 또는 주어진 프롬프트에서 어텐션 헤드가 특정 토큰에 주목하는지 등을 대답할 수 있는 도구라고 합니다.

소개

TDB(Transformer Debugger)는 언어 모델의 행동에 영향을 미치는 순전파 과정을 살펴볼 수 있는 도구입니다. TBD를 활용하여 개발자 및 연구자들이 언어 모델의 작동 방식을 더 잘 이해하고, 문제를 진단하며, 모델의 행동을 수정하는데 도움을 받을 수 있을 것으로 보입니다.

OpenAI에서는 GPT-2 small 모델을 활용하여 TBD의 사용법을 소개하는 비디오를 공개하였습니다:

TDB는 자동 해석 가능성 기술과 희소 오토인코더를 결합하여 언어 모델의 행동을 분석하는 독특한 접근 방식을 제공합니다. 이는 다른 언어 모델 디버깅 도구와 비교했을 때, 특정 구성 요소가 모델 행동에 미치는 영향을 보다 명확하게 이해할 수 있게 해줍니다. 또한, 사용자가 코드를 작성하기 전에 빠른 탐색을 가능하게 함으로써, 개발 및 연구 과정을 가속할 수 있을 것으로 기대합니다.

핵심 기능

Neuron Viewer: TDB를 호스팅하는 React 앱입니다. 개별 모델 구성 요소(MLP 뉴런, 어텐션 헤드, 오토인코더 잠재 변수)에 대한 정보를 담은 페이지들을 제공합니다.
Activation Server: 대상 모델에 대한 추론을 수행하여 TDB에 데이터를 제공하는 백엔드 서버입니다. 또한, 공개 Azure 버킷에서 데이터를 읽고 제공합니다.
Models: GPT-2 모델과 그 오토인코더에 대한 간단한 추론 라이브러리로, 활성화를 캡처하기 위한 후크를 제공합니다.
Collated Activation Datasets: MLP 뉴런, 어텐션 헤드 및 오토인코더 잠재 변수에 대한 top-activating 데이터셋 예시를 제공합니다.

사용 방법

TDB 앱을 실행하기 위해서는, Python/pip과 Node/npm이 먼저 필요합니다. 가상 환경 사용이 권장되며, 설치 후 Activation Server Backend와 Neuron Viewer Frontend 설정 지침에 따라야 합니다.

TBD GitHub 저장소

https://github.com/openai/transformer-debugger

용어 설명 페이지

github.com/openai/transformer-debugger

terminology.md

main

# TDB Terminology

**Component**

- An attention head or neuron, or autoencoder latent
- Has some set of weights that define what the component does
- Analogy:
    - Component is like the code for a function
    - Node is like the specific invocation of a function with specific input values and specific output values
- When invoked, each component produces nodes that read something from the unnormalized residual stream, then write some vector (the “write vector”) to the unnormalized residual stream
- Each component is independent from other components of the same type in the same layer

**Node**

- Specific invocation of a component which reads from the normalized residual stream at one token, maybe produces some intermediate values, and then writes to the normalized residual stream at one token
- Comes from talking about nodes in a computation graph/causal graph
- Neurons/latents produce one node per sequence token. they read from/write to the same token
    - Neuron pre/post activations are intermediate values
- Attention heads produce one node per pair of sequence tokens (reading from same/earlier token, writing to later token)
    - Attention QK products, value vectors are intermediate values

This file has been truncated. show original

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 뉴스 발행에 힘이 됩니다~