Meta AI, Vision 모델을 위한 PUG(Photorealistic Unreal Graphics) 데이터셋 공개

9bow · 8월 13, 2023, 5:47오전

최근 Meta에서 여러가지 모델들과 함께 데이터셋까지 활발하게 공개하고 있네요.

이번에는 PUG라는, Photorealistic Unreal Graphics 데이터셋을 공개하였습니다.

소개

~~역시나 제가 잘 모르는 분야라 김 굽듯이 살짝 설명드리자면~~

Photorealistic Unreal Graphics (PUG) 는 Meta AI에 의해 연구된 프로젝트로, 학습 및 테스트 시의 분포 이동을 제어하여 뉴럴넷이 변동 요인에 어떻게 일반화되는지를 알아보는데 도움을 줄 수 있다고 합니다.
(혹시 이 분야를 아시는 분 계시면 보충 설명 부탁드립니다 )

CC-BY-NC 라이선스로 배포하는데요, 설명이 어려우니;; 일단 데모 영상 보시고 가시죠!

DeepL로 번역한 PUG Datasheet의 데이터 소개 내용은 아래와 같습니다.

Q. 데이터 세트는 어떤 목적으로 만들어졌나요?
A. 이 논문에서 제시한 3개의 데이터 세트는 표현 학습 연구를 위해 만들어졌습니다.
PUG: Animals은 비전 모델의 표현을 더 잘 조사할 수 있을 뿐만 아니라 OOD 연구를 위한 강력한 데이터 세트입니다.
PUG: ImageNet은 특정 변동 요인에 대한 견고성 측면에서 비전 모델의 기능을 더 잘 이해할 수 있도록 ImageNet 사전 학습 모델을 위한 추가 벤치마크로 설계되었습니다.
마지막으로 PUG: SPAR은 합성 데이터를 사용하여 VLM의 이해도를 평가하는 방법을 보여줍니다.

Q. For what purpose was the dataset created?
A. The 3 datasets we presented in this paper were created for representation learning research. PUG: Animals is a strong dataset for OOD research as well as being able to better probe the representation of vision models. PUG: ImageNet was designed as an additional benchmark for ImageNet pretrained model to offer a better comprehension of vision models capabilities in term of robustness to specific factor of variations. Lastly, PUG: SPAR showcase how synthetic data can be used to evaluate VLMs understansing.

공개한 데이터셋은 Animal, ImageNet, SPAR*(Scene, Position, Attribute, Relation)*의 3종류입니다.
(GitHub 저장소에는 AR4T도 있어서 함께 정리해보긴 했습니다. )

PUG: Animals (78GB)

이 데이터셋은 분포 외 일반화 연구와 기초 모델의 표현 공간 연구를 위한 것으로, 70개의 동물 사진을 사용하여 렌더링된 215,040개의 이미지, 64개의 배경, 3개의 크기, 4개의 질감, 4개의 다른 카메라 방향을 포함하고 있다고 합니다.

215,040 pre-rendered images using 70 animal assets

64 backgrounds

3 sizes

4 textures

4 different camera orientations

다음 링크에서 다운로드 받으실 수 있습니다: PUG: Animals (78GB)
PUG Animals 데이터셋을 불러오는 DataLoader 예제는 다음 GitHub 저장소의 노트북에서 확인하실 수 있습니다.

github.com/facebookresearch/PUG

PUG_Animals/PUG_Animals.ipynb

main

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "efd14d43-f093-46fe-ae76-756cb8ea07e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "import numpy as np\n",
    "import os "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,

This file has been truncated. show original

아래는 예시 이미지들입니다

PUG: ImageNet (27GB)

이 데이터셋은 이미지 분류기의 견고성을 여러 변동 요인에 따라 세밀하게 평가하는 데 유용한 벤치마크로, 724개의 사진들을 사용하여 렌더링된 88,328개의 이미지, 151개의 ImageNet 종류(class), 64개의 배경 등을 포함하고 있다고 합니다.

88,328 pre-rendered images using 724 assets

151 ImageNet classes

64 backgrounds

7 sizes

10 textures

18 camera orientations

18 character orientations

7 light intensities

다음 링크에서 다운로드 받으실 수 있습니다: PUG: Animals (78GB)
PUG ImageNet 데이터셋을 불러오는 DataLoader 예제는 다음 GitHub 저장소의 노트북에서 확인하실 수 있습니다.

github.com/facebookresearch/PUG

PUG_ImageNet/PUG_ImageNet.ipynb

main

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "afce84de-0f22-4cfe-af4b-cc9c91abb8fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "import numpy as np\n",
    "import os "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b258e4ad-d911-49a6-8780-e27b1307cd8c",

This file has been truncated. show original

아래는 예시 이미지들입니다

PUG: SPAR (Scene, Position, Attribute, Relation, 16GB)

이 데이터셋은 시각-언어 모델의 평가를 위해 사용됩니다. 10개의 배경, 32개의 동물, 4개의 관계, 4개의 속성을 포함하는 43,560개의 렌더링된 이미지를 포함합니다.

43,560 pre-rendered images

10 backgrounds

32 animals

4 relations (left/right, bottom/top)

4 attributes (blue/red, grass/stone)

다음 링크에서 다운로드 받으실 수 있습니다: PUG: SPAR (16GB)
PUG SPAR 데이터셋을 불러오는 DataLoader 예제는 다음 GitHub 저장소의 노트북에서 확인하실 수 있습니다.

github.com/facebookresearch/PUG

PUG_SPAR/PUG_SPAR.ipynb

main

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e67c065b-b3b0-4822-bd6b-92b0897aef23",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "import os \n",
    "import shutil"
   ]
  },
  {
   "cell_type": "markdown",

This file has been truncated. show original

아래는 예시 이미지들입니다

PUG: AR4T (Attribute and Relation for Training, 97GB)

시각-언어(Vision-Language) 모델의 파인튜닝을 위한 데이터셋으로, 23,216장의 테스트 이미지와 캡션, 그리고 249,986장의 렌더링된 이미지로 구성되어 있습니다.

249,986 pre-rendered images

23,216 test images

Captions

다음 링크에서 다운로드 받으실 수 있습니다: PUG: AR4T (97GB)
PUG AR4T 데이터셋을 불러오는 DataLoader 예제는 다음 GitHub 저장소의 노트북에서 확인하실 수 있습니다.

github.com/facebookresearch/PUG

PUG_AR4T/PUG_AR4T.ipynb

main

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2c6b35f8-dd35-41bf-b754-d85714e143aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "import os \n",
    "import shutil\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

아래는 예시 이미지들입니다

데이터셋 사용을 위한 torchmultiverse 환경 구성

Epic Games launcher를 다운로드 받으신 뒤, Unreal Engine을 설치하고, Pixel Streaming project 등의 환경을 구성하여 데이터셋을 불러올 수 있다고 합니다. ~~(...라고 써두고 나서도 감이 안오네요;; )~~

자세한 내용은 아래 torchmultiverse 소개를 참고해주세요!

더 읽어보기

PUG 데이터셋 홈페이지

Datasheet

논문: PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

GitHub 저장소 (예제 코드 및 소개)

https://github.com/facebookresearch/PUG