[Release] PyTorch 2.1이 공개되었습니다 🎉

아래는 릴리즈 노트의 일부분입니다.
원문은 아래에서 보실 수 있으며, 자세한 내용 정리는 이따 저녁에 정리해서 올리겠습니다~ :sweat_smile:


PyTorch 2.1: automatic dynamic shape compilation, distributed checkpointing

PyTorch 2.1 Release Notes

  • Highlights
  • Backwards Incompatible Change
  • Deprecations
  • New Features
  • Improvements
  • Bug fixes
  • Performance
  • Documentation
  • Developers
  • Security

Highlights

We are excited to announce the release of PyTorch® 2.1! PyTorch 2.1 offers automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API.

In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization.

Along with 2.1, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.

This release is composed of 6,682 commits and 784 contributors since 2.0. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.1. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

Summary:

  • torch.compile now includes automatic support for detecting and minimizing recompilations due to tensor shape changes using automatic dynamic shapes.
  • torch.distributed.checkpoint enables saving and loading models from multiple ranks in parallel, as well as resharding due to changes in cluster topology.
  • torch.compile can now compile NumPy operations via translating them into PyTorch-equivalent operations.
  • torch.compile now includes improved support for Python 3.11.
  • New CPU performance features include inductor improvements (e.g. bfloat16 support and dynamic shapes), AVX512 kernel support, and scaled-dot-product-attention kernels.
  • torch.export, a sound full-graph capture mechanism is introduced as a prototype feature, as well as torch.export-based quantization.
  • torch.sparse now includes prototype support for semi-structured (2:4) sparsity on NVIDIA® GPUs.
Stable Beta Prototype Performance Improvements
Automatic Dynamic Shapes torch.export() AVX512 kernel support
torch.distributed.checkpoint torch.export-based Quantization CPU optimizations for scaled-dot-product-attention (SDPA)
torch.compile + NumPy semi-structured (2:4) sparsity CPU optimizations for bfloat16
torch.compile + Python 3.11 cpp_wrapper for torchinductor
torch.compile + autograd.Function
third-party device integration: PrivateUse1

*To see a full list of public 2.1, 2.0, and 1.13 feature submissions click here.

For more details about these highlighted features, you can look at the release blogpost.
Below are the full release notes for this release.

2개의 좋아요

파이썬도 TF도 업데이트되고 볼게 많네요.