아래는 릴리즈 노트의 일부분입니다.
원문은 아래에서 보실 수 있으며, 자세한 내용 정리는 이따 저녁에 정리해서 올리겠습니다~
PyTorch 2.1: automatic dynamic shape compilation, distributed checkpointing
PyTorch 2.1 Release Notes
- Highlights
- Backwards Incompatible Change
- Deprecations
- New Features
- Improvements
- Bug fixes
- Performance
- Documentation
- Developers
- Security
Highlights
We are excited to announce the release of PyTorch® 2.1! PyTorch 2.1 offers automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API.
In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export
-based quantization.
Along with 2.1, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.
This release is composed of 6,682 commits and 784 contributors since 2.0. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.1. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.
Summary:
torch.compile
now includes automatic support for detecting and minimizing recompilations due to tensor shape changes using automatic dynamic shapes.torch.distributed.checkpoint
enables saving and loading models from multiple ranks in parallel, as well as resharding due to changes in cluster topology.torch.compile
can now compile NumPy operations via translating them into PyTorch-equivalent operations.torch.compile
now includes improved support for Python 3.11.- New CPU performance features include inductor improvements (e.g. bfloat16 support and dynamic shapes), AVX512 kernel support, and scaled-dot-product-attention kernels.
torch.export
, a sound full-graph capture mechanism is introduced as a prototype feature, as well as torch.export-based quantization.torch.sparse
now includes prototype support for semi-structured (2:4) sparsity on NVIDIA® GPUs.
Stable | Beta | Prototype | Performance Improvements |
---|---|---|---|
Automatic Dynamic Shapes | torch.export() | AVX512 kernel support | |
torch.distributed.checkpoint | torch.export-based Quantization | CPU optimizations for scaled-dot-product-attention (SDPA) | |
torch.compile + NumPy | semi-structured (2:4) sparsity | CPU optimizations for bfloat16 | |
torch.compile + Python 3.11 | cpp_wrapper for torchinductor | ||
torch.compile + autograd.Function | |||
third-party device integration: PrivateUse1 |
*To see a full list of public 2.1, 2.0, and 1.13 feature submissions click here.
For more details about these highlighted features, you can look at the release blogpost.
Below are the full release notes for this release.