Skip to content

Commit

Permalink
Update on "Support different NSE in batches of CSR and CSC tensors"
Browse files Browse the repository at this point in the history
This PR enables batched CSR/CSC tensors that batches may have different NSE counts.

For instance, with the current master we have
```python
>>> a = torch.tensor([[[1, 2], [3, 4]], [[0, 12], [21, 0]]])
>>> a.to_sparse_csr()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expect the same number of specified elements per batch.
```
because the NSE of the first and second batches are different, 4 and 2, respectively.

This PR implements a strided-to-sparse-CSR/CSC conversion algorithm that supports CSR/CSC batches with different NSE counts. For instance:
```python
>>> a = torch.tensor([[[1, 2], [3, 4]], [[0, 12], [21, 0]]])
>>> b = a.to_sparse_csr()
>>> b
tensor(crow_indices=tensor([[0, 2, 4],
                            [0, 1, 2]]),
       col_indices=tensor([[0, 1, 0, 1],
                           [1, 0, 0, 0]]),
       values=tensor([[ 1,  2,  3,  4],
                      [12, 21,  0,  0]]), size=(2, 2, 2), nnz=4,
       layout=torch.sparse_csr)
>>> b[0]
tensor(crow_indices=tensor([0, 2, 4]),
       col_indices=tensor([0, 1, 0, 1]),
       values=tensor([1, 2, 3, 4]), size=(2, 2), nnz=4,
       layout=torch.sparse_csr)
>>> b[1]
tensor(crow_indices=tensor([0, 1, 2]),
       col_indices=tensor([1, 0]),
       values=tensor([12, 21]), size=(2, 2), nnz=2, layout=torch.sparse_csr)
```
that is, if the NSE of a batch is smaller than the maximum NSE over all batches, the corresponding rows in `col_indices`/`values` are padded with zeros as placeholders. Algorithms on batched CSR/CSC tensors must not access the padded parts of these tensors, that is, the algorithms should use the last element of the corresponding `crow_indices` row as the NSE value rather than the value of `.values().shape[0]` that holds the maximum NSE over all batches.

Performance-wise, the strided-to-sparse-CSR/CSC conversion algorithms in master and in this PR, are roughly equivalent:
```python
# master branch:
n [2]: a = torch.rand(10, 10, 1000, 1000)

In [3]: a = torch.where(a==0, 0.1, a)  # required for master, optional for the PR

In [4]: %timeit a.to_sparse_csr()
2.25 s ± 9.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: a_cuda = a.cuda()

In [6]: %timeit a_cuda.to_sparse_csr()
55.2 ms ± 6.95 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
```python
# this PR
In [2]: a = torch.rand(10, 10, 1000, 1000)

In [3]: a = torch.where(a==0, 0.1, a)  # required for master, optional for the PR

In [4]: %timeit a.to_sparse_csr()
2.12 s ± 2.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: a_cuda = a.cuda()

In [6]: %timeit a_cuda.to_sparse_csr(); torch.cuda.synchronize()
47.2 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
The performance of `to_sparse_csr()` on CUDA tensors increased by 15% with this PR.
 
A strided-to-sparse-BSR/BSC conversion with variable NSE support will be implemented as a follow-up.




[ghstack-poisoned]
  • Loading branch information
pearu committed Sep 19, 2022
2 parents 90fe9e9 + 60375a1 commit 18f4bfc
Show file tree
Hide file tree
Showing 562 changed files with 16,463 additions and 14,294 deletions.
2 changes: 1 addition & 1 deletion .circleci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
Expand Down
8 changes: 7 additions & 1 deletion .circleci/docker/common/install_cudnn.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive"
curl -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
else
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
fi

tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/include/
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
Expand Down
1 change: 1 addition & 0 deletions .circleci/docker/ubuntu-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

# Install CUDNN
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
Expand Down
2 changes: 1 addition & 1 deletion .circleci/scripts/windows_cudnn_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ case ${CUDA_VERSION} in
;;
11.7)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
cudnn_file_name="cudnn-windows-x86_64-8.5.0.96_cuda11-archive"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
Expand Down
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/torchdynamo.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ffd056dc1510bdfecafb689ed87601055694f3e6
41c44bc1d080d6cf063419a4166732b983b84eef
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/vision.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a67cc87a33a3f713aebf5299bdeb2672c98e0bc5
a4f53308b2d0f1aa9191686e326f45c26053f686
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/xla.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
e0dcc3171c8024ab288551d105fba24fbfae7332
307af4313d2b0b0236618ef837959a41068cc272
4 changes: 4 additions & 0 deletions .github/merge_rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- .jenkins/caffe2/*
- aten/src/ATen/core/interned_strings.h
- docs/source/onnx.rst
- docs/source/onnx*
- docs/source/scripts/onnx/**
- scripts/onnx/**
- test/jit/test_export_modes.py
Expand All @@ -15,6 +16,8 @@
- torch/csrc/jit/serialization/onnx.*
- torch/csrc/onnx/**
- torch/onnx/**
- third_party/onnx
- caffe2/python/onnx/**
approved_by:
- BowenBao
- abock
Expand Down Expand Up @@ -323,6 +326,7 @@
- '*'
approved_by:
- pytorch/metamates
- mruberry
mandatory_checks_name:
- Facebook CLA Check
- Lint
Expand Down
2 changes: 1 addition & 1 deletion .github/scripts/generate_binary_build_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from typing import Dict, List, Tuple, Optional


CUDA_ARCHES = ["10.2", "11.3", "11.6", "11.7"]
CUDA_ARCHES = ["10.2", "11.6", "11.7"]


ROCM_ARCHES = ["5.1.1", "5.2"]
Expand Down
9 changes: 0 additions & 9 deletions .github/scripts/generate_ci_workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,15 +207,6 @@ class OperatingSystem:
),
]
WINDOWS_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="wheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.WINDOWS,
arches=["11.3"],
python_versions=["3.7"]),
branches="master",
),
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="libtorch",
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/_linux-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ jobs:
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PR_BODY: ${{ github.event.pull_request.body }}
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
SCCACHE_S3_KEY_PREFIX: ${{ github.workflow }}
SHM_SIZE: ${{ contains(inputs.build-environment, 'cuda') && '2g' || '1g' }}
DOCKER_IMAGE: ${{ inputs.docker-image }}
XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }}
Expand Down Expand Up @@ -171,6 +172,7 @@ jobs:
-e PR_LABELS \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e SCCACHE_S3_KEY_PREFIX \
-e XLA_CUDA \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
Expand Down
33 changes: 30 additions & 3 deletions .github/workflows/_mac-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,21 @@ on:
default: "3.8"
description: |
The python version to be used. Will be 3.8 by default
test-matrix:
required: false
type: string
description: |
An option JSON description of what test configs to run later on. This
is moved here from the Linux test workflow so that we can apply filter
logic using test-config labels earlier and skip unnecessary builds
outputs:
test-matrix:
value: ${{ inputs.test-matrix }}
description: An optional JSON description of what test configs to run later on.
build-outcome:
value: ${{ jobs.build.outputs.build-outcome }}
description: The outcome of the build step. This is used to influence test filtering logic later on.

secrets:
MACOS_SCCACHE_S3_ACCESS_KEY_ID:
Expand All @@ -52,6 +67,8 @@ jobs:
AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }}
BUILD_ENVIRONMENT: ${{ inputs.build-environment }}
outputs:
build-outcome: ${{ steps.build.outcome }}
steps:
# [see note: pytorch repo ref]
- name: Checkout PyTorch
Expand Down Expand Up @@ -90,21 +107,31 @@ jobs:
with:
github-token: ${{ secrets.GITHUB_TOKEN }}

# Apply the filter logic to the build step too if the test-config label is already there
- name: Select all requested test configurations (if the test matrix is available)
id: filter
uses: ./.github/actions/filter-test-configs
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
test-matrix: ${{ inputs.test-matrix }}

- name: Build
if: steps.filter.outputs.is-test-matrix-empty == 'False' || inputs.test-matrix == ''
id: build
env:
OUR_GITHUB_JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
run: |
echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}"
${CONDA_RUN} .jenkins/pytorch/macos-build.sh
- name: Archive artifacts into zip
if: inputs.build-generates-artifacts
if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped'
run: |
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json
- name: Store PyTorch Build Artifacts on GHA
uses: actions/upload-artifact@v2
if: inputs.build-generates-artifacts
if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped'
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
Expand All @@ -114,7 +141,7 @@ jobs:
- name: Upload sccache stats to GHA
uses: actions/upload-artifact@v2
# Only if sccache is installed, see above
if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }}
if: ${{ (github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository) && steps.build.outcome != 'skipped' }}
with:
name: sccache-stats-${{ inputs.build-environment }}-runattempt${{ github.run_attempt }}-${{ steps.get-job-id.outputs.job-id }}
retention-days: 14
Expand Down
28 changes: 25 additions & 3 deletions .github/workflows/_mac-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,38 @@ on:
description: secret acess key for test stats upload

jobs:
# This needs to be run right before the test starts so that it can gather the
# latest labels from the PR
filter:
runs-on: [self-hosted, linux.large]
outputs:
test-matrix: ${{ steps.filter.outputs.test-matrix }}
is-test-matrix-empty: ${{ steps.filter.outputs.is-test-matrix-empty }}
steps:
- name: Checkout PyTorch
uses: pytorch/pytorch/.github/actions/checkout-pytorch@master
with:
fetch-depth: 1
submodules: false

- name: Select all requested test configurations
id: filter
uses: ./.github/actions/filter-test-configs
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
test-matrix: ${{ inputs.test-matrix }}

test:
# Don't run on forked repos.
if: github.repository_owner == 'pytorch'
needs: filter
# Don't run on forked repos or empty test matrix
if: github.repository_owner == 'pytorch' && needs.filter.outputs.is-test-matrix-empty == 'False'
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
# Also ensure that we always run with the right architecture
defaults:
run:
shell: arch -arch ${{ inputs.arch }} bash -e -l {0}
strategy:
matrix: ${{ fromJSON(inputs.test-matrix) }}
matrix: ${{ fromJSON(needs.filter.outputs.test-matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: 240
Expand Down

0 comments on commit 18f4bfc

Please sign in to comment.