Pulse · pytorch/pytorch · GitHub

May 25, 2024 – June 1, 2024

Overview

203 Active pull requests

216 Active issues

7 Pull requests merged by 5 people

[EZ] Pin scipy to 1.12 for Py-3.12
#127322 merged May 29, 2024
Update hf_BirdBird periodic-dynamo-benchmarks results
#127312 merged May 29, 2024
Put back "[Release only] Release 2.3 start using triton package from pypi""
#127290 merged May 28, 2024
[DSD] Fix to remove non_persistent buffer in distributed state dict (#125337)
#127219 merged May 27, 2024
[DSD] Add a test to verify FSDP lazy initialization case (#127069)
#127130 merged May 27, 2024
[DCP][state_dict] Remove the check of FSDP has root (#121544)
#126557 merged May 27, 2024
[DSD] Correctly handle _extra_state (#125336)
#126567 merged May 27, 2024

196 Pull requests opened by 110 people

Warn unused functions
#127185 opened May 26, 2024
Quick Fix on #126854, deepcopy `lr` and other possible `base_parameters`
#127190 opened May 26, 2024
[inductor][cpp] BF16 AMX micro-gemm support
#127195 opened May 26, 2024
Reduce number of samples in {svd,pca}_lowrank OpInfos
#127199 opened May 26, 2024
Implement a generic function scheduler
#127200 opened May 26, 2024
[Inductor][CPP] Enable int8 GEMM Template
#127206 opened May 27, 2024
Add error checking for boolean beta and alpha for fake tensor impl of torch.addr
#127207 opened May 27, 2024
[Inductor] Skip model_fail_to_load and eager_fail_to_run models in inductor benchmarks test
#127210 opened May 27, 2024
[Inductor][CPP] Fix FP16 GEMM Template UT failure with FP16 instruction support
#127211 opened May 27, 2024
[Inductor][CPP] Fallback QLinear Binaryfusion from postop sum to binary add when others is view
#127212 opened May 27, 2024
Enable OP: convtranspose into operator benchmark
#127216 opened May 27, 2024
Add OpInfo entry for as_strided_copy
#127231 opened May 27, 2024
Add OpInfo entry for add_alias
#127232 opened May 27, 2024
[Clang Tidy] Fix misc-header-include-cycle errors in clang-tidy and ignore some files
#127233 opened May 27, 2024
[easy?] Move AsyncCompile to a different file
#127235 opened May 27, 2024
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127238 opened May 27, 2024
[Do not review] Top of Traceable FSDP2 stack - to be broken up
#127239 opened May 27, 2024
[MPS] Fused Adam & AdamW
#127242 opened May 27, 2024
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127243 opened May 28, 2024
[DO NOT REVIEW][NOT USED] Add queue_callback support
#127244 opened May 28, 2024
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127245 opened May 28, 2024
[DO NOT REVIEW][NOT USED] Add queue_callback support
#127246 opened May 28, 2024
Add Dynamo support for run_with_rng_state HOP
#127247 opened May 28, 2024
[Traceable FSDP2] Improve FSDPManagedNNModuleVariable support
#127249 opened May 28, 2024
fix code
#127252 opened May 28, 2024
Support SetVariable mutation
#127253 opened May 28, 2024
Check hasattr before comparing source name
#127254 opened May 28, 2024
[Traceable FSDP2] Workaround in nn_module_proxy()
#127255 opened May 28, 2024
[WIP] Improve __bool__ access handling for UserDefinedObjectVariable
#127257 opened May 28, 2024
Trace TensorVariable attribute mutation if call_setattr is called
#127258 opened May 28, 2024
Add support for register_post_accumulate_grad_hook
#127259 opened May 28, 2024
Add tracing support for out-variant custom ops that return None
#127260 opened May 28, 2024
[NOT USED] support inplace all_gather
#127261 opened May 28, 2024
[Doc] fix some typos (found by codespell and typos)
#127267 opened May 28, 2024
Add new Inductor-friendly op _fp8_mm and lowering
#127268 opened May 28, 2024
[quant] Enable auto code completion by import during type checking
#127269 opened May 28, 2024
add xpu for amp
#127276 opened May 28, 2024
Enable deterministic support for oneDNN
#127277 opened May 28, 2024
update amp example to device-agnostic
#127278 opened May 28, 2024
add xpu to torch.compile
#127279 opened May 28, 2024
add xpu to torch.tensors
#127280 opened May 28, 2024
Select Runner Label Dynamically
#127287 opened May 28, 2024
[inductor] enable bf32 test for mkldnn conv
#127293 opened May 28, 2024
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 opened May 28, 2024
functorch x torch.compile: Remove functools.wraps hack
#127302 opened May 28, 2024
Remove parts of AOTDedupWrapper logic since we have dynamo dedup
#127306 opened May 28, 2024
[halide-backend] Add test shard
#127308 opened May 28, 2024
Use by-column algorithm for fp16/bf16 CPUBlas gemm_transb kernels
#127318 opened May 28, 2024
Remove tensor storage_offset/storage_bytes from the cache key
#127319 opened May 28, 2024
[PT2Quant] Migrate move_exported_model_to_eval to torch.export._trace._export
#127323 opened May 28, 2024
[export] Fix fake mode detection with empty inputs.
#127327 opened May 28, 2024
[do-not-review][wip] remove TLS cudagraph tree manager for CA
#127330 opened May 28, 2024
[wip] mark compiled autograd params as static addresses for cudagraphs
#127331 opened May 28, 2024
[pipelining] rewrite interleaved 1f1b
#127332 opened May 28, 2024
[Inductor] Emit strided block pointer from ModularIndexing and FloorDiv
#127342 opened May 28, 2024
add auto-functionalize support for mutable list[Tensor]
#127347 opened May 28, 2024
Retire torch.distributed.pipeline
#127354 opened May 28, 2024
[Functorch][cuDNN] Bump tolerances for `test_vmapjvpvjp`
#127355 opened May 28, 2024
[dtensor][debug] added c10d reduce_scatter_ and reduce_scatter_tensor_coalesced tracing_ to CommDebugMode
#127358 opened May 28, 2024
[dtensor][debug] added c10d alltoall_ and alltoall_base_ to CommDebugMode
#127360 opened May 28, 2024
[c10d] guard gpu context during abort
#127363 opened May 29, 2024
UFMT format on test_fake_tesnor.py test_futures.py test_fx.py
#127369 opened May 29, 2024
Migrate CalculateSmallVectorDefaultInlinedElements to constexpr Function for SmallVector
#127370 opened May 29, 2024
Remove cuda check in the CUDAGraph destrcutor
#127382 opened May 29, 2024
Masked scale meta function registration #119984
#127389 opened May 29, 2024
Build SYCL kernels for ATen XPU ops on Native Windows (take 2)
#127390 opened May 29, 2024
[sym_shapes] Allow check_is_size for backed symints
#127395 opened May 29, 2024
[inductor] Enable subprocess-based parallel compile internally
#127401 opened May 29, 2024
Handle unpacking during TorchScript to ExportedProgram conversion
#127419 opened May 29, 2024
[c10d] Add commCreateFromRanks to c10d
#127421 opened May 29, 2024
[inductor] Take absolute value of strides when picking loop order
#127425 opened May 29, 2024
Implement Graph Transform Observer
#127427 opened May 29, 2024
Add NaturalDiv to distinguish from FloorDiv/TruncDiv
#127430 opened May 29, 2024
[DRAFT] nested tensor subclass support
#127431 opened May 29, 2024
[export] provide refine function for automatically accepting dynamic shapes suggested fixes
#127436 opened May 29, 2024
[jit] Validate mobile module fields parsed by flatbuffer loader
#127437 opened May 29, 2024
Update to cuda 12.4.1
#127439 opened May 29, 2024
[TESTING] Run inductor CI on custom Triton pin
#127450 opened May 29, 2024
force_stride_order on fused_all_gather_matmul/fused_matmul_reduce_scatter's operands to avoid a copy due to layout transformation
#127454 opened May 29, 2024
Improve the scheduling for fused_matmul_reduce_scatter
#127455 opened May 29, 2024
fix post_grad pattern
#127457 opened May 29, 2024
Add typing annotations to pattern_matcher.py
#127458 opened May 29, 2024
documentation for pattern_matcher.py
#127459 opened May 29, 2024
Add registry for TorchScript to ExportedProgram conversion
#127464 opened May 29, 2024
update test_reformer_train test to handle nn module inlining
#127467 opened May 29, 2024
Add new convenience checks for PyTorch operators/kernels
#127469 opened May 29, 2024
Save quantization_tag in export graph serialization
#127473 opened May 29, 2024
[pipelining] Stress test schedules with multi iters
#127475 opened May 29, 2024
Onboard ARM bfloat16 to gemm-by-dot-product-for-gemm_transa_ infrastructure
#127477 opened May 29, 2024
Patch ARM Half use_gemv_fast_path gate to avoid kernel duplication
#127478 opened May 29, 2024
[pipelining] Add forward-only tests to mimic inference
#127479 opened May 29, 2024
Onboard ARM bfloat16 to gemv fast path
#127484 opened May 29, 2024
reset dynamo in test_do_not_skip_side_effects unit test loop to avoid dynamo cache limit hit
#127487 opened May 30, 2024
Use bfdot instruction in ARM bfloat16 dot product if compiling with it available
#127488 opened May 30, 2024
[caffe2][be] migrate global static initializer
#127489 opened May 30, 2024
[NestedTensor] Extend coverage for unbind when ragged_idx != 1
#127493 opened May 30, 2024
Properly detect nested torch function args
#127496 opened May 30, 2024
Check unused variables in tests
#127498 opened May 30, 2024
Use less A100.large shards for torchao perf benchmark
#127499 opened May 30, 2024
[cpuinfo] bump cpuinfo to the latest to support amx isa check
#127505 opened May 30, 2024
[halide-backend] Add GPU support
#127506 opened May 30, 2024
[Intel GPU]Enable fp64 GEMM
#127507 opened May 30, 2024
[Intel GPU]Enable fp64 double GEMM
#127508 opened May 30, 2024
[BE][ptd_fb_test][1/N] Enable testslide
#127512 opened May 30, 2024
Add linker script optimization flag to CMAKE rule for CUDA ARM wheel
#127514 opened May 30, 2024
Supervisor as a torchrun rendezvous impl
#127515 opened May 30, 2024
Try setting memory budget to 0.5 by default
#127520 opened May 30, 2024
[TORCH_FA2_flash_api] Update total_q to the reshaped query 0th dimension
#127524 opened May 30, 2024
Validate file and handle exceptions for weights_only unpickler
#127526 opened May 30, 2024
[ROCm] Fix error in torch.cuda initialisation if amdsmi is not available
#127528 opened May 30, 2024
[WIP] Add decomposition for aten.bernoulli.p
#127537 opened May 30, 2024
Always simplify sympy expressions before printing.
#127543 opened May 30, 2024
Handle aten::__contains__ during TorchScript to ExportedProgram conversion
#127544 opened May 30, 2024
Add noqa to prevent lint warnings
#127545 opened May 30, 2024
Update doc for nn.Module.load_state_dict(assign=True) to be more clear about semantic
#127549 opened May 30, 2024
add wider support for input mutations during backward in torch.compile, as long as they are hidden from autograd
#127551 opened May 30, 2024
update test_ddp_graphs to reflect inlined nn module behaviour, ddp graph splitting not triggered when modules inlined.
#127553 opened May 30, 2024
[export] only add guard-related runtime asserts once, silence if needed
#127554 opened May 30, 2024
Add profiler annotation for fused_all_gather_matmul and fused_matmul_reduce_scatter
#127556 opened May 30, 2024
[pipelining] test pipeline_order in schedule
#127559 opened May 30, 2024
Remove unstable ARC jobs
#127563 opened May 30, 2024
Joy dev
#127564 opened May 30, 2024
[Split Build] Make libtorch_global_deps accessible from libtorch wheel
#127570 opened May 30, 2024
[DO NOT MERGE] Testing new runners
#127573 opened May 30, 2024
[AOTInductor] [Tooling] Update NaN and INF Checker for AOTInductor
#127574 opened May 30, 2024
Use freshly traced jit-traced module to be used in export analysis
#127577 opened May 30, 2024
[DO NOT MERGE] Testing additional LF config labels
#127579 opened May 30, 2024
Handle custom op during TorchScript to ExportedProgram conversion
#127580 opened May 30, 2024
[inductor] fix redis-related env vars in remote_cache.py
#127583 opened May 30, 2024
[FSDP2] Fix submesh slicing to enable 3D parallelism
#127585 opened May 31, 2024
[WIP] mark NestedInts as symints instead of symfloats
#127587 opened May 31, 2024
[CI] Comment hf_T5_generate, hf_GPT2 and timm_efficientnet in inductor cpu smoketest for performance unstable issue
#127588 opened May 31, 2024
[Do not merge] [Test] Test builder cudnn v9 change
#127589 opened May 31, 2024
[Quant][Inductor] Add get_mutation_names in QLinearPointwiseBinaryPT2E IR
#127592 opened May 31, 2024
[ts migration] support aten::dim, aten::len, aten::__getitem__
#127593 opened May 31, 2024
Disable one of the GPT2 SDPA patterns for single-thread case
#127594 opened May 31, 2024
WIP: fake tensor SymInt support
#127596 opened May 31, 2024
test linear add bias
#127597 opened May 31, 2024
[Inductor][CPP] Support more than one LocalBuffer
#127598 opened May 31, 2024
[fbgemm_gpu] remove deleted embedding_backward_dense_host references
#127599 opened May 31, 2024
Don't find statically linked libs in TorchConfig.cmake.in
#127601 opened May 31, 2024
[WIP] Reuse UT for Intel GPU backend [Part1]
#127602 opened May 31, 2024
[export][unflatten] More strictly respect scope when removing inputs
#127607 opened May 31, 2024
Autoselect default device in FSDP construction.
#127609 opened May 31, 2024
[AOTI] align data_size of the constants
#127610 opened May 31, 2024
[CI] disable td for xpu ci test by default
#127611 opened May 31, 2024
[inductor] custom do_bench_gpu to reduce max-autotune overhead
#127613 opened May 31, 2024
Add functionality to make ViewAndMutationData (slightly more) cache safe
#127618 opened May 31, 2024
Validate tensor storage before a direct access to it
#127619 opened May 31, 2024
[caffe2][be][2/n] migrate gloabl static initializer
#127620 opened May 31, 2024
Updating Module Tracker
#127624 opened May 31, 2024
[Testing only] Flip default on weights_only
#127627 opened May 31, 2024
[RFC] Introduce Checkpointable for DCP (#127540)
#127628 opened May 31, 2024
[Inductor UT][Intel GPU] Skip test case which doesn't currently work on the XPU stack but newly re-enabled by community.
#127629 opened May 31, 2024
[dtensor][debug] created example test to print module parameters
#127631 opened May 31, 2024
wip test change
#127632 opened May 31, 2024
[export] Handle serializing duplicate getitem nodes
#127633 opened May 31, 2024
[aten_cuda/flash_attn] Add typename to template argument Kernel_trait…
#127634 opened May 31, 2024
[DSD] Fixes various bugs for broadcast_from_rank0
#127635 opened May 31, 2024
[inductor] parallel-compile: call triton_key() before forking
#127639 opened May 31, 2024
[ONNX] Add quantized layer norm op to opset 17
#127640 opened May 31, 2024
View specialization
#127641 opened May 31, 2024
[ATen][Native] fixes sparse SPMV on aarch64
#127642 opened May 31, 2024
[FSDP] keep paras in torch.distributed.checkpoint.state_dict.set_optimizer_state_dict
#127644 opened May 31, 2024
GGML inspired int8 MM Metal shader
#127646 opened May 31, 2024
Test jit
#127647 opened May 31, 2024
[PT2][Optimus] Improve group batch fusion with same parent/users fusion enablement
#127648 opened May 31, 2024
Relax Symbol check on debug_compile mode
#127650 opened May 31, 2024
[Perf] Provide API to retrieve less data from recorder
#127651 opened May 31, 2024
[CI] Gen xml during run
#127653 opened May 31, 2024
[c10d][BE] fix test_init_pg_and_rpc_with_same_socket
#127654 opened May 31, 2024
[Caffe2] Remove Caffe2 proto and other files
#127655 opened May 31, 2024
[ts-migration] support aten::__is__, aten::__isnot__, aten::__not__, profiler::_record_function_enter_new, profiler::_record_function_exit
#127656 opened May 31, 2024
[inductor] simplify indexing
#127661 opened Jun 1, 2024
Allow symint inputs to aten.expand_copy and aten.view_copy
#127662 opened Jun 1, 2024
Inductor: Allow small sizes of m for mixed mm autotuning
#127663 opened Jun 1, 2024
adjust thresholds for gluon_inception_v3, beit_base_patch16_224, phli…
#127664 opened Jun 1, 2024
[c10d] add a simple test to demonstrate the user usage of collectives
#127665 opened Jun 1, 2024
[wip][cudagraphs] static input params flag
#127668 opened Jun 1, 2024
[DO NOT MERGE] Fuzzkatt/12 4 inductor tolerance fixes debug
#127669 opened Jun 1, 2024
declare 'static' if the function is not intended to be used outside o…
#127670 opened Jun 1, 2024
[Inductor][CI][CUDA 12.4] Update dynamic_inductor_timm_training.csv - change gluon_inception_v3 from fail_accuracy to pass
#127672 opened Jun 1, 2024
[pipelining][review-only] Simple 1F1B schedule
#127673 opened Jun 1, 2024
[AOTI] Switch to use shim v2
#127674 opened Jun 1, 2024
[WIP] add optional boolean arg "true" to cumsum signature
#127675 opened Jun 1, 2024
[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value
#127678 opened Jun 1, 2024
[symbolic shapes] if symbol not in var_to_range default to unknown range
#127681 opened Jun 1, 2024
wip
#127682 opened Jun 1, 2024
Simplify CMake code
#127683 opened Jun 1, 2024
[BE][Ez]: Enable ruff PYI019
#127684 opened Jun 1, 2024
[BE][Ez]: Apply PYI059 - Generic always come last
#127685 opened Jun 1, 2024
[inductor][cpp] align dtype of local buffer with the global one
#127687 opened Jun 1, 2024
[BE]: Enable ruff TCH rules and autofixes for better imports
#127688 opened Jun 1, 2024
[BE] wrap deprecated function/class with `typing_extensions.deprecated`
#127689 opened Jun 1, 2024
Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()`
#127690 opened Jun 1, 2024
Retry of D58015187 Move AsyncCompile to a different file
#127691 opened Jun 1, 2024

112 Issues closed by 33 people

TreeSpec equal logic return False for the same tree in str format
#127447 closed Jun 1, 2024
Upgrade MacOS runner to 14
#127490 closed Jun 1, 2024
UNSTABLE trunk / macos-py3-arm64
#127542 closed Jun 1, 2024
When testing the scalar version, test_torchinductor.py will fail
#126763 closed Jun 1, 2024
[Feature] `_foreach_copy_` supports different src/dst dtype with fusion
#115171 closed Jun 1, 2024
Segmentation fault between Numpy and Pytorch using torch.bmm
#93161 closed May 31, 2024
Type conversion between float/complex
#97888 closed May 31, 2024
PyTorch Distributed Load Updates or Returns `state_dict`
#125096 closed May 31, 2024
[NestedTensor] RelaxedUnspecConstraint failures due to mark_dynamic in NT constructor
#127097 closed May 31, 2024
Build without MKL is not possible when MKL is installed
#32407 closed May 31, 2024
[RFC] rename allow_in_graph to unsafe_allow_in_graph
#122189 closed May 31, 2024
dcp resharding does not work for optimizer state_dict
#91205 closed May 31, 2024
Remove `coordinator_rank` from public fns in distributed checkpointing, e.g. `load`
#119205 closed May 31, 2024
Support torch.distributed.checkpoint.state_dict.get_model_state_dict/set_model_state_dict when torch dist is not initialized
#124942 closed May 31, 2024
DCP sees 1/2 of the expected size of each tensor in 3D parallel
#126595 closed May 31, 2024
Loading Old Checkpoints with DTensor
#127351 closed May 31, 2024
The training always freezes after some epochs.
#22671 closed May 31, 2024
[AOTI][cpp_wrapper] test_triton_heuristics.py::test_artificial_grid_cpp_wrapper disabled
#123210 closed May 31, 2024
[ONNX] export.export + dynamo_export fail on GLU because decompositions are not run
#125894 closed May 31, 2024
dynamo breaks when getting attributes of builtins
#127172 closed May 31, 2024
[optim] add missing 'maximize' parameter to LBFGS, NAdam and RAdam optimizers
#126642 closed May 31, 2024
[ONNX] Memory leak when exporting a jit model to onnx
#82532 closed May 31, 2024
[ONNX] Simplify fake tensor export logic to get rid of the redundant model argument
#127141 closed May 31, 2024
Do we need an N-dim sub-DeviceMesh?
#126530 closed May 30, 2024
Fix torch._dynamo.exc.Unsupported: builtin: index when nn module inlining enabled.
#127426 closed May 30, 2024
Incorrect output from inductor: tile -> as_strided -> add
#127474 closed May 30, 2024
saved_tensors_hooks auto delete custom attributes
#126676 closed May 30, 2024
Add a `descending` flag to `linalg.eigh` and `linalg.svd`
#58034 closed May 30, 2024
UNSTABLE periodic / linux-jammy-xpu-py3.8 / test (default)
#127539 closed May 30, 2024
Error while parsing dispatch dictionary for NativeFunctions YAML in torchgen
#127405 closed May 30, 2024
Error while parsing NativeFunction from YAML in torchgen
#127406 closed May 30, 2024
Error while parsing DeviceCheckType for NativeFunctions YAML in torchgen
#127407 closed May 30, 2024
Error while parsing function signature for NativeFunctions YAML in torchgen
#127408 closed May 30, 2024
Missing structured delegate function in YAML in torchgen
#127409 closed May 30, 2024
Error while parsing Yaml file in torchgen
#127410 closed May 30, 2024
Error while parsing precomputed field for NativeFunctions YAML in torchgen
#127411 closed May 30, 2024
Error while parsing function parameter with default value for NativeFunctions YAML in torchgen
#127404 closed May 30, 2024
Inconsistent gradients of Conv2d layers when training the same model using CPU and GPU
#127226 closed May 30, 2024
DISABLED test_non_contiguous_input_addmm (__main__.TestMaxAutotune)
#126176 closed May 30, 2024
DISABLED test_activations_bfloat16_half_cpu_cpu_float16 (__main__.TestNNDeviceTypeCPU)
#126177 closed May 30, 2024
DISABLED test_quantization_doc_fx (__main__.TestQuantizationDocs)
#125670 closed May 30, 2024
DISABLED test_quantization_doc_ptsq (__main__.TestQuantizationDocs)
#125669 closed May 30, 2024
DISABLED test_quantization_doc_custom (__main__.TestQuantizationDocs)
#125668 closed May 30, 2024
DISABLED test_quantization_doc_ptdq (__main__.TestQuantizationDocs)
#125667 closed May 30, 2024
PyTorch nightly docs build hasn't run since 4/8?
#127527 closed May 30, 2024
Check for error messages on torch.compile with pybind'ed functions
#126799 closed May 30, 2024
torch.svd_lowrank does not work for complex matrices.
#122188 closed May 30, 2024
error: ‘False’ was not declared in this scope
#127392 closed May 30, 2024
[Inductor] Masked `tl.load` operations should explicitly include `other` if the masked out values are expected to be used
#126535 closed May 30, 2024
Bad error message for aten::_local_scalar_dense on meta tensor
#119588 closed May 30, 2024
Document torch.Tensor legacy constructor
#122408 closed May 29, 2024
torch.fx.passes.split_module.split_module doesn't support dynamic shapes
#103539 closed May 29, 2024
[While_loop] How to use layer like `torch.nn.BatchNorm2d` with while_loop?
#127320 closed May 29, 2024
Add `str` type to `device` parameter of `torch.cuda.get_device_name()` on the doc
#126400 closed May 29, 2024
[MPS] `.to('mps')` zeroes out elements in tensors taking up >=2^32 bytes
#96716 closed May 29, 2024
Fix "failed running function instead of failed running module" when nn module inlining enabled.
#125605 closed May 29, 2024
[docs] scaled_dot_product_attention is_causal description is misleading
#126873 closed May 29, 2024
Fix torch._dynamo.exc.Unsupported: call_id withh args (UnspecializedNNModuleVariable() when TORCHDYNAMO_INLINE_INBUILT_NN_MODULES=1
#127095 closed May 29, 2024
running opcheck leads to `Fail to import hypothesis in common_utils, tests are not derandomized` print
#126871 closed May 29, 2024
opcheck has dependency on expecttest, which is not a pytorch runtime dependency, leading to "module not found" error message
#126870 closed May 29, 2024
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda11_8
#104727 closed May 29, 2024
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda12_1-test / test
#127288 closed May 29, 2024
MAX-Autotune Compilation Time Regression Due To Added MM Configs
#125687 closed May 29, 2024
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda12_4-test / test
#127289 closed May 29, 2024
AOTI doesn't produce meaningful response w/ CPU backend on Linux x86
#123990 closed May 29, 2024
[torch.compile] `index_select` out of bound read
#121251 closed May 29, 2024
[pipelining] Add back support for multi-use parameters/buffers
#126626 closed May 29, 2024
install cuda version always get cpuonly
#106565 closed May 28, 2024
test_view_dynamic_zero_dim no longer testing zero input
#105066 closed May 28, 2024
DISABLED test_some_outputs_dont_require_grad_view (__main__.TestAOTAutograd)
#125593 closed May 28, 2024
DISABLED test_some_output_requires_grad_input_doesnt (__main__.TestAOTAutograd)
#125402 closed May 28, 2024
DISABLED test_schema_correctness_nn_functional_conv3d_cuda_complex128 (__main__.TestSchemaCheckModeOpInfoCUDA)
#114573 closed May 28, 2024
DISABLED test_view_and_inplace_view (__main__.TestAOTAutograd)
#125671 closed May 28, 2024
DISABLED test_aot_export_multiple_outputs_require_grad_banned (__main__.TestAOTExport)
#124221 closed May 28, 2024
Profiler reports different # of Calls depending on group_by_stack_n
#83737 closed May 28, 2024
DISABLED test_profiler (__main__.TestJit)
#65521 closed May 28, 2024
In profiler, recorded block's total time can be less than the operators within the block
#43868 closed May 28, 2024
RuntimeError: MPS device does not support bmm for non-float inputs
#127178 closed May 28, 2024
Data race in RecordFunction::callbackShouldRun
#58452 closed May 28, 2024
Segmentation fault when importing `sklearn.model_selection`
#127192 closed May 28, 2024
UNSTABLE pull / linux-focal-cuda12.4-py3.10-gcc9 / build
#127108 closed May 28, 2024
pyyaml dependency error on Mac with source installation
#127158 closed May 28, 2024
Overly demanding PyTorch CLA
#127285 closed May 28, 2024
A UserWarning occurs after CBAM attention is added
#127198 closed May 28, 2024
When training done, the mode output same result each tensor input. ( I tried many way to debug, but can't find any way to fix it, so i guess this is a bug )
#127177 closed May 28, 2024
CLA required despite being existing contributor
#127272 closed May 28, 2024
Tensor slicing reducing dimensionality of tensor
#127236 closed May 28, 2024
torch.linalg.inv returns constantly the identity matrix for input on GPU
#127281 closed May 28, 2024
Tensor Parallel cannot work when tp mesh size is 1
#127213 closed May 28, 2024
Symbolic shapes unable to reason: Ne(Mod(u0*u2 + u1*u2, u0 + u1), 0)
#125307 closed May 28, 2024
fatal: not a git repository: '.git'
#127188 closed May 28, 2024
DISABLED test_transformerdecoder (__main__.TestNN)
#126043 closed May 28, 2024
DISABLED test_retrace_export_cond_simple_cuda_float32 (__main__.TestHOPCUDA)
#123564 closed May 27, 2024
DISABLED test_conv_empty_input_cpu_float32 (__main__.TestNNDeviceTypeCPU)
#126091 closed May 27, 2024
DISABLED test_activations_bfloat16_half_cpu_cpu_bfloat16 (__main__.TestNNDeviceTypeCPU)
#126090 closed May 27, 2024
DISABLED test_correctness_Adam_use_closure_False_cuda_float32 (__main__.CompiledOptimizerParityTestsCUDA)
#126076 closed May 27, 2024
DISABLED test_correctness_Rprop_use_closure_False_cuda_float32 (__main__.CompiledOptimizerParityTestsCUDA)
#126077 closed May 27, 2024
[Distributed] gloo backend, barrier operation is even slower than broadcast
#127179 closed May 27, 2024
Time taken to data loading increased in newer builds (ARM)
#124922 closed May 27, 2024
Importing torch after TensorFlow results in std::runtime_error
#101152 closed May 27, 2024
Improve discoverability of meta function registration in documentation
#126337 closed May 27, 2024
DISABLED test_min_cut_partitioner_output_tensor_shape_tensor (__main__.TestPartitioning)
#104326 closed May 27, 2024
DISABLED test_unbacked_cat_backwards_cuda (__main__.TestInductorDynamicCUDA)
#125019 closed May 27, 2024
DISABLED test_full_tensor_sync (__main__.DTensorTest)
#125366 closed May 27, 2024
DISABLED test_depthwise_convolution (__main__.DistConvolutionOpsTest)
#125565 closed May 27, 2024
DISABLED test_pad_dynamic_cuda (__main__.TestInductorDynamicCUDA)
#124276 closed May 27, 2024
DISABLED test_all_gather_object (__main__.TestObjectCollectives)
#125566 closed May 27, 2024
DISABLED test_dtensor_device_mesh_device_conversion (__main__.DTensorMeshTest)
#100126 closed May 27, 2024
Dynamo benchmarks direct arg passing doesn't work
#126543 closed May 26, 2024
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
#127181 closed May 26, 2024
torch compile failure with bool inputs with CPP backend
#126824 closed May 25, 2024
CMake -DBUILD_PYTHON=ON doesn't make & install Python bindings.
#126831 closed May 25, 2024

104 Issues opened by 79 people

UNSTABLE inductor / cuda12.4-py3.10-gcc9-sm86 / test (dynamic_inductor_timm)
#127680 opened Jun 1, 2024
RuntimeError: `jit.freeze` fails to find externally assigned attributes
#127679 opened Jun 1, 2024
Key error in index_propagation when looking up dynamic shape vr
#127677 opened Jun 1, 2024
module 'torch.mps' has no attribute 'device'
#127676 opened Jun 1, 2024
Add line number to ` _warn_capture_scalar_outputs():`
#127667 opened Jun 1, 2024
Strange clamp assert error when building on Fedora 40/gcc 14 in IndexKernel.hip
#127666 opened Jun 1, 2024
Inductor generates unnecessary allocation + copy operations for custom ops with mutable inputs
#127660 opened May 31, 2024
DISABLED test_workspace_allocation_error (__main__.CudaGraphTreeTests)
#127657 opened May 31, 2024
Illegal memory access resulted from pointwise autotuning of a cat-like kernel
#127652 opened May 31, 2024
OneCycleLR Example
#127649 opened May 31, 2024
[export] Errors out when unflattening TorchTitan
#127643 opened May 31, 2024
c++ library written with a lot of errors
#127638 opened May 31, 2024
SyntaxError: unterminated string literal (detected at line 1) (<unknown>, line 1)
#127637 opened May 31, 2024
dynamo minifier test test_cpu_cuda_module_after_dynamo fail with nn module inlining.
#127636 opened May 31, 2024
Fix accuracy regression for cspdarknet53 or flakiness associated with cu121 (and potentially cu124)
#127626 opened May 31, 2024
Segfault, possibly due to recursion limit
#127622 opened May 31, 2024
custom_op API: better type anntation for Tuple
#127621 opened May 31, 2024
TORCH_LIBRARY breaks when passing (unexpanded) macro as namespace argument
#127615 opened May 31, 2024
[Bug] Data on CPUs Are Not Synchronized Before Subsequent Operations
#127612 opened May 31, 2024
Error in gradient of CTCLoss
#127608 opened May 31, 2024
` to be a quantized tensor. Is this likely due to missing support for quantized `onnx::Pad`
#127606 opened May 31, 2024
RuntimeError: CUDA error: an illegal memory access was encountered with specific input shape and Conv1d
#127605 opened May 31, 2024
The data was supposed to be on the GPU, but was mistakenly placed on the CPU.
#127603 opened May 31, 2024
pip install torch==2.1.0 fails to be found
#127591 opened May 31, 2024
Equivalent index expression results in 50% perf difference in a triton kernel
#127581 opened May 30, 2024
AOTAutograd: allow input mutations in the bw that occur under no_grad
#127572 opened May 30, 2024
[export] Cannot mutate tensors with frozen storage
#127571 opened May 30, 2024
Nightly s390 manywheel-py3_8-cpu-s390x-build is failing
#127568 opened May 30, 2024
AOTAutograd: the partitioner should not move mutations from the backward into the forward.
#127561 opened May 30, 2024
[libtorch] duplicate symbol of vtable
#127560 opened May 30, 2024
Reserving this issue to test something, don't mind me
#127555 opened May 30, 2024
ddp graph splitting not happening when nn modules inlined.
#127552 opened May 30, 2024
TestNestedTensorSubclassCPU::test_chunk_cpu fails in debug mode
#127546 opened May 30, 2024
RuntimeError: derivative for aten::_spdiags is not implemented
#127541 opened May 30, 2024
DISABLED test_cusparse_multiple_threads_same_device (__main__.TestCuda)
#127536 opened May 30, 2024
Can we have a Y-Split module for torch.nn.Sequential?
#127535 opened May 30, 2024
Compilation from source fails (PYTORCH 1.13.1)
#127534 opened May 30, 2024
CUDA 12.5
#127532 opened May 30, 2024
Triton `kernel.run` segfaults when passed non-default stream
#127531 opened May 30, 2024
Run rstcheck on modified docstrings and docs as additional linter
#127530 opened May 30, 2024
Multiple unhandled exceptions in weights_only unpickler
#127525 opened May 30, 2024
SDPA memory efficient and flash attention kernels don't work with singleton dimensions
#127523 opened May 30, 2024
Modeling ViT does not support quantized models
#127521 opened May 30, 2024
[RFC] Load and register rendezvous backends dynamically as plugins at runtime
#127519 opened May 30, 2024
THPVariable_Check(list_elem) INTERNAL ASSERT FAILED
#127518 opened May 30, 2024
With pytest8.2.0 or later, test cases under test/distributed/ execute will meet issue "object has no attribute 'runTest'. Did you mean: 'run_test'"
#127517 opened May 30, 2024
[AOTI] Conv-BN folding on CPU not working anymore after benchmark script change in https://github.com/pytorch/pytorch/pull/123403
#127513 opened May 30, 2024
`assume_constant_result` does not work with method of `UnspecializedNNModuleVariable`
#127509 opened May 30, 2024
[BUG] Using custome backend for torch.compile give nothing outputs
#127502 opened May 30, 2024
Avoid Having to Register Op For ExternKernelChoice of Aten Refs
#127500 opened May 30, 2024
torch.compiler docstrings can be derived from _dynamo and _inductor docs
#127497 opened May 30, 2024
Multiplying a sparse CSR tensor by a strided vector nondeterministically fails on MacOS ARM64
#127491 opened May 30, 2024
investigate :torch._dynamo.exc.Unsupported: cache_size_limit reached when inlining nn module for test_do_not_skip_side_effects
#127483 opened May 29, 2024
Cannot fakeify a tensor who's .grad field is a tensor subclass
#127470 opened May 29, 2024
Add way to annotate a raw triton kernel using the triton kernel HOP so that it becomes make_fx-traceable
#127452 opened May 29, 2024
PyTorch version agnostic C++ extensions
#127445 opened May 29, 2024
address TODO: model is somehow not being freed when z3 is available
#127444 opened May 29, 2024
Ensure the Python custom ops tutorial actually runs in PyTorch 2.4
#127443 opened May 29, 2024
Release 2.3.1 validations checklist and cherry-picks
#127441 opened May 29, 2024
UNSTABLE inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm)
#127438 opened May 29, 2024
Crash in flatbuffer_loader during mobile model load from python API
#127434 opened May 29, 2024
index_select op not implemented on Vulkan backend
#127422 opened May 29, 2024
Fix torch._dynamo.exc.Unsupported: call_method GetAttrVariable(UnspecializedNNModuleVariable(CheckpointWrapper), __dict__) __contains__ [ConstantVariable()] {}
#127416 opened May 29, 2024
[inductor][cpu]abnormal performance improvement and accuracy drop for huggingface suit static/dynamic quantization
#127402 opened May 29, 2024
sys.maxsize special case doesn't work if you slightly offset the ranges
#127396 opened May 29, 2024
Unable to quantize resnet50 model with post training static quantization
#127391 opened May 29, 2024
torch=2.1, import torch; fails AttributeError: '_OpNamespace' 'aten' object has no attribute 'sym_constrain_range_for_size'
#127388 opened May 29, 2024
Backwards pass through Beta distribution rsample gives inf for 4 < alpha - 2**16 < 1040, beta = 3/2
#127387 opened May 29, 2024
4 GPT2 can't run into `_scaled_dot_product_flash_attention_for_cpu` using AOTI due to export related change in https://github.com/pytorch/pytorch/pull/123732
#127383 opened May 29, 2024
Different tensor strides can result in surprisingly large discrepancies in Conv2d outputs
#127375 opened May 29, 2024
Torch.compile produces Exception: Please convert all Tensors to FakeTensors first or instantiate
#127374 opened May 29, 2024
DISABLED test_dtensor_op_db_bmm_cpu_float32 (__main__.TestDTensorOpsCPU)
#127373 opened May 29, 2024
[Feature Request] switch amx isa detection in onednn to cpuinfo
#127368 opened May 29, 2024
torch.export.export() throws out an error when dealing weighttying model.
#127357 opened May 28, 2024
`flake8: noqa` disables flake8 linter for the whole file and it's not obvious
#127352 opened May 28, 2024
Dynamo should prune non-live captured variables
#127350 opened May 28, 2024
~PyTorch Docathon H1 2024!~
#127345 opened May 28, 2024
DISABLED test_comprehensive_fft_ifft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#127344 opened May 28, 2024
DISABLED test_arange2_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#127343 opened May 28, 2024
Add option for custom ops to automatically get a FakeTensor kernel (during static shapes)
#127337 opened May 28, 2024
Inductor fails at assert self.symbol_to_source.get(expr)
#127328 opened May 28, 2024
torch.compile reorder_for_compute_comm_overlap sink_waits pass does not work
#127324 opened May 28, 2024
torch.compile (inductor) bug random signed number generation
#127310 opened May 28, 2024
I don’t know if it’s a problem with cuda or pytorch
#127299 opened May 28, 2024
Fused AdamW maybe should accept lr_dict directly?
#127284 opened May 28, 2024
[BUG][JIT] `torch.jit.script` is not compatible with `DeprecationWarning` and `FutureWarning`
#127283 opened May 28, 2024
torch.fx.symbolic_trace doesn't support many Callable types
#127282 opened May 28, 2024
Python sometimes crashes inexplicably, with "/var/log/kernel. log" displaying index errors
#127275 opened May 28, 2024
DISABLED test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_False_cuda (__main__.TestLinalgCUDA)
#127273 opened May 28, 2024
ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler' when using triton 2.3.x
#127271 opened May 28, 2024
Not supporting higher protobuf versions
#127270 opened May 28, 2024
make_graphed_callables don't work with FSDP at all even on a simple network
#127225 opened May 27, 2024
RuntimeError: grid_sampler_2d_cpu not implemented for Half
#127224 opened May 27, 2024
[BUG] torch.linalg.lstsq returning wrong result.
#127223 opened May 27, 2024
Migrate `CalculateSmallVectorDefaultInlinedElements` to constexpr Function for SmallVector
#127222 opened May 27, 2024
lacking checking for ConvTranspose's parameters when running with GPUs
#127221 opened May 27, 2024
Comparing dynamic shapes fails with KeyError
#127217 opened May 27, 2024
No way to config profiling scope in torch.autograd.profile
#127215 opened May 27, 2024
DISABLED test_large_mmaped_weights_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#127202 opened May 27, 2024
Tensors of the same index must be on the same device and the same dtype except step tensors that can be CPU and float32 notwithstanding.
#127197 opened May 26, 2024
torch.topk results differ on CPU and CUDA
#127196 opened May 26, 2024
Should `torch.Size` convert np.ndarrays to lists of ints?
#127194 opened May 26, 2024
Unify async_save and sync_save in state_dict_saver from distributed checkpointing
#127191 opened May 26, 2024
`store_param_remainders` from Apex DistributedFusedAdam
#127189 opened May 26, 2024

383 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Complete revamp of float/promotion sympy handling
#126905 commented on May 31, 2024 • 78 new comments
[CI] Add freezing for cpu inductor accuracy test in inductor CI
#124715 commented on May 31, 2024 • 38 new comments
[Autograd] Cond Higher-Order Operation
#126911 commented on Jun 1, 2024 • 28 new comments
[quant][pt2e][quantizer] Support `set_module_name_qconfig` in X86InductorQuantizer
#126044 commented on May 31, 2024 • 28 new comments
[Inductor] support masked vectorization for the tail_loop
#126526 commented on May 30, 2024 • 24 new comments
[executorch hash update] update the pinned executorch hash
#123043 commented on Jun 1, 2024 • 21 new comments
[vision hash update] update the pinned vision hash
#125806 commented on Jun 1, 2024 • 21 new comments
ROCm 6.x appears Cannot find CO in the bundle libhipblaslt.so for ISA
#127169 commented on May 30, 2024 • 18 new comments
[CI] add xpu test in periodic workflow
#126410 commented on Jun 1, 2024 • 17 new comments
Memory Tracker for tracking Module wise memory
#124688 commented on Jun 1, 2024 • 16 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on May 30, 2024 • 16 new comments
[BE]: Update cudnn to 9.1.0.70
#123475 commented on Jun 1, 2024 • 15 new comments
[aoti] Add initial custom op support
#127034 commented on May 31, 2024 • 15 new comments
[dtensor] local_map UX change: keep func signature and be compatible with Tensor input
#126924 commented on Jun 1, 2024 • 12 new comments
[inductor] online softmax
#127011 commented on May 28, 2024 • 12 new comments
[inductor][cpp] bf16/fp16 gemm template computed with fp32 w/o epilogue fusion
#126068 commented on Jun 1, 2024 • 12 new comments
Opt model save and load
#126374 commented on Jun 1, 2024 • 11 new comments
[RFC] Add support for device extension autoloading
#127074 commented on Jun 1, 2024 • 11 new comments
torch.export.export() fails with dynamic shapes when more than one shape is dynamic
#126127 commented on May 31, 2024 • 11 new comments
Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept
#126376 commented on May 31, 2024 • 10 new comments
General MPS op coverage tracking issue
#77764 commented on Jun 1, 2024 • 10 new comments
T5 -small Dynamic quantization in graviton3
#127062 commented on May 30, 2024 • 9 new comments
Enable UFMT on test_shape_ops.py test_show_pickle.py test_sort_and_select.py
#127165 commented on May 31, 2024 • 9 new comments
fix constant folding with buffer mutation
#123909 commented on May 28, 2024 • 9 new comments
Beef up the allow_in_graph docs
#127117 commented on Jun 1, 2024 • 9 new comments
[torchbind] always fakify script object by default in non-strict export
#127116 commented on Jun 1, 2024 • 9 new comments
[Inductor][CPP] Add Min/Max with VecMask
#126841 commented on May 27, 2024 • 8 new comments
skip hf_T5_generate in dynamic shape test
#121129 commented on May 31, 2024 • 8 new comments
Set simdlen based on ATEN_CPU_CAPABILITY
#123514 commented on May 30, 2024 • 8 new comments
First version of AOTAutogradCache
#126791 commented on May 31, 2024 • 8 new comments
enable device index check for all device types
#126767 commented on May 31, 2024 • 8 new comments
LambdaLR has incorrect multiplicative behavior when using torch.tensor LR
#126854 commented on May 30, 2024 • 8 new comments
Extend Fake Tensor Caching to Symints
#126411 commented on May 28, 2024 • 8 new comments
CUDNN not detected in official nightly devel image. We need to use `12.4.1` upstream images
#126005 commented on May 31, 2024 • 8 new comments
Update _dedup_save_plans.py
#126569 commented on Jun 1, 2024 • 8 new comments
[ROCm] TunableOp improvements
#124362 commented on Jun 1, 2024 • 7 new comments
Enable UFMT format on test/test_nn.py
#126663 commented on May 31, 2024 • 7 new comments
[dtensor] implement scatter op with simple replication
#126713 commented on Jun 1, 2024 • 7 new comments
Fix decorators skipping NCCL tests
#122397 commented on May 29, 2024 • 7 new comments
sdp::SDPBackend::flash_attention support PrivateUse1
#126392 commented on May 31, 2024 • 7 new comments
[inductor] add cpp builder code. (take 2)
#125849 commented on Jun 1, 2024 • 7 new comments
[feature request] `quantized::linear_dynamic` on CUDA/eager, and other quantized and low-level int8 operators (matmul, gemm etc) on CUDA + integrate LLM.int8 + integrate ZeroQuant?
#69364 commented on May 30, 2024 • 7 new comments
[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80
#125343 commented on Jun 1, 2024 • 7 new comments
[dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule
#126578 commented on Jun 1, 2024 • 7 new comments
Support aten operations with out tensor
#124926 commented on May 31, 2024 • 7 new comments
Make TraceUtils.h to be device-agnostic
#126969 commented on Jun 1, 2024 • 6 new comments
Move the build of AOTriton to base ROCM docker image.
#127012 commented on Jun 1, 2024 • 6 new comments
extend `nonzero` to int64
#125850 commented on May 31, 2024 • 6 new comments
[CUDNN] Remove defunct cuDNN V8 API build flag
#120006 commented on Jun 1, 2024 • 6 new comments
[WIP] Warn on future divergent behavior for conditional views
#126129 commented on Jun 1, 2024 • 6 new comments
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on May 29, 2024 • 6 new comments
[Submodule] Remove glog dependency
#126768 commented on Jun 1, 2024 • 6 new comments
PT2 Inductor ComboKernels
#124969 commented on Jun 1, 2024 • 6 new comments
[compiled autograd][cudagraphs] Inputs runtime wrapper to move cpu scalars to cuda
#125382 commented on Jun 1, 2024 • 5 new comments
Introduce Inductor passes to micro-pipeline all-gather-matmul and matmul-reduce-scatter in certain cases
#126598 commented on Jun 1, 2024 • 5 new comments
CUDA 12.4 CI Inductor Issues
#126692 commented on Jun 1, 2024 • 5 new comments
[DTensor] `clip_grad_norm_` follow-ups
#121020 commented on May 31, 2024 • 5 new comments
Compiling extension on pytorch nighlty image is broken
#125879 commented on Jun 1, 2024 • 5 new comments
PyTorch's Execution Trace Observer Producing Invalid JSON
#126703 commented on Jun 1, 2024 • 5 new comments
refine fp32 precision api
#125888 commented on May 30, 2024 • 5 new comments
[Inductor][CPP] Enable Local Buffer for Outer loop fusion
#126967 commented on May 31, 2024 • 5 new comments
[ONNX] Skip assertion nodes
#126889 commented on May 29, 2024 • 5 new comments
[BE]: Update NCCL submodule to 2.21.5
#124014 commented on May 31, 2024 • 5 new comments
[Split Build] Test split build in CI
#126699 commented on Jun 1, 2024 • 5 new comments
Pytorch 2.0 installation tutorial does not work under Macbook
#96073 commented on May 30, 2024 • 4 new comments
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on May 26, 2024 • 4 new comments
[dynamo] Automatically convert loop bodies to function calls
#113538 commented on May 31, 2024 • 4 new comments
MPS device can train or evaluate models producing unacceptable output due to "fast math" optimization
#84936 commented on May 31, 2024 • 4 new comments
PyTorch DataLoader improvements for Iterable Dataset
#127072 commented on Jun 1, 2024 • 4 new comments
Refresh OpOverloadPacket if a new OpOverload gets added
#126863 commented on May 29, 2024 • 4 new comments
dynamo doesn't support `__torch_function__` on non-tensor classes
#127174 commented on May 31, 2024 • 4 new comments
IPEX as TorchDynamo Backend Performance Dashboard
#101273 commented on May 27, 2024 • 4 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on May 31, 2024 • 4 new comments
add core tag to linalg_vector_norm
#125789 commented on May 31, 2024 • 4 new comments
[torchbind] support torch.compile with aot_eager backend
#127114 commented on May 31, 2024 • 4 new comments
Setting static arguments in `torch.compile()`
#121299 commented on May 29, 2024 • 4 new comments
`torch.compiler.allow_in_graph` does not create a `call_module` op in fx.Graph in torch 2.3.0
#126566 commented on May 29, 2024 • 4 new comments
Add aten._unsafe_masked_index
#116491 commented on May 30, 2024 • 4 new comments
[dtensor] standardize multi mesh-dim strategy with utils
#126712 commented on Jun 1, 2024 • 3 new comments
[xla hash update] update the pinned xla hash
#126672 commented on May 27, 2024 • 3 new comments
fix multiprocessing.spawn doc issue
#126902 commented on May 28, 2024 • 3 new comments
Triton Matrix Multiplication example invalid results (return zeros) on Volta
#127157 commented on May 30, 2024 • 3 new comments
Move MKLDNN Specific IR to Separate File
#126504 commented on May 27, 2024 • 3 new comments
[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100
#126614 commented on May 30, 2024 • 3 new comments
make_graphed_callables fails when Module or function does not contain parameters
#124582 commented on May 28, 2024 • 3 new comments
[dynamo][numpy] Add unsigned integer dtypes
#125717 commented on May 31, 2024 • 3 new comments
Invalidate StorageImpl instances when tensor is overwritten with cudagraphs
#125264 commented on Jun 1, 2024 • 3 new comments
delete inductor config.trace.compile_profile
#127143 commented on May 29, 2024 • 3 new comments
Add unfold support for MaskedTensor
#125262 commented on May 30, 2024 • 3 new comments
add uuid in cudaDeviceProperties
#125083 commented on May 29, 2024 • 3 new comments
move label_to_label.yml to an issue?
#126714 commented on May 29, 2024 • 3 new comments
[dynamo] Trace through invalid bool tensor operations properly
#127003 commented on Jun 1, 2024 • 3 new comments
[Dynamo] Log backward graph compilation metrics
#126629 commented on May 31, 2024 • 3 new comments
[NJT] Allow construction of NJT within graph using offsets from inputs
#124624 commented on May 28, 2024 • 3 new comments
[v2.3.1] Release Tracker
#125425 commented on May 28, 2024 • 3 new comments
ROCm loses some supported GPUs by requiring hipblaslt
#119081 commented on May 31, 2024 • 3 new comments
`@functools.wraps` graph breaks in many cases where we should be able to handle it
#123365 commented on May 28, 2024 • 3 new comments
Made some minor improvements to flexattention perf + added more autotune configs
#126811 commented on May 28, 2024 • 3 new comments
[Inductor][CPP] Add ne with VecMask
#126940 commented on May 27, 2024 • 3 new comments
s390x: build s390x binaries on each pull request
#125399 commented on May 29, 2024 • 2 new comments
Allow multiple cudagraph recordings per compiled graph
#126822 commented on Jun 1, 2024 • 2 new comments
inplace parameter in dropouts should function as expected regardless of the value of training(or train) paramter
#126755 commented on May 31, 2024 • 2 new comments
[XPU] Add xpu support of `make triton`
#126513 commented on May 29, 2024 • 2 new comments
[triton hash update] update the pinned triton hash
#115529 commented on May 27, 2024 • 2 new comments
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on May 31, 2024 • 2 new comments
torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict)
#120219 commented on Jun 1, 2024 • 2 new comments
[Bazel] Build shared libraries to reduce total binary size
#126031 commented on May 30, 2024 • 2 new comments
[nn.functional] Set default value of Optional params of batch_norm to None
#122717 commented on May 28, 2024 • 2 new comments
[inductor][cpu]detectron2_maskrcnn_r_101_fpn detectron2_maskrcnn_r_50_c4 accuracy check failed when freezing flag is on.
#127073 commented on Jun 1, 2024 • 2 new comments
Disable atomic_add fallback for cpu in index_put lowering
#122873 commented on May 27, 2024 • 2 new comments
Add Ability to Skip Hipification When Building CUDA Extension
#127015 commented on May 28, 2024 • 2 new comments
[SDPA/memeff] Backport changes from xFormers to PT
#127090 commented on May 31, 2024 • 2 new comments
xpu: implement grid_sample op for XPU (fallback to CPU not possible for fp16 and bf16)
#127002 commented on May 30, 2024 • 2 new comments
```FlopCounterMode``` returns 0 when inference mode is on during forwardpropagation.
#126268 commented on May 30, 2024 • 2 new comments
[PT2E Quantization] `prepare_pt2e` produces inconsistent data types for primitive int
#127076 commented on May 30, 2024 • 2 new comments
Move loop ordering after fusion
#126255 commented on May 30, 2024 • 2 new comments
BUG: `torch.pow(-0.0)` wrongly returns `-0.0` (should be +0.0)
#127163 commented on May 28, 2024 • 2 new comments
libtorch mimalloc Debug Warning with WinError
#125840 commented on May 30, 2024 • 2 new comments
[VIT Model] [perf Degradation] [X86] [ARM] torch.compile + weight prepacking results in perf degradation for VIT Transformer model
#126391 commented on May 28, 2024 • 2 new comments
[feature][cudagraph] API to clear a bad recording
#127147 commented on May 29, 2024 • 2 new comments
[inductor][cpu]GPT2ForSequenceClassification AMP static/dynamic shape default/cpp wrapper single thread accuracy crash
#123503 commented on May 30, 2024 • 2 new comments
NCCL watchdog thread terminated with exception
#113128 commented on May 30, 2024 • 2 new comments
pytorch is built without _GLIBCXX_USE_CXX11_ABI and can cause std::regex crashes (probably)
#50779 commented on May 26, 2024 • 2 new comments
fixed torch.where and torch.nonzero issue
#123024 commented on May 28, 2024 • 1 new comment
Parts of https://download.pytorch.org not updated anymore, broken links and missing versions
#121837 commented on May 29, 2024 • 1 new comment
Register lowering for `_foreach_norm.Scalar`
#123051 commented on May 30, 2024 • 1 new comment
[BASE_MODULE] removing base_module for glow/fb/test:test_utils
#123092 commented on May 31, 2024 • 1 new comment
[TP] Add tests for wildcard support
#123101 commented on May 31, 2024 • 1 new comment
[Pytorch] Replace nn.module call to `train()` for executorch models (#757)
#122532 commented on Jun 1, 2024 • 1 new comment
add testing support for Vectorized<Half>
#123132 commented on Jun 1, 2024 • 1 new comment
Add quantized.linear_unpacked_dynamic_fp16
#122509 commented on May 26, 2024 • 1 new comment
[WIP] Add comm fusion pass
#122505 commented on May 26, 2024 • 1 new comment
dynamo_export fails for for NVIDIA NeMo framework classes decorated with wrapt package
#125462 commented on May 29, 2024 • 1 new comment
torch._inductor.config.max_autotune_gemm_backends = "TRITON" crashes with Convolution layer
#125728 commented on May 29, 2024 • 1 new comment
Add DtypeContext
#122481 commented on May 28, 2024 • 1 new comment
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM: `RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with dtype 'float'.`
#111317 commented on May 29, 2024 • 1 new comment
[export] Restore original placeholder names
#122452 commented on May 27, 2024 • 1 new comment
segfault when registering op with mismatched schema
#127102 commented on May 29, 2024 • 1 new comment
Ensure profiler record function callback always increments "record function id".
#123755 commented on May 27, 2024 • 1 new comment
[TEST] assume no aligned inputs
#122159 commented on May 28, 2024 • 1 new comment
Fix edge cases for gather in inductor
#126893 commented on May 29, 2024 • 1 new comment
[example] Example of properly optimizing mm=>layernorm=>gelu
#122687 commented on May 25, 2024 • 1 new comment
Modified pointless_convert pass to only apply if it's removing truly useless conversion
#122689 commented on May 27, 2024 • 1 new comment
[fx] Preserve Fx graph node order in partitioner across runs
#122696 commented on May 27, 2024 • 1 new comment
[wip][export] Add symint input support
#122712 commented on May 25, 2024 • 1 new comment
[POC][FSDP2] Ran parameter post acc grad hooks manually
#122719 commented on May 26, 2024 • 1 new comment
Allow flight recorder to call elasped_time again.
#122731 commented on May 27, 2024 • 1 new comment
[DRAFT] add int4_mm support for ARM using SIMDe
#122764 commented on May 26, 2024 • 1 new comment
add testing support for Vectorized<Half>
#122798 commented on May 27, 2024 • 1 new comment
[Inductor][Optimus] Fix group batch fusion for AFOC
#122839 commented on May 27, 2024 • 1 new comment
[ATen-Vulkan][EZ] Small fixes: fix gpu size calculation and Half scalartype ctype mapping
#122861 commented on Jun 1, 2024 • 1 new comment
[dynamo/inductor] guard on whether a tensor is a view, assume non-views are aligned
#122867 commented on May 27, 2024 • 1 new comment
Libtorch build for ROCM error: “aten/src/THH” not exist
#126640 commented on May 29, 2024 • 1 new comment
Optimize upsample_bilinear2d on channels last format
#122879 commented on May 31, 2024 • 1 new comment
[inductor][Autotune] Add matrix_instr_nonkdim to triton_meta (#122852)
#122906 commented on May 27, 2024 • 1 new comment
[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat
#122667 commented on May 28, 2024 • 1 new comment
[ROCm] skip some SDPA tests
#122914 commented on May 29, 2024 • 1 new comment
[AOTInductor] Support quantized linear on CPU with fbgemm
#122939 commented on May 30, 2024 • 1 new comment
Fix fuse linear bn for affine=False
#122537 commented on Jun 1, 2024 • 1 new comment
[dynamo] Dont run Dynamo on assert* functions
#122993 commented on May 29, 2024 • 1 new comment
[dynamo] Detect functionalized functions with HOPs with mutations at tracing time
#126988 commented on May 29, 2024 • 1 new comment
torch.compile does not work since 2.2.1 on MacOS for some models
#124497 commented on May 27, 2024 • 1 new comment
[checkpoint] Clean up selective activation checkpoint and make public
#125795 commented on May 31, 2024 • 1 new comment
JAX + PyTorch produces `OMP: Error #13: Assertion failure at kmp_affinity.cpp(532)`
#97635 commented on May 27, 2024 • 1 new comment
Run CompositeImplicit on torch fns during dynamo tracing
#125808 commented on Jun 1, 2024 • 1 new comment
Use sleef on macOS Apple silicon by default
#126509 commented on May 28, 2024 • 1 new comment
_convert_input_to_fake doesn't work when there are no inputs
#126770 commented on May 27, 2024 • 1 new comment
[NT] Implementing Multi-Head Attention with NestedTensors
#125214 commented on May 27, 2024 • 1 new comment
❓Different results between normal batching and `vmap` while using lower precision (e.g., bfloat16)
#125534 commented on May 27, 2024 • 1 new comment
Support for stereo audio data in from torch.utils.tensorboard.SummaryWriter
#56360 commented on May 27, 2024 • 1 new comment
Remove redundant device guard in Resize.h
#126498 commented on May 31, 2024 • 1 new comment
Box constraints for optimizers
#22281 commented on May 27, 2024 • 1 new comment
NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
#122068 commented on May 27, 2024 • 1 new comment
TorchDynamo ONNX Export does not work as expected with masking (ScatterElements)
#126856 commented on May 27, 2024 • 1 new comment
Dynamo-based ONNX Export: Failed to produce a graph during tracing as no tensor operations were found.
#123973 commented on May 26, 2024 • 1 new comment
Update triton pin to improve throughput w/ assert
#126098 commented on May 30, 2024 • 1 new comment
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 commented on May 26, 2024 • 1 new comment
Specify version
#105460 commented on May 26, 2024 • 1 new comment
[inductor][codegen] Codegen constexpr globals and constexpr annotated globals correctly.
#126195 commented on May 30, 2024 • 1 new comment
[aoti] Add support for more custom op input/output types
#126215 commented on May 28, 2024 • 1 new comment
[Traceable FSDP][Compiled Autograd] Add queue_callback() support
#126366 commented on May 29, 2024 • 1 new comment
Added memory budget to partitioner
#126320 commented on Jun 1, 2024 • 1 new comment
Optimize Vectorized<float> exp() with neon simd instructions
#126612 commented on May 29, 2024 • 1 new comment
[torchbind] remove test cases that don't fakify script objects
#127113 commented on May 31, 2024 • 1 new comment
DISABLED test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers)
#117784 commented on May 28, 2024 • 1 new comment
[dynamo] Handle inplace op aliasing errors
#126474 commented on May 28, 2024 • 1 new comment
torch.compile crash - Aborted exit code 134
#125804 commented on May 28, 2024 • 1 new comment
Many tests under test/distributed/elastic not running in OSS CI
#124317 commented on May 28, 2024 • 1 new comment
Tracker: Slow gradcheck failures possibly indicating incorrect gradients
#80411 commented on May 28, 2024 • 1 new comment
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on May 28, 2024 • 1 new comment
`torch.einsum` docs don't mention that `opt_einsum` must be installed separately
#127109 commented on May 28, 2024 • 1 new comment
Modularize aten parameter parser and checker
#125308 commented on May 31, 2024 • 1 new comment
[dynamo] Improve support for hasattr for custom __getattr__
#125340 commented on Jun 1, 2024 • 1 new comment
A huge difference between the results of torch.round() on the GPU compared to its results on the CPU and other DL libraries
#126975 commented on May 28, 2024 • 1 new comment
torch_dispatch mode silent incorrectness with torch.compile
#115653 commented on May 28, 2024 • 1 new comment
Fails to compile with nvidia-cuda-toolkit-12.4.0
#122169 commented on May 28, 2024 • 1 new comment
[inductor][cpp] support bf16/fp16 gemm template epilogue fusion
#126545 commented on Jun 1, 2024 • 1 new comment
add wait_tensor to ops to skip in DCE pass
#126541 commented on May 31, 2024 • 1 new comment
aten.bernoulli.p is missing in core aten IR opset but does not get decomposed
#105519 commented on May 28, 2024 • 1 new comment
CUDA Extension Bug for Pytorch 2.2.0
#118842 commented on May 27, 2024 • 1 new comment
[1/N] Dynamic Shape: To enable export.aot_compile to switch between static shape and dynamic shape
#126517 commented on May 31, 2024 • 1 new comment
Segmentation Fault in `torch.nn.Conv1d` starting from torch 2.2.0
#121222 commented on May 27, 2024 • 1 new comment
dynamo test: remove the cuda hardcoding and detect device dynamically
#125761 commented on May 31, 2024 • 1 new comment
QAT using pytorch-quantization cause accuracy loss after exporting to onnx
#120166 commented on May 31, 2024 • 1 new comment
[inductor][cpu]tnt_s_patch16_224 and functorch_dp_cifar10 fP32 static default wrapper performance regression
#119178 commented on May 30, 2024 • 1 new comment
[MPS] Improve the performance of torch.linear()
#91737 commented on May 31, 2024 • 1 new comment
Why is AvgPool2D taking longer than Conv2D for the same input?
#93188 commented on May 31, 2024 • 1 new comment
[inductor][cpu]speech_transformer AMP single/multiple thread static/dynamic shape CPP/default wrapper performance regression in 2024-05-12 nightly release
#126274 commented on May 30, 2024 • 1 new comment
UNSTABLE inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf)
#126993 commented on May 30, 2024 • 1 new comment
RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Timeout waiting for key: default_pg/0/0 after 1800000 ms
#82091 commented on May 30, 2024 • 1 new comment
[Split Build] Test split build in pull CI workflow
#126813 commented on Jun 1, 2024 • 1 new comment
Cudnn 9.1.1 is out!
#119400 commented on May 31, 2024 • 1 new comment
[DCP] `set_model_state_dict` errors on compiled module with non-persistent buffer
#122792 commented on May 31, 2024 • 1 new comment
aten::nonzero calls taking a huge amount of time when using MPS backend vs CPU
#124850 commented on May 31, 2024 • 1 new comment
Support backward hook optimizers in FSDP
#98419 commented on May 30, 2024 • 1 new comment
Investigate torch.compile Windows support.
#122094 commented on May 31, 2024 • 1 new comment
Previous version not found
#107611 commented on May 31, 2024 • 1 new comment
FSDP does not work on GLOO backend
#74041 commented on May 31, 2024 • 1 new comment
[bug] Dynamo graph break when using pyton module `heapq` (manipulates with `list`s), although succeeds when placing `heapq.py` near the test script
#106885 commented on May 31, 2024 • 1 new comment
torch.compile error
#124044 commented on May 31, 2024 • 1 new comment
xpu: provide a way to debug explicit CPU fallback
#126488 commented on May 30, 2024 • 1 new comment
xpu: python hangs on exit after check for xpu on multi-dev system
#126259 commented on May 30, 2024 • 1 new comment
torch.compile generates wrong code on CPU and compiled code replaces original function
#126848 commented on May 30, 2024 • 1 new comment
tensordict functional calls with nn.Module silently gives the wrong (non-functional) result
#127173 commented on May 30, 2024 • 1 new comment
ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed.
#123097 commented on May 30, 2024 • 1 new comment
Pytorch dataloader not loading first-available data with multiple workers
#105203 commented on May 30, 2024 • 1 new comment
[ONNX] view(dtype=dtype) is not supported by both onnx.export and onnx.dynamo_export
#126921 commented on May 30, 2024 • 1 new comment
Lowering after pointwise cat can lead to uncontiguous memory accesses
#124002 commented on May 30, 2024 • 1 new comment
The doc of `diagonal()` doesn't have such an explanation
#126827 commented on May 30, 2024 • 1 new comment
The dynamic compilation does not handle composite shapes with multiple dimensions very effectively
#127162 commented on May 30, 2024 • 1 new comment
MultiheadAttention returns NaNs when need_weights=False for long sequences with a mask that ignores old tokens
#127055 commented on May 30, 2024 • 1 new comment
[RFC] Enable PyTorch XPU on Native Windows on Intel GPUs
#126719 commented on May 30, 2024 • 1 new comment
[dynamo] Dynamic slicing on data-dependent value is not supported (regression from legacy ONNX)
#127154 commented on May 31, 2024 • 1 new comment
Multiple Inputs/Outputs with torch.onnx.dynamo_export
#127128 commented on May 31, 2024 • 1 new comment
[ONNX] pad_sequence() is not exportable, with neither legacy onnx.export nor with dynamo_export
#127153 commented on May 31, 2024 • 1 new comment
[feature request]: Update max onnx opset to 21 for onnxruntime==1.18 compatability
#127167 commented on May 31, 2024 • 1 new comment
[WIP] fix calls to `super().__getattr__` nn.Module in dynamo
#126875 commented on May 31, 2024 • 1 new comment
UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR
#126605 commented on May 30, 2024 • 1 new comment
[torch.compile]: Enhanced Error Reporting and Performance Canary Mode
#126644 commented on May 31, 2024 • 1 new comment
Add parameter "half_pixel_center =False" to the Bilinear function
#48389 commented on May 30, 2024 • 1 new comment
FSDP `use_orig_params` + full sharding results in missing parameters in gathered state dict
#126310 commented on May 31, 2024 • 1 new comment
DISABLED test_comprehensive_special_bessel_y1_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#127080 commented on May 29, 2024 • 1 new comment
[fx] Preserve Fx graph node order in partitioner across runs
#115621 commented on May 29, 2024 • 1 new comment
test_python_shard.bat -> .sh
#117194 commented on May 31, 2024 • 1 new comment
RuntimeError in torch.istft with center=False: Window Overlap Add Issue
#118507 commented on May 29, 2024 • 1 new comment
Make watchdog sleep interval auto-adaptive
#118293 commented on May 26, 2024 • 1 new comment
[optim] Add support for complex tensors in SparseAdam
#118653 commented on Jun 1, 2024 • 1 new comment
Fix jagged NT softmax semantics
#119459 commented on May 31, 2024 • 1 new comment
Add decompositions for copy variants of view ops
#119889 commented on May 30, 2024 • 1 new comment
[Caffe2]Remove Caffe2 scripts and benchmarks
#126747 commented on May 30, 2024 • 1 new comment
Change ATEN generator argument type to const std::optional<Generator>&
#120076 commented on May 25, 2024 • 1 new comment
Delete lazy ddp optimizer
#120727 commented on May 26, 2024 • 1 new comment
turned on matrix-multiplication => matrix-vector multiplication always on if reduction-dim is contiguous
#120954 commented on May 27, 2024 • 1 new comment
DISABLED test_circular_dependencies (__main__.TestImports)
#110040 commented on May 29, 2024 • 1 new comment
fix graph deepcopy to also copy `_tracer_extras`
#121171 commented on May 28, 2024 • 1 new comment
[Inductor] Add prologue fusion
#121211 commented on Jun 1, 2024 • 1 new comment
[NJT] Actually inline NT torch function during dynamo
#121445 commented on Jun 1, 2024 • 1 new comment
[DDP] Bucket handling: make first bucket size equal to bucket_cap_mb if it was set
#121640 commented on May 28, 2024 • 1 new comment
[no-ci] test pr
#122006 commented on May 28, 2024 • 1 new comment
[feature request] torch.kthvalue to support a new argument largest
#29398 commented on May 30, 2024 • 1 new comment
[dynamo] Fake tensor impl for Tensor.add_ not checking for errors
#127049 commented on Jun 1, 2024 • 1 new comment
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
#123834 commented on May 29, 2024 • 1 new comment
Support tracing through _get_current_dispatch_mode_stack
#126789 commented on May 28, 2024 • 1 new comment
Compile with non-default mode + triton kernel fails
#126864 commented on May 29, 2024 • 1 new comment
[benchmark] Rename the count field FunctionCount
#105471 commented on May 29, 2024 • 1 new comment
Switch to CXX11 ABI
#126778 commented on May 31, 2024 • 0 new comments
some logging changes
#127005 commented on May 29, 2024 • 0 new comments
[Split Build] Use single command to build both wheels
#126590 commented on May 29, 2024 • 0 new comments
[halide-backend] Initial implementation of HalideKernel and HalideScheduling
#126417 commented on Jun 1, 2024 • 0 new comments
[do-not-review][aot] explicitly pass number of static inputs (buffer/params) to backend
#127006 commented on Jun 1, 2024 • 0 new comments
Allow overriding per-dim group options via _MeshEnv.set_dim_group_options
#126599 commented on Jun 1, 2024 • 0 new comments
[Inductor] support masked vectorization for the tail_loop
#127166 commented on May 26, 2024 • 0 new comments
Enable LSAN
#127171 commented on May 29, 2024 • 0 new comments
[Split Build] Test non-linux manywheel builds
#126929 commented on May 29, 2024 • 0 new comments
[Split Build][WIP] Change global env variables to cmake args
#126930 commented on May 29, 2024 • 0 new comments
[inductor] [for benchmarking] Add a subprocess-based parallel compile
#127088 commented on May 27, 2024 • 0 new comments
Enable optimized dynamic quantization on aarch64
#126687 commented on May 30, 2024 • 0 new comments
Set NO_DELAY flag in TCPStore
#127042 commented on Jun 1, 2024 • 0 new comments
[2/N] Dynamic Shape: Enable dynamic shape support for aoti_eager
#126883 commented on May 31, 2024 • 0 new comments
[FSDP2] Added `share_comm_ctx`
#127032 commented on May 28, 2024 • 0 new comments
avoid size() dispatch on FunctionalTensor
#126784 commented on May 31, 2024 • 0 new comments
[inductor] Enable subprocess-based parallel compile as the default
#126817 commented on May 29, 2024 • 0 new comments
Collect static parameter metadata in aot
#126820 commented on May 30, 2024 • 0 new comments
Remove unused arg to GraphLowering
#126821 commented on May 30, 2024 • 0 new comments
[TD] Test removal on sm86
#127131 commented on May 29, 2024 • 0 new comments
[wip, dynamo] trace through FunctionalTensor
#126936 commented on May 30, 2024 • 0 new comments
Default traceable subclasses to use swap_tensors path for load_state_dict
#126788 commented on May 30, 2024 • 0 new comments
Enable XPU operator codegen via backend whitelist
#126977 commented on May 29, 2024 • 0 new comments
reset dynamo cache before each test
#126586 commented on May 25, 2024 • 0 new comments
[export] add nonstrict, retrace, serdes patching for _trace._export tests
#126810 commented on May 28, 2024 • 0 new comments
Initial commit of flight recorder analyzer
#126726 commented on Jun 1, 2024 • 0 new comments
Add private escape hatches to fall back to pre-swap tensors behavior
#126984 commented on May 30, 2024 • 0 new comments
[inductor][cpp] add vectorization support for double
#126858 commented on May 29, 2024 • 0 new comments
[CI] Ensure inductor/test_cpu_cpp_wrapper is actually run in inductor_cpp_wrapper_abi_compatible
#126717 commented on Jun 1, 2024 • 0 new comments
Avoid accessing storage in wrapper tensor
#126878 commented on May 31, 2024 • 0 new comments
HealthcheckNCCL
#127044 commented on May 29, 2024 • 0 new comments
Save backward graphs lazily to cache
#126999 commented on May 31, 2024 • 0 new comments
[RFC] Intel GPU Upstreaming
#114723 commented on May 27, 2024 • 0 new comments
Faster Pytorch dequantize() + matmul for quantized models
#115985 commented on May 29, 2024 • 0 new comments
[MPS] F.conv1d and F.conv2d produce incorrect gradients when minibatch >= 2^16
#96225 commented on May 29, 2024 • 0 new comments
DISABLED test_bmm_multithreaded (__main__.TestTorch)
#125240 commented on May 29, 2024 • 0 new comments
DISABLED test_memory_format_type_cuda (__main__.TestTorchDeviceTypeCUDA)
#126954 commented on May 29, 2024 • 0 new comments
Choose better configs for `tuned_mixed_mm`
#127056 commented on May 29, 2024 • 0 new comments
xpu: support torch.utils.data.DataLoader(pin_memory_device='xpu')
#126491 commented on May 30, 2024 • 0 new comments
xpu: can't build XPU backend without sourcing oneAPI environment variables (/opt/intel/oneapi/setvars.sh)
#127008 commented on May 30, 2024 • 0 new comments
When testing the scalar version, test_open_device_registration will fail
#126372 commented on May 30, 2024 • 0 new comments
When testing the scalar version, test_AllenaiLongformerBase_repro will fail
#126262 commented on May 30, 2024 • 0 new comments
[Inductor] Kill Mutation Layout
#118570 commented on May 30, 2024 • 0 new comments
Lowering for the Average Pooling 3D backward operation
#127101 commented on May 30, 2024 • 0 new comments
UNSTABLE inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm)
#126884 commented on May 30, 2024 • 0 new comments
Invalid Reference to Class
#99107 commented on May 30, 2024 • 0 new comments
RReLU doc doesn't specify the eval mode behaving just like LeakyReLU
#82677 commented on May 30, 2024 • 0 new comments
[ONNX] Do not include concrete tensors when we add initializers
#127140 commented on May 31, 2024 • 0 new comments
[ONNX] Support external tensors in ONNXProgram.save()
#127142 commented on May 31, 2024 • 0 new comments
[ONNX] ExportedProgram weight serialization support
#127138 commented on May 31, 2024 • 0 new comments
[ONNX] Move the graph building logic
#127139 commented on May 31, 2024 • 0 new comments
[AOTI] Using `AOTI_TORCH_CHECK` will cause performance drop on several models compared with using `TORCH_CHECK`
#126665 commented on May 31, 2024 • 0 new comments
Cpp-wrapper mode issue tracker
#117363 commented on May 31, 2024 • 0 new comments
torchrun fails to run on Windows 11
#108602 commented on May 31, 2024 • 0 new comments
What is the processing principle when the complex64 input tensor contains nan or inf for addition?
#127075 commented on May 27, 2024 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on May 28, 2024 • 0 new comments
randn generates different output for 4x4 tensor size sliced to match shape of direct 2x4 or 4x2 and compare output
#127164 commented on May 28, 2024 • 0 new comments
UNSTABLE pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build
#127104 commented on May 28, 2024 • 0 new comments
function signature for multiprocessing.spawn is multiprocessing.spawn.spawn
#126899 commented on May 28, 2024 • 0 new comments
Segmentation error for torch==2.2.1 on MacOs
#121101 commented on May 28, 2024 • 0 new comments
[DSD] keep 'initial_lr' in `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
#126948 commented on May 28, 2024 • 0 new comments
Python 3.12 CPU Build system cannot find MKL libraries
#119557 commented on May 28, 2024 • 0 new comments
Unexpected MYPY linter errors on CI
#126361 commented on May 28, 2024 • 0 new comments
Make the `sccache` cache easily available to all pytorch contributors in readonly mode
#125297 commented on May 28, 2024 • 0 new comments
[RFC] Autoload Device Extension
#122468 commented on May 29, 2024 • 0 new comments
[RFC] Raising minimal glibc support to: glibc2_28 . Deprecation support for Amazon Linux 2 support for PyTorch Release 2.5
#126551 commented on May 29, 2024 • 0 new comments
Add warning messages to provide info about expected performance improvement using cuda for a specific model
#126874 commented on May 29, 2024 • 0 new comments
Support for no-frills FP8 matmuls
#123761 commented on May 29, 2024 • 0 new comments
Inaccurate filename due to test class metaprogramming that doesn't set __file__ and __name__
#125467 commented on May 29, 2024 • 0 new comments
torch.cuda.BoolTensor uses 8 bits per element, not 1 bit as reported by element_size()
#41571 commented on May 29, 2024 • 0 new comments
[question] How hard would it be to implement 4-bit precision training?
#49298 commented on May 29, 2024 • 0 new comments
No factory functions for strided quantized tensors
#74540 commented on May 29, 2024 • 0 new comments
[discussion, idea] Batched, vectorized base64 decoding / encoding + maybe RLE decoding / encoding
#90560 commented on May 29, 2024 • 0 new comments
Inplace fused (leaky)relu+(leaky)dropout for memory savings (I think, can be made fully allocation-less if never fully allocating random mask in FlashAttention style and recover the mask from the output)
#92927 commented on May 29, 2024 • 0 new comments
[feature request] Specialized memory layouts and wide blocked/tiled dtypes for cublasLt/onednn: e.g. torch.float16x32 / torch.int8x32 / torch.bits1x512 (akin to torch.quint2x4)
#104702 commented on May 29, 2024 • 0 new comments
The performance of multiplication of two matrices is different between window and linux
#26345 commented on May 31, 2024 • 0 new comments
[WIP] [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 2)
#124147 commented on May 31, 2024 • 0 new comments
[DO NOT MERGE] Test new ROCm CI nodes
#124424 commented on May 31, 2024 • 0 new comments
[dynamo] Support ndarray.dtype attribute access
#124490 commented on May 31, 2024 • 0 new comments
While loop autograd
#124573 commented on Jun 1, 2024 • 0 new comments
[WIP][Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 3)
#124702 commented on May 31, 2024 • 0 new comments
[WIP][Inductor] Update Intel GPU Triton commit pin.
#124842 commented on May 31, 2024 • 0 new comments
Add Efficient Attention support on ROCM
#124885 commented on May 28, 2024 • 0 new comments
S390x ci periodic tests
#125401 commented on May 30, 2024 • 0 new comments
[Inductor][ROCm] Composable Kernel backend for Inductor
#125453 commented on Jun 1, 2024 • 0 new comments
Uses memory pools for mixing CUDA allocators
#125722 commented on May 29, 2024 • 0 new comments
[AOTI][not for review] Test cpp_wrapper mode
#125733 commented on May 29, 2024 • 0 new comments
[CI]enable AMP accuracy test for inductor on SPR CPU
#125748 commented on May 29, 2024 • 0 new comments
Separate AOTI Eager utils as a single file
#125819 commented on May 31, 2024 • 0 new comments
[3/N] Non-Tensor: Support string parameter for aten operations
#125831 commented on May 31, 2024 • 0 new comments
[4/N] Non-Tensor: Support layout, device and dtype for aten operations
#125897 commented on May 31, 2024 • 0 new comments
Fix tensor subclass + dynamic shapes in torch.compile + aot autograd
#125941 commented on May 31, 2024 • 0 new comments
Remove deprecated _aminmax operator
#125995 commented on May 28, 2024 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on May 30, 2024 • 0 new comments
Enable UFMT format on test/quantization
#126152 commented on May 27, 2024 • 0 new comments
[wip][inductor] move loop ordering after fusion
#126254 commented on May 29, 2024 • 0 new comments
[DONT MERGE][dynamo] Turn on inlining of inbuilt nn modules
#126304 commented on Jun 1, 2024 • 0 new comments
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on May 31, 2024 • 0 new comments
Cost & performance estimation for Windows Arm64 compilation
#92302 commented on May 31, 2024 • 0 new comments
torch.nn.AdaptiveAvgPool2d lacks checking of input dimension
#126673 commented on May 31, 2024 • 0 new comments
Broken Link and unfinished sentence in Frequently Asked Questions
#126367 commented on May 31, 2024 • 0 new comments
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on May 31, 2024 • 0 new comments
[BE] wrap deprecated function/class with `typing_extensions.deprecated` for better IDE integration
#126888 commented on Jun 1, 2024 • 0 new comments
Automated submodule update: kineto
#106149 commented on Jun 1, 2024 • 0 new comments
[POC][pytree] test flattening dict in sorted order
#115014 commented on May 29, 2024 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Jun 1, 2024 • 0 new comments
Factory function and basic .sizes() support for C++ NestedTensor
#117905 commented on May 28, 2024 • 0 new comments
Switch batch norm stack to consolidated ops
#119496 commented on May 30, 2024 • 0 new comments
[NJT] Store vec on nested ints
#119976 commented on May 28, 2024 • 0 new comments
[NJT] Factory function support
#119977 commented on May 28, 2024 • 0 new comments
[AMD] Turn on hipblaslt by default
#120480 commented on May 31, 2024 • 0 new comments
Optionally use hipblaslt
#120551 commented on May 30, 2024 • 0 new comments
Enable clang-tidy on c10/util/Float8*.h
#120573 commented on May 26, 2024 • 0 new comments
[do not review]
#120881 commented on May 28, 2024 • 0 new comments
Adds general tensor equality subsystem
#121481 commented on May 28, 2024 • 0 new comments
[draft] python 3.13 test
#121979 commented on May 31, 2024 • 0 new comments
Improve decomposition for constand_pad_nd
#123661 commented on May 29, 2024 • 0 new comments
Add auto-tuning for sparse semi-structured MM operator
#123742 commented on May 30, 2024 • 0 new comments