-
Notifications
You must be signed in to change notification settings - Fork 21.3k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
7 Pull requests merged by 5 people
-
[EZ] Pin scipy to 1.12 for Py-3.12
#127322 merged
May 29, 2024 -
Update hf_BirdBird periodic-dynamo-benchmarks results
#127312 merged
May 29, 2024 -
Put back "[Release only] Release 2.3 start using triton package from pypi""
#127290 merged
May 28, 2024 -
[DSD] Fix to remove non_persistent buffer in distributed state dict (#125337)
#127219 merged
May 27, 2024 -
[DSD] Add a test to verify FSDP lazy initialization case (#127069)
#127130 merged
May 27, 2024 -
[DCP][state_dict] Remove the check of FSDP has root (#121544)
#126557 merged
May 27, 2024 -
[DSD] Correctly handle _extra_state (#125336)
#126567 merged
May 27, 2024
196 Pull requests opened by 110 people
-
Warn unused functions
#127185 opened
May 26, 2024 -
Quick Fix on #126854, deepcopy `lr` and other possible `base_parameters`
#127190 opened
May 26, 2024 -
[inductor][cpp] BF16 AMX micro-gemm support
#127195 opened
May 26, 2024 -
Reduce number of samples in {svd,pca}_lowrank OpInfos
#127199 opened
May 26, 2024 -
Implement a generic function scheduler
#127200 opened
May 26, 2024 -
[Inductor][CPP] Enable int8 GEMM Template
#127206 opened
May 27, 2024 -
Add error checking for boolean beta and alpha for fake tensor impl of torch.addr
#127207 opened
May 27, 2024 -
[Inductor] Skip model_fail_to_load and eager_fail_to_run models in inductor benchmarks test
#127210 opened
May 27, 2024 -
[Inductor][CPP] Fix FP16 GEMM Template UT failure with FP16 instruction support
#127211 opened
May 27, 2024 -
[Inductor][CPP] Fallback QLinear Binaryfusion from postop sum to binary add when others is view
#127212 opened
May 27, 2024 -
Enable OP: convtranspose into operator benchmark
#127216 opened
May 27, 2024 -
Add OpInfo entry for as_strided_copy
#127231 opened
May 27, 2024 -
Add OpInfo entry for add_alias
#127232 opened
May 27, 2024 -
[Clang Tidy] Fix misc-header-include-cycle errors in clang-tidy and ignore some files
#127233 opened
May 27, 2024 -
[easy?] Move AsyncCompile to a different file
#127235 opened
May 27, 2024 -
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127238 opened
May 27, 2024 -
[Do not review] Top of Traceable FSDP2 stack - to be broken up
#127239 opened
May 27, 2024 -
[MPS] Fused Adam & AdamW
#127242 opened
May 27, 2024 -
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127243 opened
May 28, 2024 -
[DO NOT REVIEW][NOT USED] Add queue_callback support
#127244 opened
May 28, 2024 -
[Traceable FSDP2] Make ._unsharded_param creation traceable
#127245 opened
May 28, 2024 -
[DO NOT REVIEW][NOT USED] Add queue_callback support
#127246 opened
May 28, 2024 -
Add Dynamo support for run_with_rng_state HOP
#127247 opened
May 28, 2024 -
[Traceable FSDP2] Improve FSDPManagedNNModuleVariable support
#127249 opened
May 28, 2024 -
fix code
#127252 opened
May 28, 2024 -
Support SetVariable mutation
#127253 opened
May 28, 2024 -
Check hasattr before comparing source name
#127254 opened
May 28, 2024 -
[Traceable FSDP2] Workaround in nn_module_proxy()
#127255 opened
May 28, 2024 -
[WIP] Improve __bool__ access handling for UserDefinedObjectVariable
#127257 opened
May 28, 2024 -
Trace TensorVariable attribute mutation if call_setattr is called
#127258 opened
May 28, 2024 -
Add support for register_post_accumulate_grad_hook
#127259 opened
May 28, 2024 -
Add tracing support for out-variant custom ops that return None
#127260 opened
May 28, 2024 -
[NOT USED] support inplace all_gather
#127261 opened
May 28, 2024 -
[Doc] fix some typos (found by codespell and typos)
#127267 opened
May 28, 2024 -
Add new Inductor-friendly op _fp8_mm and lowering
#127268 opened
May 28, 2024 -
[quant] Enable auto code completion by import during type checking
#127269 opened
May 28, 2024 -
add xpu for amp
#127276 opened
May 28, 2024 -
Enable deterministic support for oneDNN
#127277 opened
May 28, 2024 -
update amp example to device-agnostic
#127278 opened
May 28, 2024 -
add xpu to torch.compile
#127279 opened
May 28, 2024 -
add xpu to torch.tensors
#127280 opened
May 28, 2024 -
Select Runner Label Dynamically
#127287 opened
May 28, 2024 -
[inductor] enable bf32 test for mkldnn conv
#127293 opened
May 28, 2024 -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 opened
May 28, 2024 -
functorch x torch.compile: Remove functools.wraps hack
#127302 opened
May 28, 2024 -
Remove parts of AOTDedupWrapper logic since we have dynamo dedup
#127306 opened
May 28, 2024 -
[halide-backend] Add test shard
#127308 opened
May 28, 2024 -
Use by-column algorithm for fp16/bf16 CPUBlas gemm_transb kernels
#127318 opened
May 28, 2024 -
Remove tensor storage_offset/storage_bytes from the cache key
#127319 opened
May 28, 2024 -
[PT2Quant] Migrate move_exported_model_to_eval to torch.export._trace._export
#127323 opened
May 28, 2024 -
[export] Fix fake mode detection with empty inputs.
#127327 opened
May 28, 2024 -
[do-not-review][wip] remove TLS cudagraph tree manager for CA
#127330 opened
May 28, 2024 -
[wip] mark compiled autograd params as static addresses for cudagraphs
#127331 opened
May 28, 2024 -
[pipelining] rewrite interleaved 1f1b
#127332 opened
May 28, 2024 -
[Inductor] Emit strided block pointer from ModularIndexing and FloorDiv
#127342 opened
May 28, 2024 -
add auto-functionalize support for mutable list[Tensor]
#127347 opened
May 28, 2024 -
Retire torch.distributed.pipeline
#127354 opened
May 28, 2024 -
[Functorch][cuDNN] Bump tolerances for `test_vmapjvpvjp`
#127355 opened
May 28, 2024 -
[dtensor][debug] added c10d reduce_scatter_ and reduce_scatter_tensor_coalesced tracing_ to CommDebugMode
#127358 opened
May 28, 2024 -
[dtensor][debug] added c10d alltoall_ and alltoall_base_ to CommDebugMode
#127360 opened
May 28, 2024 -
[c10d] guard gpu context during abort
#127363 opened
May 29, 2024 -
UFMT format on test_fake_tesnor.py test_futures.py test_fx.py
#127369 opened
May 29, 2024 -
Migrate CalculateSmallVectorDefaultInlinedElements to constexpr Function for SmallVector
#127370 opened
May 29, 2024 -
Remove cuda check in the CUDAGraph destrcutor
#127382 opened
May 29, 2024 -
Masked scale meta function registration #119984
#127389 opened
May 29, 2024 -
Build SYCL kernels for ATen XPU ops on Native Windows (take 2)
#127390 opened
May 29, 2024 -
[sym_shapes] Allow check_is_size for backed symints
#127395 opened
May 29, 2024 -
[inductor] Enable subprocess-based parallel compile internally
#127401 opened
May 29, 2024 -
Handle unpacking during TorchScript to ExportedProgram conversion
#127419 opened
May 29, 2024 -
[c10d] Add commCreateFromRanks to c10d
#127421 opened
May 29, 2024 -
[inductor] Take absolute value of strides when picking loop order
#127425 opened
May 29, 2024 -
Implement Graph Transform Observer
#127427 opened
May 29, 2024 -
Add NaturalDiv to distinguish from FloorDiv/TruncDiv
#127430 opened
May 29, 2024 -
[DRAFT] nested tensor subclass support
#127431 opened
May 29, 2024 -
[export] provide refine function for automatically accepting dynamic shapes suggested fixes
#127436 opened
May 29, 2024 -
[jit] Validate mobile module fields parsed by flatbuffer loader
#127437 opened
May 29, 2024 -
Update to cuda 12.4.1
#127439 opened
May 29, 2024 -
[TESTING] Run inductor CI on custom Triton pin
#127450 opened
May 29, 2024 -
Improve the scheduling for fused_matmul_reduce_scatter
#127455 opened
May 29, 2024 -
fix post_grad pattern
#127457 opened
May 29, 2024 -
Add typing annotations to pattern_matcher.py
#127458 opened
May 29, 2024 -
documentation for pattern_matcher.py
#127459 opened
May 29, 2024 -
Add registry for TorchScript to ExportedProgram conversion
#127464 opened
May 29, 2024 -
update test_reformer_train test to handle nn module inlining
#127467 opened
May 29, 2024 -
Add new convenience checks for PyTorch operators/kernels
#127469 opened
May 29, 2024 -
Save quantization_tag in export graph serialization
#127473 opened
May 29, 2024 -
[pipelining] Stress test schedules with multi iters
#127475 opened
May 29, 2024 -
Onboard ARM bfloat16 to gemm-by-dot-product-for-gemm_transa_ infrastructure
#127477 opened
May 29, 2024 -
Patch ARM Half use_gemv_fast_path gate to avoid kernel duplication
#127478 opened
May 29, 2024 -
[pipelining] Add forward-only tests to mimic inference
#127479 opened
May 29, 2024 -
Onboard ARM bfloat16 to gemv fast path
#127484 opened
May 29, 2024 -
reset dynamo in test_do_not_skip_side_effects unit test loop to avoid dynamo cache limit hit
#127487 opened
May 30, 2024 -
Use bfdot instruction in ARM bfloat16 dot product if compiling with it available
#127488 opened
May 30, 2024 -
[caffe2][be] migrate global static initializer
#127489 opened
May 30, 2024 -
[NestedTensor] Extend coverage for unbind when ragged_idx != 1
#127493 opened
May 30, 2024 -
Properly detect nested torch function args
#127496 opened
May 30, 2024 -
Check unused variables in tests
#127498 opened
May 30, 2024 -
Use less A100.large shards for torchao perf benchmark
#127499 opened
May 30, 2024 -
[cpuinfo] bump cpuinfo to the latest to support amx isa check
#127505 opened
May 30, 2024 -
[halide-backend] Add GPU support
#127506 opened
May 30, 2024 -
[Intel GPU]Enable fp64 GEMM
#127507 opened
May 30, 2024 -
[Intel GPU]Enable fp64 double GEMM
#127508 opened
May 30, 2024 -
[BE][ptd_fb_test][1/N] Enable testslide
#127512 opened
May 30, 2024 -
Add linker script optimization flag to CMAKE rule for CUDA ARM wheel
#127514 opened
May 30, 2024 -
Supervisor as a torchrun rendezvous impl
#127515 opened
May 30, 2024 -
Try setting memory budget to 0.5 by default
#127520 opened
May 30, 2024 -
[TORCH_FA2_flash_api] Update total_q to the reshaped query 0th dimension
#127524 opened
May 30, 2024 -
Validate file and handle exceptions for weights_only unpickler
#127526 opened
May 30, 2024 -
[ROCm] Fix error in torch.cuda initialisation if amdsmi is not available
#127528 opened
May 30, 2024 -
[WIP] Add decomposition for aten.bernoulli.p
#127537 opened
May 30, 2024 -
Always simplify sympy expressions before printing.
#127543 opened
May 30, 2024 -
Handle aten::__contains__ during TorchScript to ExportedProgram conversion
#127544 opened
May 30, 2024 -
Add noqa to prevent lint warnings
#127545 opened
May 30, 2024 -
Update doc for nn.Module.load_state_dict(assign=True) to be more clear about semantic
#127549 opened
May 30, 2024 -
add wider support for input mutations during backward in torch.compile, as long as they are hidden from autograd
#127551 opened
May 30, 2024 -
[export] only add guard-related runtime asserts once, silence if needed
#127554 opened
May 30, 2024 -
Add profiler annotation for fused_all_gather_matmul and fused_matmul_reduce_scatter
#127556 opened
May 30, 2024 -
[pipelining] test pipeline_order in schedule
#127559 opened
May 30, 2024 -
Remove unstable ARC jobs
#127563 opened
May 30, 2024 -
Joy dev
#127564 opened
May 30, 2024 -
[Split Build] Make libtorch_global_deps accessible from libtorch wheel
#127570 opened
May 30, 2024 -
[DO NOT MERGE] Testing new runners
#127573 opened
May 30, 2024 -
[AOTInductor] [Tooling] Update NaN and INF Checker for AOTInductor
#127574 opened
May 30, 2024 -
Use freshly traced jit-traced module to be used in export analysis
#127577 opened
May 30, 2024 -
[DO NOT MERGE] Testing additional LF config labels
#127579 opened
May 30, 2024 -
Handle custom op during TorchScript to ExportedProgram conversion
#127580 opened
May 30, 2024 -
[inductor] fix redis-related env vars in remote_cache.py
#127583 opened
May 30, 2024 -
[FSDP2] Fix submesh slicing to enable 3D parallelism
#127585 opened
May 31, 2024 -
[WIP] mark NestedInts as symints instead of symfloats
#127587 opened
May 31, 2024 -
[Do not merge] [Test] Test builder cudnn v9 change
#127589 opened
May 31, 2024 -
[Quant][Inductor] Add get_mutation_names in QLinearPointwiseBinaryPT2E IR
#127592 opened
May 31, 2024 -
[ts migration] support aten::dim, aten::len, aten::__getitem__
#127593 opened
May 31, 2024 -
Disable one of the GPT2 SDPA patterns for single-thread case
#127594 opened
May 31, 2024 -
WIP: fake tensor SymInt support
#127596 opened
May 31, 2024 -
test linear add bias
#127597 opened
May 31, 2024 -
[Inductor][CPP] Support more than one LocalBuffer
#127598 opened
May 31, 2024 -
[fbgemm_gpu] remove deleted embedding_backward_dense_host references
#127599 opened
May 31, 2024 -
Don't find statically linked libs in TorchConfig.cmake.in
#127601 opened
May 31, 2024 -
[WIP] Reuse UT for Intel GPU backend [Part1]
#127602 opened
May 31, 2024 -
[export][unflatten] More strictly respect scope when removing inputs
#127607 opened
May 31, 2024 -
Autoselect default device in FSDP construction.
#127609 opened
May 31, 2024 -
[AOTI] align data_size of the constants
#127610 opened
May 31, 2024 -
[CI] disable td for xpu ci test by default
#127611 opened
May 31, 2024 -
[inductor] custom do_bench_gpu to reduce max-autotune overhead
#127613 opened
May 31, 2024 -
Add functionality to make ViewAndMutationData (slightly more) cache safe
#127618 opened
May 31, 2024 -
Validate tensor storage before a direct access to it
#127619 opened
May 31, 2024 -
[caffe2][be][2/n] migrate gloabl static initializer
#127620 opened
May 31, 2024 -
Updating Module Tracker
#127624 opened
May 31, 2024 -
[Testing only] Flip default on weights_only
#127627 opened
May 31, 2024 -
[RFC] Introduce Checkpointable for DCP (#127540)
#127628 opened
May 31, 2024 -
[dtensor][debug] created example test to print module parameters
#127631 opened
May 31, 2024 -
wip test change
#127632 opened
May 31, 2024 -
[export] Handle serializing duplicate getitem nodes
#127633 opened
May 31, 2024 -
[aten_cuda/flash_attn] Add typename to template argument Kernel_trait…
#127634 opened
May 31, 2024 -
[DSD] Fixes various bugs for broadcast_from_rank0
#127635 opened
May 31, 2024 -
[inductor] parallel-compile: call triton_key() before forking
#127639 opened
May 31, 2024 -
[ONNX] Add quantized layer norm op to opset 17
#127640 opened
May 31, 2024 -
View specialization
#127641 opened
May 31, 2024 -
[ATen][Native] fixes sparse SPMV on aarch64
#127642 opened
May 31, 2024 -
[FSDP] keep paras in torch.distributed.checkpoint.state_dict.set_optimizer_state_dict
#127644 opened
May 31, 2024 -
GGML inspired int8 MM Metal shader
#127646 opened
May 31, 2024 -
Test jit
#127647 opened
May 31, 2024 -
[PT2][Optimus] Improve group batch fusion with same parent/users fusion enablement
#127648 opened
May 31, 2024 -
Relax Symbol check on debug_compile mode
#127650 opened
May 31, 2024 -
[Perf] Provide API to retrieve less data from recorder
#127651 opened
May 31, 2024 -
[CI] Gen xml during run
#127653 opened
May 31, 2024 -
[c10d][BE] fix test_init_pg_and_rpc_with_same_socket
#127654 opened
May 31, 2024 -
[Caffe2] Remove Caffe2 proto and other files
#127655 opened
May 31, 2024 -
[inductor] simplify indexing
#127661 opened
Jun 1, 2024 -
Allow symint inputs to aten.expand_copy and aten.view_copy
#127662 opened
Jun 1, 2024 -
Inductor: Allow small sizes of m for mixed mm autotuning
#127663 opened
Jun 1, 2024 -
adjust thresholds for gluon_inception_v3, beit_base_patch16_224, phli…
#127664 opened
Jun 1, 2024 -
[c10d] add a simple test to demonstrate the user usage of collectives
#127665 opened
Jun 1, 2024 -
[wip][cudagraphs] static input params flag
#127668 opened
Jun 1, 2024 -
[DO NOT MERGE] Fuzzkatt/12 4 inductor tolerance fixes debug
#127669 opened
Jun 1, 2024 -
declare 'static' if the function is not intended to be used outside o…
#127670 opened
Jun 1, 2024 -
[pipelining][review-only] Simple 1F1B schedule
#127673 opened
Jun 1, 2024 -
[AOTI] Switch to use shim v2
#127674 opened
Jun 1, 2024 -
[WIP] add optional boolean arg "true" to cumsum signature
#127675 opened
Jun 1, 2024 -
[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value
#127678 opened
Jun 1, 2024 -
[symbolic shapes] if symbol not in var_to_range default to unknown range
#127681 opened
Jun 1, 2024 -
wip
#127682 opened
Jun 1, 2024 -
Simplify CMake code
#127683 opened
Jun 1, 2024 -
[BE][Ez]: Enable ruff PYI019
#127684 opened
Jun 1, 2024 -
[BE][Ez]: Apply PYI059 - Generic always come last
#127685 opened
Jun 1, 2024 -
[inductor][cpp] align dtype of local buffer with the global one
#127687 opened
Jun 1, 2024 -
[BE]: Enable ruff TCH rules and autofixes for better imports
#127688 opened
Jun 1, 2024 -
[BE] wrap deprecated function/class with `typing_extensions.deprecated`
#127689 opened
Jun 1, 2024 -
Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()`
#127690 opened
Jun 1, 2024 -
Retry of D58015187 Move AsyncCompile to a different file
#127691 opened
Jun 1, 2024
112 Issues closed by 33 people
-
TreeSpec equal logic return False for the same tree in str format
#127447 closed
Jun 1, 2024 -
Upgrade MacOS runner to 14
#127490 closed
Jun 1, 2024 -
UNSTABLE trunk / macos-py3-arm64
#127542 closed
Jun 1, 2024 -
When testing the scalar version, test_torchinductor.py will fail
#126763 closed
Jun 1, 2024 -
[Feature] `_foreach_copy_` supports different src/dst dtype with fusion
#115171 closed
Jun 1, 2024 -
Segmentation fault between Numpy and Pytorch using torch.bmm
#93161 closed
May 31, 2024 -
Type conversion between float/complex
#97888 closed
May 31, 2024 -
PyTorch Distributed Load Updates or Returns `state_dict`
#125096 closed
May 31, 2024 -
[NestedTensor] RelaxedUnspecConstraint failures due to mark_dynamic in NT constructor
#127097 closed
May 31, 2024 -
Build without MKL is not possible when MKL is installed
#32407 closed
May 31, 2024 -
[RFC] rename allow_in_graph to unsafe_allow_in_graph
#122189 closed
May 31, 2024 -
dcp resharding does not work for optimizer state_dict
#91205 closed
May 31, 2024 -
Remove `coordinator_rank` from public fns in distributed checkpointing, e.g. `load`
#119205 closed
May 31, 2024 -
DCP sees 1/2 of the expected size of each tensor in 3D parallel
#126595 closed
May 31, 2024 -
Loading Old Checkpoints with DTensor
#127351 closed
May 31, 2024 -
The training always freezes after some epochs.
#22671 closed
May 31, 2024 -
[AOTI][cpp_wrapper] test_triton_heuristics.py::test_artificial_grid_cpp_wrapper disabled
#123210 closed
May 31, 2024 -
[ONNX] export.export + dynamo_export fail on GLU because decompositions are not run
#125894 closed
May 31, 2024 -
dynamo breaks when getting attributes of builtins
#127172 closed
May 31, 2024 -
[optim] add missing 'maximize' parameter to LBFGS, NAdam and RAdam optimizers
#126642 closed
May 31, 2024 -
[ONNX] Memory leak when exporting a jit model to onnx
#82532 closed
May 31, 2024 -
[ONNX] Simplify fake tensor export logic to get rid of the redundant model argument
#127141 closed
May 31, 2024 -
Do we need an N-dim sub-DeviceMesh?
#126530 closed
May 30, 2024 -
Fix torch._dynamo.exc.Unsupported: builtin: index when nn module inlining enabled.
#127426 closed
May 30, 2024 -
Incorrect output from inductor: tile -> as_strided -> add
#127474 closed
May 30, 2024 -
saved_tensors_hooks auto delete custom attributes
#126676 closed
May 30, 2024 -
Add a `descending` flag to `linalg.eigh` and `linalg.svd`
#58034 closed
May 30, 2024 -
UNSTABLE periodic / linux-jammy-xpu-py3.8 / test (default)
#127539 closed
May 30, 2024 -
Error while parsing dispatch dictionary for NativeFunctions YAML in torchgen
#127405 closed
May 30, 2024 -
Error while parsing NativeFunction from YAML in torchgen
#127406 closed
May 30, 2024 -
Error while parsing DeviceCheckType for NativeFunctions YAML in torchgen
#127407 closed
May 30, 2024 -
Error while parsing function signature for NativeFunctions YAML in torchgen
#127408 closed
May 30, 2024 -
Missing structured delegate function in YAML in torchgen
#127409 closed
May 30, 2024 -
Error while parsing Yaml file in torchgen
#127410 closed
May 30, 2024 -
Error while parsing precomputed field for NativeFunctions YAML in torchgen
#127411 closed
May 30, 2024 -
Error while parsing function parameter with default value for NativeFunctions YAML in torchgen
#127404 closed
May 30, 2024 -
Inconsistent gradients of Conv2d layers when training the same model using CPU and GPU
#127226 closed
May 30, 2024 -
DISABLED test_non_contiguous_input_addmm (__main__.TestMaxAutotune)
#126176 closed
May 30, 2024 -
DISABLED test_activations_bfloat16_half_cpu_cpu_float16 (__main__.TestNNDeviceTypeCPU)
#126177 closed
May 30, 2024 -
DISABLED test_quantization_doc_fx (__main__.TestQuantizationDocs)
#125670 closed
May 30, 2024 -
DISABLED test_quantization_doc_ptsq (__main__.TestQuantizationDocs)
#125669 closed
May 30, 2024 -
DISABLED test_quantization_doc_custom (__main__.TestQuantizationDocs)
#125668 closed
May 30, 2024 -
DISABLED test_quantization_doc_ptdq (__main__.TestQuantizationDocs)
#125667 closed
May 30, 2024 -
PyTorch nightly docs build hasn't run since 4/8?
#127527 closed
May 30, 2024 -
Check for error messages on torch.compile with pybind'ed functions
#126799 closed
May 30, 2024 -
torch.svd_lowrank does not work for complex matrices.
#122188 closed
May 30, 2024 -
error: ‘False’ was not declared in this scope
#127392 closed
May 30, 2024 -
Bad error message for aten::_local_scalar_dense on meta tensor
#119588 closed
May 30, 2024 -
Document torch.Tensor legacy constructor
#122408 closed
May 29, 2024 -
torch.fx.passes.split_module.split_module doesn't support dynamic shapes
#103539 closed
May 29, 2024 -
[While_loop] How to use layer like `torch.nn.BatchNorm2d` with while_loop?
#127320 closed
May 29, 2024 -
Add `str` type to `device` parameter of `torch.cuda.get_device_name()` on the doc
#126400 closed
May 29, 2024 -
[MPS] `.to('mps')` zeroes out elements in tensors taking up >=2^32 bytes
#96716 closed
May 29, 2024 -
Fix "failed running function instead of failed running module" when nn module inlining enabled.
#125605 closed
May 29, 2024 -
[docs] scaled_dot_product_attention is_causal description is misleading
#126873 closed
May 29, 2024 -
running opcheck leads to `Fail to import hypothesis in common_utils, tests are not derandomized` print
#126871 closed
May 29, 2024 -
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda11_8
#104727 closed
May 29, 2024 -
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda12_1-test / test
#127288 closed
May 29, 2024 -
MAX-Autotune Compilation Time Regression Due To Added MM Configs
#125687 closed
May 29, 2024 -
UNSTABLE linux-binary-manywheel / manywheel-py3_8-cuda12_4-test / test
#127289 closed
May 29, 2024 -
AOTI doesn't produce meaningful response w/ CPU backend on Linux x86
#123990 closed
May 29, 2024 -
[torch.compile] `index_select` out of bound read
#121251 closed
May 29, 2024 -
[pipelining] Add back support for multi-use parameters/buffers
#126626 closed
May 29, 2024 -
install cuda version always get cpuonly
#106565 closed
May 28, 2024 -
test_view_dynamic_zero_dim no longer testing zero input
#105066 closed
May 28, 2024 -
DISABLED test_some_outputs_dont_require_grad_view (__main__.TestAOTAutograd)
#125593 closed
May 28, 2024 -
DISABLED test_some_output_requires_grad_input_doesnt (__main__.TestAOTAutograd)
#125402 closed
May 28, 2024 -
DISABLED test_schema_correctness_nn_functional_conv3d_cuda_complex128 (__main__.TestSchemaCheckModeOpInfoCUDA)
#114573 closed
May 28, 2024 -
DISABLED test_view_and_inplace_view (__main__.TestAOTAutograd)
#125671 closed
May 28, 2024 -
DISABLED test_aot_export_multiple_outputs_require_grad_banned (__main__.TestAOTExport)
#124221 closed
May 28, 2024 -
Profiler reports different # of Calls depending on group_by_stack_n
#83737 closed
May 28, 2024 -
DISABLED test_profiler (__main__.TestJit)
#65521 closed
May 28, 2024 -
In profiler, recorded block's total time can be less than the operators within the block
#43868 closed
May 28, 2024 -
RuntimeError: MPS device does not support bmm for non-float inputs
#127178 closed
May 28, 2024 -
Data race in RecordFunction::callbackShouldRun
#58452 closed
May 28, 2024 -
Segmentation fault when importing `sklearn.model_selection`
#127192 closed
May 28, 2024 -
UNSTABLE pull / linux-focal-cuda12.4-py3.10-gcc9 / build
#127108 closed
May 28, 2024 -
pyyaml dependency error on Mac with source installation
#127158 closed
May 28, 2024 -
Overly demanding PyTorch CLA
#127285 closed
May 28, 2024 -
A UserWarning occurs after CBAM attention is added
#127198 closed
May 28, 2024 -
CLA required despite being existing contributor
#127272 closed
May 28, 2024 -
Tensor slicing reducing dimensionality of tensor
#127236 closed
May 28, 2024 -
torch.linalg.inv returns constantly the identity matrix for input on GPU
#127281 closed
May 28, 2024 -
Tensor Parallel cannot work when tp mesh size is 1
#127213 closed
May 28, 2024 -
Symbolic shapes unable to reason: Ne(Mod(u0*u2 + u1*u2, u0 + u1), 0)
#125307 closed
May 28, 2024 -
fatal: not a git repository: '.git'
#127188 closed
May 28, 2024 -
DISABLED test_transformerdecoder (__main__.TestNN)
#126043 closed
May 28, 2024 -
DISABLED test_retrace_export_cond_simple_cuda_float32 (__main__.TestHOPCUDA)
#123564 closed
May 27, 2024 -
DISABLED test_conv_empty_input_cpu_float32 (__main__.TestNNDeviceTypeCPU)
#126091 closed
May 27, 2024 -
DISABLED test_activations_bfloat16_half_cpu_cpu_bfloat16 (__main__.TestNNDeviceTypeCPU)
#126090 closed
May 27, 2024 -
DISABLED test_correctness_Adam_use_closure_False_cuda_float32 (__main__.CompiledOptimizerParityTestsCUDA)
#126076 closed
May 27, 2024 -
DISABLED test_correctness_Rprop_use_closure_False_cuda_float32 (__main__.CompiledOptimizerParityTestsCUDA)
#126077 closed
May 27, 2024 -
[Distributed] gloo backend, barrier operation is even slower than broadcast
#127179 closed
May 27, 2024 -
Time taken to data loading increased in newer builds (ARM)
#124922 closed
May 27, 2024 -
Importing torch after TensorFlow results in std::runtime_error
#101152 closed
May 27, 2024 -
Improve discoverability of meta function registration in documentation
#126337 closed
May 27, 2024 -
DISABLED test_min_cut_partitioner_output_tensor_shape_tensor (__main__.TestPartitioning)
#104326 closed
May 27, 2024 -
DISABLED test_unbacked_cat_backwards_cuda (__main__.TestInductorDynamicCUDA)
#125019 closed
May 27, 2024 -
DISABLED test_full_tensor_sync (__main__.DTensorTest)
#125366 closed
May 27, 2024 -
DISABLED test_depthwise_convolution (__main__.DistConvolutionOpsTest)
#125565 closed
May 27, 2024 -
DISABLED test_pad_dynamic_cuda (__main__.TestInductorDynamicCUDA)
#124276 closed
May 27, 2024 -
DISABLED test_all_gather_object (__main__.TestObjectCollectives)
#125566 closed
May 27, 2024 -
DISABLED test_dtensor_device_mesh_device_conversion (__main__.DTensorMeshTest)
#100126 closed
May 27, 2024 -
Dynamo benchmarks direct arg passing doesn't work
#126543 closed
May 26, 2024 -
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
#127181 closed
May 26, 2024 -
torch compile failure with bool inputs with CPP backend
#126824 closed
May 25, 2024 -
CMake -DBUILD_PYTHON=ON doesn't make & install Python bindings.
#126831 closed
May 25, 2024
104 Issues opened by 79 people
-
UNSTABLE inductor / cuda12.4-py3.10-gcc9-sm86 / test (dynamic_inductor_timm)
#127680 opened
Jun 1, 2024 -
RuntimeError: `jit.freeze` fails to find externally assigned attributes
#127679 opened
Jun 1, 2024 -
Key error in index_propagation when looking up dynamic shape vr
#127677 opened
Jun 1, 2024 -
module 'torch.mps' has no attribute 'device'
#127676 opened
Jun 1, 2024 -
Add line number to ` _warn_capture_scalar_outputs():`
#127667 opened
Jun 1, 2024 -
Strange clamp assert error when building on Fedora 40/gcc 14 in IndexKernel.hip
#127666 opened
Jun 1, 2024 -
Inductor generates unnecessary allocation + copy operations for custom ops with mutable inputs
#127660 opened
May 31, 2024 -
DISABLED test_workspace_allocation_error (__main__.CudaGraphTreeTests)
#127657 opened
May 31, 2024 -
Illegal memory access resulted from pointwise autotuning of a cat-like kernel
#127652 opened
May 31, 2024 -
OneCycleLR Example
#127649 opened
May 31, 2024 -
[export] Errors out when unflattening TorchTitan
#127643 opened
May 31, 2024 -
c++ library written with a lot of errors
#127638 opened
May 31, 2024 -
SyntaxError: unterminated string literal (detected at line 1) (<unknown>, line 1)
#127637 opened
May 31, 2024 -
dynamo minifier test test_cpu_cuda_module_after_dynamo fail with nn module inlining.
#127636 opened
May 31, 2024 -
Fix accuracy regression for cspdarknet53 or flakiness associated with cu121 (and potentially cu124)
#127626 opened
May 31, 2024 -
Segfault, possibly due to recursion limit
#127622 opened
May 31, 2024 -
custom_op API: better type anntation for Tuple
#127621 opened
May 31, 2024 -
TORCH_LIBRARY breaks when passing (unexpanded) macro as namespace argument
#127615 opened
May 31, 2024 -
[Bug] Data on CPUs Are Not Synchronized Before Subsequent Operations
#127612 opened
May 31, 2024 -
Error in gradient of CTCLoss
#127608 opened
May 31, 2024 -
` to be a quantized tensor. Is this likely due to missing support for quantized `onnx::Pad`
#127606 opened
May 31, 2024 -
RuntimeError: CUDA error: an illegal memory access was encountered with specific input shape and Conv1d
#127605 opened
May 31, 2024 -
The data was supposed to be on the GPU, but was mistakenly placed on the CPU.
#127603 opened
May 31, 2024 -
pip install torch==2.1.0 fails to be found
#127591 opened
May 31, 2024 -
Equivalent index expression results in 50% perf difference in a triton kernel
#127581 opened
May 30, 2024 -
AOTAutograd: allow input mutations in the bw that occur under no_grad
#127572 opened
May 30, 2024 -
[export] Cannot mutate tensors with frozen storage
#127571 opened
May 30, 2024 -
Nightly s390 manywheel-py3_8-cpu-s390x-build is failing
#127568 opened
May 30, 2024 -
AOTAutograd: the partitioner should not move mutations from the backward into the forward.
#127561 opened
May 30, 2024 -
[libtorch] duplicate symbol of vtable
#127560 opened
May 30, 2024 -
Reserving this issue to test something, don't mind me
#127555 opened
May 30, 2024 -
ddp graph splitting not happening when nn modules inlined.
#127552 opened
May 30, 2024 -
TestNestedTensorSubclassCPU::test_chunk_cpu fails in debug mode
#127546 opened
May 30, 2024 -
RuntimeError: derivative for aten::_spdiags is not implemented
#127541 opened
May 30, 2024 -
DISABLED test_cusparse_multiple_threads_same_device (__main__.TestCuda)
#127536 opened
May 30, 2024 -
Can we have a Y-Split module for torch.nn.Sequential?
#127535 opened
May 30, 2024 -
Compilation from source fails (PYTORCH 1.13.1)
#127534 opened
May 30, 2024 -
CUDA 12.5
#127532 opened
May 30, 2024 -
Triton `kernel.run` segfaults when passed non-default stream
#127531 opened
May 30, 2024 -
Run rstcheck on modified docstrings and docs as additional linter
#127530 opened
May 30, 2024 -
Multiple unhandled exceptions in weights_only unpickler
#127525 opened
May 30, 2024 -
SDPA memory efficient and flash attention kernels don't work with singleton dimensions
#127523 opened
May 30, 2024 -
Modeling ViT does not support quantized models
#127521 opened
May 30, 2024 -
[RFC] Load and register rendezvous backends dynamically as plugins at runtime
#127519 opened
May 30, 2024 -
THPVariable_Check(list_elem) INTERNAL ASSERT FAILED
#127518 opened
May 30, 2024 -
`assume_constant_result` does not work with method of `UnspecializedNNModuleVariable`
#127509 opened
May 30, 2024 -
[BUG] Using custome backend for torch.compile give nothing outputs
#127502 opened
May 30, 2024 -
Avoid Having to Register Op For ExternKernelChoice of Aten Refs
#127500 opened
May 30, 2024 -
torch.compiler docstrings can be derived from _dynamo and _inductor docs
#127497 opened
May 30, 2024 -
Multiplying a sparse CSR tensor by a strided vector nondeterministically fails on MacOS ARM64
#127491 opened
May 30, 2024 -
Cannot fakeify a tensor who's .grad field is a tensor subclass
#127470 opened
May 29, 2024 -
Add way to annotate a raw triton kernel using the triton kernel HOP so that it becomes make_fx-traceable
#127452 opened
May 29, 2024 -
PyTorch version agnostic C++ extensions
#127445 opened
May 29, 2024 -
address TODO: model is somehow not being freed when z3 is available
#127444 opened
May 29, 2024 -
Ensure the Python custom ops tutorial actually runs in PyTorch 2.4
#127443 opened
May 29, 2024 -
Release 2.3.1 validations checklist and cherry-picks
#127441 opened
May 29, 2024 -
UNSTABLE inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm)
#127438 opened
May 29, 2024 -
Crash in flatbuffer_loader during mobile model load from python API
#127434 opened
May 29, 2024 -
index_select op not implemented on Vulkan backend
#127422 opened
May 29, 2024 -
sys.maxsize special case doesn't work if you slightly offset the ranges
#127396 opened
May 29, 2024 -
Unable to quantize resnet50 model with post training static quantization
#127391 opened
May 29, 2024 -
Backwards pass through Beta distribution rsample gives inf for 4 < alpha - 2**16 < 1040, beta = 3/2
#127387 opened
May 29, 2024 -
Different tensor strides can result in surprisingly large discrepancies in Conv2d outputs
#127375 opened
May 29, 2024 -
Torch.compile produces Exception: Please convert all Tensors to FakeTensors first or instantiate
#127374 opened
May 29, 2024 -
DISABLED test_dtensor_op_db_bmm_cpu_float32 (__main__.TestDTensorOpsCPU)
#127373 opened
May 29, 2024 -
[Feature Request] switch amx isa detection in onednn to cpuinfo
#127368 opened
May 29, 2024 -
torch.export.export() throws out an error when dealing weighttying model.
#127357 opened
May 28, 2024 -
`flake8: noqa` disables flake8 linter for the whole file and it's not obvious
#127352 opened
May 28, 2024 -
Dynamo should prune non-live captured variables
#127350 opened
May 28, 2024 -
~PyTorch Docathon H1 2024!~
#127345 opened
May 28, 2024 -
DISABLED test_comprehensive_fft_ifft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#127344 opened
May 28, 2024 -
DISABLED test_arange2_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#127343 opened
May 28, 2024 -
Add option for custom ops to automatically get a FakeTensor kernel (during static shapes)
#127337 opened
May 28, 2024 -
Inductor fails at assert self.symbol_to_source.get(expr)
#127328 opened
May 28, 2024 -
torch.compile reorder_for_compute_comm_overlap sink_waits pass does not work
#127324 opened
May 28, 2024 -
torch.compile (inductor) bug random signed number generation
#127310 opened
May 28, 2024 -
I don’t know if it’s a problem with cuda or pytorch
#127299 opened
May 28, 2024 -
Fused AdamW maybe should accept lr_dict directly?
#127284 opened
May 28, 2024 -
[BUG][JIT] `torch.jit.script` is not compatible with `DeprecationWarning` and `FutureWarning`
#127283 opened
May 28, 2024 -
torch.fx.symbolic_trace doesn't support many Callable types
#127282 opened
May 28, 2024 -
Python sometimes crashes inexplicably, with "/var/log/kernel. log" displaying index errors
#127275 opened
May 28, 2024 -
DISABLED test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_False_cuda (__main__.TestLinalgCUDA)
#127273 opened
May 28, 2024 -
ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler' when using triton 2.3.x
#127271 opened
May 28, 2024 -
Not supporting higher protobuf versions
#127270 opened
May 28, 2024 -
make_graphed_callables don't work with FSDP at all even on a simple network
#127225 opened
May 27, 2024 -
RuntimeError: grid_sampler_2d_cpu not implemented for Half
#127224 opened
May 27, 2024 -
[BUG] torch.linalg.lstsq returning wrong result.
#127223 opened
May 27, 2024 -
Migrate `CalculateSmallVectorDefaultInlinedElements` to constexpr Function for SmallVector
#127222 opened
May 27, 2024 -
lacking checking for ConvTranspose's parameters when running with GPUs
#127221 opened
May 27, 2024 -
Comparing dynamic shapes fails with KeyError
#127217 opened
May 27, 2024 -
No way to config profiling scope in torch.autograd.profile
#127215 opened
May 27, 2024 -
DISABLED test_large_mmaped_weights_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#127202 opened
May 27, 2024 -
torch.topk results differ on CPU and CUDA
#127196 opened
May 26, 2024 -
Should `torch.Size` convert np.ndarrays to lists of ints?
#127194 opened
May 26, 2024 -
Unify async_save and sync_save in state_dict_saver from distributed checkpointing
#127191 opened
May 26, 2024 -
`store_param_remainders` from Apex DistributedFusedAdam
#127189 opened
May 26, 2024
383 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Complete revamp of float/promotion sympy handling
#126905 commented on
May 31, 2024 • 78 new comments -
[CI] Add freezing for cpu inductor accuracy test in inductor CI
#124715 commented on
May 31, 2024 • 38 new comments -
[Autograd] Cond Higher-Order Operation
#126911 commented on
Jun 1, 2024 • 28 new comments -
[quant][pt2e][quantizer] Support `set_module_name_qconfig` in X86InductorQuantizer
#126044 commented on
May 31, 2024 • 28 new comments -
[Inductor] support masked vectorization for the tail_loop
#126526 commented on
May 30, 2024 • 24 new comments -
[executorch hash update] update the pinned executorch hash
#123043 commented on
Jun 1, 2024 • 21 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
Jun 1, 2024 • 21 new comments -
ROCm 6.x appears Cannot find CO in the bundle libhipblaslt.so for ISA
#127169 commented on
May 30, 2024 • 18 new comments -
[CI] add xpu test in periodic workflow
#126410 commented on
Jun 1, 2024 • 17 new comments -
Memory Tracker for tracking Module wise memory
#124688 commented on
Jun 1, 2024 • 16 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
May 30, 2024 • 16 new comments -
[BE]: Update cudnn to 9.1.0.70
#123475 commented on
Jun 1, 2024 • 15 new comments -
[aoti] Add initial custom op support
#127034 commented on
May 31, 2024 • 15 new comments -
[dtensor] local_map UX change: keep func signature and be compatible with Tensor input
#126924 commented on
Jun 1, 2024 • 12 new comments -
[inductor] online softmax
#127011 commented on
May 28, 2024 • 12 new comments -
[inductor][cpp] bf16/fp16 gemm template computed with fp32 w/o epilogue fusion
#126068 commented on
Jun 1, 2024 • 12 new comments -
Opt model save and load
#126374 commented on
Jun 1, 2024 • 11 new comments -
[RFC] Add support for device extension autoloading
#127074 commented on
Jun 1, 2024 • 11 new comments -
torch.export.export() fails with dynamic shapes when more than one shape is dynamic
#126127 commented on
May 31, 2024 • 11 new comments -
Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept
#126376 commented on
May 31, 2024 • 10 new comments -
General MPS op coverage tracking issue
#77764 commented on
Jun 1, 2024 • 10 new comments -
T5 -small Dynamic quantization in graviton3
#127062 commented on
May 30, 2024 • 9 new comments -
Enable UFMT on test_shape_ops.py test_show_pickle.py test_sort_and_select.py
#127165 commented on
May 31, 2024 • 9 new comments -
fix constant folding with buffer mutation
#123909 commented on
May 28, 2024 • 9 new comments -
Beef up the allow_in_graph docs
#127117 commented on
Jun 1, 2024 • 9 new comments -
[torchbind] always fakify script object by default in non-strict export
#127116 commented on
Jun 1, 2024 • 9 new comments -
[Inductor][CPP] Add Min/Max with VecMask
#126841 commented on
May 27, 2024 • 8 new comments -
skip hf_T5_generate in dynamic shape test
#121129 commented on
May 31, 2024 • 8 new comments -
Set simdlen based on ATEN_CPU_CAPABILITY
#123514 commented on
May 30, 2024 • 8 new comments -
First version of AOTAutogradCache
#126791 commented on
May 31, 2024 • 8 new comments -
enable device index check for all device types
#126767 commented on
May 31, 2024 • 8 new comments -
LambdaLR has incorrect multiplicative behavior when using torch.tensor LR
#126854 commented on
May 30, 2024 • 8 new comments -
Extend Fake Tensor Caching to Symints
#126411 commented on
May 28, 2024 • 8 new comments -
CUDNN not detected in official nightly devel image. We need to use `12.4.1` upstream images
#126005 commented on
May 31, 2024 • 8 new comments -
Update _dedup_save_plans.py
#126569 commented on
Jun 1, 2024 • 8 new comments -
[ROCm] TunableOp improvements
#124362 commented on
Jun 1, 2024 • 7 new comments -
Enable UFMT format on test/test_nn.py
#126663 commented on
May 31, 2024 • 7 new comments -
[dtensor] implement scatter op with simple replication
#126713 commented on
Jun 1, 2024 • 7 new comments -
Fix decorators skipping NCCL tests
#122397 commented on
May 29, 2024 • 7 new comments -
sdp::SDPBackend::flash_attention support PrivateUse1
#126392 commented on
May 31, 2024 • 7 new comments -
[inductor] add cpp builder code. (take 2)
#125849 commented on
Jun 1, 2024 • 7 new comments -
[feature request] `quantized::linear_dynamic` on CUDA/eager, and other quantized and low-level int8 operators (matmul, gemm etc) on CUDA + integrate LLM.int8 + integrate ZeroQuant?
#69364 commented on
May 30, 2024 • 7 new comments -
[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80
#125343 commented on
Jun 1, 2024 • 7 new comments -
[dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule
#126578 commented on
Jun 1, 2024 • 7 new comments -
Support aten operations with out tensor
#124926 commented on
May 31, 2024 • 7 new comments -
Make TraceUtils.h to be device-agnostic
#126969 commented on
Jun 1, 2024 • 6 new comments -
Move the build of AOTriton to base ROCM docker image.
#127012 commented on
Jun 1, 2024 • 6 new comments -
extend `nonzero` to int64
#125850 commented on
May 31, 2024 • 6 new comments -
[CUDNN] Remove defunct cuDNN V8 API build flag
#120006 commented on
Jun 1, 2024 • 6 new comments -
[WIP] Warn on future divergent behavior for conditional views
#126129 commented on
Jun 1, 2024 • 6 new comments -
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on
May 29, 2024 • 6 new comments -
[Submodule] Remove glog dependency
#126768 commented on
Jun 1, 2024 • 6 new comments -
PT2 Inductor ComboKernels
#124969 commented on
Jun 1, 2024 • 6 new comments -
[compiled autograd][cudagraphs] Inputs runtime wrapper to move cpu scalars to cuda
#125382 commented on
Jun 1, 2024 • 5 new comments -
Introduce Inductor passes to micro-pipeline all-gather-matmul and matmul-reduce-scatter in certain cases
#126598 commented on
Jun 1, 2024 • 5 new comments -
CUDA 12.4 CI Inductor Issues
#126692 commented on
Jun 1, 2024 • 5 new comments -
[DTensor] `clip_grad_norm_` follow-ups
#121020 commented on
May 31, 2024 • 5 new comments -
Compiling extension on pytorch nighlty image is broken
#125879 commented on
Jun 1, 2024 • 5 new comments -
PyTorch's Execution Trace Observer Producing Invalid JSON
#126703 commented on
Jun 1, 2024 • 5 new comments -
refine fp32 precision api
#125888 commented on
May 30, 2024 • 5 new comments -
[Inductor][CPP] Enable Local Buffer for Outer loop fusion
#126967 commented on
May 31, 2024 • 5 new comments -
[ONNX] Skip assertion nodes
#126889 commented on
May 29, 2024 • 5 new comments -
[BE]: Update NCCL submodule to 2.21.5
#124014 commented on
May 31, 2024 • 5 new comments -
[Split Build] Test split build in CI
#126699 commented on
Jun 1, 2024 • 5 new comments -
Pytorch 2.0 installation tutorial does not work under Macbook
#96073 commented on
May 30, 2024 • 4 new comments -
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on
May 26, 2024 • 4 new comments -
[dynamo] Automatically convert loop bodies to function calls
#113538 commented on
May 31, 2024 • 4 new comments -
MPS device can train or evaluate models producing unacceptable output due to "fast math" optimization
#84936 commented on
May 31, 2024 • 4 new comments -
PyTorch DataLoader improvements for Iterable Dataset
#127072 commented on
Jun 1, 2024 • 4 new comments -
Refresh OpOverloadPacket if a new OpOverload gets added
#126863 commented on
May 29, 2024 • 4 new comments -
dynamo doesn't support `__torch_function__` on non-tensor classes
#127174 commented on
May 31, 2024 • 4 new comments -
IPEX as TorchDynamo Backend Performance Dashboard
#101273 commented on
May 27, 2024 • 4 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
May 31, 2024 • 4 new comments -
add core tag to linalg_vector_norm
#125789 commented on
May 31, 2024 • 4 new comments -
[torchbind] support torch.compile with aot_eager backend
#127114 commented on
May 31, 2024 • 4 new comments -
Setting static arguments in `torch.compile()`
#121299 commented on
May 29, 2024 • 4 new comments -
`torch.compiler.allow_in_graph` does not create a `call_module` op in fx.Graph in torch 2.3.0
#126566 commented on
May 29, 2024 • 4 new comments -
Add aten._unsafe_masked_index
#116491 commented on
May 30, 2024 • 4 new comments -
[dtensor] standardize multi mesh-dim strategy with utils
#126712 commented on
Jun 1, 2024 • 3 new comments -
[xla hash update] update the pinned xla hash
#126672 commented on
May 27, 2024 • 3 new comments -
fix multiprocessing.spawn doc issue
#126902 commented on
May 28, 2024 • 3 new comments -
Triton Matrix Multiplication example invalid results (return zeros) on Volta
#127157 commented on
May 30, 2024 • 3 new comments -
Move MKLDNN Specific IR to Separate File
#126504 commented on
May 27, 2024 • 3 new comments -
[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100
#126614 commented on
May 30, 2024 • 3 new comments -
make_graphed_callables fails when Module or function does not contain parameters
#124582 commented on
May 28, 2024 • 3 new comments -
[dynamo][numpy] Add unsigned integer dtypes
#125717 commented on
May 31, 2024 • 3 new comments -
Invalidate StorageImpl instances when tensor is overwritten with cudagraphs
#125264 commented on
Jun 1, 2024 • 3 new comments -
delete inductor config.trace.compile_profile
#127143 commented on
May 29, 2024 • 3 new comments -
Add unfold support for MaskedTensor
#125262 commented on
May 30, 2024 • 3 new comments -
add uuid in cudaDeviceProperties
#125083 commented on
May 29, 2024 • 3 new comments -
move label_to_label.yml to an issue?
#126714 commented on
May 29, 2024 • 3 new comments -
[dynamo] Trace through invalid bool tensor operations properly
#127003 commented on
Jun 1, 2024 • 3 new comments -
[Dynamo] Log backward graph compilation metrics
#126629 commented on
May 31, 2024 • 3 new comments -
[NJT] Allow construction of NJT within graph using offsets from inputs
#124624 commented on
May 28, 2024 • 3 new comments -
[v2.3.1] Release Tracker
#125425 commented on
May 28, 2024 • 3 new comments -
ROCm loses some supported GPUs by requiring hipblaslt
#119081 commented on
May 31, 2024 • 3 new comments -
`@functools.wraps` graph breaks in many cases where we should be able to handle it
#123365 commented on
May 28, 2024 • 3 new comments -
Made some minor improvements to flexattention perf + added more autotune configs
#126811 commented on
May 28, 2024 • 3 new comments -
[Inductor][CPP] Add ne with VecMask
#126940 commented on
May 27, 2024 • 3 new comments -
s390x: build s390x binaries on each pull request
#125399 commented on
May 29, 2024 • 2 new comments -
Allow multiple cudagraph recordings per compiled graph
#126822 commented on
Jun 1, 2024 • 2 new comments -
inplace parameter in dropouts should function as expected regardless of the value of training(or train) paramter
#126755 commented on
May 31, 2024 • 2 new comments -
[XPU] Add xpu support of `make triton`
#126513 commented on
May 29, 2024 • 2 new comments -
[triton hash update] update the pinned triton hash
#115529 commented on
May 27, 2024 • 2 new comments -
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on
May 31, 2024 • 2 new comments -
torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict)
#120219 commented on
Jun 1, 2024 • 2 new comments -
[Bazel] Build shared libraries to reduce total binary size
#126031 commented on
May 30, 2024 • 2 new comments -
[nn.functional] Set default value of Optional params of batch_norm to None
#122717 commented on
May 28, 2024 • 2 new comments -
[inductor][cpu]detectron2_maskrcnn_r_101_fpn detectron2_maskrcnn_r_50_c4 accuracy check failed when freezing flag is on.
#127073 commented on
Jun 1, 2024 • 2 new comments -
Disable atomic_add fallback for cpu in index_put lowering
#122873 commented on
May 27, 2024 • 2 new comments -
Add Ability to Skip Hipification When Building CUDA Extension
#127015 commented on
May 28, 2024 • 2 new comments -
[SDPA/memeff] Backport changes from xFormers to PT
#127090 commented on
May 31, 2024 • 2 new comments -
xpu: implement grid_sample op for XPU (fallback to CPU not possible for fp16 and bf16)
#127002 commented on
May 30, 2024 • 2 new comments -
```FlopCounterMode``` returns 0 when inference mode is on during forwardpropagation.
#126268 commented on
May 30, 2024 • 2 new comments -
[PT2E Quantization] `prepare_pt2e` produces inconsistent data types for primitive int
#127076 commented on
May 30, 2024 • 2 new comments -
Move loop ordering after fusion
#126255 commented on
May 30, 2024 • 2 new comments -
BUG: `torch.pow(-0.0)` wrongly returns `-0.0` (should be +0.0)
#127163 commented on
May 28, 2024 • 2 new comments -
libtorch mimalloc Debug Warning with WinError
#125840 commented on
May 30, 2024 • 2 new comments -
[VIT Model] [perf Degradation] [X86] [ARM] torch.compile + weight prepacking results in perf degradation for VIT Transformer model
#126391 commented on
May 28, 2024 • 2 new comments -
[feature][cudagraph] API to clear a bad recording
#127147 commented on
May 29, 2024 • 2 new comments -
[inductor][cpu]GPT2ForSequenceClassification AMP static/dynamic shape default/cpp wrapper single thread accuracy crash
#123503 commented on
May 30, 2024 • 2 new comments -
NCCL watchdog thread terminated with exception
#113128 commented on
May 30, 2024 • 2 new comments -
pytorch is built without _GLIBCXX_USE_CXX11_ABI and can cause std::regex crashes (probably)
#50779 commented on
May 26, 2024 • 2 new comments -
fixed torch.where and torch.nonzero issue
#123024 commented on
May 28, 2024 • 1 new comment -
Parts of https://download.pytorch.org not updated anymore, broken links and missing versions
#121837 commented on
May 29, 2024 • 1 new comment -
Register lowering for `_foreach_norm.Scalar`
#123051 commented on
May 30, 2024 • 1 new comment -
[BASE_MODULE] removing base_module for glow/fb/test:test_utils
#123092 commented on
May 31, 2024 • 1 new comment -
[TP] Add tests for wildcard support
#123101 commented on
May 31, 2024 • 1 new comment -
[Pytorch] Replace nn.module call to `train()` for executorch models (#757)
#122532 commented on
Jun 1, 2024 • 1 new comment -
add testing support for Vectorized<Half>
#123132 commented on
Jun 1, 2024 • 1 new comment -
Add quantized.linear_unpacked_dynamic_fp16
#122509 commented on
May 26, 2024 • 1 new comment -
[WIP] Add comm fusion pass
#122505 commented on
May 26, 2024 • 1 new comment -
dynamo_export fails for for NVIDIA NeMo framework classes decorated with wrapt package
#125462 commented on
May 29, 2024 • 1 new comment -
torch._inductor.config.max_autotune_gemm_backends = "TRITON" crashes with Convolution layer
#125728 commented on
May 29, 2024 • 1 new comment -
Add DtypeContext
#122481 commented on
May 28, 2024 • 1 new comment -
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM: `RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with dtype 'float'.`
#111317 commented on
May 29, 2024 • 1 new comment -
[export] Restore original placeholder names
#122452 commented on
May 27, 2024 • 1 new comment -
segfault when registering op with mismatched schema
#127102 commented on
May 29, 2024 • 1 new comment -
Ensure profiler record function callback always increments "record function id".
#123755 commented on
May 27, 2024 • 1 new comment -
[TEST] assume no aligned inputs
#122159 commented on
May 28, 2024 • 1 new comment -
Fix edge cases for gather in inductor
#126893 commented on
May 29, 2024 • 1 new comment -
[example] Example of properly optimizing mm=>layernorm=>gelu
#122687 commented on
May 25, 2024 • 1 new comment -
Modified pointless_convert pass to only apply if it's removing truly useless conversion
#122689 commented on
May 27, 2024 • 1 new comment -
[fx] Preserve Fx graph node order in partitioner across runs
#122696 commented on
May 27, 2024 • 1 new comment -
[wip][export] Add symint input support
#122712 commented on
May 25, 2024 • 1 new comment -
[POC][FSDP2] Ran parameter post acc grad hooks manually
#122719 commented on
May 26, 2024 • 1 new comment -
Allow flight recorder to call elasped_time again.
#122731 commented on
May 27, 2024 • 1 new comment -
[DRAFT] add int4_mm support for ARM using SIMDe
#122764 commented on
May 26, 2024 • 1 new comment -
add testing support for Vectorized<Half>
#122798 commented on
May 27, 2024 • 1 new comment -
[Inductor][Optimus] Fix group batch fusion for AFOC
#122839 commented on
May 27, 2024 • 1 new comment -
[ATen-Vulkan][EZ] Small fixes: fix gpu size calculation and Half scalartype ctype mapping
#122861 commented on
Jun 1, 2024 • 1 new comment -
[dynamo/inductor] guard on whether a tensor is a view, assume non-views are aligned
#122867 commented on
May 27, 2024 • 1 new comment -
Libtorch build for ROCM error: “aten/src/THH” not exist
#126640 commented on
May 29, 2024 • 1 new comment -
Optimize upsample_bilinear2d on channels last format
#122879 commented on
May 31, 2024 • 1 new comment -
[inductor][Autotune] Add matrix_instr_nonkdim to triton_meta (#122852)
#122906 commented on
May 27, 2024 • 1 new comment -
[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat
#122667 commented on
May 28, 2024 • 1 new comment -
[ROCm] skip some SDPA tests
#122914 commented on
May 29, 2024 • 1 new comment -
[AOTInductor] Support quantized linear on CPU with fbgemm
#122939 commented on
May 30, 2024 • 1 new comment -
Fix fuse linear bn for affine=False
#122537 commented on
Jun 1, 2024 • 1 new comment -
[dynamo] Dont run Dynamo on assert* functions
#122993 commented on
May 29, 2024 • 1 new comment -
[dynamo] Detect functionalized functions with HOPs with mutations at tracing time
#126988 commented on
May 29, 2024 • 1 new comment -
torch.compile does not work since 2.2.1 on MacOS for some models
#124497 commented on
May 27, 2024 • 1 new comment -
[checkpoint] Clean up selective activation checkpoint and make public
#125795 commented on
May 31, 2024 • 1 new comment -
JAX + PyTorch produces `OMP: Error #13: Assertion failure at kmp_affinity.cpp(532)`
#97635 commented on
May 27, 2024 • 1 new comment -
Run CompositeImplicit on torch fns during dynamo tracing
#125808 commented on
Jun 1, 2024 • 1 new comment -
Use sleef on macOS Apple silicon by default
#126509 commented on
May 28, 2024 • 1 new comment -
_convert_input_to_fake doesn't work when there are no inputs
#126770 commented on
May 27, 2024 • 1 new comment -
[NT] Implementing Multi-Head Attention with NestedTensors
#125214 commented on
May 27, 2024 • 1 new comment -
❓Different results between normal batching and `vmap` while using lower precision (e.g., bfloat16)
#125534 commented on
May 27, 2024 • 1 new comment -
Support for stereo audio data in from torch.utils.tensorboard.SummaryWriter
#56360 commented on
May 27, 2024 • 1 new comment -
Remove redundant device guard in Resize.h
#126498 commented on
May 31, 2024 • 1 new comment -
Box constraints for optimizers
#22281 commented on
May 27, 2024 • 1 new comment -
NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
#122068 commented on
May 27, 2024 • 1 new comment -
TorchDynamo ONNX Export does not work as expected with masking (ScatterElements)
#126856 commented on
May 27, 2024 • 1 new comment -
Dynamo-based ONNX Export: Failed to produce a graph during tracing as no tensor operations were found.
#123973 commented on
May 26, 2024 • 1 new comment -
Update triton pin to improve throughput w/ assert
#126098 commented on
May 30, 2024 • 1 new comment -
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 commented on
May 26, 2024 • 1 new comment -
Specify version
#105460 commented on
May 26, 2024 • 1 new comment -
[inductor][codegen] Codegen constexpr globals and constexpr annotated globals correctly.
#126195 commented on
May 30, 2024 • 1 new comment -
[aoti] Add support for more custom op input/output types
#126215 commented on
May 28, 2024 • 1 new comment -
[Traceable FSDP][Compiled Autograd] Add queue_callback() support
#126366 commented on
May 29, 2024 • 1 new comment -
Added memory budget to partitioner
#126320 commented on
Jun 1, 2024 • 1 new comment -
Optimize Vectorized<float> exp() with neon simd instructions
#126612 commented on
May 29, 2024 • 1 new comment -
[torchbind] remove test cases that don't fakify script objects
#127113 commented on
May 31, 2024 • 1 new comment -
DISABLED test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers)
#117784 commented on
May 28, 2024 • 1 new comment -
[dynamo] Handle inplace op aliasing errors
#126474 commented on
May 28, 2024 • 1 new comment -
torch.compile crash - Aborted exit code 134
#125804 commented on
May 28, 2024 • 1 new comment -
Many tests under test/distributed/elastic not running in OSS CI
#124317 commented on
May 28, 2024 • 1 new comment -
Tracker: Slow gradcheck failures possibly indicating incorrect gradients
#80411 commented on
May 28, 2024 • 1 new comment -
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on
May 28, 2024 • 1 new comment -
`torch.einsum` docs don't mention that `opt_einsum` must be installed separately
#127109 commented on
May 28, 2024 • 1 new comment -
Modularize aten parameter parser and checker
#125308 commented on
May 31, 2024 • 1 new comment -
[dynamo] Improve support for hasattr for custom __getattr__
#125340 commented on
Jun 1, 2024 • 1 new comment -
A huge difference between the results of torch.round() on the GPU compared to its results on the CPU and other DL libraries
#126975 commented on
May 28, 2024 • 1 new comment -
torch_dispatch mode silent incorrectness with torch.compile
#115653 commented on
May 28, 2024 • 1 new comment -
Fails to compile with nvidia-cuda-toolkit-12.4.0
#122169 commented on
May 28, 2024 • 1 new comment -
[inductor][cpp] support bf16/fp16 gemm template epilogue fusion
#126545 commented on
Jun 1, 2024 • 1 new comment -
add wait_tensor to ops to skip in DCE pass
#126541 commented on
May 31, 2024 • 1 new comment -
aten.bernoulli.p is missing in core aten IR opset but does not get decomposed
#105519 commented on
May 28, 2024 • 1 new comment -
CUDA Extension Bug for Pytorch 2.2.0
#118842 commented on
May 27, 2024 • 1 new comment -
[1/N] Dynamic Shape: To enable export.aot_compile to switch between static shape and dynamic shape
#126517 commented on
May 31, 2024 • 1 new comment -
Segmentation Fault in `torch.nn.Conv1d` starting from torch 2.2.0
#121222 commented on
May 27, 2024 • 1 new comment -
dynamo test: remove the cuda hardcoding and detect device dynamically
#125761 commented on
May 31, 2024 • 1 new comment -
QAT using pytorch-quantization cause accuracy loss after exporting to onnx
#120166 commented on
May 31, 2024 • 1 new comment -
[inductor][cpu]tnt_s_patch16_224 and functorch_dp_cifar10 fP32 static default wrapper performance regression
#119178 commented on
May 30, 2024 • 1 new comment -
[MPS] Improve the performance of torch.linear()
#91737 commented on
May 31, 2024 • 1 new comment -
Why is AvgPool2D taking longer than Conv2D for the same input?
#93188 commented on
May 31, 2024 • 1 new comment -
[inductor][cpu]speech_transformer AMP single/multiple thread static/dynamic shape CPP/default wrapper performance regression in 2024-05-12 nightly release
#126274 commented on
May 30, 2024 • 1 new comment -
UNSTABLE inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf)
#126993 commented on
May 30, 2024 • 1 new comment -
[Split Build] Test split build in pull CI workflow
#126813 commented on
Jun 1, 2024 • 1 new comment -
Cudnn 9.1.1 is out!
#119400 commented on
May 31, 2024 • 1 new comment -
[DCP] `set_model_state_dict` errors on compiled module with non-persistent buffer
#122792 commented on
May 31, 2024 • 1 new comment -
aten::nonzero calls taking a huge amount of time when using MPS backend vs CPU
#124850 commented on
May 31, 2024 • 1 new comment -
Support backward hook optimizers in FSDP
#98419 commented on
May 30, 2024 • 1 new comment -
Investigate torch.compile Windows support.
#122094 commented on
May 31, 2024 • 1 new comment -
Previous version not found
#107611 commented on
May 31, 2024 • 1 new comment -
FSDP does not work on GLOO backend
#74041 commented on
May 31, 2024 • 1 new comment -
[bug] Dynamo graph break when using pyton module `heapq` (manipulates with `list`s), although succeeds when placing `heapq.py` near the test script
#106885 commented on
May 31, 2024 • 1 new comment -
torch.compile error
#124044 commented on
May 31, 2024 • 1 new comment -
xpu: provide a way to debug explicit CPU fallback
#126488 commented on
May 30, 2024 • 1 new comment -
xpu: python hangs on exit after check for xpu on multi-dev system
#126259 commented on
May 30, 2024 • 1 new comment -
torch.compile generates wrong code on CPU and compiled code replaces original function
#126848 commented on
May 30, 2024 • 1 new comment -
tensordict functional calls with nn.Module silently gives the wrong (non-functional) result
#127173 commented on
May 30, 2024 • 1 new comment -
ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed.
#123097 commented on
May 30, 2024 • 1 new comment -
Pytorch dataloader not loading first-available data with multiple workers
#105203 commented on
May 30, 2024 • 1 new comment -
[ONNX] view(dtype=dtype) is not supported by both onnx.export and onnx.dynamo_export
#126921 commented on
May 30, 2024 • 1 new comment -
Lowering after pointwise cat can lead to uncontiguous memory accesses
#124002 commented on
May 30, 2024 • 1 new comment -
The doc of `diagonal()` doesn't have such an explanation
#126827 commented on
May 30, 2024 • 1 new comment -
The dynamic compilation does not handle composite shapes with multiple dimensions very effectively
#127162 commented on
May 30, 2024 • 1 new comment -
MultiheadAttention returns NaNs when need_weights=False for long sequences with a mask that ignores old tokens
#127055 commented on
May 30, 2024 • 1 new comment -
[RFC] Enable PyTorch XPU on Native Windows on Intel GPUs
#126719 commented on
May 30, 2024 • 1 new comment -
[dynamo] Dynamic slicing on data-dependent value is not supported (regression from legacy ONNX)
#127154 commented on
May 31, 2024 • 1 new comment -
Multiple Inputs/Outputs with torch.onnx.dynamo_export
#127128 commented on
May 31, 2024 • 1 new comment -
[ONNX] pad_sequence() is not exportable, with neither legacy onnx.export nor with dynamo_export
#127153 commented on
May 31, 2024 • 1 new comment -
[feature request]: Update max onnx opset to 21 for onnxruntime==1.18 compatability
#127167 commented on
May 31, 2024 • 1 new comment -
[WIP] fix calls to `super().__getattr__` nn.Module in dynamo
#126875 commented on
May 31, 2024 • 1 new comment -
UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR
#126605 commented on
May 30, 2024 • 1 new comment -
[torch.compile]: Enhanced Error Reporting and Performance Canary Mode
#126644 commented on
May 31, 2024 • 1 new comment -
Add parameter "half_pixel_center =False" to the Bilinear function
#48389 commented on
May 30, 2024 • 1 new comment -
FSDP `use_orig_params` + full sharding results in missing parameters in gathered state dict
#126310 commented on
May 31, 2024 • 1 new comment -
DISABLED test_comprehensive_special_bessel_y1_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#127080 commented on
May 29, 2024 • 1 new comment -
[fx] Preserve Fx graph node order in partitioner across runs
#115621 commented on
May 29, 2024 • 1 new comment -
test_python_shard.bat -> .sh
#117194 commented on
May 31, 2024 • 1 new comment -
RuntimeError in torch.istft with center=False: Window Overlap Add Issue
#118507 commented on
May 29, 2024 • 1 new comment -
Make watchdog sleep interval auto-adaptive
#118293 commented on
May 26, 2024 • 1 new comment -
[optim] Add support for complex tensors in SparseAdam
#118653 commented on
Jun 1, 2024 • 1 new comment -
Fix jagged NT softmax semantics
#119459 commented on
May 31, 2024 • 1 new comment -
Add decompositions for copy variants of view ops
#119889 commented on
May 30, 2024 • 1 new comment -
[Caffe2]Remove Caffe2 scripts and benchmarks
#126747 commented on
May 30, 2024 • 1 new comment -
Change ATEN generator argument type to const std::optional<Generator>&
#120076 commented on
May 25, 2024 • 1 new comment -
Delete lazy ddp optimizer
#120727 commented on
May 26, 2024 • 1 new comment -
turned on matrix-multiplication => matrix-vector multiplication always on if reduction-dim is contiguous
#120954 commented on
May 27, 2024 • 1 new comment -
DISABLED test_circular_dependencies (__main__.TestImports)
#110040 commented on
May 29, 2024 • 1 new comment -
fix graph deepcopy to also copy `_tracer_extras`
#121171 commented on
May 28, 2024 • 1 new comment -
[Inductor] Add prologue fusion
#121211 commented on
Jun 1, 2024 • 1 new comment -
[NJT] Actually inline NT torch function during dynamo
#121445 commented on
Jun 1, 2024 • 1 new comment -
[DDP] Bucket handling: make first bucket size equal to bucket_cap_mb if it was set
#121640 commented on
May 28, 2024 • 1 new comment -
[no-ci] test pr
#122006 commented on
May 28, 2024 • 1 new comment -
[feature request] torch.kthvalue to support a new argument largest
#29398 commented on
May 30, 2024 • 1 new comment -
[dynamo] Fake tensor impl for Tensor.add_ not checking for errors
#127049 commented on
Jun 1, 2024 • 1 new comment -
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
#123834 commented on
May 29, 2024 • 1 new comment -
Support tracing through _get_current_dispatch_mode_stack
#126789 commented on
May 28, 2024 • 1 new comment -
Compile with non-default mode + triton kernel fails
#126864 commented on
May 29, 2024 • 1 new comment -
[benchmark] Rename the count field FunctionCount
#105471 commented on
May 29, 2024 • 1 new comment -
Switch to CXX11 ABI
#126778 commented on
May 31, 2024 • 0 new comments -
some logging changes
#127005 commented on
May 29, 2024 • 0 new comments -
[Split Build] Use single command to build both wheels
#126590 commented on
May 29, 2024 • 0 new comments -
[halide-backend] Initial implementation of HalideKernel and HalideScheduling
#126417 commented on
Jun 1, 2024 • 0 new comments -
[do-not-review][aot] explicitly pass number of static inputs (buffer/params) to backend
#127006 commented on
Jun 1, 2024 • 0 new comments -
Allow overriding per-dim group options via _MeshEnv.set_dim_group_options
#126599 commented on
Jun 1, 2024 • 0 new comments -
[Inductor] support masked vectorization for the tail_loop
#127166 commented on
May 26, 2024 • 0 new comments -
Enable LSAN
#127171 commented on
May 29, 2024 • 0 new comments -
[Split Build] Test non-linux manywheel builds
#126929 commented on
May 29, 2024 • 0 new comments -
[Split Build][WIP] Change global env variables to cmake args
#126930 commented on
May 29, 2024 • 0 new comments -
[inductor] [for benchmarking] Add a subprocess-based parallel compile
#127088 commented on
May 27, 2024 • 0 new comments -
Enable optimized dynamic quantization on aarch64
#126687 commented on
May 30, 2024 • 0 new comments -
Set NO_DELAY flag in TCPStore
#127042 commented on
Jun 1, 2024 • 0 new comments -
[2/N] Dynamic Shape: Enable dynamic shape support for aoti_eager
#126883 commented on
May 31, 2024 • 0 new comments -
[FSDP2] Added `share_comm_ctx`
#127032 commented on
May 28, 2024 • 0 new comments -
avoid size() dispatch on FunctionalTensor
#126784 commented on
May 31, 2024 • 0 new comments -
[inductor] Enable subprocess-based parallel compile as the default
#126817 commented on
May 29, 2024 • 0 new comments -
Collect static parameter metadata in aot
#126820 commented on
May 30, 2024 • 0 new comments -
Remove unused arg to GraphLowering
#126821 commented on
May 30, 2024 • 0 new comments -
[TD] Test removal on sm86
#127131 commented on
May 29, 2024 • 0 new comments -
[wip, dynamo] trace through FunctionalTensor
#126936 commented on
May 30, 2024 • 0 new comments -
Default traceable subclasses to use swap_tensors path for load_state_dict
#126788 commented on
May 30, 2024 • 0 new comments -
Enable XPU operator codegen via backend whitelist
#126977 commented on
May 29, 2024 • 0 new comments -
reset dynamo cache before each test
#126586 commented on
May 25, 2024 • 0 new comments -
[export] add nonstrict, retrace, serdes patching for _trace._export tests
#126810 commented on
May 28, 2024 • 0 new comments -
Initial commit of flight recorder analyzer
#126726 commented on
Jun 1, 2024 • 0 new comments -
Add private escape hatches to fall back to pre-swap tensors behavior
#126984 commented on
May 30, 2024 • 0 new comments -
[inductor][cpp] add vectorization support for double
#126858 commented on
May 29, 2024 • 0 new comments -
[CI] Ensure inductor/test_cpu_cpp_wrapper is actually run in inductor_cpp_wrapper_abi_compatible
#126717 commented on
Jun 1, 2024 • 0 new comments -
Avoid accessing storage in wrapper tensor
#126878 commented on
May 31, 2024 • 0 new comments -
HealthcheckNCCL
#127044 commented on
May 29, 2024 • 0 new comments -
Save backward graphs lazily to cache
#126999 commented on
May 31, 2024 • 0 new comments -
[RFC] Intel GPU Upstreaming
#114723 commented on
May 27, 2024 • 0 new comments -
Faster Pytorch dequantize() + matmul for quantized models
#115985 commented on
May 29, 2024 • 0 new comments -
[MPS] F.conv1d and F.conv2d produce incorrect gradients when minibatch >= 2^16
#96225 commented on
May 29, 2024 • 0 new comments -
DISABLED test_bmm_multithreaded (__main__.TestTorch)
#125240 commented on
May 29, 2024 • 0 new comments -
DISABLED test_memory_format_type_cuda (__main__.TestTorchDeviceTypeCUDA)
#126954 commented on
May 29, 2024 • 0 new comments -
Choose better configs for `tuned_mixed_mm`
#127056 commented on
May 29, 2024 • 0 new comments -
xpu: support torch.utils.data.DataLoader(pin_memory_device='xpu')
#126491 commented on
May 30, 2024 • 0 new comments -
xpu: can't build XPU backend without sourcing oneAPI environment variables (/opt/intel/oneapi/setvars.sh)
#127008 commented on
May 30, 2024 • 0 new comments -
When testing the scalar version, test_open_device_registration will fail
#126372 commented on
May 30, 2024 • 0 new comments -
When testing the scalar version, test_AllenaiLongformerBase_repro will fail
#126262 commented on
May 30, 2024 • 0 new comments -
[Inductor] Kill Mutation Layout
#118570 commented on
May 30, 2024 • 0 new comments -
Lowering for the Average Pooling 3D backward operation
#127101 commented on
May 30, 2024 • 0 new comments -
UNSTABLE inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm)
#126884 commented on
May 30, 2024 • 0 new comments -
Invalid Reference to Class
#99107 commented on
May 30, 2024 • 0 new comments -
RReLU doc doesn't specify the eval mode behaving just like LeakyReLU
#82677 commented on
May 30, 2024 • 0 new comments -
[ONNX] Do not include concrete tensors when we add initializers
#127140 commented on
May 31, 2024 • 0 new comments -
[ONNX] Support external tensors in ONNXProgram.save()
#127142 commented on
May 31, 2024 • 0 new comments -
[ONNX] ExportedProgram weight serialization support
#127138 commented on
May 31, 2024 • 0 new comments -
[ONNX] Move the graph building logic
#127139 commented on
May 31, 2024 • 0 new comments -
[AOTI] Using `AOTI_TORCH_CHECK` will cause performance drop on several models compared with using `TORCH_CHECK`
#126665 commented on
May 31, 2024 • 0 new comments -
Cpp-wrapper mode issue tracker
#117363 commented on
May 31, 2024 • 0 new comments -
torchrun fails to run on Windows 11
#108602 commented on
May 31, 2024 • 0 new comments -
What is the processing principle when the complex64 input tensor contains nan or inf for addition?
#127075 commented on
May 27, 2024 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
May 28, 2024 • 0 new comments -
randn generates different output for 4x4 tensor size sliced to match shape of direct 2x4 or 4x2 and compare output
#127164 commented on
May 28, 2024 • 0 new comments -
UNSTABLE pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build
#127104 commented on
May 28, 2024 • 0 new comments -
function signature for multiprocessing.spawn is multiprocessing.spawn.spawn
#126899 commented on
May 28, 2024 • 0 new comments -
Segmentation error for torch==2.2.1 on MacOs
#121101 commented on
May 28, 2024 • 0 new comments -
[DSD] keep 'initial_lr' in `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
#126948 commented on
May 28, 2024 • 0 new comments -
Python 3.12 CPU Build system cannot find MKL libraries
#119557 commented on
May 28, 2024 • 0 new comments -
Unexpected MYPY linter errors on CI
#126361 commented on
May 28, 2024 • 0 new comments -
Make the `sccache` cache easily available to all pytorch contributors in readonly mode
#125297 commented on
May 28, 2024 • 0 new comments -
[RFC] Autoload Device Extension
#122468 commented on
May 29, 2024 • 0 new comments -
[RFC] Raising minimal glibc support to: glibc2_28 . Deprecation support for Amazon Linux 2 support for PyTorch Release 2.5
#126551 commented on
May 29, 2024 • 0 new comments -
Add warning messages to provide info about expected performance improvement using cuda for a specific model
#126874 commented on
May 29, 2024 • 0 new comments -
Support for no-frills FP8 matmuls
#123761 commented on
May 29, 2024 • 0 new comments -
Inaccurate filename due to test class metaprogramming that doesn't set __file__ and __name__
#125467 commented on
May 29, 2024 • 0 new comments -
torch.cuda.BoolTensor uses 8 bits per element, not 1 bit as reported by element_size()
#41571 commented on
May 29, 2024 • 0 new comments -
[question] How hard would it be to implement 4-bit precision training?
#49298 commented on
May 29, 2024 • 0 new comments -
No factory functions for strided quantized tensors
#74540 commented on
May 29, 2024 • 0 new comments -
[discussion, idea] Batched, vectorized base64 decoding / encoding + maybe RLE decoding / encoding
#90560 commented on
May 29, 2024 • 0 new comments -
[feature request] Specialized memory layouts and wide blocked/tiled dtypes for cublasLt/onednn: e.g. torch.float16x32 / torch.int8x32 / torch.bits1x512 (akin to torch.quint2x4)
#104702 commented on
May 29, 2024 • 0 new comments -
The performance of multiplication of two matrices is different between window and linux
#26345 commented on
May 31, 2024 • 0 new comments -
[WIP] [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 2)
#124147 commented on
May 31, 2024 • 0 new comments -
[DO NOT MERGE] Test new ROCm CI nodes
#124424 commented on
May 31, 2024 • 0 new comments -
[dynamo] Support ndarray.dtype attribute access
#124490 commented on
May 31, 2024 • 0 new comments -
While loop autograd
#124573 commented on
Jun 1, 2024 • 0 new comments -
[WIP][Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 3)
#124702 commented on
May 31, 2024 • 0 new comments -
[WIP][Inductor] Update Intel GPU Triton commit pin.
#124842 commented on
May 31, 2024 • 0 new comments -
Add Efficient Attention support on ROCM
#124885 commented on
May 28, 2024 • 0 new comments -
S390x ci periodic tests
#125401 commented on
May 30, 2024 • 0 new comments -
[Inductor][ROCm] Composable Kernel backend for Inductor
#125453 commented on
Jun 1, 2024 • 0 new comments -
Uses memory pools for mixing CUDA allocators
#125722 commented on
May 29, 2024 • 0 new comments -
[AOTI][not for review] Test cpp_wrapper mode
#125733 commented on
May 29, 2024 • 0 new comments -
[CI]enable AMP accuracy test for inductor on SPR CPU
#125748 commented on
May 29, 2024 • 0 new comments -
Separate AOTI Eager utils as a single file
#125819 commented on
May 31, 2024 • 0 new comments -
[3/N] Non-Tensor: Support string parameter for aten operations
#125831 commented on
May 31, 2024 • 0 new comments -
[4/N] Non-Tensor: Support layout, device and dtype for aten operations
#125897 commented on
May 31, 2024 • 0 new comments -
Fix tensor subclass + dynamic shapes in torch.compile + aot autograd
#125941 commented on
May 31, 2024 • 0 new comments -
Remove deprecated _aminmax operator
#125995 commented on
May 28, 2024 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
May 30, 2024 • 0 new comments -
Enable UFMT format on test/quantization
#126152 commented on
May 27, 2024 • 0 new comments -
[wip][inductor] move loop ordering after fusion
#126254 commented on
May 29, 2024 • 0 new comments -
[DONT MERGE][dynamo] Turn on inlining of inbuilt nn modules
#126304 commented on
Jun 1, 2024 • 0 new comments -
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on
May 31, 2024 • 0 new comments -
Cost & performance estimation for Windows Arm64 compilation
#92302 commented on
May 31, 2024 • 0 new comments -
torch.nn.AdaptiveAvgPool2d lacks checking of input dimension
#126673 commented on
May 31, 2024 • 0 new comments -
Broken Link and unfinished sentence in Frequently Asked Questions
#126367 commented on
May 31, 2024 • 0 new comments -
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on
May 31, 2024 • 0 new comments -
[BE] wrap deprecated function/class with `typing_extensions.deprecated` for better IDE integration
#126888 commented on
Jun 1, 2024 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Jun 1, 2024 • 0 new comments -
[POC][pytree] test flattening dict in sorted order
#115014 commented on
May 29, 2024 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Jun 1, 2024 • 0 new comments -
Factory function and basic .sizes() support for C++ NestedTensor
#117905 commented on
May 28, 2024 • 0 new comments -
Switch batch norm stack to consolidated ops
#119496 commented on
May 30, 2024 • 0 new comments -
[NJT] Store vec on nested ints
#119976 commented on
May 28, 2024 • 0 new comments -
[NJT] Factory function support
#119977 commented on
May 28, 2024 • 0 new comments -
[AMD] Turn on hipblaslt by default
#120480 commented on
May 31, 2024 • 0 new comments -
Optionally use hipblaslt
#120551 commented on
May 30, 2024 • 0 new comments -
Enable clang-tidy on c10/util/Float8*.h
#120573 commented on
May 26, 2024 • 0 new comments -
[do not review]
#120881 commented on
May 28, 2024 • 0 new comments -
Adds general tensor equality subsystem
#121481 commented on
May 28, 2024 • 0 new comments -
[draft] python 3.13 test
#121979 commented on
May 31, 2024 • 0 new comments -
Improve decomposition for constand_pad_nd
#123661 commented on
May 29, 2024 • 0 new comments -
Add auto-tuning for sparse semi-structured MM operator
#123742 commented on
May 30, 2024 • 0 new comments