-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Insights: triton-inference-server/server
Overview
Could not load contribution data
Please try again later
4 Pull requests merged by 4 people
-
build: Update README and versions for 2.47.0 / 24.06
#7334 merged
Jun 12, 2024 -
test: Add test for sequence flags in ensemble streaming inference
#7344 merged
Jun 12, 2024 -
test: Python models filtering outputs based on requested outputs
#7338 merged
Jun 12, 2024 -
test: Fix the test to expect updated error messages
#7340 merged
Jun 12, 2024
4 Pull requests opened by 3 people
-
revert: "Change TensorRT-LLM (#7143)"
#7341 opened
Jun 12, 2024 -
test: Remove AWS bucket on test failure
#7342 opened
Jun 12, 2024 -
test: Python models filtering outputs based on requested outputs (#7338)
#7348 opened
Jun 12, 2024 -
Add tests for infer_request.cc byte_size check
#7351 opened
Jun 13, 2024
5 Issues closed by 5 people
-
unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32
#7307 closed
Jun 14, 2024 -
Segmentation fault when multi-requsts to triton-vllm
#7332 closed
Jun 13, 2024 -
How does Triton implement one instance to handle multiple requests simultaneously?
#7295 closed
Jun 12, 2024 -
Configurable Parallel Model Loading (Python backend)
#7094 closed
Jun 11, 2024
9 Issues opened by 9 people
-
Add torch.set_float32_matmul_precision settting in Libtorch backend
#7352 opened
Jun 14, 2024 -
Triton/vllm_backend launches model on incorrect GPU
#7349 opened
Jun 13, 2024 -
Regression from 23.07 to 24.05 on model count lifecycle/restarts
#7347 opened
Jun 12, 2024 -
The trt llm container does not have the other backends
#7346 opened
Jun 12, 2024 -
tritonserver log problem
#7345 opened
Jun 12, 2024 -
Large latency when use `tritonclient.http.aio.infer`
#7343 opened
Jun 12, 2024 -
could you give some examples about ragged input config for tensorrt backend
#7339 opened
Jun 11, 2024 -
Triton server crash when running a large model with an ONNX/CPU backend
#7337 opened
Jun 10, 2024 -
Triton Tensorrt-LLM 24.04 and 24.05 are very large
#7335 opened
Jun 8, 2024
23 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
TensorRT model low throughput (with cuda shmem or system shmem)
#6978 commented on
Jun 13, 2024 • 4 new comments -
Does ensemble model release CUDA cache?
#5237 commented on
Jun 12, 2024 • 4 new comments -
Uneven QPS leads to low throughput and high latency as well as low GPU utilization
#7318 commented on
Jun 12, 2024 • 4 new comments -
Tritonserver Physical RAM Grow Overtime
#6781 commented on
Jun 13, 2024 • 3 new comments -
Segmentation fault (core dumped) - Server version 2.46.0
#7330 commented on
Jun 11, 2024 • 3 new comments -
Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs
#7333 commented on
Jun 11, 2024 • 3 new comments -
ci: Add INT64 Datatype Support for Shape Tensors in TensorRT Backend
#7329 commented on
Jun 14, 2024 • 2 new comments -
Triton Server 24.05 can't initialize CUDA drivers if host system has installed Nvidia driver 555.85
#7319 commented on
Jun 12, 2024 • 2 new comments -
triton malloc fail
#7308 commented on
Jun 12, 2024 • 2 new comments -
Single docker layer is too large
#7314 commented on
Jun 12, 2024 • 2 new comments -
Improve the L0_io to test for peer access
#3893 commented on
Jun 10, 2024 • 1 new comment -
How to send binary data (audio file) in perf_analyzer?
#6701 commented on
Jun 14, 2024 • 1 new comment -
Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API
#7222 commented on
Jun 13, 2024 • 1 new comment -
CUDA runtime API error raised when using only cpu on Mac M3
#7324 commented on
Jun 11, 2024 • 1 new comment -
Unfixed bugs:issue/5783, Inaccurate request handling when configuring queue policy
#6796 commented on
Jun 11, 2024 • 1 new comment -
Triton Server Crash with Signal (11)
#6720 commented on
Jun 11, 2024 • 1 new comment -
Why is my model in ensemble receiving out-of-order input
#7303 commented on
Jun 11, 2024 • 1 new comment -
Add TT-Metalium as a backend
#7305 commented on
Jun 11, 2024 • 1 new comment -
Low QPS with momentary traffic surges cause significant increases in inference TP99 latency.
#7313 commented on
Jun 11, 2024 • 1 new comment -
Memory over 100% with decoupled dali video model
#7315 commented on
Jun 11, 2024 • 1 new comment -
When the request is large, the Triton server has a very high TTFT.
#7316 commented on
Jun 11, 2024 • 1 new comment -
Building and developing with libtritonserver.so
#7320 commented on
Jun 11, 2024 • 1 new comment -
fix: Fix version for setuptools
#7331 commented on
Jun 13, 2024 • 0 new comments