Skip to content

[TRTLLM-6019] feat: Remove cutlass min latency code from AutoTuner. #5394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hyukn
Copy link
Collaborator

@hyukn hyukn commented Jun 21, 2025

Remove the tuning process for the cutlass fused moe path under min_latency_mode.

@hyukn hyukn requested a review from hlu1 June 21, 2025 13:26
@hyukn hyukn requested a review from a team as a code owner June 21, 2025 13:26
@hyukn hyukn requested a review from liji-nv June 21, 2025 13:26
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 21, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9573 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9573 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7031 completed with status: 'FAILURE'

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
@hyukn hyukn force-pushed the feat/remove_min_latency_mode_autotune branch from e48de61 to e4925e8 Compare June 22, 2025 00:46
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 22, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9578 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9578 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7034 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@@ -44,6 +41,7 @@ def __init__(
enable_alltoall: bool,
use_deepseek_fp8_block_scale: bool,
use_w4a8_group_scaling: bool,
min_latency_mode: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to pass the argument here? Should self.min_latency_mode just be false?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants