Skip to content

correct attention_head_dim for JointTransformerBlock #8608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/diffusers/models/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,9 +128,9 @@ def __init__(self, dim, num_attention_heads, attention_head_dim, context_pre_onl
query_dim=dim,
cross_attention_dim=None,
added_kv_proj_dim=dim,
dim_head=attention_head_dim // num_attention_heads,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't break? Wouldn't the value of dim_head be computed differently?

Copy link
Collaborator Author

@yiyixuxu yiyixuxu Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well no

currently dim_head=attention_head_dim // num_attention_heads with attention_head_dim and num_attention_heads passed from SD3ControlNetModel like this
* attention_head_dim=self.inner_dim

attention_head_dim=self.inner_dim,

* self.inner_dim = num_attention_heads * attention_head_dim
self.inner_dim = self.config.num_attention_heads * self.config.attention_head_dim

* -> so basically attention_head_dim is num_attention_heads * attention_head_dim
* num_attention_heads is num_attention_heads
* -> so dim_heads here are just attention_head_dim we used to configure the model, and if we pass it down correctly, we can use it directly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. Thanks for the explaining! 🙏🏽

dim_head=attention_head_dim,
heads=num_attention_heads,
out_dim=attention_head_dim,
out_dim=dim,
context_pre_only=context_pre_only,
bias=True,
processor=processor,
Expand Down
2 changes: 1 addition & 1 deletion src/diffusers/models/controlnet_sd3.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def __init__(
JointTransformerBlock(
dim=self.inner_dim,
num_attention_heads=num_attention_heads,
attention_head_dim=self.inner_dim,
attention_head_dim=attention_head_dim,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be self.config.attention_head_dim to match transformer_sd3.py?

context_pre_only=False,
)
for i in range(num_layers)
Expand Down
2 changes: 1 addition & 1 deletion src/diffusers/models/transformers/transformer_sd3.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def __init__(
JointTransformerBlock(
dim=self.inner_dim,
num_attention_heads=self.config.num_attention_heads,
attention_head_dim=self.inner_dim,
attention_head_dim=self.config.attention_head_dim,
context_pre_only=i == num_layers - 1,
)
for i in range(self.config.num_layers)
Expand Down
Loading