Skip to content

Docs: fp16 page #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Sep 8, 2022
Merged

Docs: fp16 page #404

merged 12 commits into from
Sep 8, 2022

Conversation

pcuenca
Copy link
Member

@pcuenca pcuenca commented Sep 7, 2022

Part of #293.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2022

The documentation is not available anymore as the PR was closed or merged.

@keturn
Copy link
Contributor

keturn commented Sep 7, 2022

LGTM. float16 made all the difference for me in terms of being able to run it on my hardware; it's good to have examples like this of its use.

A few other details to consider (but could also follow in a future update):

  • Is attention slicing useful only for batch processing, or does it still give benefits for a single prompt with a batch size of 1?
  • Do these options change the inference result, or is it a speed/memory tradeoff but arrives at the same place in the end?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - feel free to merge. Left some suggestions as comments :-)

pcuenca and others added 7 commits September 8, 2022 08:35
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Explained by @patrickvonplaten after a suggestion by @keturn.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@pcuenca
Copy link
Member Author

pcuenca commented Sep 8, 2022

Added a couple of tweaks on top of your suggestions. Thanks a lot @patrickvonplaten and @keturn, very useful observations!

@patrickvonplaten
Copy link
Contributor

Very nice!

@patrickvonplaten patrickvonplaten merged commit c29d81c into main Sep 8, 2022
@patrickvonplaten patrickvonplaten deleted the docs-optim-fp16 branch September 8, 2022 07:21
PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Initial version of `fp16` page.

* Fix typo in README.

* Change titles of fp16 section in toctree.

* PR suggestion

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* PR suggestion

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Clarify attention slicing is useful even for batches of 1

Explained by @patrickvonplaten after a suggestion by @keturn.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Do not talk about `batches` in `enable_attention_slicing`.

* Use Tip (just for fun), add link to method.

* Comment about fp16 results looking the same as float32 in practice.

* Style: docstring line wrapping.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants