flash_attn issue

Hello
flash_attn is support for cuda 12.1 and no support for cuda 12.4
If install by pip, it cause the error in inference
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found 

So how did you use the flash_attn in your case?