Skip to content

[BUG] Exllamav2 quickly devolves into endless repetition in versions newer than 2.8.0 #793

Open
@ZhenyaPav

Description

@ZhenyaPav

OS

Linux

GPU Library

AMD ROCm

Python version

3.12

Pytorch version

2.8.0

Model

dakkidaze/Cydonia-22B-v1.3-4.5bpw-h6-exl2

Describe the bug

Exllamav2 versions newer than 2.8.0 seem to very quickly start generating nonsense.

Example I got (both with the model I wrote above, exactly the same request):
Old version (0.2.8):

Once upon a time, in a lush green meadow, there lived a curious little rabbit named Benny. Benny loved exploring the meadow, hopping from one patch of clover to another, nibbling on the sweet leaves. One sunny morning, as Benny was enjoying his breakfast, he noticed a sleek black cat lounging under a nearby tree. (it goes on to write several paragraphs of coherent text)

New version (0.3.1):

Once upon a time, a cat named Tom and a rabbit named named named named (continues to write the same word until the length limit is hit)

Regarding models: This bug seems to work differently depending on the model (and I'm not sure whether it impacts all or only Mistral 22B) Mistral 22B-based finetunes (I tested several) start to repeat the same word, while Llama 3 8B seems to return a generally coherent text most of the time (with appropriate instruct template). Rocinante 12B is also unaffected

Reproduction steps

Clone TabbyAPI into 2 folders with different names.

Inside both, create a venv and clone exllamav2.

Install python via
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
(I tested on a system with rocm 6.4 installed, but torch built for 6.3 seems to be compatible with 6.4, and if my memory serves me well, this issue was also present on 6.3 and 6.2.4 at least)

In repo 1 (old):
For TabbyAPI, checkout commit 3960612d38b231017cd72e5fd19db855fe3bd371
in exllama, checkout v2.8.0
build both via pip install . (exllamav2 then tabby)

In repo 2 (new):
Simply build both via pip install . (exllamav2 then tabby)

Launch the old one, generate a response with some deterministic preset (I used SillyTavern), close.
Launch the new one, generate a response with the same prompt and preset.

I encountered the bug several times when trying to move to newer versions to exllama, 2.8.0 is the latest version I'm confident works fine.

Expected behavior

In both cases, a coherent result should be generated

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions