Open
Description
If I want to obtain the probabilities of 8 emotions:
"<|HAPPY|>",
"<|SAD|>",
"<|ANGRY|>",
"<|NEUTRAL|>",
"<|FEARFUL|>",
"<|DISGUSTED|>",
"<|SURPRISED|>",
"<|OTHER|>",
should I take the logits at the corresponding token ID positions from the second frame?
for i in range(b):
x = ctc_logits[i, : encoder_out_lens[i].item(), :]
yseq = x.argmax(dim=-1)
emotion_logits=x[1,:]
Can I do this?