Skip to content

normalizers.Replace able to support regex group captureĀ #1760

Open
@nrv

Description

@nrv

Is it possible to have the Replace function support the group capture in the replacement string ?

In the following dummy example, I want to add a space between letters l and e. It works with the re package but not with normalizers.

import re
from tokenizers import normalizers, Regex

pattern = r"(l)(e)"
replacement = r"\1 \2"

text = "le travail est totalement pƩnible"

text1 = normalizers.Replace(Regex(pattern), replacement).normalize_str(text)
text2 = re.sub(pattern, replacement, text)

print(f"{text  = }")
print(f"{text1 = }")
print(f"{text2 = }")

execution result :

text  = 'le travail est totalement pƩnible'
text1 = '\\1 \\2 travail est tota\\1 \\2ment pƩnib\\1 \\2'
text2 = 'l e travail est total ement pƩnibl e'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions