Error must install unstructured_pytesseract when using paddleocr

Version:
unstructured: 0.17.2
unstructured-client: 0.36.0
unstructured-inference: 1.0.2
unstructured_paddleocr: 2.10.0
paddlepaddle: 3.0.0

Set env
`os.environ["OCR_AGENT"] = "unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle"`

I'm using paddle as my OCR model, but when I run this code

`raw_pdf = partition_pdf(
    filename=filepath,
    strategy="hi_res",
    infer_table_structure=True, 
    extract_images_in_pdf=True, 
    # extract_image_block_types=["Image", "Table"],
    # extract_image_block_output_dir=path,
    chunking_strategy="by_title", 
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
)`

it shows error:
`ModuleNotFoundError: No module named 'unstructured_pytesseract' `

Why do I have to install unstructured_pytesseract when I already have unstructured_paddleocr?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error must install unstructured_pytesseract when using paddleocr #4007

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error must install unstructured_pytesseract when using paddleocr #4007

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions