Open PDF

Note: Users must procure and maintain the applicable open source tools to integrate this DE tool with the Istari Digital platform. Please contact your local IT administrator for assistance.

Supported Functions:

extract

Getting Started

The Open PDF integration allows users to extract data from .pdf files.

Methods to Link to Istari Digital Platform

Upload: Yes

Link: No

Files Supported

The istari Digital Platform can extract from the following file types: .pdf

Example Files

Download Example Document: example_document.pdf

Setup for Administrators

Ensure that Istari Digital Agent and appropriate Istari Digital Software is installed on the machine.

Version Compatibility

This software is intended to run in a Windows environment. It was tested on a Windows 11 machine.

Function Coverage and Outputs

The Open PDF software can produce a number of artifacts extracted from the Open PDF document. The table below describes each output artifact and its type.

Route	Coverage	Artifact Content Example
Extract all text - TXT	Yes
Extract text sections - JSON	Yes
Extract JSON sections - JSON	Yes
Extract document metadata - JSON	Yes
Extract seperate pages - PNG	Yes
Extract embedded images - PNG/JPEG	Yes
Extract seperate pages - PDF	Yes
Extract document - HTML	Yes

Detailed SDK Reference

Prerequisite: Install Istari Digital SDK and initialize Istari Digital Client per instructions here

Step 1: Upload and Extract the File(s)

Upload the file as a model

model = client.add_model(
    path="example_document.pdf",
    description="Open PDF example Model",
    display_name="Open PDF Model Name",
)
print(f"Uploaded base model with ID {model.id}")

Extract once you have the model ID

extraction_job = client.add_job(
    model_id  = model.id,
    function  = "@istari:extract",
    tool_name = "open_pdf",
    tool_version = "1.0.0",
    operating_system = "Windows Server 2019",
)
print(f"Extraction started for model ID {model.id}, job ID: {extraction_job.id}")

Please choose appropriate tool_name, tool_version, and operating_system for your installation of this software.
Above is an example of how to call the function

Step 2: Check the Job Status

extraction_job.poll_job()

Step 3: Retrieve Results

Example

for artifact in model.artifacts:
    output_file_path = f"c:\\extracts\\{artifact.name}"

    if artifact.extension in ["txt", "csv", "md", "json", "html"]:
        with open(output_file_path, "w") as f:
            f.write(artifact.read_text())
    else:
        with open(output_file_path, "wb") as f:
            f.write(artifact.read_bytes())

Troubleshooting

For general Agent and Software Troubleshooting Click Here

Supported Functions:​

Getting Started​

Methods to Link to Istari Digital Platform​

Upload: Yes​

Link: No​

Files Supported​

Example Files​

Setup for Administrators​

Version Compatibility​

Function Coverage and Outputs​

Detailed SDK Reference​

Prerequisite: Install Istari Digital SDK and initialize Istari Digital Client per instructions here​

Step 1: Upload and Extract the File(s)​

Upload the file as a model​

Extract once you have the model ID​

Step 2: Check the Job Status​

Step 3: Retrieve Results​

Example​

Troubleshooting​

FAQ​