Open Spreadsheet
Note: Users must procure and maintain the applicable open source tools to integrate this DE tool with the Istari Digital platform. Please contact your local IT administrator for assistance.
Supported Functions:
Getting Started
The Open Spreadsheet integration allows users to extract data from .xlsx
files.
Methods to Link to Istari Digital Platform
Upload: Yes
Link: No
Files Supported
The istari Digital Platform can extract from the following file types:
.xlsx
Example Files
Setup for Administrators
Ensure that LibreOffice is installed on a Virtual Machine (VM) with Istari Digital Agent and appropriate Istari Digital Software. Verify that the installation is up to date with the latest updates from LibreOffice.
Version Compatibility
This software was tested with LibreOffice, and is intended to run in a Windows or Linux environment.
Function Coverage and Outputs
The Microsoft Office Excel software can produce a number of artifacts extracted from the Excel model. The table below describes each output artifact and its type.
Route | Coverage | Artifact Content Example |
---|---|---|
Extract Sheets - CSV | Yes | |
Named Cells - JSON | Yes | |
Worksheet Data - JSON | Yes | |
Extract workbook - PDF | Yes | |
Extract workbook - xlsx | Yes | |
Extract zipped_html_workbook - ZIP | Yes | |
Extract html_workbook - HTML | Yes |
Detailed SDK Reference
Prerequisite: Install Istari Digital SDK and initialize Istari Digital Client per instructions here
Step 1: Upload and Extract the File(s)
Upload the file as a model
model = client.add_model(
path="example.xlsx",
description="Excel example Model",
display_name="Excel Model Name",
)
print(f"Uploaded base model with ID {model.id}")
Extract once you have the model ID
extraction_job = client.add_job(
model_id = model.id,
function = "@istari:extract",
tool_name = "open_spreadsheet",
tool_version = "1.0.0",
operating_system = "Ubuntu 22.04",
)
print(f"Extraction started for model ID {model.id}, job ID: {extraction_job.id}")
Please choose appropriate tool_name, tool_version, and operating_system for your installation of this software.
Above is an example of how to call the function
Step 2: Check the Job Status
extraction_job.poll_job()
Step 3: Retrieve Results
Example
for artifact in model.artifacts:
output_file_path = f"c:\\extracts\\{artifact.name}"
if artifact.extension in ["txt", "csv", "md", "json", "html"]:
with open(output_file_path, "w") as f:
f.write(artifact.read_text())
else:
with open(output_file_path, "wb") as f:
f.write(artifact.read_bytes())
Troubleshooting
- For general Agent and Software Troubleshooting Click Here
- Missing Artifacts:
- 2.1 named_cells.json: Check source file, are there named cells in the file? If not, refer to the software's manual for defining appropriate requirements.
- 2.2 embedded images: The tool doesn't extract any embedded images.