Microsoft Office 365
Summary
The Microsoft Office 365 integration enables extraction and analysis of Office documents stored in SharePoint using Microsoft Graph API. Extract data from Excel workbooks, Word documents, and PowerPoint presentations with support for multiple export formats, intelligent chunking for RAG applications, and comprehensive content extraction.
Supported File Types: Excel (.xlsx, .xls, .csv), Word (.docx), PowerPoint (.pptx, .ppt)
Connection Methods: Link (Microsoft 365 cloud documents only)
Note: This integration requires OAuth2 authentication with Microsoft Entra (Azure AD). An Azure AD app registration configured for PKCE and an App Integration registered in the Istari Digital Platform are required before use. See the Installation section for setup instructions.
How and Where to Use
You can use the Microsoft Office 365 integration through the Istari Digital Platform UI. This integration requires authentication and works exclusively with cloud-hosted Office documents stored in SharePoint.
What You Can Do
- Extract Excel Data: Export workbooks to PDF/HTML, extract worksheet data as JSON, export charts as images, generate CSV exports, and retrieve named cells
- Extract Word Content: Convert documents to PDF/HTML, extract paragraphs, tables, images, and sections, generate markdown representations, and create smart/semantic chunks for RAG
- Extract PowerPoint Content: Export presentations to PDF, extract slides with text/shapes/layouts, retrieve speaker notes, extract images and tables, and generate markdown representations with semantic chunking
- Access Cloud Documents: Connect directly to SharePoint documents via metadata links without downloading files locally
Prerequisites
Before using this integration, ensure:
- Microsoft 365 subscription with access to SharePoint
- Azure AD app registration configured for PKCE with required permissions (see Installation)
- App Integration and Auth Integration registered in Istari Digital Platform (see Installation)
- Access to the Istari Digital Platform UI
- Office documents stored in SharePoint (not local files)
API
Functions
| Function | Description | Inputs | Outputs |
|---|---|---|---|
@istari:extract | Extracts data from Office 365 documents (Excel, Word, PowerPoint) | Office document metadata file, OAuth2 auth token | Files, directories, JSON exports (varies by document type) |
Output Examples
Excel Outputs
| Artifact Name | Type | Description |
|---|---|---|
workbook | File | Original Excel workbook (.xlsx) |
pdf_workbook | File | PDF export of the workbook |
zipped_html_workbook | File | HTML export (zipped) |
chart_images | Directory | Individual chart images (.png) |
worksheets | Directory | Worksheet data as JSON files |
csv_exports | Directory | CSV exports of worksheets |
csv_export_summary | File | JSON summary of CSV exports |
named_cells | File | Named cells data (JSON) |
chart_data | File | Chart metadata and data (JSON) |
worksheet_data | File | Consolidated worksheet data (JSON) |
Word Outputs
| Artifact Name | Type | Description |
|---|---|---|
document | File | Original Word document (.docx) |
pdf_document | File | PDF export |
zipped_html_document | File | HTML export (zipped) |
paragraphs | File | All paragraph text (JSON) |
tables | File | All table data (JSON) |
images | Directory | Extracted images |
sections | File | Document structure (JSON) |
markdown | File | Structured markdown representation |
smart_chunks | File | Overlapping text chunks for RAG (JSON) |
semantic_chunks | File | Page-based semantic segments (JSON) |
PowerPoint Outputs
| Artifact Name | Type | Description |
|---|---|---|
presentation | File | Original presentation (.pptx) |
pdf_presentation | File | PDF export |
slides | File | All slide content including text, shapes, layout (JSON) |
notes | File | Speaker notes from all slides (JSON) |
images | Directory | Embedded images extracted from slides |
tables | File | All table data from slides (JSON) |
charts | File | Chart metadata and data references (JSON) |
presentation_md | File | Structured markdown representation |
smart_chunks | File | Overlapping text chunks for RAG (JSON) |
semantic_chunks | File | Content-based semantic segments (JSON) |
Note: For legacy
.pptformat files, only the original file and PDF export are available due to format limitations. Full extraction requires.pptxformat.
Usage
Link
Connect to Office 365 documents stored in SharePoint directly through the Platform UI. The Istari Digital Platform handles authentication and document access automatically.
Sample SharePoint Link Formats:
Excel workbook links typically look like:
https://yourcompany.sharepoint.com/:x:/s/SiteName/AbCdEfGhIjKlMnOpQrStUvWxYz1234567890?e=ABCD1234
Word document links typically look like:
https://yourcompany.sharepoint.com/:w:/s/SiteName/ZyXwVuTsRqPoNmLkJiHgFeDcBa9876543210?e=WXYZ5678
PowerPoint presentation links typically look like:
https://yourcompany.sharepoint.com/:p:/s/SiteName/MnOpQrStUvWxYzAbCdEfGhIjKl1357924680?e=EFGH9012
Note: The sample links above are examples only and cannot be used. Each SharePoint document has a unique link. See How do I get the SharePoint link for my document? in the FAQ section for instructions on obtaining your document's link.
Using the Istari Digital Platform UI
Follow these steps to extract data from Office 365 documents:
-
Navigate to the Files page.
Click the Files option in the left-hand sidebar. -
Click Connect.
Click the Connect button in the top right corner of the Files page. -
Select Microsoft 365 integration.
You will be prompted to select an integration. Select Microsoft 365 integration. -
Fill in the connection details.
Fill in the mandatory fields:- Link: Enter the SharePoint link to your Office document
- See How do I get the SharePoint link for my document? in the FAQ section for detailed instructions on obtaining the link.
- Name: Enter a descriptive name for this connection
- Resource type: Select the document type (
docx,xlsx, orpptx)
- Link: Enter the SharePoint link to your Office document
-
Connect the file.
Click Connect. The new link connection will appear in your Files list. -
Open the connected file.
Click on the connected file in the Files list to open it in the model viewer. -
Go to the Artifacts tab.
Click the Artifacts tab in the model viewer. -
Fill out the function execution form.
- Tool Name:
microsoft_office_365 - Tool Version:
1.0.0 - Operating System: Select your agent's operating system (Windows 10/11, Ubuntu 22.04, or RHEL 8)
- Function:
@istari:extract - Agent: Select the agent where the module is installed
- Auth: Select your Microsoft Entra OAuth2 auth integration
- Tool Name:
-
Run the function.
Click the Run button to start the extraction job. -
Authenticate with Microsoft.
A login popup will appear prompting you to authenticate with Microsoft Entra. Enter your Microsoft 365 username and password to authorize the integration to access your Office documents. Once authenticated, the extraction job will begin. -
Monitor job progress.
The job status will appear in the Jobs list. Click on the job to view detailed progress and logs. -
View results.
Once the job completes, return to the model file's Artifacts tab to view all extracted outputs. -
Download artifacts.
Click on any artifact to download or view it. Excel outputs include workbooks, PDF exports, JSON data, chart images, and CSV exports. Word and PowerPoint outputs include documents, PDFs, structured JSON data, and RAG-ready chunks.
Installation
Prerequisites
- Istari Digital Agent version
>9.0.0 - Supported operating system:
- Windows 10, Windows 11, Windows Server 2019, Windows Server 2022
- Ubuntu 22.04
- RHEL 8
- Microsoft 365 subscription with SharePoint access
- Azure AD administrator access (for app registration)
Configuration
Module Version 1.2.0+: Zero Configuration Required! ✓
Starting with module version 1.2.0, the Microsoft Office 365 integration requires no manual configuration after installation. The module uses OAuth2 authentication passed via the Platform UI, and all connection settings are managed automatically.
Azure AD App Registration
Before using this integration, an Azure AD app registration must be created and configured for PKCE (Proof Key for Code Exchange). This is a one-time setup performed by your organization's Azure AD administrator.
Step 1: Register the Application
- Navigate to Azure Portal → Azure Active Directory → App registrations
- Click New registration
- Enter an application name (e.g., "Istari Digital Office 365 Integration")
- Select Accounts in this organizational directory only
- Click Register
- Note the Application (client) ID and Directory (tenant) ID - you'll need these for app integration setup
Step 2: Configure Redirect URI (for PKCE)
- In your app registration, go to Authentication
- Click Add a platform → Web
- Add redirect URI:
http://localhost:8080 - Ensure Allow public client flows is ENABLED (required for PKCE)
- Click Configure
Note: For PKCE-enabled apps, the redirect URI is used during the authorization flow but the actual redirect is handled by the Istari Digital Platform.
Step 3: Configure API Permissions
-
In your app registration, go to API permissions
-
Click Add a permission → Microsoft Graph → Delegated permissions
-
Add the following permissions:
Files.ReadWrite.All(orFiles.Read.Allfor read-only access)Sites.ReadWrite.All(orSites.Read.Allfor read-only access)User.Readoffline_accessopenid
-
Click Add permissions
Step 4: Grant Admin Consent
- Click Grant admin consent for [your organization]
- Confirm that admin consent has been granted (status shows green checkmarks)
Important: Admin consent is required for SharePoint site access. Without admin consent, users will not be able to access SharePoint documents.
Register App Integration in Istari Digital Platform
Before users can authenticate, you must register both the App Integration and Auth Integration in the Istari Digital Platform. This is a prerequisite for the authentication workflow.
Step 1: Create the App Integration
- Navigate to the Istari Digital Platform Admin page
- Go to App Integrations
- Click Add App in the upper right corner
- Fill in the app integration form:
- App: Select Microsoft 365 from the dropdown
- Description: (Optional) Enter a description for your integration (e.g., "ACME Inc. Microsoft 365 Integration")
- Click Add. Your Microsoft 365 app integration will appear in the list
Step 2: Add the Auth Integration
-
In the App Integrations page, find your Microsoft 365 integration
-
In the Auth Providers section, click the + (plus) button to add auth information
-
Select Microsoft Entra as the auth provider
-
Fill in the auth registration information:
-
Client ID: Enter your Azure AD Application (client) ID
-
Authorization Issuer:
https://login.microsoftonline.com/{tenant_id}/v2.0- Replace
{tenant_id}with your Directory (tenant) ID from Azure AD
- Replace
-
Scope: Enter a space-separated list of scopes. Use the permission names without the full URL prefix:
- For read-only access:
Files.Read.All Sites.Read.All User.Read offline_access openid - For read-write access (recommended):
Files.ReadWrite.All Sites.ReadWrite.All User.Read offline_access openid
Example (read-write):
Files.ReadWrite.All Sites.ReadWrite.All User.Read offline_access openid - For read-only access:
-
PKCE Enabled: Toggle ON (required for Microsoft Entra integration)
-
-
Click Add or Save
Your Microsoft Entra auth integration is now configured and ready to use with the Office 365 integration. Users can now authenticate when connecting Office 365 documents through the Platform UI.
Note: If you need to update the auth integration later, you can edit it from the App Integrations page. See the Third Party App Integrations documentation for more details.
Configuration Parameters
No configuration parameters are required. Authentication is handled through the Platform UI using the registered Microsoft Entra auth integration.
Versions
Current Module Version: 1.2.0
Compatibility Notes
- Requires Istari Digital Agent version
>9.0.0 - Supports Windows 10/11, Windows Server 2019/2022, Ubuntu 22.04, and RHEL 8
- Compatible with Microsoft Graph API v1.0
- Requires Microsoft 365 subscription with SharePoint access
Changelog
Module Version 1.2.0
- Initial public release
- Support for Excel, Word, and PowerPoint extraction
- OAuth2 authentication with Microsoft Entra
- Multiple export formats (PDF, HTML)
- RAG-ready chunking for Word and PowerPoint documents
- Comprehensive content extraction (charts, tables, images, named cells)
Release Notes
Key Changes Between Versions
This is the initial public release of the Microsoft Office 365 integration module.
Troubleshooting
Common Issues
Issue: Authentication fails with "admin consent needed"
- Symptom: Job fails with authentication error or admin consent prompt
- Cause: Azure AD app registration doesn't have admin consent granted for required permissions
- Solution:
- Have your Azure AD administrator navigate to the app registration → API permissions
- Click Grant admin consent for [your organization]
- Verify all permissions show green checkmarks
- Retry the extraction job
Issue: "Insufficient permissions" error
- Symptom: Job fails with "Insufficient permissions" or "Forbidden" error
- Cause: The authenticated user doesn't have access to the SharePoint site or file
- Solution:
- Verify the user has access to the SharePoint site
- Check that the file link you entered when connecting the file is correct and accessible
- Ensure the Azure AD app has the correct permissions (
Files.ReadWrite.AllandSites.ReadWrite.All) - Verify admin consent has been granted for the app registration
Issue: File not found error
- Symptom: Job fails with "workbook not found" or "document not found" error
- Cause: The SharePoint link you entered when connecting the file is incorrect or the file has been moved/deleted
- Solution:
- Verify the link you entered when connecting the file is correct and accessible in a browser
- Ensure the file hasn't been moved or deleted
- Check that the link format matches the expected pattern for the file type
- For SharePoint links, ensure you're using the correct sharing link format
- If the file was moved, disconnect and reconnect the file with the updated link
Issue: Legacy PowerPoint format (.ppt) limitations
- Symptom: Only original file and PDF export available for .ppt files
- Cause: Legacy .ppt format doesn't support full content extraction via Microsoft Graph API
- Solution:
- Convert the presentation to .pptx format in Microsoft PowerPoint
- Upload the converted .pptx file to SharePoint
- Disconnect the old connection and reconnect the file with the new .pptx link
Getting Help
For additional support:
- Check the Istari Digital Platform documentation
- Contact your organization's Azure AD administrator for authentication setup
- Reach out to Istari Digital support at support@istaridigital.com
Tips and Best Practices
- Use .pptx format: Convert legacy .ppt files to .pptx for full extraction capabilities
- Verify permissions: Ensure users have appropriate SharePoint permissions before running extractions
- Check file links: Always verify SharePoint links are accessible in a browser before connecting files
- Admin consent: Work with your Azure AD administrator to grant admin consent during initial setup
- RAG optimization: Use
smart_chunksandsemantic_chunksoutputs for optimal RAG performance with Word and PowerPoint documents - Excel exports: Use CSV exports for data analysis workflows, and chart images for documentation purposes
FAQ
-
Why does this integration require admin consent?
- The
Files.ReadWrite.AllandSites.ReadWrite.Allpermissions are admin-consent-required permissions needed to access SharePoint sites and files across your organization. This is a Microsoft security requirement.
- The
-
Can I use this integration with local files?
- No, this integration is designed specifically for cloud-hosted Office documents in SharePoint. For local files, use the standard Excel, Word, or PowerPoint integrations.
-
What's the difference between read-only and read-write permissions?
- Read-only permissions (
Files.Read.All,Sites.Read.All) allow extraction but prevent any modifications. Read-write permissions are required if you plan to use update functions in future releases.
- Read-only permissions (
-
How do I get the SharePoint link for my document?
You can get the SharePoint link using either method:
Method 1: From SharePoint
- Navigate to your SharePoint site
- Right-click on the file
- Click Copy link
- Select Anyone with the link or People in [your organization]
- Copy the link and paste it into the Link field when connecting the file through the Platform UI
Method 2: From Office App (Word/Excel/PowerPoint)
- Open the document in the Office app (Word, Excel, or PowerPoint)
- Click the Share button in the top right corner
- Click Copy link or Get a link
- Select the appropriate sharing permissions
- Copy the link and paste it into the Link field when connecting the file through the Platform UI
Sample Link Formats:
Excel workbook links typically look like:
https://yourcompany.sharepoint.com/:x:/s/SiteName/AbCdEfGhIjKlMnOpQrStUvWxYz1234567890?e=ABCD1234Word document links typically look like:
https://yourcompany.sharepoint.com/:w:/s/SiteName/ZyXwVuTsRqPoNmLkJiHgFeDcBa9876543210?e=WXYZ5678PowerPoint presentation links typically look like:
https://yourcompany.sharepoint.com/:p:/s/SiteName/MnOpQrStUvWxYzAbCdEfGhIjKl1357924680?e=EFGH9012Note: The sample links above are examples only and cannot be used. Each SharePoint document has a unique link that you must obtain from your SharePoint site or Office app.
-
Can I extract data from password-protected files?
- Password-protected Office files are not supported. Files must be accessible without password protection in SharePoint.
-
What happens if I use a read-only permission but the app needs write access?
- The extraction will fail with an "Insufficient permissions" error. Ensure your Azure AD app has
Files.ReadWrite.AllandSites.ReadWrite.Allpermissions if you encounter this issue.
- The extraction will fail with an "Insufficient permissions" error. Ensure your Azure AD app has