Skip to main content
Version: 2025.11

Microsoft Office 365

Summary

The Microsoft Office 365 integration enables extraction and analysis of Office documents stored in SharePoint using Microsoft Graph API. Extract data from Excel workbooks, Word documents, and PowerPoint presentations with support for multiple export formats, intelligent chunking for RAG applications, and comprehensive content extraction.

Supported File Types: Excel (.xlsx, .xls, .csv), Word (.docx), PowerPoint (.pptx, .ppt)

Connection Methods: Link (Microsoft 365 cloud documents only)

Note: This integration requires OAuth2 authentication with Microsoft Entra (Azure AD). An Azure AD app registration configured for PKCE and an App Integration registered in the Istari Digital Platform are required before use. See the Installation section for setup instructions.

How and Where to Use

You can use the Microsoft Office 365 integration through the Istari Digital Platform UI. This integration requires authentication and works exclusively with cloud-hosted Office documents stored in SharePoint.

What You Can Do

  • Extract Excel Data: Export workbooks to PDF/HTML, extract worksheet data as JSON, export charts as images, generate CSV exports, and retrieve named cells
  • Extract Word Content: Convert documents to PDF/HTML, extract paragraphs, tables, images, and sections, generate markdown representations, and create smart/semantic chunks for RAG
  • Extract PowerPoint Content: Export presentations to PDF, extract slides with text/shapes/layouts, retrieve speaker notes, extract images and tables, and generate markdown representations with semantic chunking
  • Access Cloud Documents: Connect directly to SharePoint documents via metadata links without downloading files locally

Prerequisites

Before using this integration, ensure:

  • Microsoft 365 subscription with access to SharePoint
  • Azure AD app registration configured for PKCE with required permissions (see Installation)
  • App Integration and Auth Integration registered in Istari Digital Platform (see Installation)
  • Access to the Istari Digital Platform UI
  • Office documents stored in SharePoint (not local files)

API

Functions

FunctionDescriptionInputsOutputs
@istari:extractExtracts data from Office 365 documents (Excel, Word, PowerPoint)Office document metadata file, OAuth2 auth tokenFiles, directories, JSON exports (varies by document type)

Output Examples

Excel Outputs

Artifact NameTypeDescription
workbookFileOriginal Excel workbook (.xlsx)
pdf_workbookFilePDF export of the workbook
zipped_html_workbookFileHTML export (zipped)
chart_imagesDirectoryIndividual chart images (.png)
worksheetsDirectoryWorksheet data as JSON files
csv_exportsDirectoryCSV exports of worksheets
csv_export_summaryFileJSON summary of CSV exports
named_cellsFileNamed cells data (JSON)
chart_dataFileChart metadata and data (JSON)
worksheet_dataFileConsolidated worksheet data (JSON)

Word Outputs

Artifact NameTypeDescription
documentFileOriginal Word document (.docx)
pdf_documentFilePDF export
zipped_html_documentFileHTML export (zipped)
paragraphsFileAll paragraph text (JSON)
tablesFileAll table data (JSON)
imagesDirectoryExtracted images
sectionsFileDocument structure (JSON)
markdownFileStructured markdown representation
smart_chunksFileOverlapping text chunks for RAG (JSON)
semantic_chunksFilePage-based semantic segments (JSON)

PowerPoint Outputs

Artifact NameTypeDescription
presentationFileOriginal presentation (.pptx)
pdf_presentationFilePDF export
slidesFileAll slide content including text, shapes, layout (JSON)
notesFileSpeaker notes from all slides (JSON)
imagesDirectoryEmbedded images extracted from slides
tablesFileAll table data from slides (JSON)
chartsFileChart metadata and data references (JSON)
presentation_mdFileStructured markdown representation
smart_chunksFileOverlapping text chunks for RAG (JSON)
semantic_chunksFileContent-based semantic segments (JSON)

Note: For legacy .ppt format files, only the original file and PDF export are available due to format limitations. Full extraction requires .pptx format.

Usage

Connect to Office 365 documents stored in SharePoint directly through the Platform UI. The Istari Digital Platform handles authentication and document access automatically.

Sample SharePoint Link Formats:

Excel workbook links typically look like:

https://yourcompany.sharepoint.com/:x:/s/SiteName/AbCdEfGhIjKlMnOpQrStUvWxYz1234567890?e=ABCD1234

Word document links typically look like:

https://yourcompany.sharepoint.com/:w:/s/SiteName/ZyXwVuTsRqPoNmLkJiHgFeDcBa9876543210?e=WXYZ5678

PowerPoint presentation links typically look like:

https://yourcompany.sharepoint.com/:p:/s/SiteName/MnOpQrStUvWxYzAbCdEfGhIjKl1357924680?e=EFGH9012

Note: The sample links above are examples only and cannot be used. Each SharePoint document has a unique link. See How do I get the SharePoint link for my document? in the FAQ section for instructions on obtaining your document's link.

Using the Istari Digital Platform UI

Follow these steps to extract data from Office 365 documents:

  1. Navigate to the Files page.
    Click the Files option in the left-hand sidebar.

  2. Click Connect.
    Click the Connect button in the top right corner of the Files page.

  3. Select Microsoft 365 integration.
    You will be prompted to select an integration. Select Microsoft 365 integration.

  4. Fill in the connection details.
    Fill in the mandatory fields:

    • Link: Enter the SharePoint link to your Office document
    • Name: Enter a descriptive name for this connection
    • Resource type: Select the document type (docx, xlsx, or pptx)
  5. Connect the file.
    Click Connect. The new link connection will appear in your Files list.

  6. Open the connected file.
    Click on the connected file in the Files list to open it in the model viewer.

  7. Go to the Artifacts tab.
    Click the Artifacts tab in the model viewer.

  8. Fill out the function execution form.

    • Tool Name: microsoft_office_365
    • Tool Version: 1.0.0
    • Operating System: Select your agent's operating system (Windows 10/11, Ubuntu 22.04, or RHEL 8)
    • Function: @istari:extract
    • Agent: Select the agent where the module is installed
    • Auth: Select your Microsoft Entra OAuth2 auth integration
  9. Run the function.
    Click the Run button to start the extraction job.

  10. Authenticate with Microsoft.
    A login popup will appear prompting you to authenticate with Microsoft Entra. Enter your Microsoft 365 username and password to authorize the integration to access your Office documents. Once authenticated, the extraction job will begin.

  11. Monitor job progress.
    The job status will appear in the Jobs list. Click on the job to view detailed progress and logs.

  12. View results.
    Once the job completes, return to the model file's Artifacts tab to view all extracted outputs.

  13. Download artifacts.
    Click on any artifact to download or view it. Excel outputs include workbooks, PDF exports, JSON data, chart images, and CSV exports. Word and PowerPoint outputs include documents, PDFs, structured JSON data, and RAG-ready chunks.

Installation

Prerequisites

  • Istari Digital Agent version >9.0.0
  • Supported operating system:
    • Windows 10, Windows 11, Windows Server 2019, Windows Server 2022
    • Ubuntu 22.04
    • RHEL 8
  • Microsoft 365 subscription with SharePoint access
  • Azure AD administrator access (for app registration)

Configuration

Module Version 1.2.0+: Zero Configuration Required! ✓

Starting with module version 1.2.0, the Microsoft Office 365 integration requires no manual configuration after installation. The module uses OAuth2 authentication passed via the Platform UI, and all connection settings are managed automatically.

Azure AD App Registration

Before using this integration, an Azure AD app registration must be created and configured for PKCE (Proof Key for Code Exchange). This is a one-time setup performed by your organization's Azure AD administrator.

Step 1: Register the Application

  1. Navigate to Azure PortalAzure Active DirectoryApp registrations
  2. Click New registration
  3. Enter an application name (e.g., "Istari Digital Office 365 Integration")
  4. Select Accounts in this organizational directory only
  5. Click Register
  6. Note the Application (client) ID and Directory (tenant) ID - you'll need these for app integration setup

Step 2: Configure Redirect URI (for PKCE)

  1. In your app registration, go to Authentication
  2. Click Add a platformWeb
  3. Add redirect URI: http://localhost:8080
  4. Ensure Allow public client flows is ENABLED (required for PKCE)
  5. Click Configure

Note: For PKCE-enabled apps, the redirect URI is used during the authorization flow but the actual redirect is handled by the Istari Digital Platform.

Step 3: Configure API Permissions

  1. In your app registration, go to API permissions

  2. Click Add a permissionMicrosoft GraphDelegated permissions

  3. Add the following permissions:

    • Files.ReadWrite.All (or Files.Read.All for read-only access)
    • Sites.ReadWrite.All (or Sites.Read.All for read-only access)
    • User.Read
    • offline_access
    • openid
  4. Click Add permissions

  1. Click Grant admin consent for [your organization]
  2. Confirm that admin consent has been granted (status shows green checkmarks)

Important: Admin consent is required for SharePoint site access. Without admin consent, users will not be able to access SharePoint documents.

Register App Integration in Istari Digital Platform

Before users can authenticate, you must register both the App Integration and Auth Integration in the Istari Digital Platform. This is a prerequisite for the authentication workflow.

Step 1: Create the App Integration

  1. Navigate to the Istari Digital Platform Admin page
  2. Go to App Integrations
  3. Click Add App in the upper right corner
  4. Fill in the app integration form:
    • App: Select Microsoft 365 from the dropdown
    • Description: (Optional) Enter a description for your integration (e.g., "ACME Inc. Microsoft 365 Integration")
  5. Click Add. Your Microsoft 365 app integration will appear in the list

Step 2: Add the Auth Integration

  1. In the App Integrations page, find your Microsoft 365 integration

  2. In the Auth Providers section, click the + (plus) button to add auth information

  3. Select Microsoft Entra as the auth provider

  4. Fill in the auth registration information:

    • Client ID: Enter your Azure AD Application (client) ID

    • Authorization Issuer: https://login.microsoftonline.com/{tenant_id}/v2.0

      • Replace {tenant_id} with your Directory (tenant) ID from Azure AD
    • Scope: Enter a space-separated list of scopes. Use the permission names without the full URL prefix:

      • For read-only access: Files.Read.All Sites.Read.All User.Read offline_access openid
      • For read-write access (recommended): Files.ReadWrite.All Sites.ReadWrite.All User.Read offline_access openid

      Example (read-write):

      Files.ReadWrite.All Sites.ReadWrite.All User.Read offline_access openid
    • PKCE Enabled: Toggle ON (required for Microsoft Entra integration)

  5. Click Add or Save

Your Microsoft Entra auth integration is now configured and ready to use with the Office 365 integration. Users can now authenticate when connecting Office 365 documents through the Platform UI.

Note: If you need to update the auth integration later, you can edit it from the App Integrations page. See the Third Party App Integrations documentation for more details.

Configuration Parameters

No configuration parameters are required. Authentication is handled through the Platform UI using the registered Microsoft Entra auth integration.

Versions

Current Module Version: 1.2.0

Compatibility Notes

  • Requires Istari Digital Agent version >9.0.0
  • Supports Windows 10/11, Windows Server 2019/2022, Ubuntu 22.04, and RHEL 8
  • Compatible with Microsoft Graph API v1.0
  • Requires Microsoft 365 subscription with SharePoint access

Changelog

Module Version 1.2.0

  • Initial public release
  • Support for Excel, Word, and PowerPoint extraction
  • OAuth2 authentication with Microsoft Entra
  • Multiple export formats (PDF, HTML)
  • RAG-ready chunking for Word and PowerPoint documents
  • Comprehensive content extraction (charts, tables, images, named cells)

Release Notes

Key Changes Between Versions

This is the initial public release of the Microsoft Office 365 integration module.

Troubleshooting

Common Issues

Issue: Authentication fails with "admin consent needed"

  • Symptom: Job fails with authentication error or admin consent prompt
  • Cause: Azure AD app registration doesn't have admin consent granted for required permissions
  • Solution:
    1. Have your Azure AD administrator navigate to the app registration → API permissions
    2. Click Grant admin consent for [your organization]
    3. Verify all permissions show green checkmarks
    4. Retry the extraction job

Issue: "Insufficient permissions" error

  • Symptom: Job fails with "Insufficient permissions" or "Forbidden" error
  • Cause: The authenticated user doesn't have access to the SharePoint site or file
  • Solution:
    1. Verify the user has access to the SharePoint site
    2. Check that the file link you entered when connecting the file is correct and accessible
    3. Ensure the Azure AD app has the correct permissions (Files.ReadWrite.All and Sites.ReadWrite.All)
    4. Verify admin consent has been granted for the app registration

Issue: File not found error

  • Symptom: Job fails with "workbook not found" or "document not found" error
  • Cause: The SharePoint link you entered when connecting the file is incorrect or the file has been moved/deleted
  • Solution:
    1. Verify the link you entered when connecting the file is correct and accessible in a browser
    2. Ensure the file hasn't been moved or deleted
    3. Check that the link format matches the expected pattern for the file type
    4. For SharePoint links, ensure you're using the correct sharing link format
    5. If the file was moved, disconnect and reconnect the file with the updated link

Issue: Legacy PowerPoint format (.ppt) limitations

  • Symptom: Only original file and PDF export available for .ppt files
  • Cause: Legacy .ppt format doesn't support full content extraction via Microsoft Graph API
  • Solution:
    1. Convert the presentation to .pptx format in Microsoft PowerPoint
    2. Upload the converted .pptx file to SharePoint
    3. Disconnect the old connection and reconnect the file with the new .pptx link

Getting Help

For additional support:

Tips and Best Practices

  • Use .pptx format: Convert legacy .ppt files to .pptx for full extraction capabilities
  • Verify permissions: Ensure users have appropriate SharePoint permissions before running extractions
  • Check file links: Always verify SharePoint links are accessible in a browser before connecting files
  • Admin consent: Work with your Azure AD administrator to grant admin consent during initial setup
  • RAG optimization: Use smart_chunks and semantic_chunks outputs for optimal RAG performance with Word and PowerPoint documents
  • Excel exports: Use CSV exports for data analysis workflows, and chart images for documentation purposes

FAQ

  • Why does this integration require admin consent?

    • The Files.ReadWrite.All and Sites.ReadWrite.All permissions are admin-consent-required permissions needed to access SharePoint sites and files across your organization. This is a Microsoft security requirement.
  • Can I use this integration with local files?

    • No, this integration is designed specifically for cloud-hosted Office documents in SharePoint. For local files, use the standard Excel, Word, or PowerPoint integrations.
  • What's the difference between read-only and read-write permissions?

    • Read-only permissions (Files.Read.All, Sites.Read.All) allow extraction but prevent any modifications. Read-write permissions are required if you plan to use update functions in future releases.
  • How do I get the SharePoint link for my document?

    You can get the SharePoint link using either method:

    Method 1: From SharePoint

    • Navigate to your SharePoint site
    • Right-click on the file
    • Click Copy link
    • Select Anyone with the link or People in [your organization]
    • Copy the link and paste it into the Link field when connecting the file through the Platform UI

    Method 2: From Office App (Word/Excel/PowerPoint)

    • Open the document in the Office app (Word, Excel, or PowerPoint)
    • Click the Share button in the top right corner
    • Click Copy link or Get a link
    • Select the appropriate sharing permissions
    • Copy the link and paste it into the Link field when connecting the file through the Platform UI

    Sample Link Formats:

    Excel workbook links typically look like:

    https://yourcompany.sharepoint.com/:x:/s/SiteName/AbCdEfGhIjKlMnOpQrStUvWxYz1234567890?e=ABCD1234

    Word document links typically look like:

    https://yourcompany.sharepoint.com/:w:/s/SiteName/ZyXwVuTsRqPoNmLkJiHgFeDcBa9876543210?e=WXYZ5678

    PowerPoint presentation links typically look like:

    https://yourcompany.sharepoint.com/:p:/s/SiteName/MnOpQrStUvWxYzAbCdEfGhIjKl1357924680?e=EFGH9012

    Note: The sample links above are examples only and cannot be used. Each SharePoint document has a unique link that you must obtain from your SharePoint site or Office app.

  • Can I extract data from password-protected files?

    • Password-protected Office files are not supported. Files must be accessible without password protection in SharePoint.
  • What happens if I use a read-only permission but the app needs write access?

    • The extraction will fail with an "Insufficient permissions" error. Ensure your Azure AD app has Files.ReadWrite.All and Sites.ReadWrite.All permissions if you encounter this issue.