Google Workspace
Summary
The Google Workspace integration enables extraction and analysis of Google Workspace documents stored in Google Drive using Google APIs. Extract data from Google Docs, Google Sheets, and Google Slides with support for multiple export formats, intelligent chunking for RAG applications, and comprehensive content extraction.
Supported File Types: Google Docs, Google Sheets, Google Slides
Connection Methods: Link (Google Drive cloud documents only)
Note: This integration requires OAuth2 authentication with Google Accounts. A Google Cloud project with required APIs enabled and an App Integration registered in the Istari Digital Platform are required before use. See the Installation section for setup instructions.
How and Where to Use
You can use the Google Workspace integration through the Istari Digital Platform UI. This integration requires authentication and works exclusively with cloud-hosted Google Workspace documents stored in Google Drive.
What You Can Do
- Extract Google Docs Data: Export documents to DOCX/PDF/HTML, extract paragraphs, tables, images, and sections, generate markdown representations, and create smart/semantic chunks for RAG
- Extract Google Sheets Content: Export spreadsheets to XLSX/PDF/HTML, extract cell data with smart header detection, generate CSV exports, retrieve named ranges and charts, and create RAG-ready chunks
- Extract Google Slides Content: Export presentations to PPTX/PDF/HTML, extract slides with text/shapes/layouts, retrieve speaker notes, extract images and tables, generate markdown representations, and create semantic chunking for presentations
- Access Cloud Documents: Connect directly to Google Drive documents via links without downloading files locally
Prerequisites
Before using this integration, ensure:
- Google Workspace or Google Account with access to Google Drive
- Google Cloud project configured with required APIs and OAuth credentials (see Installation)
- App Integration and Auth Integration registered in Istari Digital Platform (see Installation)
- Access to the Istari Digital Platform UI
- Google Workspace documents stored in Google Drive (not local files)
API
Functions
| Function | Description | Inputs | Outputs |
|---|---|---|---|
@istari:extract | Extracts data from Google Workspace documents (Docs, Sheets, Slides) | Google Drive document link, OAuth2 auth token | Files, directories, JSON exports (varies by document type) |
Output Examples
Google Docs Outputs
| Artifact Name | Type | Description |
|---|---|---|
original_document | File | DOCX export of the document |
pdf_document | File | PDF export of the document |
document_html | File | HTML export (zipped) |
paragraphs | File | All paragraph text and styles (JSON) |
tables | File | All table data (JSON) |
images | Directory | Extracted images |
sections | File | Document structure (JSON) |
document_md | File | Markdown conversion with formatting |
smart_chunks | File | Overlapping text chunks for RAG (JSON) |
semantic_chunks | File | Section-based semantic segments (JSON) |
metadata_report | File | Extraction statistics and status (JSON) |
Google Sheets Outputs
| Artifact Name | Type | Description |
|---|---|---|
original_spreadsheet | File | XLSX export of the spreadsheet |
pdf_document | File | PDF export of the spreadsheet |
zipped_html | File | HTML representation (zipped) |
csv_files | Directory | CSV files for each sheet |
sheets_data | File | Cell data with smart header detection (JSON) |
named_ranges | File | Named ranges data (JSON) |
charts | Directory | Chart metadata |
smart_chunks | File | Row-based chunks for RAG (JSON) |
metadata_report | File | Extraction statistics and status (JSON) |
Google Slides Outputs
| Artifact Name | Type | Description |
|---|---|---|
original_presentation | File | PPTX export of the presentation |
pdf_document | File | PDF export |
presentation_html | File | Rich HTML export with formatting (zipped) |
slides_content | File | Slide text, shapes, and layout data (JSON) |
speaker_notes | File | Speaker notes for each slide (JSON) |
images | Directory | Embedded images extracted from slides |
tables | File | Table data from slides (JSON) |
charts | File | Chart metadata from slides (JSON) |
presentation_md | File | Markdown conversion with formatting |
slide_images | Directory | Thumbnails for each slide (rendered as images) |
smart_chunks | File | Overlapping text chunks for RAG (JSON) |
semantic_chunks | File | Slide-based semantic segments (JSON) |
metadata_report | File | Extraction statistics and status (JSON) |
Usage
Link
Connect to Google Workspace documents stored in Google Drive directly through the Platform UI. The Istari Digital Platform handles authentication and document access automatically.
Sample Google Drive Link Formats:
Google Docs links typically look like:
https://docs.google.com/document/d/DOCUMENT_ID/edit
Google Sheets links typically look like:
https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit
Google Slides links typically look like:
https://docs.google.com/presentation/d/PRESENTATION_ID/edit
Note: The sample links above are examples only and cannot be used. Each Google Drive document has a unique link. See How do I get the Google Drive link for my document? in the FAQ section for instructions on obtaining your document's link.
Using the Istari Digital Platform UI
Follow these steps to extract data from Google Workspace documents:
-
Navigate to the Files page.
Click the Files option in the left-hand sidebar. -
Click Connect.
Click the Connect button in the top right corner of the Files page. -
Select Google Drive integration.
You will be prompted to select an integration. Select Google Drive integration. -
Fill in the connection details.
Fill in the mandatory fields:- Link: Enter the Google Drive link to your Google Workspace document
- See How do I get the Google Drive link for my document? in the FAQ section for detailed instructions on obtaining the link.
- Name: Enter a descriptive name for this connection
- Resource type: Select the document type (
document,spreadsheets, orpresentation)
- Link: Enter the Google Drive link to your Google Workspace document
-
Connect the file.
Click Connect. The new link connection will appear in your Files list. -
Open the connected file.
Click on the connected file in the Files list to open it in the model viewer. -
Go to the Artifacts tab.
Click the Artifacts tab in the model viewer. -
Fill out the function execution form.
- Tool Name:
google_workspace - Tool Version: Select the appropriate version based on your document type:
- For
document(Google Docs):v1orv3 - For
spreadsheets(Google Sheets):v4 - For
presentation(Google Slides):v1
- For
- Operating System: Select your agent's operating system (Windows 10/11, Ubuntu 22.04, or RHEL 8)
- Function:
@istari:extract - Agent: Select the agent where the module is installed
- Auth: Select your Google Accounts OAuth2 auth integration
- Tool Name:
-
Run the function.
Click the Run button to start the extraction job. -
Authenticate with Google.
A login popup will appear prompting you to authenticate with Google Accounts. Enter your Google account email and password to authorize the integration to access your Google Workspace documents. Once authenticated, the extraction job will begin. -
Monitor job progress.
The job status will appear in the Jobs list. Click on the job to view detailed progress and logs. -
View results.
Once the job completes, return to the model file's Artifacts tab to view all extracted outputs. -
Download artifacts.
Click on any artifact to download or view it. Google Docs outputs include documents, PDF exports, JSON data, images, and RAG-ready chunks. Google Sheets outputs include spreadsheets, PDF exports, CSV files, and structured JSON data. Google Slides outputs include presentations, PDFs, slide images, and RAG-ready chunks.
Installation
Prerequisites
- Istari Digital Agent version
>9.0.0 - Supported operating system:
- Windows 10, Windows 11, Windows Server 2019, Windows Server 2022
- Ubuntu 22.04
- RHEL 8
- Google Workspace or Google Account with Google Drive access
- Google Cloud Console access (for project setup)
Configuration
Module Version 1.2.0+: Zero Configuration Required! ✓
Starting with module version 1.2.0, the Google Workspace integration requires no manual configuration after installation. The module uses OAuth2 authentication passed via the Platform UI, and all connection settings are managed automatically.
Google Cloud Project Setup
Before using this integration, a Google Cloud project must be created and configured with OAuth2 credentials. This is a one-time setup performed by your organization's Google Cloud administrator.
Step 1: Create Google Cloud Project
- Navigate to Google Cloud Console
- Click on the project dropdown at the top of the page
- Click New Project
- Enter project details:
- Project Name: Enter a name (e.g., "Istari Google Workspace Integration")
- Organization: Select your organization (if applicable)
- Click Create
- Wait for the project to be created
Step 2: Enable Required APIs
- Ensure your new project is selected (check the project dropdown at the top)
- Navigate to APIs & Services > Library
- Enable the following APIs (search for each and click Enable):
- Google Docs API (v1)
- Google Sheets API (v4)
- Google Slides API (v1)
- Google Drive API (v3)
Note: Google Drive API is required because Google Workspace documents are stored in Drive, and Drive access is needed to read and export documents.
Step 3: Configure OAuth Consent Screen
- Navigate to APIs & Services > OAuth consent screen
- Select External as the User Type (unless you have a Google Workspace organization)
- Click Create
- Fill in the required information:
- App name: Enter a name (e.g., "Istari Google Workspace Integration")
- User support email: Your email address
- Developer contact information: Your email address
- Click Save and Continue
- On the Scopes page, click Add or Remove Scopes
- Add the following scopes:
https://www.googleapis.com/auth/documents.readonlyhttps://www.googleapis.com/auth/spreadsheets.readonlyhttps://www.googleapis.com/auth/presentations.readonlyhttps://www.googleapis.com/auth/drive.readonly
- Click Update and then Save and Continue
- On the Test users page (if External), add test users:
- Click Add Users
- Add your Google email address
- Click Add
- Click Save and Continue
- Review the summary and click Back to Dashboard
Step 4: Create OAuth Client ID Credentials
- Navigate to APIs & Services > Credentials
- Click Create Credentials at the top
- Select OAuth client ID
- For Application type, select Desktop app
- Enter a name (e.g., "Istari Google Workspace Desktop Client")
- Click Create
- A dialog will appear with your Client ID and Client Secret
- Note the Client ID - you'll need this for app integration setup
- Click OK
Register App Integration in Istari Digital Platform
Before users can authenticate, you must register both the App Integration and Auth Integration in the Istari Digital Platform. This is a prerequisite for the authentication workflow.
Step 1: Create the App Integration
- Navigate to the Istari Digital Platform Admin page
- Go to App Integrations
- Click Add App in the upper right corner
- Fill in the app integration form:
- App: Select Google Drive from the dropdown
- Description: (Optional) Enter a description for your integration (e.g., "ACME Inc. Google Drive Integration")
- Click Add. Your Google Drive app integration will appear in the list
Step 2: Add the Auth Integration
- In the App Integrations page, find your Google Drive integration
- In the Auth Providers section, click the + (plus) button to add auth information
- Select Google Accounts as the auth provider
- Fill in the auth registration information:
- Client ID: Enter your Google Cloud OAuth Client ID
- Authorization Issuer:
https://accounts.google.com - Scope: Enter a space-separated list of scopes:
https://www.googleapis.com/auth/documents.readonly https://www.googleapis.com/auth/spreadsheets.readonly https://www.googleapis.com/auth/presentations.readonly https://www.googleapis.com/auth/drive.readonly - PKCE Enabled: Toggle ON (recommended for Google Accounts integration)
- Click Add or Save
Your Google Accounts auth integration is now configured and ready to use with the Google Workspace integration. Users can now authenticate when connecting Google Workspace documents through the Platform UI.
Note: If you need to update the auth integration later, you can edit it from the App Integrations page. See the Third Party App Integrations documentation for more details.
Configuration Parameters
No configuration parameters are required. Authentication is handled through the Platform UI using the registered Google Accounts auth integration.
Versions
Current Module Version: 1.2.0
Compatibility Notes
- Requires Istari Digital Agent version
>9.0.0 - Supports Windows 10/11, Windows Server 2019/2022, Ubuntu 22.04, and RHEL 8
- Compatible with Google Docs API v1, Google Sheets API v4, Google Slides API v1, and Google Drive API v3
- Requires Google Workspace or Google Account with Google Drive access
Changelog
Module Version 1.2.0
- Initial public release
- Support for Google Docs, Google Sheets, and Google Slides extraction
- OAuth2 authentication with Google Accounts
- Multiple export formats (DOCX, PDF, HTML, PPTX, XLSX, CSV)
- RAG-ready chunking for Docs, Sheets, and Slides documents
- Comprehensive content extraction (text, tables, images, charts, named ranges)
Release Notes
Key Changes Between Versions
This is the initial public release of the Google Workspace integration module.
Troubleshooting
Common Issues
Issue: Authentication fails with "access blocked"
- Symptom: Job fails with authentication error or access blocked message
- Cause: OAuth consent screen not properly configured or user not added as test user
- Solution:
- Have your Google Cloud administrator navigate to APIs & Services > OAuth consent screen
- Verify all required scopes are added
- If using External user type, ensure the user's email is added to the Test users list
- Verify the app is in testing mode (or published if needed)
- Retry the extraction job
Issue: "Insufficient permissions" error
- Symptom: Job fails with "Insufficient permissions" or "Forbidden" error
- Cause: The authenticated user doesn't have access to the Google Drive file or the OAuth scopes are incorrect
- Solution:
- Verify the user has access to the Google Drive file
- Check that the file link you entered when connecting the file is correct and accessible
- Ensure the Google Cloud OAuth client has all required scopes configured
- Verify the App Integration has the correct scopes in the Auth Integration settings
- Check that the user has granted the necessary permissions during authentication
Issue: File not found error
- Symptom: Job fails with "document not found" or "file not found" error
- Cause: The Google Drive link you entered when connecting the file is incorrect or the file has been moved/deleted
- Solution:
- Verify the link you entered when connecting the file is correct and accessible in a browser
- Ensure the file hasn't been moved or deleted
- Check that the link format matches the expected pattern for the file type
- Verify the file is shared with the authenticated user (if applicable)
- If the file was moved, disconnect and reconnect the file with the updated link
Issue: API rate limit exceeded
- Symptom: Job fails with rate limit error
- Cause: Too many requests to Google APIs in a short time period
- Solution:
- Wait a few minutes before retrying
- Reduce the number of concurrent extraction jobs
- Contact your Google Cloud administrator if rate limits persist
Getting Help
For additional support:
- Check the Istari Digital Platform documentation
- Contact your organization's Google Cloud administrator for authentication setup
- Reach out to Istari Digital support at support@istaridigital.com
Tips and Best Practices
- Verify permissions: Ensure users have appropriate Google Drive permissions before running extractions
- Check file links: Always verify Google Drive links are accessible in a browser before connecting files
- OAuth consent: Work with your Google Cloud administrator to configure OAuth consent screen properly
- RAG optimization: Use
smart_chunksandsemantic_chunksoutputs for optimal RAG performance with Docs and Slides documents - Sheets data: Use CSV exports for data analysis workflows, and sheets_data JSON for structured data access
- Slide thumbnails: Use
slide_imagesoutput for visual previews of presentation content
FAQ
-
Why does this integration require OAuth consent screen configuration?
- Google APIs require OAuth2 authentication to access user data. The OAuth consent screen ensures users understand what permissions the integration is requesting. This is a Google security requirement.
-
Can I use this integration with local files?
- No, this integration is designed specifically for cloud-hosted Google Workspace documents in Google Drive. For local files, use the standard Word, Excel, PowerPoint, or PDF integrations.
-
What's the difference between read-only scopes and read-write scopes?
- This integration uses read-only scopes (
*.readonly), which allow extraction but prevent any modifications to documents. This ensures data security and prevents accidental changes.
- This integration uses read-only scopes (
-
How do I get the Google Drive link for my document?
You can get the Google Drive link using either method:
Method 1: From Google Drive
- Navigate to your Google Drive
- Right-click on the file
- Click Get link or Share
- Select the appropriate sharing permissions
- Copy the link and paste it into the Link field when connecting the file through the Platform UI
Method 2: From Google Workspace App (Docs/Sheets/Slides)
- Open the document in the Google Workspace app (Docs, Sheets, or Slides)
- Click the Share button in the top right corner
- Click Copy link or Get shareable link
- Select the appropriate sharing permissions
- Copy the link and paste it into the Link field when connecting the file through the Platform UI
Sample Link Formats:
Google Docs links typically look like:
https://docs.google.com/document/d/DOCUMENT_ID/editGoogle Sheets links typically look like:
https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/editGoogle Slides links typically look like:
https://docs.google.com/presentation/d/PRESENTATION_ID/editNote: The sample links above are examples only and cannot be used. Each Google Drive document has a unique link that you must obtain from your Google Drive or Google Workspace app.
-
Can I extract data from password-protected files?
- Password-protected Google Workspace files are not supported. Files must be accessible without password protection in Google Drive.
-
What happens if I use incorrect scopes in the OAuth configuration?
- The extraction will fail with an "Insufficient permissions" error. Ensure your Google Cloud OAuth client and App Integration have all four required scopes:
documents.readonly,spreadsheets.readonly,presentations.readonly, anddrive.readonly.
- The extraction will fail with an "Insufficient permissions" error. Ensure your Google Cloud OAuth client and App Integration have all four required scopes: