Lido API: quickstart and authentication
The Lido API lets you submit documents and get back structured JSON data. Use it to embed Lido extraction inside your own app, backend, or third-party tool. This article covers authentication, the core endpoints, and the recommended workflow.
When to use the API
- Your end users never log into Lido.
- You're embedding extraction in another product or backend service.
- You want the same Lido extraction quality without users learning a new tool.
When NOT to use the API:
- You're processing documents that arrive via Drive, OneDrive, or email — use a workflow instead, no code needed.
- You're testing or designing extraction — start in the spreadsheet UI, then promote to API.
The recommended workflow
This is important enough to put first:
- Build the extractor in the spreadsheet UI. Test it on real documents until it works.
- Click the API button in the bottom-left of the Data Extractor. Lido generates the exact JSON configuration that matches your tested setup.
- Copy the generated configuration. This is what you'll send to the API.
- Get an API key at sheets.lido.app/settings/api-keys.
- Make API calls from your backend using the configuration and the API key.
Skipping step 1 (testing in the UI first) is the most common API integration mistake. Don't write configuration from scratch in code — the UI tunes it for you in minutes.
Authentication
All API requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Get your API key:
- Open sheets.lido.app/settings/api-keys.
- Click Create API Key.
- Copy the key immediately. It's shown once. Store it in your secrets manager.
API keys are workspace-scoped. They have access to everything in the workspace they were created in.
Core endpoints
POST /extract-file-data
Submit a document for extraction. Returns a jobId you'll use to retrieve results.
URL: https://sheets.lido.app/api/v1/extract-file-data
Two upload methods:
Method | Max file size | Use when |
|---|---|---|
JSON + base64 | 50 MB | Web apps, smaller files, easier client-side coding |
Multipart form data | 500 MB | Larger files, server-side integrations |
Rate limit: 5 requests per 30 seconds.
JSON + base64 example (Python)
import requests, base64
url = "https://sheets.lido.app/api/v1/extract-file-data"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
with open("invoice.pdf", "rb") as f:
file_b64 = base64.b64encode(f.read()).decode("utf-8")
payload = {
"file": {"type": "base64", "data": file_b64, "name": "invoice.pdf"},
"columns": ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
"instructions": "Total Amount is the grand total including tax. Use ISO format YYYY-MM-DD for dates.",
"multiRow": False,
"pageRange": "1"
}
r = requests.post(url, headers=headers, json=payload)
print(r.json()) # {"status": "running", "jobId": "..."}
Multipart example (cURL)
curl -X POST 'https://sheets.lido.app/api/v1/extract-file-data' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@invoice.pdf' \
-F 'config={"columns":["Vendor Name","Invoice Number","Total Amount","Due Date"],"instructions":"...","multiRow":false,"pageRange":"1"}'
GET /job-result/{jobId}
Retrieve extraction results. Poll this endpoint after submitting.
URL: https://sheets.lido.app/api/v1/job-result/{jobId}
Response:
{
"status": "complete",
"data": [
{
"Vendor Name": "Acme Corp",
"Invoice Number": "INV-2026-0042",
"Total Amount": "1234.56",
"Due Date": "2026-05-15"
}
]
}
While processing, status will be "running". Poll until "complete" or "error".
Important: Job results expire after 24 hours. Persist them in your own system before then.
Configuration parameters
The columns, instructions, multiRow, and pageRange parameters in your API call match the settings you configured in the spreadsheet UI.
Parameter | Type | Required | Description |
|---|---|---|---|
| array | Yes | List of field names to extract |
| string | No | Free-form guidance for the AI |
| boolean | No | True for tabular extraction (one row per item); false for summary (default) |
| string | No | Which pages to process (e.g., |
Build the configuration in the UI, click the API button, and copy. Don't write this from scratch.
Recommended polling pattern
import time
# Submit
r = requests.post(submit_url, headers=headers, json=payload)
job_id = r.json()["jobId"]
# Wait at least 10 seconds before first poll
time.sleep(10)
# Poll with backoff
result_url = f"https://sheets.lido.app/api/v1/job-result/{job_id}"
backoff = 5
while True:
result = requests.get(result_url, headers=headers).json()
if result["status"] == "complete":
break
if result["status"] == "error":
raise Exception(f"Extraction failed: {result.get('error')}")
time.sleep(backoff)
backoff = min(backoff * 1.5, 30) # cap at 30s
print(result["data"])
For high-throughput pipelines, kick off many jobs in parallel and poll asynchronously rather than blocking per document.
Error handling
Status code | Meaning | What to do |
|---|---|---|
200 | Success | Use the response |
400 | Bad request | Check your payload — usually missing required fields or malformed JSON |
401 | Auth failed | Check your API key |
413 | File too large | Switch to multipart for files >50 MB; max 500 MB |
429 | Rate limited | Back off and retry; you've exceeded 5 requests / 30 seconds |
5xx | Lido error | Retry with exponential backoff |
Wrap every API call in retry logic for 429 and 5xx. Don't retry 4xx (other than 413, which is recoverable by switching upload method).
Tips
- Always test in the UI first. Always.
- Use multipart for files >50 MB (or anytime you'd rather not base64-encode).
- Persist results immediately. They expire in 24 hours.
- Store your API key in a secrets manager, never in client-side code or git.
- Rotate keys periodically. The Lido API key page lets you create new keys and revoke old ones.
- Use webhooks (Webhook Trigger in workflows) for the inverse pattern — instead of polling Lido, have Lido push to your system when work completes.
Common mistakes
- Writing extraction config from scratch instead of copying from the UI. Hours of trial and error you don't need to do.
- Polling every second. Wastes your rate limit and yields nothing — extraction takes 10–30 seconds. Wait 10 seconds before the first poll.
- Forgetting result expiration. Results disappear after 24 hours. Save them when they arrive.
- Hardcoding the API key. Use environment variables or a secrets manager.
- Treating 429 as a fatal error. It's a "back off" signal, not a failure. Retry with backoff.
- Calling the API for one-off documents. For a handful of files, the UI is faster. The API is for systems, not humans.
Related articles
- Extract data from PDFs and documents
- Quickstart: extract data from your first document
- Concepts: spreadsheet vs. workflow vs. API
- Improve extraction accuracy
- Triggers: how workflows start (Webhook Trigger)
- Pricing, plans, and page allowance (API access requires a paid plan)
Updated on: 16/04/2026
Thank you!