Articles on: API & developers

Lido API: quickstart and authentication

The Lido API lets you submit documents and get back structured JSON data. Use it to embed Lido extraction inside your own app, backend, or third-party tool. This article covers authentication, the core endpoints, and the recommended workflow.

When to use the API

Your end users never log into Lido.
You're embedding extraction in another product or backend service.
You want the same Lido extraction quality without users learning a new tool.

When NOT to use the API:

You're processing documents that arrive via Drive, OneDrive, or email — use a workflow instead, no code needed.
You're testing or designing extraction — start in the spreadsheet UI, then promote to API.

The recommended workflow

This is important enough to put first:

Build the extractor in the spreadsheet UI. Test it on real documents until it works.
Click the API button in the bottom-left of the Data Extractor. Lido generates the exact JSON configuration that matches your tested setup.
Copy the generated configuration. This is what you'll send to the API.
Get an API key at sheets.lido.app/settings/api-keys.
Make API calls from your backend using the configuration and the API key.

Skipping step 1 (testing in the UI first) is the most common API integration mistake. Don't write configuration from scratch in code — the UI tunes it for you in minutes.

Authentication

All API requests require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Get your API key:

Open sheets.lido.app/settings/api-keys.
Click Create API Key.
Copy the key immediately. It's shown once. Store it in your secrets manager.

API keys are workspace-scoped. They have access to everything in the workspace they were created in.

Core endpoints

`POST /extract-file-data`

Submit a document for extraction. Returns a jobId you'll use to retrieve results.

URL: https://sheets.lido.app/api/v1/extract-file-data

Two upload methods:

Method	Max file size	Use when
JSON + base64	50 MB	Web apps, smaller files, easier client-side coding
Multipart form data	500 MB	Larger files, server-side integrations

Rate limit: 5 requests per 30 seconds.

JSON + base64 example (Python)

import requests, base64

url = "https://sheets.lido.app/api/v1/extract-file-data"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

with open("invoice.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode("utf-8")

payload = {
    "file": {"type": "base64", "data": file_b64, "name": "invoice.pdf"},
    "columns": ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
    "instructions": "Total Amount is the grand total including tax. Use ISO format YYYY-MM-DD for dates.",
    "multiRow": False,
    "pageRange": "1"
}

r = requests.post(url, headers=headers, json=payload)
print(r.json())  # {"status": "running", "jobId": "..."}

Multipart example (cURL)

curl -X POST 'https://sheets.lido.app/api/v1/extract-file-data' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@invoice.pdf' \
  -F 'config={"columns":["Vendor Name","Invoice Number","Total Amount","Due Date"],"instructions":"...","multiRow":false,"pageRange":"1"}'

`GET /job-result/{jobId}`

Retrieve extraction results. Poll this endpoint after submitting.

URL: https://sheets.lido.app/api/v1/job-result/{jobId}

Response:

{
  "status": "complete",
  "data": [
    {
      "Vendor Name": "Acme Corp",
      "Invoice Number": "INV-2026-0042",
      "Total Amount": "1234.56",
      "Due Date": "2026-05-15"
    }
  ]
}

While processing, status will be "running". Poll until "complete" or "error".

Important: Job results expire after 24 hours. Persist them in your own system before then.

Configuration parameters

The columns, instructions, multiRow, and pageRange parameters in your API call match the settings you configured in the spreadsheet UI.

Parameter	Type	Required	Description
`columns`	array	Yes	List of field names to extract
`instructions`	string	No	Free-form guidance for the AI
`multiRow`	boolean	No	True for tabular extraction (one row per item); false for summary (default)
`pageRange`	string	No	Which pages to process (e.g., `"1-3"`, `"2,5,7"`)

Build the configuration in the UI, click the API button, and copy. Don't write this from scratch.

Recommended polling pattern

import time

# Submit
r = requests.post(submit_url, headers=headers, json=payload)
job_id = r.json()["jobId"]

# Wait at least 10 seconds before first poll
time.sleep(10)

# Poll with backoff
result_url = f"https://sheets.lido.app/api/v1/job-result/{job_id}"
backoff = 5
while True:
    result = requests.get(result_url, headers=headers).json()
    if result["status"] == "complete":
        break
    if result["status"] == "error":
        raise Exception(f"Extraction failed: {result.get('error')}")
    time.sleep(backoff)
    backoff = min(backoff * 1.5, 30)  # cap at 30s

print(result["data"])

For high-throughput pipelines, kick off many jobs in parallel and poll asynchronously rather than blocking per document.

Error handling

Status code	Meaning	What to do
200	Success	Use the response
400	Bad request	Check your payload — usually missing required fields or malformed JSON
401	Auth failed	Check your API key
413	File too large	Switch to multipart for files >50 MB; max 500 MB
429	Rate limited	Back off and retry; you've exceeded 5 requests / 30 seconds
5xx	Lido error	Retry with exponential backoff

Wrap every API call in retry logic for 429 and 5xx. Don't retry 4xx (other than 413, which is recoverable by switching upload method).

Tips

Always test in the UI first. Always.
Use multipart for files >50 MB (or anytime you'd rather not base64-encode).
Persist results immediately. They expire in 24 hours.
Store your API key in a secrets manager, never in client-side code or git.
Rotate keys periodically. The Lido API key page lets you create new keys and revoke old ones.
Use webhooks (Webhook Trigger in workflows) for the inverse pattern — instead of polling Lido, have Lido push to your system when work completes.

Common mistakes

Writing extraction config from scratch instead of copying from the UI. Hours of trial and error you don't need to do.
Polling every second. Wastes your rate limit and yields nothing — extraction takes 10–30 seconds. Wait 10 seconds before the first poll.
Forgetting result expiration. Results disappear after 24 hours. Save them when they arrive.
Hardcoding the API key. Use environment variables or a secrets manager.
Treating 429 as a fatal error. It's a "back off" signal, not a failure. Retry with backoff.
Calling the API for one-off documents. For a handful of files, the UI is faster. The API is for systems, not humans.

Extract data from PDFs and documents
Quickstart: extract data from your first document
Concepts: spreadsheet vs. workflow vs. API
Improve extraction accuracy
Triggers: how workflows start (Webhook Trigger)
Pricing, plans, and page allowance (API access requires a paid plan)

Updated on: 16/04/2026

Was this article helpful?

Thank you!