Articles on: API & developers

Lido API: quickstart and authentication

The Lido API lets you submit documents and get back structured JSON data. Use it to embed Lido extraction inside your own app, backend, or third-party tool. This article covers authentication, the core endpoints, and the recommended workflow.



When to use the API


  • Your end users never log into Lido.
  • You're embedding extraction in another product or backend service.
  • You want the same Lido extraction quality without users learning a new tool.


When NOT to use the API:


  • You're processing documents that arrive via Drive, OneDrive, or email — use a workflow instead, no code needed.
  • You're testing or designing extraction — start in the spreadsheet UI, then promote to API.




This is important enough to put first:


  1. Build the extractor in the spreadsheet UI. Test it on real documents until it works.
  2. Click the API button in the bottom-left of the Data Extractor. Lido generates the exact JSON configuration that matches your tested setup.
  3. Copy the generated configuration. This is what you'll send to the API.
  4. Get an API key at sheets.lido.app/settings/api-keys.
  5. Make API calls from your backend using the configuration and the API key.


Skipping step 1 (testing in the UI first) is the most common API integration mistake. Don't write configuration from scratch in code — the UI tunes it for you in minutes.



Authentication


All API requests require a Bearer token in the Authorization header:


Authorization: Bearer YOUR_API_KEY


Get your API key:


  1. Open sheets.lido.app/settings/api-keys.
  2. Click Create API Key.
  3. Copy the key immediately. It's shown once. Store it in your secrets manager.


API keys are workspace-scoped. They have access to everything in the workspace they were created in.



Core endpoints


POST /extract-file-data


Submit a document for extraction. Returns a jobId you'll use to retrieve results.


URL: https://sheets.lido.app/api/v1/extract-file-data


Two upload methods:


Method

Max file size

Use when

JSON + base64

50 MB

Web apps, smaller files, easier client-side coding

Multipart form data

500 MB

Larger files, server-side integrations


Rate limit: 5 requests per 30 seconds.


JSON + base64 example (Python)


import requests, base64

url = "https://sheets.lido.app/api/v1/extract-file-data"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}

with open("invoice.pdf", "rb") as f:
file_b64 = base64.b64encode(f.read()).decode("utf-8")

payload = {
"file": {"type": "base64", "data": file_b64, "name": "invoice.pdf"},
"columns": ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
"instructions": "Total Amount is the grand total including tax. Use ISO format YYYY-MM-DD for dates.",
"multiRow": False,
"pageRange": "1"
}

r = requests.post(url, headers=headers, json=payload)
print(r.json()) # {"status": "running", "jobId": "..."}


Multipart example (cURL)


curl -X POST 'https://sheets.lido.app/api/v1/extract-file-data' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@invoice.pdf' \
-F 'config={"columns":["Vendor Name","Invoice Number","Total Amount","Due Date"],"instructions":"...","multiRow":false,"pageRange":"1"}'


GET /job-result/{jobId}


Retrieve extraction results. Poll this endpoint after submitting.


URL: https://sheets.lido.app/api/v1/job-result/{jobId}


Response:


{
"status": "complete",
"data": [
{
"Vendor Name": "Acme Corp",
"Invoice Number": "INV-2026-0042",
"Total Amount": "1234.56",
"Due Date": "2026-05-15"
}
]
}


While processing, status will be "running". Poll until "complete" or "error".


Important: Job results expire after 24 hours. Persist them in your own system before then.



Configuration parameters


The columns, instructions, multiRow, and pageRange parameters in your API call match the settings you configured in the spreadsheet UI.


Parameter

Type

Required

Description

columns

array

Yes

List of field names to extract

instructions

string

No

Free-form guidance for the AI

multiRow

boolean

No

True for tabular extraction (one row per item); false for summary (default)

pageRange

string

No

Which pages to process (e.g., "1-3", "2,5,7")


Build the configuration in the UI, click the API button, and copy. Don't write this from scratch.




import time

# Submit
r = requests.post(submit_url, headers=headers, json=payload)
job_id = r.json()["jobId"]

# Wait at least 10 seconds before first poll
time.sleep(10)

# Poll with backoff
result_url = f"https://sheets.lido.app/api/v1/job-result/{job_id}"
backoff = 5
while True:
result = requests.get(result_url, headers=headers).json()
if result["status"] == "complete":
break
if result["status"] == "error":
raise Exception(f"Extraction failed: {result.get('error')}")
time.sleep(backoff)
backoff = min(backoff * 1.5, 30) # cap at 30s

print(result["data"])


For high-throughput pipelines, kick off many jobs in parallel and poll asynchronously rather than blocking per document.



Error handling


Status code

Meaning

What to do

200

Success

Use the response

400

Bad request

Check your payload — usually missing required fields or malformed JSON

401

Auth failed

Check your API key

413

File too large

Switch to multipart for files >50 MB; max 500 MB

429

Rate limited

Back off and retry; you've exceeded 5 requests / 30 seconds

5xx

Lido error

Retry with exponential backoff


Wrap every API call in retry logic for 429 and 5xx. Don't retry 4xx (other than 413, which is recoverable by switching upload method).



Tips


  • Always test in the UI first. Always.
  • Use multipart for files >50 MB (or anytime you'd rather not base64-encode).
  • Persist results immediately. They expire in 24 hours.
  • Store your API key in a secrets manager, never in client-side code or git.
  • Rotate keys periodically. The Lido API key page lets you create new keys and revoke old ones.
  • Use webhooks (Webhook Trigger in workflows) for the inverse pattern — instead of polling Lido, have Lido push to your system when work completes.



Common mistakes


  • Writing extraction config from scratch instead of copying from the UI. Hours of trial and error you don't need to do.
  • Polling every second. Wastes your rate limit and yields nothing — extraction takes 10–30 seconds. Wait 10 seconds before the first poll.
  • Forgetting result expiration. Results disappear after 24 hours. Save them when they arrive.
  • Hardcoding the API key. Use environment variables or a secrets manager.
  • Treating 429 as a fatal error. It's a "back off" signal, not a failure. Retry with backoff.
  • Calling the API for one-off documents. For a handful of files, the UI is faster. The API is for systems, not humans.




  • Extract data from PDFs and documents
  • Quickstart: extract data from your first document
  • Concepts: spreadsheet vs. workflow vs. API
  • Improve extraction accuracy
  • Triggers: how workflows start (Webhook Trigger)
  • Pricing, plans, and page allowance (API access requires a paid plan)


Updated on: 16/04/2026

Was this article helpful?

Share your feedback

Cancel

Thank you!