Developer Documentation
Everything you need to integrate DocuStack into your application — authentication, quickstart, API reference, and webhook events.
Overview
DocuStack is a developer-first document processing platform. You define a schema, upload documents, and query structured extraction results from a single API. Documents are processed by a multi-stage pipeline: OCR via AWS Textract, field extraction via AI, and full-text indexing in OpenSearch.
How it works
- 1Create a schema — Define the fields you want extracted (invoice number, vendor name, total amount, etc.) in the dashboard.
- 2Upload documents — POST a PDF, JPEG, or TIFF to the API, or submit URLs in bulk via the batch endpoint.
- 3Poll for completion — Extraction takes a few seconds to a few minutes depending on document complexity.
- 4Query results — Fetch extracted field values with confidence scores and search across document text.
Base URL
https://api.docustack.comAPI keys
DocuStack uses two key types. Both are created in the dashboard under Settings → API Keys.
| Prefix | Environment | Use |
|---|---|---|
ds_live_... | Production | Real data, billed usage |
ds_test_... | Test | Isolated sandbox, no billing |
Authentication
All API requests must include your API key in the X-API-Key header. There is no OAuth flow — just the key.
curl https://api.docustack.com/api/v1/documents \
-H "X-API-Key: ds_live_your_key_here"Getting your API key
- Log in at app.docustack.com
- Navigate to Settings → API Keys
- Click Create key, choose Live or Test, and copy the value
Error responses
Missing or invalid keys return a 401 Unauthorized with a JSON body:
{
"detail": "Invalid or missing API key"
}Rate limiting
Requests are rate-limited per API key using a sliding window. When exceeded, the API returns 429 Too Many Requests. The response includes a Retry-After header indicating when you may retry.
Quickstart
This walkthrough takes you from zero to extracting field values from a document in under 30 minutes. It uses curl, but the same steps apply to any HTTP client.
ds_test_...) while following this guide. Switch to a live key when you're ready for production.Step 1 — Create a schema
Schemas define which fields to extract from your documents. For now, create one in the dashboard under Schemas → New schema. Add fields like invoice_number, vendor_name, and total_amount.
After saving, copy the schema ID — you'll need it in every upload request.
Step 2 — Upload a document
Upload a PDF using multipart form data. Replace YOUR_SCHEMA_ID, YOUR_ORG_ID, and YOUR_API_KEY with real values.
curl -X POST https://api.docustack.com/api/v1/documents/upload \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F "schema_id=YOUR_SCHEMA_ID" \
-F "organization_id=YOUR_ORG_ID"Response:
{
"document_id": "c1a2b3c4d5e6f7g8h9i0",
"job_id": "c0j1k2l3m4n5o6p7q8r9"
}Step 3 — Poll for status
Extraction typically completes in under 30 seconds for a simple one-page PDF. Poll the status endpoint until status is completed or failed.
curl https://api.docustack.com/api/v1/documents/DOCUMENT_ID/status \
-H "X-API-Key: YOUR_API_KEY"{
"document_id": "c1a2b3c4d5e6f7g8h9i0",
"status": "completed",
"extraction_progress": 100,
"fields_extracted": 3
}Step 4 — Retrieve extracted fields
Once status is completed, fetch the full document to get extracted field values.
curl https://api.docustack.com/api/v1/documents/DOCUMENT_ID \
-H "X-API-Key: YOUR_API_KEY"{
"id": "c1a2b3c4d5e6f7g8h9i0",
"name": "invoice.pdf",
"status": "completed",
"field_values": [
{
"field_key": "invoice_number",
"field_name": "Invoice Number",
"value": "INV-2024-0042",
"confidence": 0.98,
"was_inferred": false,
"manually_edited": false
},
{
"field_key": "vendor_name",
"field_name": "Vendor Name",
"value": "Acme Corp",
"confidence": 0.95,
"was_inferred": false,
"manually_edited": false
},
{
"field_key": "total_amount",
"field_name": "Total Amount",
"value": "4,200.00",
"confidence": 0.99,
"was_inferred": false,
"manually_edited": false
}
]
}Python SDK example
You can also use the DocuStack Python SDK (pip install docustack).
import docustack
import time
client = docustack.Client(api_key="ds_live_your_key_here")
# Upload document
result = client.documents.upload(
file=open("invoice.pdf", "rb"),
schema_id="YOUR_SCHEMA_ID",
organization_id="YOUR_ORG_ID",
)
doc_id = result.document_id
# Poll until complete
while True:
status = client.documents.get_status(doc_id)
if status.status in ("completed", "failed"):
break
time.sleep(2)
# Fetch extracted fields
doc = client.documents.get(doc_id)
for field in doc.field_values:
print(f"{field.field_name}: {field.value} ({field.confidence:.0%})")Batch ingestion
For bulk processing, use the batch endpoint to submit multiple document URLs or S3 keys in a single request:
curl -X POST https://api.docustack.com/api/v1/documents/batch \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"schema_id": "YOUR_SCHEMA_ID",
"organization_id": "YOUR_ORG_ID",
"documents": [
{ "ref": "https://example.com/invoice-001.pdf", "name": "Invoice 001" },
{ "ref": "https://example.com/invoice-002.pdf", "name": "Invoice 002" },
{ "ref": "s3://my-bucket/invoices/003.pdf", "name": "Invoice 003" }
]
}'Webhooks
DocuStack sends HMAC-signed HTTP POST requests to your endpoint when document or job events occur. This lets you react to extraction results without polling.
Configure a webhook
In the dashboard, navigate to Settings → Webhooks and add your endpoint URL. DocuStack will sign every delivery with your webhook secret.
Verify the signature
Every request includes a X-DocuStack-Signature header containing an HMAC-SHA256 signature of the raw request body, keyed with your webhook secret.
import hashlib
import hmac
def verify_webhook(body: bytes, secret: str, signature_header: str) -> bool:
expected = hmac.new(
secret.encode(),
body,
hashlib.sha256,
).hexdigest()
return hmac.compare_digest(expected, signature_header)401 status.Events
document_completedFired when a document has been fully extracted and all field values are available.
{
"event": "document_completed",
"document_id": "c1a2b3c4d5e6f7g8h9i0",
"job_id": "c0j1k2l3m4n5o6p7q8r9",
"organization_id": "org_abc123",
"schema_id": "sch_xyz789",
"processed_at": "2024-11-15T14:23:01Z"
}job_completedFired when all documents in an ingestion job have finished processing (some may have failed).
{
"event": "job_completed",
"job_id": "c0j1k2l3m4n5o6p7q8r9",
"organization_id": "org_abc123",
"document_count": 10,
"processed_count": 9,
"failed_count": 1,
"completed_at": "2024-11-15T14:25:44Z"
}job_failedFired when all documents in a job have failed or the job itself encountered a fatal error.
{
"event": "job_failed",
"job_id": "c0j1k2l3m4n5o6p7q8r9",
"organization_id": "org_abc123",
"document_count": 5,
"failed_count": 5,
"error_message": "Extraction workflow timed out",
"failed_at": "2024-11-15T14:30:00Z"
}Retry behaviour
If your endpoint returns a non-2xx status, DocuStack will retry the delivery up to 5 times with exponential backoff (1s, 5s, 30s, 5min, 30min). After 5 failed attempts the delivery is marked as failed and no further retries occur.
200 OK as quickly as possible (within 30 seconds). Offload any heavy processing to a background job so your webhook endpoint is always responsive.Full API Reference
Browse all endpoints with interactive request/response examples, schema definitions, and the ability to try requests directly in the browser.
Open API Reference