docustack
HomeDocumentation

Developer Documentation

Everything you need to integrate DocuStack into your application — authentication, quickstart, API reference, and webhook events.

Overview

DocuStack is a developer-first document processing platform. You define a schema, upload documents, and query structured extraction results from a single API. Documents are processed by a multi-stage pipeline: OCR via AWS Textract, field extraction via AI, and full-text indexing in OpenSearch.

How it works

  1. 1Create a schema Define the fields you want extracted (invoice number, vendor name, total amount, etc.) in the dashboard.
  2. 2Upload documents POST a PDF, JPEG, or TIFF to the API, or submit URLs in bulk via the batch endpoint.
  3. 3Poll for completion Extraction takes a few seconds to a few minutes depending on document complexity.
  4. 4Query results Fetch extracted field values with confidence scores and search across document text.

Base URL

text
https://api.docustack.com

API keys

DocuStack uses two key types. Both are created in the dashboard under Settings → API Keys.

PrefixEnvironmentUse
ds_live_...ProductionReal data, billed usage
ds_test_...TestIsolated sandbox, no billing

Authentication

All API requests must include your API key in the X-API-Key header. There is no OAuth flow — just the key.

shell
curl https://api.docustack.com/api/v1/documents \
  -H "X-API-Key: ds_live_your_key_here"

Getting your API key

  1. Log in at app.docustack.com
  2. Navigate to Settings → API Keys
  3. Click Create key, choose Live or Test, and copy the value
Keep your key secret.API keys carry full access to your organization's data. Never commit them to source control or expose them in client-side code. Use environment variables.

Error responses

Missing or invalid keys return a 401 Unauthorized with a JSON body:

json
{
  "detail": "Invalid or missing API key"
}

Rate limiting

Requests are rate-limited per API key using a sliding window. When exceeded, the API returns 429 Too Many Requests. The response includes a Retry-After header indicating when you may retry.

Quickstart

This walkthrough takes you from zero to extracting field values from a document in under 30 minutes. It uses curl, but the same steps apply to any HTTP client.

Use a test key (ds_test_...) while following this guide. Switch to a live key when you're ready for production.

Step 1 — Create a schema

Schemas define which fields to extract from your documents. For now, create one in the dashboard under Schemas → New schema. Add fields like invoice_number, vendor_name, and total_amount.

After saving, copy the schema ID — you'll need it in every upload request.

Step 2 — Upload a document

Upload a PDF using multipart form data. Replace YOUR_SCHEMA_ID, YOUR_ORG_ID, and YOUR_API_KEY with real values.

shell
curl -X POST https://api.docustack.com/api/v1/documents/upload \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "schema_id=YOUR_SCHEMA_ID" \
  -F "organization_id=YOUR_ORG_ID"

Response:

json
{
  "document_id": "c1a2b3c4d5e6f7g8h9i0",
  "job_id": "c0j1k2l3m4n5o6p7q8r9"
}

Step 3 — Poll for status

Extraction typically completes in under 30 seconds for a simple one-page PDF. Poll the status endpoint until status is completed or failed.

shell
curl https://api.docustack.com/api/v1/documents/DOCUMENT_ID/status \
  -H "X-API-Key: YOUR_API_KEY"
json
{
  "document_id": "c1a2b3c4d5e6f7g8h9i0",
  "status": "completed",
  "extraction_progress": 100,
  "fields_extracted": 3
}

Step 4 — Retrieve extracted fields

Once status is completed, fetch the full document to get extracted field values.

shell
curl https://api.docustack.com/api/v1/documents/DOCUMENT_ID \
  -H "X-API-Key: YOUR_API_KEY"
json
{
  "id": "c1a2b3c4d5e6f7g8h9i0",
  "name": "invoice.pdf",
  "status": "completed",
  "field_values": [
    {
      "field_key": "invoice_number",
      "field_name": "Invoice Number",
      "value": "INV-2024-0042",
      "confidence": 0.98,
      "was_inferred": false,
      "manually_edited": false
    },
    {
      "field_key": "vendor_name",
      "field_name": "Vendor Name",
      "value": "Acme Corp",
      "confidence": 0.95,
      "was_inferred": false,
      "manually_edited": false
    },
    {
      "field_key": "total_amount",
      "field_name": "Total Amount",
      "value": "4,200.00",
      "confidence": 0.99,
      "was_inferred": false,
      "manually_edited": false
    }
  ]
}

Python SDK example

You can also use the DocuStack Python SDK (pip install docustack).

python
import docustack
import time

client = docustack.Client(api_key="ds_live_your_key_here")

# Upload document
result = client.documents.upload(
    file=open("invoice.pdf", "rb"),
    schema_id="YOUR_SCHEMA_ID",
    organization_id="YOUR_ORG_ID",
)
doc_id = result.document_id

# Poll until complete
while True:
    status = client.documents.get_status(doc_id)
    if status.status in ("completed", "failed"):
        break
    time.sleep(2)

# Fetch extracted fields
doc = client.documents.get(doc_id)
for field in doc.field_values:
    print(f"{field.field_name}: {field.value} ({field.confidence:.0%})")

Batch ingestion

For bulk processing, use the batch endpoint to submit multiple document URLs or S3 keys in a single request:

shell
curl -X POST https://api.docustack.com/api/v1/documents/batch \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema_id": "YOUR_SCHEMA_ID",
    "organization_id": "YOUR_ORG_ID",
    "documents": [
      { "ref": "https://example.com/invoice-001.pdf", "name": "Invoice 001" },
      { "ref": "https://example.com/invoice-002.pdf", "name": "Invoice 002" },
      { "ref": "s3://my-bucket/invoices/003.pdf",     "name": "Invoice 003" }
    ]
  }'

Webhooks

DocuStack sends HMAC-signed HTTP POST requests to your endpoint when document or job events occur. This lets you react to extraction results without polling.

Configure a webhook

In the dashboard, navigate to Settings → Webhooks and add your endpoint URL. DocuStack will sign every delivery with your webhook secret.

Verify the signature

Every request includes a X-DocuStack-Signature header containing an HMAC-SHA256 signature of the raw request body, keyed with your webhook secret.

python
import hashlib
import hmac

def verify_webhook(body: bytes, secret: str, signature_header: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        body,
        hashlib.sha256,
    ).hexdigest()
    return hmac.compare_digest(expected, signature_header)
Always verify the signature before processing a webhook. Reject requests with invalid or missing signatures with a 401 status.

Events

document_completed

Fired when a document has been fully extracted and all field values are available.

json
{
  "event": "document_completed",
  "document_id": "c1a2b3c4d5e6f7g8h9i0",
  "job_id": "c0j1k2l3m4n5o6p7q8r9",
  "organization_id": "org_abc123",
  "schema_id": "sch_xyz789",
  "processed_at": "2024-11-15T14:23:01Z"
}
job_completed

Fired when all documents in an ingestion job have finished processing (some may have failed).

json
{
  "event": "job_completed",
  "job_id": "c0j1k2l3m4n5o6p7q8r9",
  "organization_id": "org_abc123",
  "document_count": 10,
  "processed_count": 9,
  "failed_count": 1,
  "completed_at": "2024-11-15T14:25:44Z"
}
job_failed

Fired when all documents in a job have failed or the job itself encountered a fatal error.

json
{
  "event": "job_failed",
  "job_id": "c0j1k2l3m4n5o6p7q8r9",
  "organization_id": "org_abc123",
  "document_count": 5,
  "failed_count": 5,
  "error_message": "Extraction workflow timed out",
  "failed_at": "2024-11-15T14:30:00Z"
}

Retry behaviour

If your endpoint returns a non-2xx status, DocuStack will retry the delivery up to 5 times with exponential backoff (1s, 5s, 30s, 5min, 30min). After 5 failed attempts the delivery is marked as failed and no further retries occur.

Respond with 200 OK as quickly as possible (within 30 seconds). Offload any heavy processing to a background job so your webhook endpoint is always responsive.

Full API Reference

Browse all endpoints with interactive request/response examples, schema definitions, and the ability to try requests directly in the browser.

Open API Reference