What it does
Turns supplier SDS files into structured records instead of plain OCR text blocks.
Integration Guide
SafetyDataSheetAPI converts SDS and MSDS PDFs into structured JSON, XML, and CSV with confidence scores, warnings, and schema versioning for downstream compliance systems.
What it does
Turns supplier SDS files into structured records instead of plain OCR text blocks.
Who it is for
Integration teams connecting SDS workflows to EHS, ERP, PLM, product stewardship, or shared services systems.
When bulk matters
Use asynchronous bulk queues for supplier backfills, nightly syncs, or multi-file onboarding projects.
What to review first
Authentication, schema version, exception routing, residency requirements, and downstream field mapping.
These answer blocks are written for procurement, engineering, and implementation teams that need the short version before reading the full reference.
Each successful request returns a request ID, confidence score, warning metadata, and synchronized JSON, XML, and CSV outputs built from the same extraction run.
Production requests use a bearer token in the Authorization header. Keys are scoped by
environment, throughput tier, and deployment model.
Yes. OCR-assisted extraction supports scanned SDS files, and multilingual SDS inputs can be
supplied with an optional language_hint.
Low-certainty extraction surfaces through the confidence score plus warnings so teams can route exceptions to QA or regulatory review instead of silently accepting weak data.
Use bulk plus webhooks for supplier migrations, nightly refreshes, and queued ingestion where synchronous single-file calls would create unnecessary wait states.
Schema versioning lets teams pin output contracts, plan mapping changes deliberately, and roll forward without breaking downstream ERP, EHS, or PLM workflows.
A minimal production-oriented integration has four parts: authenticate, upload the SDS, store the structured outputs, and route exceptions deliberately.
Authorization
header.
POST /extract-sds.
curl -X POST "https://api.safetydatasheetapi.com/v1/extract-sds" \
-H "Authorization: Bearer <api_key>" \
-F "file=@acetone-sds.pdf" \
-F "language_hint=en" \
-F "schema_version=2026-01"
The production API, public trial endpoints, and analytics collector serve different purposes. Keep evaluation traffic and production ingestion logic separate.
| Method | Path | Purpose | Use when |
|---|---|---|---|
POST |
/extract-sds |
Extract structured SDS data from a single document. | Primary synchronous production ingestion. |
POST |
/extract-sds/bulk |
Submit multiple SDS files and receive asynchronous results. | Queued ingestion, backfills, and large supplier batches. |
POST |
/api/sample-upload |
Website sample upload endpoint with Turnstile and rate limiting. | Controlled public evaluation, not production integration. |
POST |
/api/sample-output-access |
Captures lead details before copy or download from the public sample output viewer. | Website evaluation flow only. |
POST |
/api/contact |
Enterprise implementation inquiry endpoint. | Requesting onboarding, architecture, or procurement review. |
POST |
/api/analytics-event |
Public CTA analytics collector. | Tracking commercial page interactions, not extraction workflows. |
Authentication header
Authorization: Bearer <api_key>
Every successful extraction response is designed to be both machine-usable and reviewable. Teams typically store the raw response envelope alongside their preferred format.
| Artifact | What it contains | Why teams keep it |
|---|---|---|
| JSON output | Normalized SDS fields and nested sections for application logic. | Primary contract for EHS, ERP, PLM, and stewardship integrations. |
| XML output | Structured representation for systems that prefer XML interchange. | Supports legacy enterprise interfaces and governed data exchange patterns. |
| CSV output | Flat field-value export generated from the same extraction run. | Useful for review operations, audits, or spreadsheet-based reconciliation. |
confidence_score |
Overall extraction confidence for the request. | Supports automated acceptance thresholds and exception routing. |
warnings |
Signals about ambiguous text, low OCR quality, or incomplete fields. | Keeps uncertain output visible instead of silently blending into production data. |
schema_version |
Versioned contract marker for the structured response model. | Lets downstream teams pin mappings and manage upgrades deliberately. |
request_id |
Trace identifier for the extraction request. | Helps support, retries, and audit-oriented reconciliation. |
{
"request_id": "req_8hy2n3",
"output_formats": ["JSON", "XML", "CSV"],
"outputs": {
"json": {
"schema_version": "2026-01",
"product_identification": {
"product_name": "Acetone",
"supplier_name": "Example Chemicals Ltd."
},
"hazards_identification": {
"ghs_classification": ["Flammable Liquid - Category 2"],
"h_statements": ["H225 Highly flammable liquid and vapour"]
},
"transport_information": {
"un_number": "UN1090"
},
"exposure_controls_ppe": {
"ppe": ["Protective gloves", "Eye protection"]
},
"revision_metadata": {
"revision_date": "2024-01-15"
}
},
"xml": "<sds_extraction>...</sds_extraction>",
"csv": "field,value"
},
"confidence_score": 0.97,
"warnings": [],
"processing_ms": 2410
}
Bulk ingestion is the safer operating model once SDS extraction becomes a queueing problem instead of a user-click problem.
| Flow | Best for | Operational pattern |
|---|---|---|
| Synchronous single-file extraction | Interactive validation, low-volume upload flows, and immediate review. | Client uploads one file and receives structured output directly in the response. |
| Bulk extraction | Supplier migrations, nightly refreshes, and large backlog cleanup. | Client submits multiple files, tracks the batch, and consumes results asynchronously. |
| Webhook completion | Event-driven systems that do not want polling loops. | Receive completion and warning summaries as extraction jobs finish. |
{
"event": "extraction.completed",
"request_id": "req_8hy2n3",
"status": "success",
"confidence_score": 0.97,
"warnings": []
}
Production throughput, queue sizing, and SLA-backed lanes are aligned during implementation rather than hard-coded into the public docs.
Error handling is only one part of operational safety. The other part is deciding when apparently successful output still deserves human review.
| Status | Code | Meaning | Recommended response |
|---|---|---|---|
400 |
invalid_document |
File is unsupported or not parseable as an SDS input. | Reject the file and request a new source document. |
401 |
unauthorized |
Missing or invalid API token. | Rotate or correct credentials before retrying. |
422 |
low_text_quality |
Extraction is incomplete because text quality is too weak. | Route the file to manual review or request a cleaner source copy. |
429 |
rate_limited |
Plan throughput or endpoint policy was exceeded. | Retry with backoff or move the workload into the agreed bulk lane. |
request_id even when the extraction looks clean.These questions mirror the schema markup and are phrased for assistant-style extraction.
Each successful request returns a request ID, confidence score, warning metadata, and synchronized JSON, XML, and CSV outputs built from the same extraction run.
All production API requests use a bearer token in the Authorization header. Keys are scoped by environment, throughput tier, and deployment model.
Use bulk extraction and webhook completion for supplier backlogs, nightly refresh jobs, and queued ingestion where synchronous single-file requests would create unnecessary wait states.
Low-certainty extraction surfaces through the confidence score plus warnings so teams can route documents or specific fields into QA, regulatory review, or exception handling queues.
Yes. The API supports scanned SDS files with OCR-assisted extraction and multilingual SDS inputs, with warning metadata available when text quality or layout affects confidence.
Schema versioning lets integration teams pin output contracts, plan downstream mappings deliberately, and roll forward to new field models without breaking existing ERP, EHS, or PLM workflows.
Recent changes that affect implementation, extraction review, or evaluation paths.
This docs hub is strongest when it is read together with the trust and evidence pages below.