Use Case
GHS Classification Extraction Built for Compliance Teams
Field-level extraction guide for SDS Section 2, including classification mapping, H/P statement normalization, and exception controls for regulatory workflows.
Last updated: 2026-03-10
What this workflow solves
- Converts free-text hazard language into normalized GHS classes and statement objects.
- Separates signal word, hazard classes, and precautionary guidance into contract-stable fields.
- Flags ambiguous classifications for reviewer adjudication before downstream release.
Section 2 output contract
| Field | Type | Rule |
|---|---|---|
| signal_word | string | One of `Danger` or `Warning` when present in source. |
| ghs_classification[] | array[object] | Normalized class code and human-readable label. |
| hazard_statements[] | array[object] | H-code with localized text and confidence score. |
| precautionary_statements[] | array[object] | P-code with grouped prevention/response/storage/disposal tags. |
| pictograms[] | array[string] | Standardized pictogram IDs when declared. |
| classification_warnings[] | array[string] | Raised when class ambiguity or missing qualifiers are detected. |
Classification resolution workflow
- Detect and segment the hazard-identification block from Section 2.
- Map hazard phrases to canonical GHS class codes with locale-aware synonym sets.
- Resolve H/P codes to structured objects and preserve source phrasing for audit trails.
- Compute confidence per extracted item and emit warnings for borderline cases.
- Apply policy checks (required H/P pairs, mandatory signal word, pictogram consistency).
Output example (Section 2)
{
"schema_version": "2026-01",
"hazards_identification": {
"signal_word": "Danger",
"ghs_classification": [
{ "code": "Flam. Liq. 2", "label": "Flammable Liquid - Category 2", "confidence": 0.99 },
{ "code": "Eye Irrit. 2A", "label": "Eye Irritation - Category 2A", "confidence": 0.97 }
],
"hazard_statements": [
{ "code": "H225", "text": "Highly flammable liquid and vapour", "confidence": 0.99 },
{ "code": "H319", "text": "Causes serious eye irritation", "confidence": 0.96 }
],
"precautionary_statements": [
{ "code": "P210", "text": "Keep away from heat/sparks/open flames", "group": "prevention" }
],
"pictograms": ["GHS02", "GHS07"],
"classification_warnings": []
}
}
Validation controls usually enforced
| Control | Purpose | Typical threshold |
|---|---|---|
| Class confidence gate | Route uncertain classes for review. | <0.85 flagged |
| H/P pairing check | Detect missing statement groups. | 100% required pair coverage |
| Signal-word consistency | Prevent contradictory hazard messaging. | Must match highest-risk class |
| Pictogram alignment | Ensure icon set is compatible with class list. | No unresolved mismatch |
Failure patterns specific to GHS extraction
- Combined class lines where two hazards share one sentence and one code is missed.
- Locale-specific wording that maps to the wrong hazard code family.
- Legacy supplier phrasing that lists H/P text without explicit codes.
- Scanned table artifacts where pictograms are present but statement text is truncated.
FAQ
Can this output both code and human-readable hazard text?
Yes. Each hazard and precautionary statement can include standardized code and normalized text in the same payload object.
How do we handle multilingual hazard statements?
Classification mapping is language-aware. Codes are normalized while preserving source-language text for traceability.
Can low-confidence classifications be blocked from auto-publish?
Yes. Confidence thresholds and warning-based routing can stop downstream publishing until reviewer approval.
Related pages
- SDS extraction API
- Accuracy methodology and scoring rules
- Benchmark by language and scan quality
- EHS ingestion architecture
- API docs
Need a Section 2 quality audit on your SDS corpus?
Request a GHS extraction review
to map class-level risks before go-live.