Use Case

GHS Classification Extraction Built for Compliance Teams

Field-level extraction guide for SDS Section 2, including classification mapping, H/P statement normalization, and exception controls for regulatory workflows.

Last updated: 2026-03-10

What this workflow solves

  • Converts free-text hazard language into normalized GHS classes and statement objects.
  • Separates signal word, hazard classes, and precautionary guidance into contract-stable fields.
  • Flags ambiguous classifications for reviewer adjudication before downstream release.

Section 2 output contract

Field Type Rule
signal_wordstringOne of `Danger` or `Warning` when present in source.
ghs_classification[]array[object]Normalized class code and human-readable label.
hazard_statements[]array[object]H-code with localized text and confidence score.
precautionary_statements[]array[object]P-code with grouped prevention/response/storage/disposal tags.
pictograms[]array[string]Standardized pictogram IDs when declared.
classification_warnings[]array[string]Raised when class ambiguity or missing qualifiers are detected.

Classification resolution workflow

  1. Detect and segment the hazard-identification block from Section 2.
  2. Map hazard phrases to canonical GHS class codes with locale-aware synonym sets.
  3. Resolve H/P codes to structured objects and preserve source phrasing for audit trails.
  4. Compute confidence per extracted item and emit warnings for borderline cases.
  5. Apply policy checks (required H/P pairs, mandatory signal word, pictogram consistency).

Output example (Section 2)

{
  "schema_version": "2026-01",
  "hazards_identification": {
    "signal_word": "Danger",
    "ghs_classification": [
      { "code": "Flam. Liq. 2", "label": "Flammable Liquid - Category 2", "confidence": 0.99 },
      { "code": "Eye Irrit. 2A", "label": "Eye Irritation - Category 2A", "confidence": 0.97 }
    ],
    "hazard_statements": [
      { "code": "H225", "text": "Highly flammable liquid and vapour", "confidence": 0.99 },
      { "code": "H319", "text": "Causes serious eye irritation", "confidence": 0.96 }
    ],
    "precautionary_statements": [
      { "code": "P210", "text": "Keep away from heat/sparks/open flames", "group": "prevention" }
    ],
    "pictograms": ["GHS02", "GHS07"],
    "classification_warnings": []
  }
}

Validation controls usually enforced

Control Purpose Typical threshold
Class confidence gateRoute uncertain classes for review.<0.85 flagged
H/P pairing checkDetect missing statement groups.100% required pair coverage
Signal-word consistencyPrevent contradictory hazard messaging.Must match highest-risk class
Pictogram alignmentEnsure icon set is compatible with class list.No unresolved mismatch

Failure patterns specific to GHS extraction

  • Combined class lines where two hazards share one sentence and one code is missed.
  • Locale-specific wording that maps to the wrong hazard code family.
  • Legacy supplier phrasing that lists H/P text without explicit codes.
  • Scanned table artifacts where pictograms are present but statement text is truncated.

FAQ

Can this output both code and human-readable hazard text?

Yes. Each hazard and precautionary statement can include standardized code and normalized text in the same payload object.

How do we handle multilingual hazard statements?

Classification mapping is language-aware. Codes are normalized while preserving source-language text for traceability.

Can low-confidence classifications be blocked from auto-publish?

Yes. Confidence thresholds and warning-based routing can stop downstream publishing until reviewer approval.

Related pages

Need a Section 2 quality audit on your SDS corpus? Request a GHS extraction review to map class-level risks before go-live.