Document Type Standards¶
This file defines how document types are categorized for processing.
Major Types¶
- Policy → AU or state-level strategies, compacts, policies.
- Law / Legislation → Statutes, reforms, by-laws.
- Treaty Body Reports → OHCHR, CRC, CESCR, CCPR, etc.
- UPR → Universal Periodic Review documents.
- Observations → Concluding observations by treaty bodies.
- Recommendations → From committees or commissions.
- Research / Reports → UNICEF, ACERWC, ACHPR publications.
Processing Implications¶
- All documents are normalized into text.
- Metadata field
"doc_type"should capture type (if identified). - Future processors may add automatic classification rules.
File Formats¶
.pdf→ primary format, processed viapdf_to_text.py..docx→ supported, viadocx_to_text.py..html→ supported, viahtml_to_text.py..txt→ may be ingested directly.