DigitalChild Flask API¶
REST API backend for serving DigitalChild data to the Phase 4 research dashboard.
Quick Start¶
Installation¶
# Activate virtual environment
source .LittleRainbow/bin/activate
# Install dependencies
pip install -r requirements.txt -r api_requirements.txt
# Create .env file from template
cp .env.example .env
# Edit .env and configure as needed
Running the Development Server¶
The API will be available at http://127.0.0.1:5000
Testing¶
# Test health check
curl http://127.0.0.1:5000/api/health
# Test system info
curl http://127.0.0.1:5000/api/info
API Endpoints¶
Health & System Info¶
GET /api/health - Returns API health status - Use for monitoring and load balancer health checks
GET /api/info - Returns system information and data statistics - Includes document counts, scorecard coverage, and data freshness
Documents¶
GET /api/documents
- List documents with filtering and pagination
- Query parameters:
- country: Filter by country name
- region: Filter by region
- source: Filter by source (e.g., "au_policy", "upr")
- doc_type: Filter by document type
- tags: Comma-separated list of tags
- year: Filter by specific year
- year_min, year_max: Filter by year range
- page: Page number (default: 1)
- per_page: Items per page (default: 20, max: 100)
- sort_by: Field to sort by (default: "last_processed")
- sort_order: "asc" or "desc" (default: "desc")
Example:
GET /api/documents/:id - Get detailed information for a single document - Returns full document metadata with tags_history - Cached for 15 minutes
Scorecard¶
GET /api/scorecard
- List all countries in scorecard with summary
- Query parameters:
- region: Filter by region (optional)
- page: Page number (default: 1)
- per_page: Items per page (default: 20, max: 100)
Example:
GET /api/scorecard/:country - Get full scorecard details for a specific country - Returns all 10 indicators with sources - Cached for 1 hour
Example:
GET /api/scorecard/indicators/statistics - Get statistics about indicator values across all countries - Returns value distribution for each indicator - Cached for 1 hour
Tags¶
GET /api/tags
- Get tag frequency analysis across documents
- Query parameters:
- version: Tag version (e.g., "tags_v3", "digital", "queerai")
- country: Filter by country name
- region: Filter by region
- year: Filter by specific year
- year_min, year_max: Filter by year range
Example:
GET /api/tags/versions - Get list of available tag versions - Returns array of version identifiers
Example:
Timeline¶
GET /api/timeline/tags
- Get temporal analysis of tags over time (year × tag matrix)
- Query parameters:
- version: Tag version (optional)
- year_min, year_max: Filter by year range (optional)
- country: Filter by country (optional)
- region: Filter by region (optional)
Example:
Export¶
GET /api/export - List available export formats - Returns format ID, filename, and description for each format
Example:
GET /api/export/:format
- Download dataset in CSV format
- Available formats:
- scorecard_summary: Scorecard data for all countries
- tags_summary: Tag frequency across all documents
- documents_list: Complete document list with metadata
- Query parameters (for tags_summary):
- version: Tag version (optional)
Example:
curl "http://localhost:5000/api/export/scorecard_summary" -o scorecard.csv
curl "http://localhost:5000/api/export/tags_summary?version=tags_v3" -o tags.csv
All CSV exports include SPDX license headers (CC-BY-4.0) for data attribution.
Implementation Status¶
Week 1: Foundation ✅ COMPLETE¶
- ✅ API directory structure created
- ✅ Configuration management (development, production, testing)
- ✅ Flask extensions (CORS, Caching, Rate Limiting)
- ✅ Flask app factory pattern
- ✅ Metadata service layer with caching
- ✅ Scorecard service layer (works with pandas DataFrames)
- ✅ Health check routes
- ✅ Standard response formatting and error handling
- ✅ Request validators
- ✅ API requirements file
- ✅ Environment configuration template
- ✅ Development and production entry points
Week 2: Core APIs ✅ COMPLETE¶
- ✅ Documents API (list with filters, detail)
- ✅ Scorecard API (summary, country detail, statistics)
- ✅ Caching decorators (15min documents, 1hr scorecard)
- ✅ Request validation for all parameters
- ✅ Pagination support (configurable page size)
- ✅ Sorting support (any field, asc/desc)
- ✅ 104 test cases written (100% pass rate)
- ✅ All 14 endpoints working and tested
Week 3: Extended APIs ✅ COMPLETE¶
- ✅ Tags API (frequency analysis, version management)
- GET /api/tags (with filters)
- GET /api/tags/versions
- ✅ Timeline API (temporal analysis)
- GET /api/timeline/tags (year × tag matrix)
- ✅ Export API (CSV downloads)
- GET /api/export (list formats)
- GET /api/export/:format (download CSV)
- ✅ SPDX license headers in CSV exports
- ✅ 31 test cases written for Week 3 endpoints
- ✅ All 14 endpoints now working (76 total tests passing)
Week 4: Authentication & Rate Limiting ✅ COMPLETE¶
- ✅ API key authentication middleware
@require_api_keydecorator for protected endpoints@optional_api_keyfor flexible authentication- X-API-Key header validation
- Development mode auto-allow for testing
- ✅ Rate limiting implementation
- Dynamic limits based on authentication status
- Public: 100 requests/hour default
- Authenticated: 1000 requests/hour default
- Custom limits for expensive operations (exports: 20/200 per hour)
- Search operations: 200/2000 per hour
- ✅ Flask-Limiter integration
- Custom rate limit key function (API key or IP)
- Redis storage for production
- Memory storage for development
- ✅ Applied to key endpoints
- Documents list with search rate limits
- Export downloads with strict limits
- Optional authentication throughout
- ✅ 28 test cases for authentication and rate limiting
- ✅ All 104 tests passing (100% success rate)
Week 5: Production Ready ✅ COMPLETE¶
- ✅ Docker deployment
- Multi-stage Dockerfile with security best practices
- docker-compose.yml with Redis and Nginx
- Health checks and non-root user
- ✅ Nginx configuration
- Reverse proxy setup
- SSL/TLS configuration
- Security headers
- Gzip compression
- ✅ Production deployment guide
- Complete setup instructions
- Docker and manual deployment options
- SSL certificate setup (Let's Encrypt)
- Monitoring and logging configuration
- Security checklist
- Troubleshooting guide
- ✅ Configuration management
- Environment-based settings
- Production validation
- API key management
- ✅ Ready for production deployment
API Features¶
- ✅ Standard JSON response format
- ✅ Error handling with custom exceptions
- ✅ File modification time caching for metadata
- ✅ Pandas DataFrame support for scorecard data
- ✅ Environment-based configuration
- ✅ CORS support for frontend integration
- ✅ Rate limiting ready (in-memory for dev, Redis for prod)
- ✅ Logging with configurable levels
Architecture¶
Directory Structure¶
api/
├── __init__.py # Package initialization
├── app.py # Flask app factory
├── config.py # Configuration classes
├── extensions.py # Flask extensions init
├── routes/ # API endpoints
│ ├── health.py # Health & info endpoints
│ └── ... # (More routes in Week 2+)
├── services/ # Business logic layer
│ ├── metadata_service.py # Document metadata
│ ├── scorecard_service.py # Scorecard data
│ └── ... # (More services in Week 2+)
├── middleware/ # Request/response processing
│ └── error_handlers.py # Exception handling
└── utils/ # Helper functions
├── response.py # Response formatting
└── validators.py # Input validation
Service Layer Pattern¶
Services wrap existing processors/ modules with API-friendly formatting:
# Example: metadata_service.py
from processors.logger import get_logger
def get_documents(filters, page, per_page):
"""Load metadata.json, apply filters, paginate"""
metadata = load_metadata() # With file mtime caching
docs = metadata.get("documents", [])
# Apply filters...
# Paginate...
return {"documents": [...], "pagination": {...}}
Response Format¶
All endpoints return standardized JSON:
Success:
Error:
{
"status": "error",
"error": {
"code": "NOT_FOUND",
"message": "Resource not found",
"details": {}
},
"timestamp": "2026-01-25T09:13:43Z"
}
Configuration¶
Environment variables (see .env.example):
FLASK_ENV: development | production | testingSECRET_KEY: Flask secret key (required in production)API_KEYS: Comma-separated API keys (required in production)CORS_ORIGINS: Allowed CORS originsCACHE_TYPE: SimpleCache (dev) | RedisCache (prod)METADATA_FILE: Path to metadata.jsonSCORECARD_FILE: Path to scorecard_main.xlsx
Phase 4 API: COMPLETE ✅¶
All 5 weeks of the Phase 4 API implementation are complete:
- ✅ Week 1: Foundation (app factory, config, extensions, middleware)
- ✅ Week 2: Core APIs (documents, scorecard endpoints)
- ✅ Week 3: Extended APIs (tags, timeline, exports)
- ✅ Week 4: Authentication & rate limiting
- ✅ Week 5: Production deployment ready
Final Statistics: - 14 REST endpoints operational - 104 integration tests passing (100% success rate) - Authentication: API key based with flexible decorators - Rate limiting: Dynamic limits (100-2000 req/hr based on auth) - Deployment: Docker + docker-compose + Nginx ready - Documentation: Complete API docs + production guide
Future Enhancements¶
Optional improvements for future iterations:
API Documentation¶
- Swagger/OpenAPI specification
- Interactive API explorer at /api/docs
- Auto-generated client libraries
Advanced Features¶
- GraphQL endpoint for flexible queries
- Webhook support for data updates
- Batch operations API
- API versioning (v2)
Performance¶
- Database integration (PostgreSQL)
- Full-text search (Elasticsearch)
- CDN integration for exports
- Query result streaming
Analytics¶
- API usage analytics dashboard
- Per-endpoint performance metrics
- User behavior tracking
- Cost per API call analysis
Security¶
- OAuth 2.0 / JWT authentication
- IP whitelisting
- Request signature validation
- DDoS protection (Cloudflare integration)
Production Deployment¶
Using Gunicorn¶
# Install production dependencies
pip install -r api_requirements.txt
# Set environment
export FLASK_ENV=production
export SECRET_KEY=your-secret-key
export API_KEYS=key1,key2,key3
# Run with gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 wsgi:app
Using Docker¶
# Build image
docker build -t digitalchild-api .
# Run container
docker run -p 5000:5000 --env-file .env digitalchild-api
Development Notes¶
- Requires Python 3.12+
- All data files must exist before starting API
- Run
python init_project.pyif metadata.json doesn't exist - Services use file modification time caching for efficiency
- Scorecard service works with pandas DataFrames from
processors/scorecard.py - Always run from project root for imports to work correctly
Troubleshooting¶
ImportError: No module named 'api' - Make sure you're running from the project root directory
FileNotFoundError: metadata.json
- Run python init_project.py to create required files
KeyError: 'Region' - Scorecard columns use "Region - Broad" not "Region" - Service layer handles this mapping
TypeError: '<' not supported between instances of 'NoneType' and 'str' - Fixed in metadata_service.py by converting None to "unknown" - All dictionary keys must be non-None for JSON serialization