Installation¶
This guide walks you through installing DigitalChild on your system.
Prerequisites¶
Required¶
- Python 3.12 - Modern Python features required
- pip - Python package installer
- Git - Version control (for cloning repository)
- 1GB+ disk space - For code and small dataset
- Internet connection - For scraping documents
Optional¶
- 10GB+ disk space - For large document collections
- Virtual environment tool - venv, virtualenv, or conda
Installation Steps¶
1. Clone the Repository¶
2. Set Up Virtual Environment¶
Why virtual environment?
Virtual environments isolate project dependencies, preventing conflicts with other Python projects on your system.
3. Install Dependencies¶
This installs:
beautifulsoup4- HTML parsingselenium- Browser automation (optional)pandas- Data manipulationPyPDF2- PDF processingpython-docx- Word document processingopenpyxl- Excel file handlingrequests- HTTP requests
4. Initialize Project Structure¶
This creates:
data/raw/- Downloaded documentsdata/processed/- Extracted textdata/metadata/- Metadata JSON filesdata/exports/- CSV export outputslogs/- Run logs
Ready to Go!
Your installation is complete. Proceed to Quick Start to run your first pipeline.
Development Installation¶
For contributors and developers:
# Install development tools
pip install pre-commit pytest pytest-cov
# Set up pre-commit hooks
pre-commit install
# Verify installation
pytest tests/ -v
pre-commit run --all-files
Verifying Installation¶
Test your setup:
# Check Python version
python --version # Should be 3.12.x
# Test imports
python -c "import pandas; import bs4; print('Success!')"
# Run demo (no internet needed)
python utils/pipeline_runner_DEMO.py
Optional: Selenium Setup¶
Only needed for _sel variant scrapers (browser automation):
1. Install ChromeDriver¶
2. Verify Selenium¶
Troubleshooting¶
Python Version Issues¶
Error: Python 3.12 required
The project uses modern Python features from 3.12. Install Python 3.12 from python.org.
Virtual Environment Not Activating¶
Enable script execution:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
Dependency Installation Failures¶
Try upgrading pip first:
Import Errors¶
Ensure you're running from project root:
# Wrong
cd processors
python pipeline_runner.py # Error!
# Right
cd /path/to/DigitalChild
python pipeline_runner.py # Success
More Help¶
See First Run Errors for comprehensive troubleshooting.
Next Steps¶
- Quick Start Guide - Run your first pipeline
- Runbook - Complete command reference
- FAQ - Common questions answered
System Requirements¶
Minimum¶
- Python 3.12
- 1GB RAM
- 1GB disk space
- Broadband internet
Recommended¶
- Python 3.12
- 4GB+ RAM
- 10GB+ disk space
- Fast internet connection
- SSD for faster processing
Platform Support¶
DigitalChild runs on:
- ✅ Linux (Ubuntu, Debian, Fedora, etc.)
- ✅ macOS (10.15+)
- ✅ Windows 10/11
- ✅ WSL2 (Windows Subsystem for Linux)
- ✅ Cloud VMs (AWS EC2, Google Cloud, Azure, DigitalOcean)
Need Help?¶
- Check FAQ
- Review First Run Errors
- Open GitHub Issue
- Start Discussion