Skip to content

First Run Errors Guide

This file documents common issues when running the pipeline for the first time.


🟢 Missing Dependencies

Error: ModuleNotFoundError: No module named 'requests' Fix: Install requirements.

pip install -r requirements.txt

🟢 Pandoc / PDF Generation

Error: pdflatex not found Cause: Pandoc tries to use LaTeX to generate PDFs. Fix: The pipeline does not require Pandoc. Use ReportLab for generating test PDFs.

pip install reportlab

🟢 Metadata Errors

Error: FileNotFoundError: data/metadata/metadata.json Fix: Run init_project.py to create scaffolding.

python init_project.py

🟢 Logging Issues

Symptom: No logs appear in logs/ Fix: Ensure pipeline_runner.py is invoked from project root.

python pipeline_runner.py --source au_policy

🟢 Import Errors

Error: ModuleNotFoundError: No module named 'processors' Fix: Run commands from project root.

pytest tests/ -v

🟢 Virtual Environment Not Activated

Symptom: Dependencies not found even after pip install -r requirements.txt Cause: Virtual environment not activated. Fix: Activate the virtual environment.

source .LittleRainbow/bin/activate  # Linux/Mac
# OR
.LittleRainbow\Scripts\activate  # Windows

🟢 Python Version Mismatch

Error: SyntaxError or features not available Cause: Project requires Python 3.12. Fix: Check Python version and upgrade if needed.

python --version  # Should show Python 3.12.x
# If not, install Python 3.12 and recreate venv

🟢 Pre-commit Hook Failures

Error: black or flake8 failures during commit Cause: Code doesn't meet formatting standards. Fix: Run pre-commit to auto-fix issues.

pre-commit run --all-files
# Review changes and commit again

🟢 Test Failures

Error: Tests fail with AssertionError or other exceptions Fix: Run tests with verbose output to see details.

pytest tests/ -v --tb=long
# Run specific failing test
pytest tests/test_validators.py::TestURLValidation::test_validate_url_valid_https -vv

🟢 Scorecard File Not Found

Error: FileNotFoundError: data/scorecard/scorecard_main_presentation.xlsx Cause: Scorecard Excel file missing or in wrong location. Fix: Ensure canonical scorecard file exists at data/scorecard/scorecard_main_presentation.xlsx.

ls -la data/scorecard/scorecard_main_presentation.xlsx  # Should exist

🟢 Selenium/Browser Driver Issues

Error: WebDriverException: chromedriver not found Cause: Selenium scrapers (_sel variants) need browser drivers. Fix: Use non-Selenium scrapers or install ChromeDriver.

# Use regular scraper (no Selenium)
python pipeline_runner.py --source au_policy

# Or install ChromeDriver for Selenium scrapers
# See: https://chromedriver.chromium.org/downloads

🟢 Permission Denied Errors

Error: PermissionError: [Errno 13] Permission denied Cause: Insufficient permissions to write files or create directories. Fix: Check directory permissions.

# Ensure data directories are writable
chmod -R u+w data/ logs/

# Or run init_project.py again
python init_project.py