Public

DIPRES Evaluation Bridge

Auditable system connecting evaluated Chilean public programs with budget lines, execution data, and documentary evidence, treating each bridge as a reviewable hypothesis.

706: BIPS Monitoreo 2024 programs
420: defensible bridges
60%: defensible MVP coverage

Stack

Python · Typer · DuckDB · Pydantic · pandas · RapidFuzz · httpx · pypdf · OpenPyXL · pytest

Artifacts

Local MVP export + manifest; public repo, demo, and package pending confirmation

Constraints

Base case written from local project documentation; public repo, demo, and data package are pending confirmation.
Metrics come from a local MVP 2024 run and should not be read as official program-level accounting.
The program-budget bridge is an auditable hypothesis: exact_match does not imply budget exclusivity.

TL;DR

Builds a bridge table across BIPS/DIPRES programs, the Budget Law, execution data, budget notes, and documentary evidence.
Each match keeps rule, score, status, source, URL, hash, and text fragment so it can be reviewed or challenged.
Separates programmatic linkage from financial scope so an aggregate line is not mistaken for exclusive program spending.

Reusable patterns

Conservative ingestion: store raw artifacts, SHA-256 hashes, and metadata before interpretation.
Institution-scoped matching by ministry, service, and year instead of global fuzzy matching over sensitive public data.
Explicit statuses (exact_match, high_confidence, ambiguous, unmatched) plus a manual review queue.
Separate match status and financial_scope to communicate uncertainty without hiding useful evidence.

Context

Chile evaluates and monitors public programs, but its budget classification does not provide a formal key connecting each evaluated program to budget lines.

The problem is not only technical: a public program and a budget program are different entities and can have many-to-many relationships.

The system documents which bridges are defensible, which are ambiguous, and where traceability breaks under available public sources.

Decisions

Model each bridge as an auditable hypothesis, not as accounting certification.
Store HTML, PDF, XLSX, XML, CSV, or API responses as raw hashed artifacts before parsing.
Build program entities from BIPS/DIPRES and budget lines from the Budget Law, execution data, and budget notes.
Scope automatic matching to the correct institutional universe and block known false positives as regression tests.
Keep manual review and a change log for ambiguous or high-impact decisions.

Architecture

DuckDB stores raw_artifact, program dimensions, bridge_programa_presupuesto, and review queues.
The system consumes local upstreams for DIPRES budget execution and Financial Reports instead of duplicating those pipelines.
Outputs separate traced, ambiguous, and non-comparable amounts to avoid false accounting totals.

Outcome

The local 2024 run documents 706 monitored programs, 420 with a defensible bridge, and 286 without a defensible bridge under conservative rules.
The 40% without a bridge is communicated as a traceability gap, not as a scraping failure or proof of missing budget.
The case demonstrates reusable public-data infrastructure: contracts, sources, hashes, review, and methodological policy.