Visual diagram of Evaluacion DIPRES: programs, bridge, evidence, and review

MVP 2024: public programs → budget lines → evidence → review queue.

Public

DIPRES Evaluation Bridge

Auditable system connecting evaluated Chilean public programs with budget lines, execution data, and documentary evidence, treating each bridge as a reviewable hypothesis.

706
BIPS Monitoreo 2024 programs
420
defensible bridges
60%
defensible MVP coverage

Stack

Python · Typer · DuckDB · Pydantic · pandas · RapidFuzz · httpx · pypdf · OpenPyXL · pytest

Artifacts

Local MVP export + manifest; public repo, demo, and package pending confirmation

Constraints

  • Base case written from local project documentation; public repo, demo, and data package are pending confirmation.
  • Metrics come from a local MVP 2024 run and should not be read as official program-level accounting.
  • The program-budget bridge is an auditable hypothesis: exact_match does not imply budget exclusivity.

TL;DR

  • Builds a bridge table across BIPS/DIPRES programs, the Budget Law, execution data, budget notes, and documentary evidence.
  • Each match keeps rule, score, status, source, URL, hash, and text fragment so it can be reviewed or challenged.
  • Separates programmatic linkage from financial scope so an aggregate line is not mistaken for exclusive program spending.

Reusable patterns

  • Conservative ingestion: store raw artifacts, SHA-256 hashes, and metadata before interpretation.
  • Institution-scoped matching by ministry, service, and year instead of global fuzzy matching over sensitive public data.
  • Explicit statuses (exact_match, high_confidence, ambiguous, unmatched) plus a manual review queue.
  • Separate match status and financial_scope to communicate uncertainty without hiding useful evidence.

Context

Chile evaluates and monitors public programs, but its budget classification does not provide a formal key connecting each evaluated program to budget lines.

The problem is not only technical: a public program and a budget program are different entities and can have many-to-many relationships.

The system documents which bridges are defensible, which are ambiguous, and where traceability breaks under available public sources.

Decisions

  • Model each bridge as an auditable hypothesis, not as accounting certification.
  • Store HTML, PDF, XLSX, XML, CSV, or API responses as raw hashed artifacts before parsing.
  • Build program entities from BIPS/DIPRES and budget lines from the Budget Law, execution data, and budget notes.
  • Scope automatic matching to the correct institutional universe and block known false positives as regression tests.
  • Keep manual review and a change log for ambiguous or high-impact decisions.

Architecture

Evaluacion DIPRES architecture: raw ingestion, normalization, matching, evidence, and manual queue
Raw artifacts → normalized staging → program-budget matching → evidence → review/MVP export.
  • DuckDB stores raw_artifact, program dimensions, bridge_programa_presupuesto, and review queues.
  • The system consumes local upstreams for DIPRES budget execution and Financial Reports instead of duplicating those pipelines.
  • Outputs separate traced, ambiguous, and non-comparable amounts to avoid false accounting totals.

Outcome

  • The local 2024 run documents 706 monitored programs, 420 with a defensible bridge, and 286 without a defensible bridge under conservative rules.
  • The 40% without a bridge is communicated as a traceability gap, not as a scraping failure or proof of missing budget.
  • The case demonstrates reusable public-data infrastructure: contracts, sources, hashes, review, and methodological policy.