Hospitales_ESP: from scattered public data to usable health intelligence

Public Health

Hospitales_ESP: from scattered public data to usable health intelligence

By Alonso Valdés2025-09-25
#health data#etl#public data#spain#data model

Introduction

Hospitales_ESP addresses a common issue: public health data in Spain is available, but fragmented across portals and formats. The aim was to build a reproducible pipeline to integrate, standardize, and validate these sources, producing a usable data model for comparable analysis across regions and over time.

Problem

With dispersed sources, heterogeneous naming, and sparse metadata, it is hard to answer basic questions: how is hospital activity evolving, where do bottlenecks emerge, which territorial differences are structural and which are cyclical?

Technical approach

  • Reproducible ETL with ingestion, cleaning, and normalization of key fields (dates, units, codes).
  • Data dictionary and conventions to make joins and comparisons straightforward.
  • Lightweight quality checks (ranges, temporal consistency, unique keys, duplicates).
  • Modular structure to add new sources without breaking compatibility.

Data model

Canonical schema with simple fact tables and minimal dimensions (facility, period, service), designed for common queries (time series, interregional comparisons, small multiples by service).

Example findings

  • Seasonality in activity and occupancy indicators.
  • Persistent outliers that suggest recording issues or local casuistry.
  • Structural differences across regions that stabilize after population normalization.

Visualizations

  • Choropleth map by indicator with period filters.
  • Small multiples by service to compare trajectories.
  • Anomaly detection with historical bands.

Roadmap

  • Broaden sources and improve metadata documentation.
  • Automated QA and data regression tests.
  • Public dashboard with comparable indicators and dataset downloads.

Reproducibility

Code and usage guide in the repository. Contributions and new sources are welcome.

View the GitHub repository