Modernizing CMS Medicare Data: Building a Unified Azure-Based ETL Framework for ACO REACH & Provider Enrollment

Success Metrics

100 %

automation of multi-year CMS data ingestion

60 %

reduction in manual data cleansing effort

75 %

improvement in query performance after indexing

0 %

post-production schema mismatch failures

Introduction

The project focused on ingesting and standardizing CMS public datasets including Medicare Fee-for-Service Provider Enrollment and ACO REACH Model data into Azure SQL. The objective was to build a scalable, production-ready ETL framework capable of handling multi-year schema variations while enabling simplified analytics and reporting for healthcare program monitoring.

Key Highlights

Azure Data Engineering

Cloud ETL Development

Data Modelling & Standardization

Azure SQL Optimization

Data Quality & Schema Harmonization

Business Challenges

Solution

We developed a dynamic Python-based ETL framework that.
The solution ensured reliable ingestion of multi-year CMS datasets into Azure SQL with minimal manual intervention.

Business Impact

The organization now has a centralized and analytics-ready Medicare dataset spanning multiple years and program types. The solution eliminated manual pre-processing, reduced data load failures, and enabled consistent reporting across changing CMS program structures. The simplified 3-table architecture significantly improved query performance and made the system easier for business users to understand.

Technology Stack

Azure Blob Storage

Azure SQL Database

Python (Pandas, PyODBC)

Azure SDK

Git & GitHub (Feature Branch Deployment)

SQL Indexing & Performance Tuning

Files in Blob Storage
ETL Processing Flow
Case Studies

Ready to unlock the power of data for your business?