This repository implements a production-grade ELT pipeline that automates the daily identification of high-value customers. Built as the capstone project for the DE101 course, it brings together Apache Airflow for orchestration, dbt-spark for transformation and data quality, and Apache Iceberg as the open table format — all running locally via Docker Compose.
This repository serves as a practical guide to building and orchestrating robust data pipelines using Apache Airflow. It covers essential concepts from basic workflow management to advanced deployments with Google Cloud Platform (GCP) and Kubernetes.