Organizations

2 results for Dbt
  • Project Overview

    This repository implements a production-grade ELT pipeline that automates the daily identification of high-value customers. Built as the capstone project for the DE101 course, it brings together Apache Airflow for orchestration, dbt-spark for transformation and data quality, and Apache Iceberg as the open table format — all running locally via Docker Compose.


    Key Concepts

    • Medallion Architecture: Data flows through Bronze (raw), Silver (cleaned), and Gold (business-ready) layers, each serving a distinct purpose in the transformation chain.
    • Airflow Orchestration: A single DAG wires together data generation, dbt runs, quality tests, and dashboard generation into a reliable daily schedule.
    • dbt Data Quality: 38 automated tests gate pipeline output — if any test fails, downstream tasks are blocked and the sales mart is never written with bad data.
    • Apache Iceberg Table Format: Iceberg provides schema evolution, time-travel queries, and efficient partition pruning on top of the local Spark engine.
    data engineering airflow dbt spark docker Created Thu, 26 Mar 2026 10:00:00 +0100
  • Project Overview

    This project demonstrates the implementation of a comprehensive analytics engineering pipeline using dbt (data build tool) as the primary transformation layer. The pipeline showcases modern data engineering practices including ELT methodology, dimensional modeling, automated testing, and business intelligence visualization.

    Repository: Analytics Engineering with dbt

    The project focuses on transforming raw NYC taxi trip data into business-ready analytics tables using dbt’s modular approach, implementing both dbt Cloud and dbt Core workflows, and creating interactive dashboards with Looker Studio.

    Key Concepts

    Analytics Engineering: Bridging the gap between data engineering and data analysis with software engineering best practices • ELT vs ETL: Leveraging cloud data warehouses for in-database transformations • Dimensional Modeling: Implementing Kimball’s star schema methodology for analytical workloads • dbt Fundamentals: Models, macros, packages, variables, and testing frameworks • Data Governance: Testing, documentation, and deployment strategies • Business Intelligence: Creating interactive dashboards and visualizations