Avatar

Organizations

2 results for Gcp
  • Project Overview

    This repository serves as a practical guide to building and orchestrating robust data pipelines using Apache Airflow. It covers essential concepts from basic workflow management to advanced deployments with Google Cloud Platform (GCP) and Kubernetes.


    Key Concepts

    • Workflow Orchestration: Automating and managing complex data workflows with dependencies, scheduling, retries, and monitoring using Apache Airflow.
    • DAGs (Directed Acyclic Graphs): The core abstraction in Airflow for defining task dependencies, execution order, and workflow logic.
    • Extensible Operators & Integrations: Leveraging Airflow’s wide range of built-in operators and custom plugins to interact with databases, cloud services (GCP, Kubernetes), and external systems.
    • Scalable Deployments: Running Airflow locally for prototyping, or deploying on cloud and Kubernetes for production-scale, resilient, and distributed data pipeline execution.
  • Project Overview

    This repository provides a comprehensive, step-by-step guide to building a simple data engineering pipeline using containerization (Docker), orchestration (Docker Compose), and Infrastructure as Code (Terraform), with a focus on ingesting and processing NYC taxi data. The project is hands-on and includes conceptual explanations, infrastructure setup, and several example pipeline flows.

    This project is a practical template for data engineers to learn and implement containerized data pipelines, local and cloud database management, and automated cloud infrastructure provisioning using modern tools like Docker, Docker Compose, and Terraform. It is especially useful for those looking to understand the end-to-end workflow from local prototyping to cloud deployment in a reproducible, automated way.