Avatar

Organizations

4 results for Bigquery
  • Project Overview

    This project demonstrates the implementation of a comprehensive analytics engineering pipeline using dbt (data build tool) as the primary transformation layer. The pipeline showcases modern data engineering practices including ELT methodology, dimensional modeling, automated testing, and business intelligence visualization.

    Repository: Analytics Engineering with dbt

    The project focuses on transforming raw NYC taxi trip data into business-ready analytics tables using dbt’s modular approach, implementing both dbt Cloud and dbt Core workflows, and creating interactive dashboards with Looker Studio.

    Key Concepts

    Analytics Engineering: Bridging the gap between data engineering and data analysis with software engineering best practices • ELT vs ETL: Leveraging cloud data warehouses for in-database transformations • Dimensional Modeling: Implementing Kimball’s star schema methodology for analytical workloads • dbt Fundamentals: Models, macros, packages, variables, and testing frameworks • Data Governance: Testing, documentation, and deployment strategies • Business Intelligence: Creating interactive dashboards and visualizations

  • Project Overview

    This project demonstrates the implementation of a comprehensive data pipeline using Google BigQuery as the primary data warehouse solution. The pipeline showcases modern data engineering practices including external data integration, table optimization strategies, and performance tuning techniques.

    Repository: Data Pipeline with BigQuery

    The project focuses on building a scalable, cost-effective data warehouse solution that can handle large volumes of NYC taxi trip data while maintaining optimal query performance and cost efficiency.

    Key Concepts

    OLAP vs OLTP: Understanding the fundamental differences between Online Analytical Processing and Online Transaction Processing systems • Data Warehousing: Implementing centralized storage for analytical workloads with optimized query performance • Table Partitioning: Dividing large tables into manageable chunks based on time or range values • Clustering: Organizing data within partitions to improve query performance and reduce costs • External Tables: Querying data stored outside BigQuery without incurring storage costs • Performance Optimization: Implementing best practices for cost reduction and query efficiency

  • Project Overview

    This repository demonstrates workflow orchestration for data engineering pipelines using Kestra. It guides users through building, running, and scheduling data pipelines that extract, transform, and load (ETL) data both locally (with PostgreSQL) and in the cloud (with Google Cloud Platform). The project is hands-on and includes conceptual explanations, infrastructure setup, and several example pipeline flows.


    Key Concepts

    • Workflow Orchestration: Automating and managing complex workflows with dependencies, retries, logging, and monitoring.
    • Kestra: An orchestration platform with a user-friendly UI and YAML-based workflow definitions (called “flows”).
    • Data Lake & Data Warehouse: Demonstrates moving data from raw storage (GCS) to structured analytics (BigQuery).
  • Project Overview

    This repository serves as a practical guide to building and orchestrating robust data pipelines using Apache Airflow. It covers essential concepts from basic workflow management to advanced deployments with Google Cloud Platform (GCP) and Kubernetes.


    Key Concepts

    • Workflow Orchestration: Automating and managing complex data workflows with dependencies, scheduling, retries, and monitoring using Apache Airflow.
    • DAGs (Directed Acyclic Graphs): The core abstraction in Airflow for defining task dependencies, execution order, and workflow logic.
    • Extensible Operators & Integrations: Leveraging Airflow’s wide range of built-in operators and custom plugins to interact with databases, cloud services (GCP, Kubernetes), and external systems.
    • Scalable Deployments: Running Airflow locally for prototyping, or deploying on cloud and Kubernetes for production-scale, resilient, and distributed data pipeline execution.