João Blasques (Jonas)

João Blasques (Jonas) joaoblasques

AI-Enabled Data Engineer

Organizations

5 results for Data Engineering

Clear filter

Building a Data Pipeline with BigQuery: From Storage to Analytics
Project Overview
This project demonstrates the implementation of a comprehensive data pipeline using Google BigQuery as the primary data warehouse solution. The pipeline showcases modern data engineering practices including external data integration, table optimization strategies, and performance tuning techniques.
Repository: Data Pipeline with BigQuery
The project focuses on building a scalable, cost-effective data warehouse solution that can handle large volumes of NYC taxi trip data while maintaining optimal query performance and cost efficiency.
Key Concepts
• OLAP vs OLTP: Understanding the fundamental differences between Online Analytical Processing and Online Transaction Processing systems • Data Warehousing: Implementing centralized storage for analytical workloads with optimized query performance • Table Partitioning: Dividing large tables into manageable chunks based on time or range values • Clustering: Organizing data within partitions to improve query performance and reduce costs • External Tables: Querying data stored outside BigQuery without incurring storage costs • Performance Optimization: Implementing best practices for cost reduction and query efficiency
data engineering bigquery data warehouse cloud analytics Created Mon, 14 Jul 2025 00:00:00 +0100
Data Pipeline Orchestration using Kestra
Project Overview
This repository demonstrates workflow orchestration for data engineering pipelines using Kestra. It guides users through building, running, and scheduling data pipelines that extract, transform, and load (ETL) data both locally (with PostgreSQL) and in the cloud (with Google Cloud Platform). The project is hands-on and includes conceptual explanations, infrastructure setup, and several example pipeline flows.
Key Concepts
- Workflow Orchestration: Automating and managing complex workflows with dependencies, retries, logging, and monitoring.
- Kestra: An orchestration platform with a user-friendly UI and YAML-based workflow definitions (called “flows”).
- Data Lake & Data Warehouse: Demonstrates moving data from raw storage (GCS) to structured analytics (BigQuery).
data engineering beginners tutorial docker kestra Created Sat, 21 Jun 2025 09:30:00 +0100
Orchestrating Data Pipelines with Apache Airflow: A Comprehensive Guide
Project Overview
This repository serves as a practical guide to building and orchestrating robust data pipelines using Apache Airflow. It covers essential concepts from basic workflow management to advanced deployments with Google Cloud Platform (GCP) and Kubernetes.
Key Concepts
- Workflow Orchestration: Automating and managing complex data workflows with dependencies, scheduling, retries, and monitoring using Apache Airflow.
- DAGs (Directed Acyclic Graphs): The core abstraction in Airflow for defining task dependencies, execution order, and workflow logic.
- Extensible Operators & Integrations: Leveraging Airflow’s wide range of built-in operators and custom plugins to interact with databases, cloud services (GCP, Kubernetes), and external systems.
- Scalable Deployments: Running Airflow locally for prototyping, or deploying on cloud and Kubernetes for production-scale, resilient, and distributed data pipeline execution.
data engineering airflow orchestration tutorial docker Created Sat, 21 Jun 2025 09:30:00 +0100
Simple Data Pipeline
Project Overview
This repository provides a comprehensive, step-by-step guide to building a simple data engineering pipeline using containerization (Docker), orchestration (Docker Compose), and Infrastructure as Code (Terraform), with a focus on ingesting and processing NYC taxi data. The project is hands-on and includes conceptual explanations, infrastructure setup, and several example pipeline flows.
This project is a practical template for data engineers to learn and implement containerized data pipelines, local and cloud database management, and automated cloud infrastructure provisioning using modern tools like Docker, Docker Compose, and Terraform. It is especially useful for those looking to understand the end-to-end workflow from local prototyping to cloud deployment in a reproducible, automated way.
data engineering beginners tutorial docker terraform Created Sat, 21 Jun 2025 09:30:00 +0100
Getting Started with Data Engineering
Data Engineering Fundamentals
Data engineering is the backbone of any data-driven organization. In this post, we will explore the fundamental concepts that every aspiring data engineer should understand.
What is Data Engineering?
Data engineering focuses on designing, building, and maintaining the infrastructure and architecture for data generation, storage, and analysis. Data engineers develop the systems that collect, manage, and convert raw data into usable information for data scientists and business analysts.
data engineering beginners tutorial Created Mon, 05 May 2025 09:30:00 +0100

João Blasques (Jonas) joaoblasques

Organizations

Building a Data Pipeline with BigQuery: From Storage to Analytics

Project Overview

Key Concepts

Data Pipeline Orchestration using Kestra

Project Overview

Key Concepts

Orchestrating Data Pipelines with Apache Airflow: A Comprehensive Guide

Project Overview

Key Concepts

Simple Data Pipeline

Project Overview

Getting Started with Data Engineering

Data Engineering Fundamentals

What is Data Engineering?