João Blasques (Jonas) / Projects & Portfolio

Created Fri, 20 Jun 2025 16:01:06 +0100
637 Words

Professional Projects

Here are some key projects I’ve led or contributed to significantly in my professional career:

Real-Time Customer Analytics Platform

Client: Major E-commerce Retailer
Timeline: 2023 - Present

Technologies Used:

  • Apache Kafka & Kafka Streams
  • AWS (Lambda, Kinesis, S3, DynamoDB)
  • Python, Spark Structured Streaming
  • Kubernetes for orchestration

Project Overview:
Designed and implemented a real-time customer analytics platform processing over 5TB of daily event data. The system captures user interactions, processes them through a sophisticated streaming pipeline, and feeds AI models that generate personalized recommendations in under 200ms.

Key Achievements:

  • Reduced latency of customer insights from hours to seconds
  • Improved recommendation relevance by 35%
  • Built a scalable architecture that automatically adjusts to traffic spikes
  • Implemented comprehensive monitoring and alerting systems

Enterprise Data Warehouse Modernization

Client: Financial Services Company
Timeline: 2022 - 2023

Technologies Used:

  • Google BigQuery
  • Airflow for orchestration
  • dbt for transformation
  • Python, SQL
  • Terraform for infrastructure as code

Project Overview:
Led the migration from a legacy on-premises data warehouse to a cloud-based solution. Redesigned the data model, implemented automated ETL processes, and created a self-service analytics platform for business users.

Key Achievements:

  • Reduced monthly infrastructure costs by 40%
  • Cut data processing time from 8 hours to 30 minutes
  • Implemented data quality monitoring with automated alerts
  • Created comprehensive documentation and trained internal teams

Predictive Maintenance ML System

Client: Manufacturing Industry Leader
Timeline: 2021 - 2022

Technologies Used:

  • TensorFlow for model development
  • MLflow for experiment tracking
  • Docker and Kubernetes for deployment
  • Time-series data processing
  • Edge computing for real-time analysis

Project Overview:
Developed a machine learning system that predicts equipment failures before they occur. The solution processes sensor data from manufacturing equipment, identifies patterns that precede failures, and alerts maintenance teams to intervene.

Key Achievements:

  • Reduced unplanned downtime by 32%
  • Saved an estimated €2.1M annually in maintenance costs
  • Created a scalable ML pipeline for continuous model improvement
  • Implemented an intuitive dashboard for maintenance teams

Personal & Open Source Projects

DataStreamPy

GitHub: github.com/joaoblasques/datastreampy
Technologies: Python, Apache Kafka, Docker

An open-source Python library that simplifies working with streaming data. Provides high-level abstractions for common stream processing patterns and makes it easier to build robust data pipelines.

Key Features:

  • Declarative stream processing DSL
  • Fault-tolerance and exactly-once processing guarantees
  • Extensive testing utilities
  • Comprehensive documentation and examples

ML Model Monitoring Dashboard

GitHub: github.com/joaoblasques/ml-monitor
Technologies: Python, FastAPI, React, PostgreSQL, Docker

A full-stack application for monitoring machine learning models in production. Tracks drift, performance metrics, and resource utilization to ensure models continue to perform as expected.

Key Features:

  • Automated detection of data and concept drift
  • Visualizations of model performance over time
  • Alerting system for performance degradation
  • A/B testing framework for model comparison

Automated Data Quality Framework

GitHub: github.com/joaoblasques/data-quality-framework
Technologies: Python, Great Expectations, Airflow, PostgreSQL

An extensible framework for automated data quality checking and monitoring. Integrates with data pipelines to validate data at each stage of processing and alert on anomalies.

Key Features:

  • Customizable quality rules engine
  • Integration with popular data engineering tools
  • Historical quality metrics tracking
  • Self-service interface for data stakeholders

NLP for Customer Support Analysis

Demo: customer-support-nlp.jonasblasques.com
Technologies: Python, SpaCy, BERT, Flask, D3.js

A natural language processing application that analyzes customer support conversations to identify common issues, sentiment trends, and opportunities for automation.

Key Features:

  • Topic modeling to categorize support tickets
  • Sentiment analysis to track customer satisfaction
  • Named entity recognition for product and feature mentions
  • Interactive visualizations of support trends

Research & Contributions

Beyond these projects, I regularly contribute to open-source data engineering and machine learning libraries, participate in research collaborations, and share my work through technical blog posts and conference presentations.

I’m particularly interested in the ethical implications of AI systems and work to ensure that the solutions I build are fair, transparent, and beneficial to users.


Note: Some project details have been anonymized due to confidentiality agreements. For more information about any of these projects or to discuss potential collaborations, please contact me.