Professional Projects
Here are some key projects I’ve led or contributed to significantly in my professional career:
Real-Time Customer Analytics Platform
Client: Major E-commerce Retailer
Timeline: 2023 - Present
Technologies Used:
- Apache Kafka & Kafka Streams
- AWS (Lambda, Kinesis, S3, DynamoDB)
- Python, Spark Structured Streaming
- Kubernetes for orchestration
Project Overview:
Designed and implemented a real-time customer analytics platform processing over 5TB of daily event data. The system captures user interactions, processes them through a sophisticated streaming pipeline, and feeds AI models that generate personalized recommendations in under 200ms.
Key Achievements:
- Reduced latency of customer insights from hours to seconds
- Improved recommendation relevance by 35%
- Built a scalable architecture that automatically adjusts to traffic spikes
- Implemented comprehensive monitoring and alerting systems
Enterprise Data Warehouse Modernization
Client: Financial Services Company
Timeline: 2022 - 2023
Technologies Used:
- Google BigQuery
- Airflow for orchestration
- dbt for transformation
- Python, SQL
- Terraform for infrastructure as code
Project Overview:
Led the migration from a legacy on-premises data warehouse to a cloud-based solution. Redesigned the data model, implemented automated ETL processes, and created a self-service analytics platform for business users.
Key Achievements:
- Reduced monthly infrastructure costs by 40%
- Cut data processing time from 8 hours to 30 minutes
- Implemented data quality monitoring with automated alerts
- Created comprehensive documentation and trained internal teams
Predictive Maintenance ML System
Client: Manufacturing Industry Leader
Timeline: 2021 - 2022
Technologies Used:
- TensorFlow for model development
- MLflow for experiment tracking
- Docker and Kubernetes for deployment
- Time-series data processing
- Edge computing for real-time analysis
Project Overview:
Developed a machine learning system that predicts equipment failures before they occur. The solution processes sensor data from manufacturing equipment, identifies patterns that precede failures, and alerts maintenance teams to intervene.
Key Achievements:
- Reduced unplanned downtime by 32%
- Saved an estimated €2.1M annually in maintenance costs
- Created a scalable ML pipeline for continuous model improvement
- Implemented an intuitive dashboard for maintenance teams
Personal & Open Source Projects
DataStreamPy
GitHub: github.com/joaoblasques/datastreampy
Technologies: Python, Apache Kafka, Docker
An open-source Python library that simplifies working with streaming data. Provides high-level abstractions for common stream processing patterns and makes it easier to build robust data pipelines.
Key Features:
- Declarative stream processing DSL
- Fault-tolerance and exactly-once processing guarantees
- Extensive testing utilities
- Comprehensive documentation and examples
ML Model Monitoring Dashboard
GitHub: github.com/joaoblasques/ml-monitor
Technologies: Python, FastAPI, React, PostgreSQL, Docker
A full-stack application for monitoring machine learning models in production. Tracks drift, performance metrics, and resource utilization to ensure models continue to perform as expected.
Key Features:
- Automated detection of data and concept drift
- Visualizations of model performance over time
- Alerting system for performance degradation
- A/B testing framework for model comparison
Automated Data Quality Framework
GitHub: github.com/joaoblasques/data-quality-framework
Technologies: Python, Great Expectations, Airflow, PostgreSQL
An extensible framework for automated data quality checking and monitoring. Integrates with data pipelines to validate data at each stage of processing and alert on anomalies.
Key Features:
- Customizable quality rules engine
- Integration with popular data engineering tools
- Historical quality metrics tracking
- Self-service interface for data stakeholders
NLP for Customer Support Analysis
Demo: customer-support-nlp.jonasblasques.com
Technologies: Python, SpaCy, BERT, Flask, D3.js
A natural language processing application that analyzes customer support conversations to identify common issues, sentiment trends, and opportunities for automation.
Key Features:
- Topic modeling to categorize support tickets
- Sentiment analysis to track customer satisfaction
- Named entity recognition for product and feature mentions
- Interactive visualizations of support trends
Research & Contributions
Beyond these projects, I regularly contribute to open-source data engineering and machine learning libraries, participate in research collaborations, and share my work through technical blog posts and conference presentations.
I’m particularly interested in the ethical implications of AI systems and work to ensure that the solutions I build are fair, transparent, and beneficial to users.
Note: Some project details have been anonymized due to confidentiality agreements. For more information about any of these projects or to discuss potential collaborations, please contact me.