Thong Truong
Data Engineering & AI Systems Engineer
I build scalable data infrastructure: ETL pipelines, distributed processing, and real-time analytics. Expert in Python, SQL, and modern data technologies.
Data Pipelines & Processing
- Apache Spark (PySpark) for distributed processing
- Apache Airflow for workflow orchestration
- ETL pipelines for data extraction & transformation
- Real-time data ingestion & batch processing
Data Storage & APIs
- PostgreSQL with advanced SQL & query optimization
- Vector databases (pgvector) for embeddings
- FastAPI for data service APIs & microservices
- Redis for caching & message queuing
- Apache Kafka for event streaming & data pipelines
Analytics & Infrastructure
- Business intelligence with Metabase dashboards
- Data quality monitoring & validation
- Containerized pipelines with Docker
- CI/CD for data pipeline deployments
My Projects
SiteBotic - AI-Powered Chatbot Platform
Founder & Technical Lead | Production SaaS Platform
Production SaaS platform with automated data ingestion pipelines. Built end-to-end data infrastructure for AI-powered chatbots.
Tech Stack
- • RAG pipeline with vector search (pgvector + PostgreSQL)
- • Async task processing with Dramatiq + Redis
- • FastAPI backend with SQLAlchemy ORM
- • React dashboard with TypeScript & TanStack Query
- • Docker multi-environment deployment
Platform Architecture
Customer 360 Risk Scoring System
Data Engineering Project | University Assignment
End-to-end data engineering solution building Customer 360 views and risk analytics with ETL pipelines and distributed processing.
View on GitHub →Tech Stack
- • ETL pipelines for data extraction & transformation
- • Apache Spark (PySpark) for distributed analytics
- • Apache Airflow for workflow orchestration
- • PostgreSQL data warehouse with Metabase BI
Data Engineering Architecture
Book Recommendation System
Machine Learning Project | University Assignment
Full-stack ML platform implementing collaborative filtering, content-based filtering, and hybrid recommendation algorithms.
View on GitHub →Tech Stack
- • Scikit-learn (SVD, TF-IDF) with FastAPI REST API
- • React + TypeScript frontend with Vite
- • Collaborative filtering and content-based algorithms
System Architecture
Pet Clinic Management System
Desktop Application | University Assignment
Java desktop application with complete CRUD operations for veterinary clinic management using DAO pattern and MySQL.
View on GitHub →Tech Stack
- • Java with DAO pattern and JDBC
- • Java Swing GUI with event-driven programming
- • MySQL database for data persistence
Application Architecture
Real-Time Price Tracker (BGU118)
Data Pipeline Project | University Assignment
Real-time data ingestion platform tracking Bitcoin, gold, and USD-VND rates with automated ETL pipeline and interactive visualizations.
View on GitHub →Tech Stack
- • Async httpx for concurrent API calls
- • SQLite with SQLModel ORM for time-series data
- • React + Chart.js with date range filtering