Thong Truong

Data Engineering & AI Systems Engineer

I build scalable data infrastructure: ETL pipelines, distributed processing, and real-time analytics. Expert in Python, SQL, and modern data technologies.

PythonJavaApache SparkApache AirflowPostgreSQLSQLAlchemyFastAPIRedisApache KafkaDramatiq/CeleryDockerMetabase

Data Pipelines & Processing

  • Apache Spark (PySpark) for distributed processing
  • Apache Airflow for workflow orchestration
  • ETL pipelines for data extraction & transformation
  • Real-time data ingestion & batch processing

Data Storage & APIs

  • PostgreSQL with advanced SQL & query optimization
  • Vector databases (pgvector) for embeddings
  • FastAPI for data service APIs & microservices
  • Redis for caching & message queuing
  • Apache Kafka for event streaming & data pipelines

Analytics & Infrastructure

  • Business intelligence with Metabase dashboards
  • Data quality monitoring & validation
  • Containerized pipelines with Docker
  • CI/CD for data pipeline deployments

My Projects

SiteBotic - AI-Powered Chatbot Platform

Founder & Technical Lead | Production SaaS Platform

Production SaaS platform with automated data ingestion pipelines. Built end-to-end data infrastructure for AI-powered chatbots.

Tech Stack

  • • RAG pipeline with vector search (pgvector + PostgreSQL)
  • • Async task processing with Dramatiq + Redis
  • • FastAPI backend with SQLAlchemy ORM
  • • React dashboard with TypeScript & TanStack Query
  • • Docker multi-environment deployment

Platform Architecture

User Websites (Content Sources)SiteBotic SaaS PlatformAI Content Processing & TrainingChatbot Engine(AI Responses)Analytics Dashboard(User Insights)One-Click Embed(Live Websites)End Users(24/7 Chat Support)

Customer 360 Risk Scoring System

Data Engineering Project | University Assignment

End-to-end data engineering solution building Customer 360 views and risk analytics with ETL pipelines and distributed processing.

View on GitHub →

Tech Stack

  • • ETL pipelines for data extraction & transformation
  • • Apache Spark (PySpark) for distributed analytics
  • • Apache Airflow for workflow orchestration
  • • PostgreSQL data warehouse with Metabase BI

Data Engineering Architecture

Multiple Data Sources (CRM, Transactions, Behavior)ETL Pipeline (Extract & Transform)Apache Spark(Distributed Processing)Apache Airflow(Workflow Orchestration)PostgreSQL Data Warehouse(Customer 360 & Risk Data)Metabase BI Dashboard(Risk Scoring & Analytics)

Book Recommendation System

Machine Learning Project | University Assignment

Full-stack ML platform implementing collaborative filtering, content-based filtering, and hybrid recommendation algorithms.

View on GitHub →

Tech Stack

  • • Scikit-learn (SVD, TF-IDF) with FastAPI REST API
  • • React + TypeScript frontend with Vite
  • • Collaborative filtering and content-based algorithms

System Architecture

React Frontend (TypeScript + Vite)FastAPI Backend (RESTful API)Collaborative Filtering(SVD Algorithm)Content-Based Filtering(TF-IDF Vectorization)Hybrid Recommendation Engine(Combined Algorithms)Book Dataset & User Ratings(CSV/Structured Data)

Pet Clinic Management System

Desktop Application | University Assignment

Java desktop application with complete CRUD operations for veterinary clinic management using DAO pattern and MySQL.

View on GitHub →

Tech Stack

  • • Java with DAO pattern and JDBC
  • • Java Swing GUI with event-driven programming
  • • MySQL database for data persistence

Application Architecture

Java Swing GUI (Desktop Interface)Business Logic Layer (Java Classes)Pet DAO(CRUD Operations)Owner DAO(CRUD Operations)Appointment DAO(Scheduling)Medical Record DAO(Treatment History)MySQL Database(JDBC Connection)

Real-Time Price Tracker (BGU118)

Data Pipeline Project | University Assignment

Real-time data ingestion platform tracking Bitcoin, gold, and USD-VND rates with automated ETL pipeline and interactive visualizations.

View on GitHub →

Tech Stack

  • • Async httpx for concurrent API calls
  • • SQLite with SQLModel ORM for time-series data
  • • React + Chart.js with date range filtering

Data Flow Architecture

React Frontend (Charts & UI)FastAPI Backend (Data API)GoldAPI.io(Gold Prices)CoinMarketCap(Bitcoin Prices)CurrencyAPIData Processing & Storage(SQLite + VND Conversion)SQLite Database(Historical Price Data)