Heart Stroke Risk Prediction
Machine learning model to predict stroke risk from healthcare data, with preprocessing, training, and an interactive Streamlit app.
Python, scikit-learn, Streamlit

UC Davis Graduate | B.S. Statistical Data Science, Minor in Computer Science
Currently: Seeking full-time Data Analyst / Data Engineer roles while building an SF Restaurant Safety Map.
Email: royho.career@gmail.com
Location: Davis, CA | San Francisco, CA
Hi, my name is Roy Ho, and I am a recent UC Davis graduate with a Bachelor's degree in Statistical Data Science and a minor in Computer Science.
I'm interested in building data-driven solutions using Python, R, SQL, and Excel, with a focus on machine learning. I'm currently pursuing roles in data analytics and data engineering.
Built an interactive map of 5,500+ San Francisco restaurant health inspections using public DataSF data, an ETL pipeline into a SQLite schema, a Flask REST API, and a React + Mapbox frontend for search, filters, and inspection details.
Python, SQL, Flask, React, Mapbox, ETL
GitHubBuilt a dashboard to analyze job market trends, including salaries, skills, and geographic differences across roles.
Python, Data Visualization
GitHubBuilt a real-time drowsiness detection system using computer vision techniques and a CNN model trained on eye-state data.
Python, OpenCV, CNN
GitHubMachine learning model to predict stroke risk from healthcare data, with preprocessing, training, and an interactive Streamlit app.
Python, scikit-learn, Streamlit
Classified red vs. white wines and predicted quality ratings using logistic regression, LDA, and PCA on chemical properties.
Python, scikit-learn, PCA
Built a multi-factor stock screening model with NLP sentiment analysis (FinBERT), supervised classification, and an automated daily ETL pipeline delivering real-time investment signals.
Python, scikit-learn, NLP, ETL
Predicted 5th-season NBA player performance using regression and classification models on historical stats and draft data.
Python, Random Forest, Gradient Boosting
Scraped and compared IMDb audience reviews with professional critic reviews using sentiment analysis and NLP models.
Python, Selenium, VADER, RoBERTa
Forecasted Drake's popularity trends using 14 years of Google Trends data with ARMA and ARIMA models.
R, forecast, ggplot2
Analyzed the relationship between player performance metrics and salary structures using regression and clustering.
R, tidyverse, ggplot2
Modeled the relationship between poverty, unemployment, and crime rates using multiple linear regression and model selection.
R, ANOVA, AIC/BIC
Built a graph traversal algorithm to compute degrees of separation between actors through shared movie appearances.
Python, BFS, Graph Algorithms

Journal of Artificial Intelligence and Knowledge Engineering
January 2025 – Present

March 2024 – September 2025
View or download a PDF of my experience, education, and skills.
Preview
Outside of work, I enjoy thrifting, bass fishing, spending time outdoors, and playing poker. I also love keeping up with fashion and music.





