Analytics product · Mobility
NYC Ride-Hailing Analytics
Operational dashboards and ML models for Uber/Lyft trip patterns, fare dynamics, and airport demand in New York City.
Summary
The Challenge
Stakeholders needed explainable insights across trip patterns, airport pricing, and demand peaks — not black-box scores buried in notebooks.
The Solution
A Streamlit analytics platform with geospatial dashboards, fare regression, and airport trip classification models deployed as a live dashboard.
The Impact
Fare prediction R² > 0.85 and 92% airport classification accuracy — packaged for non-technical stakeholders with interactive exploration.
“A model is only useful when a manager can explore the why behind the prediction — not just the number.”
Design principle
Constraints
What shaped the product.
Data volume, explainability, and deployment limits defined scope.
Massive trip tables
NYC TLC datasets are large and messy; aggregations and caching were required before any chart could feel responsive.
Explainability first
Stakeholders rejected opaque scores. Every model needed interpretable features and views tied to operational questions (airport, hour, borough).
Geospatial complexity
Maps and zone joins added engineering overhead compared to tabular-only dashboards — but were essential for adoption.
Streamlit deployment
Chose Streamlit Cloud for fast iteration over a custom React frontend — trading polish for time-to-insight.
Public data only
No proprietary ride feeds — all insights from open TLC releases, with documented lag and reporting limitations.
Product
Dashboard stakeholders actually open.
Interactive views for fare trends, airport classification, and geospatial demand — backed by scikit-learn models and Plotly maps. Deployed on Streamlit with a public URL for demos and portfolio evidence.

Key decisions
Models behind the charts.
Gradient boosting for fares
Fare prediction used ensemble regression with engineered time and location features — prioritizing R² and residual analysis over model complexity.
Trade-off
More features improved accuracy but required careful leakage checks on temporal splits.
Airport trip detector
A separate classifier labels airport-bound trips to power pricing and demand widgets — validated at 92% accuracy on holdout data.
Trade-off
Class imbalance required weighted metrics and manual error review on edge zones.
Plotly over static maps
Interactive maps let users drill from city view to zones without exporting shapefiles — critical for stakeholder demos.
Trade-off
Heavier payloads on first load; mitigated with aggregated layers.
Streamlit as product shell
Streamlit wrapped models and charts into one deployable artifact — ideal for analytics MVPs with minimal frontend work.
Trade-off
Limited branding and layout control vs. a custom Next.js app.
Lessons learned
What I'd do differently.
Shipping analytics products surfaced process gaps early.
Feature leakage in time splits
Early models leaked future information through poorly scoped rolling windows. Rebuilt splits with strict temporal cutoffs.
ValidationDashboard performance
Full-table loads stalled Streamlit on cold start. Moved to pre-aggregated Parquet layers for common views.
PerformanceStory before charts
First version was chart-heavy with no narrative path. Reordered pages to match stakeholder questions (where → when → how much).
UX