Analytics product · Mobility

NYC Ride-Hailing Analytics

Operational dashboards and ML models for Uber/Lyft trip patterns, fare dynamics, and airport demand in New York City.

R² .85Fare prediction

0%Airport classification

0Platforms (Uber/Lyft)

LiveStreamlit dashboard

The Challenge

Stakeholders needed explainable insights across trip patterns, airport pricing, and demand peaks — not black-box scores buried in notebooks.

The Solution

A Streamlit analytics platform with geospatial dashboards, fare regression, and airport trip classification models deployed as a live dashboard.

The Impact

Fare prediction R² > 0.85 and 92% airport classification accuracy — packaged for non-technical stakeholders with interactive exploration.

RoleData science · ML · Viz

Timeline2025

StackStreamlit, Python, Scikit-learn

DeliverablesLive dashboard, repo

“A model is only useful when a manager can explore the why behind the prediction — not just the number.”

Design principle

Constraints

What shaped the product.

Data volume, explainability, and deployment limits defined scope.

Massive trip tables

NYC TLC datasets are large and messy; aggregations and caching were required before any chart could feel responsive.

Explainability first

Stakeholders rejected opaque scores. Every model needed interpretable features and views tied to operational questions (airport, hour, borough).

Geospatial complexity

Maps and zone joins added engineering overhead compared to tabular-only dashboards — but were essential for adoption.

Streamlit deployment

Chose Streamlit Cloud for fast iteration over a custom React frontend — trading polish for time-to-insight.

Public data only

No proprietary ride feeds — all insights from open TLC releases, with documented lag and reporting limitations.

Product

Dashboard stakeholders actually open.

Interactive views for fare trends, airport classification, and geospatial demand — backed by scikit-learn models and Plotly maps. Deployed on Streamlit with a public URL for demos and portfolio evidence.

NYC ride-hailing analytics dashboard walkthrough

Key decisions

Models behind the charts.

Fare regression

Gradient boosting for fares

Fare prediction used ensemble regression with engineered time and location features — prioritizing R² and residual analysis over model complexity.

Trade-off

More features improved accuracy but required careful leakage checks on temporal splits.

Classification

Airport trip detector

A separate classifier labels airport-bound trips to power pricing and demand widgets — validated at 92% accuracy on holdout data.

Trade-off

Class imbalance required weighted metrics and manual error review on edge zones.

Geospatial

Plotly over static maps

Interactive maps let users drill from city view to zones without exporting shapefiles — critical for stakeholder demos.

Trade-off

Heavier payloads on first load; mitigated with aggregated layers.

Delivery

Streamlit as product shell

Streamlit wrapped models and charts into one deployable artifact — ideal for analytics MVPs with minimal frontend work.

Trade-off

Limited branding and layout control vs. a custom Next.js app.

Lessons learned

What I'd do differently.

Shipping analytics products surfaced process gaps early.

Feature leakage in time splits

Early models leaked future information through poorly scoped rolling windows. Rebuilt splits with strict temporal cutoffs.

Validation

Dashboard performance

Full-table loads stalled Streamlit on cold start. Moved to pre-aggregated Parquet layers for common views.

Performance

Story before charts

First version was chart-heavy with no narrative path. Reordered pages to match stakeholder questions (where → when → how much).

Back to projects

Analytics product · Mobility

NYC Ride-Hailing Analytics

Operational dashboards and ML models for Uber/Lyft trip patterns, fare dynamics, and airport demand in New York City.

Live dashboard Repository

R² .85Fare prediction

0%Airport classification

0Platforms (Uber/Lyft)

LiveStreamlit dashboard

The Challenge

Stakeholders needed explainable insights across trip patterns, airport pricing, and demand peaks — not black-box scores buried in notebooks.

The Solution

A Streamlit analytics platform with geospatial dashboards, fare regression, and airport trip classification models deployed as a live dashboard.

The Impact

Fare prediction R² > 0.85 and 92% airport classification accuracy — packaged for non-technical stakeholders with interactive exploration.

RoleData science · ML · Viz

Timeline2025

StackStreamlit, Python, Scikit-learn

DeliverablesLive dashboard, repo

“A model is only useful when a manager can explore the why behind the prediction — not just the number.”

Design principle

Constraints

What shaped the product.

Data volume, explainability, and deployment limits defined scope.

Massive trip tables

NYC TLC datasets are large and messy; aggregations and caching were required before any chart could feel responsive.

Explainability first

Stakeholders rejected opaque scores. Every model needed interpretable features and views tied to operational questions (airport, hour, borough).

Geospatial complexity

Maps and zone joins added engineering overhead compared to tabular-only dashboards — but were essential for adoption.

Streamlit deployment

Chose Streamlit Cloud for fast iteration over a custom React frontend — trading polish for time-to-insight.

Public data only

No proprietary ride feeds — all insights from open TLC releases, with documented lag and reporting limitations.

Product

Dashboard stakeholders actually open.

Key decisions

Models behind the charts.

Fare regression

Gradient boosting for fares

Fare prediction used ensemble regression with engineered time and location features — prioritizing R² and residual analysis over model complexity.

Trade-off

More features improved accuracy but required careful leakage checks on temporal splits.

Classification

Airport trip detector

A separate classifier labels airport-bound trips to power pricing and demand widgets — validated at 92% accuracy on holdout data.

Trade-off

Class imbalance required weighted metrics and manual error review on edge zones.

Geospatial

Plotly over static maps

Interactive maps let users drill from city view to zones without exporting shapefiles — critical for stakeholder demos.

Trade-off

Heavier payloads on first load; mitigated with aggregated layers.

Delivery

Streamlit as product shell

Streamlit wrapped models and charts into one deployable artifact — ideal for analytics MVPs with minimal frontend work.

Trade-off

Limited branding and layout control vs. a custom Next.js app.

Lessons learned

What I'd do differently.

Shipping analytics products surfaced process gaps early.

Feature leakage in time splits

Early models leaked future information through poorly scoped rolling windows. Rebuilt splits with strict temporal cutoffs.

Validation

Dashboard performance

Full-table loads stalled Streamlit on cold start. Moved to pre-aggregated Parquet layers for common views.

Performance

Story before charts

First version was chart-heavy with no narrative path. Reordered pages to match stakeholder questions (where → when → how much).

Summary

The Challenge

The Solution

The Impact

What shaped the product.

Massive trip tables

Explainability first

Geospatial complexity

Streamlit deployment

Public data only

Dashboard stakeholders actually open.

Models behind the charts.

Gradient boosting for fares

Trade-off

Airport trip detector

Trade-off

Plotly over static maps

Trade-off

Streamlit as product shell

Trade-off

What I'd do differently.

Feature leakage in time splits

Dashboard performance

Story before charts

Summary

The Challenge

The Solution

The Impact

What shaped the product.

Massive trip tables

Explainability first

Geospatial complexity

Streamlit deployment

Public data only

Dashboard stakeholders actually open.

Models behind the charts.

Gradient boosting for fares

Trade-off

Airport trip detector

Trade-off

Plotly over static maps

Trade-off

Streamlit as product shell

Trade-off

What I'd do differently.

Feature leakage in time splits

Dashboard performance

Story before charts