Leveraging big data to build a transportation web app
20M+ rows of Divvy, CTA, and weather data powering a functioning Lambda architecture — batch, serving, and speed layers, plus a pair of ML models.
This page is dedicated to my final project for the course MPCS 53015: Big Data Application Architecture. I built this application from the ground up using data from the City of Chicago's Data Portal and the NOAA, and the following Big Data tools:
- Hadoop Distributed File System
- Apache Hive, Apache HBase
- Apache Spark with Scala
- Apache Kafka
- Node.js for the website backend
Links
- The class AWS Cluster will soon be taken down, but the application lived on this link.
- GitHub repository with an in-depth README describing the project — here.
Walkthrough
In the video below, I walk through the web application and its features as they relate to the Lambda architecture.