Research focus
From September 2023 to October 2024, I want to understand how my Citi Bike riding habits have changed throughout time. How many Citi Bike rides am I going on each month, per borough? What months and times of day am I going on the most Citi Bike rides? How has my monthly mileage changed over time, per borough? What is the breakdown in duration for Citi Bike rides per starting borough? Are there specific Citi Bike stations that I start or end at more than others? Which boroughs do I spend the most time biking in? How often am I going on Citi Bike rides that stay within the borough the ride started in, versus ending in a different borough? What day of the week do I Citi Bike the most, and how many of those rides are within versus between boroughs?
Over the course of my time using Citi Bikes to navigate the city, I have grown more curious about my habits. The audience of my blog post is bike enthusiasts, such as myself and my friends, as we often discuss how often, how long, and where we’ve been going on Citi Bike rides.
Dataset and Variables
I collected my citi bike riding data from https://account.citibikenyc.com/ for this dataset using JavaScript code to scrape the information from my Citi Bike ride history page. Since the mileage of each ride is not present on the site, I added the mileage information which can be found through the Lyft app manually to my dataset. Additionally, I used the Citi Bike System Data to get the latitude and longitude of each starting and ending station. Finally, I used a GeoJSON file to add starting and ending borough information to the dataset. Each row in my dataset represents a citi bike ride from my personal account, with the following variables: date, start time, end time, duration, starting station, ending station, mileage (from the Lyft app), starting latitude (from Citi Bike System data), starting longitude (from Citi Bike System data), ending latitude (from Citi Bike System data), ending longitude (from Citi Bike System data), starting borough (calculated from latitude and longitude), and ending borough (calculated from latitude and longitude). I created the following calculated fields and groups to further my analysis:
Field | Description |
---|---|
day of week | a number representing the day of the week in which the citi bike ride started, parsed from starting datetime |
day of week (string) | a string representation derived from day of week |
start hour | a number representing the hour in which the citi bike ride started, parsed from starting datetime |
start hour (group) | a grouping for the start hour for a citi bike ride, based on time of start hour, which can be either: – Morning (5am – 12pm) – Afternoon (12pm – 5pm) – Evening (5pm – 10pm) – Night (10pm – 5am) |
inter borough | a string identifying whether the citi bike ride was “between” boroughs or “within” a borough, derived from starting borough and ending borough |
duration category | a grouping based on the duration of a citi bike ride, which can be either: – under 15 minutes – 15 to 29 minutes – 30 to 44 minutes – over 45 minutes |
Visualizations
Mileage and duration
Within or between boroughs?
Concluding thoughts
Creating my self-quantified dataset for citi bike rides was harder than I expected. It raised a lot of questions regarding my right to access my own data. Citi Bike did not have an easily accessible way for exporting this data, and showed different information on the website versus the app. Attempts to export this data using the app were unsuccessful, due to undisclosed limits imposed by Lyft on the number of records to be exported at once. As such, almost all data was scraped from the Citi Bike ride history page, but I had to manually add mileage data from the app. However, it was necessary to merge even more data fields in order for the dataset to be useful in visualizing what I wanted.
Overall, it’s been interesting to get a deeper dive into my citi bike riding habits. It is no surprise that the majority of my rides are concentrated within Brooklyn – that’s the borough that I and most of my friends live in! The greatest density of my rides taking place during June nights was pretty cool to see. While I know that I tend to bike more in June because of more ideal biking weather, I hadn’t realized that I spent more of my biking during nights. In seeing these visualizations, I’m setting out to go on more citi bike rides that expand beyond Brooklyn, Manhattan, and Queens. I hope that as I continue to go on more rides, I can come back to these visualizations with more datapoints and discover seasonal trends in my riding habits.