Home » Visualizing my Citi Bike rides

Visualizing my Citi Bike rides

Research focus

From September 2023 to October 2024, I want to understand how my Citi Bike riding habits have changed throughout time. How many Citi Bike rides am I going on each month, per borough? What months and times of day am I going on the most Citi Bike rides? How has my monthly mileage changed over time, per borough? What is the breakdown in duration for Citi Bike rides per starting borough? Are there specific Citi Bike stations that I start or end at more than others? Which boroughs do I spend the most time biking in? How often am I going on Citi Bike rides that stay within the borough the ride started in, versus ending in a different borough? What day of the week do I Citi Bike the most, and how many of those rides are within versus between boroughs?

Over the course of my time using Citi Bikes to navigate the city, I have grown more curious about my habits. The audience of my blog post is bike enthusiasts, such as myself and my friends, as we often discuss how often, how long, and where we’ve been going on Citi Bike rides.

Dataset and Variables

I collected my citi bike riding data from https://account.citibikenyc.com/ for this dataset using JavaScript code to scrape the information from my Citi Bike ride history page. Since the mileage of each ride is not present on the site, I added the mileage information which can be found through the Lyft app manually to my dataset. Additionally, I used the Citi Bike System Data to get the latitude and longitude of each starting and ending station. Finally, I used a GeoJSON file to add starting and ending borough information to the dataset. Each row in my dataset represents a citi bike ride from my personal account, with the following variables: date, start time, end time, duration, starting station, ending station, mileage (from the Lyft app), starting latitude (from Citi Bike System data), starting longitude (from Citi Bike System data), ending latitude (from Citi Bike System data), ending longitude (from Citi Bike System data), starting borough (calculated from latitude and longitude), and ending borough (calculated from latitude and longitude). I created the following calculated fields and groups to further my analysis:

FieldDescription
day of weeka number representing the day of the week in which the citi bike ride started, parsed from starting datetime
day of week (string)a string representation derived from day of week
start houra number representing the hour in which the citi bike ride started, parsed from starting datetime
start hour (group)a grouping for the start hour for a citi bike ride, based on time of start hour, which can be either:
Morning (5am – 12pm)
Afternoon (12pm – 5pm)
Evening (5pm – 10pm)
Night (10pm – 5am)
inter borougha string identifying whether the citi bike ride was “between” boroughs or “within” a borough, derived from starting borough and ending borough
duration categorya grouping based on the duration of a citi bike ride, which can be either:
under 15 minutes
15 to 29 minutes
30 to 44 minutes
over 45 minutes

Visualizations

Over the time period covered by my citi bike ride dataset, the time series graph above shows that the number of rides have generally increased, with the most rides peaking at 45 rides in Brooklyn during July 2024. The visualization represents each borough by a different color, though there are only datapoints for citi bike rides starting in 3 boroughs, meaning that I’ve not gone on rides in the Bronx nor Staten Island!
Using the month and start hour, the highlight table above uses a gradient of the color purple to represent the number of rides I’ve gone on. The table only represents the data over the course of the first complete year, from September 2023 to 2024. It’s important to note that there are no rows to represent January nor November, since there are no rides for those months. I’ve gone on the greatest number of rides during June nights, which would make sense since the weather is ideal for biking!
The density map above reveals where the concentration of starting stations for my citi bike rides is. The greater density spots on the map correspond to stations that are closest to where I used to or currently live. In general, the density is of starting stations is in Northern Brooklyn and Southern Manhattan.

Mileage and duration

From September 2023 to October 2024, the time series graph above shows that the monthly mileage of my rides have generally increased, with the a peak at 158.8 miles in Brooklyn during September 2024. Like previous visualizations, each borough for which there is data is represented by a different color.
Continuing with each borough represented by a different color, the stacked bar chart above depicts that a large portion of citi bike rides lasted from 15 to 29 minutes. Of these rides, an overwhelming majority started in Brooklyn.

Within or between boroughs?

According to the pie chart above, over 66% of all my citi bike rides were within Brooklyn. The next greatest slice shows that about 12% of my citi bike rides were between boroughs, starting in Brooklyn. That means that about 88% of all my citi bike rides start in Brooklyn, which would make sense since that is the borough I live in.
For citi bike rides that start at “Home”, which is in Brooklyn, 80% of the rides also ended in Brooklyn. The pie chart reflects my expectations, since I tend to opt to travel within Brooklyn via citi bike given the difficulty and increased commute time of navigating via the subways.
Aggregated by day of the week, I’ve gone on the the most number of citi bike rides on Sundays within the same borough. Sundays I typically have more time to go on longer bike rides that are between boroughs, which is reflected in the bar chart above.

Concluding thoughts

Creating my self-quantified dataset for citi bike rides was harder than I expected. It raised a lot of questions regarding my right to access my own data. Citi Bike did not have an easily accessible way for exporting this data, and showed different information on the website versus the app. Attempts to export this data using the app were unsuccessful, due to undisclosed limits imposed by Lyft on the number of records to be exported at once. As such, almost all data was scraped from the Citi Bike ride history page, but I had to manually add mileage data from the app. However, it was necessary to merge even more data fields in order for the dataset to be useful in visualizing what I wanted.

Overall, it’s been interesting to get a deeper dive into my citi bike riding habits. It is no surprise that the majority of my rides are concentrated within Brooklyn – that’s the borough that I and most of my friends live in! The greatest density of my rides taking place during June nights was pretty cool to see. While I know that I tend to bike more in June because of more ideal biking weather, I hadn’t realized that I spent more of my biking during nights. In seeing these visualizations, I’m setting out to go on more citi bike rides that expand beyond Brooklyn, Manhattan, and Queens. I hope that as I continue to go on more rides, I can come back to these visualizations with more datapoints and discover seasonal trends in my riding habits.