Finally got to work on my little DC Capital Bikeshare project.
So, the question I'm trying to answer:
How was biking activity affected during the 2013 government shutdown, which occurred from October 1-16, 2013?
My approach:
“Prior” Model Overview:
First, we construct a model of biking activity using data prior to the government shutdown (Oct 2010-Sep 30, 2013).
So, the question I'm trying to answer:
How was biking activity affected during the 2013 government shutdown, which occurred from October 1-16, 2013?
My approach:
- Predict biking activity during the shutdown, given only information available prior to the government shutdown.
- Compare predicted activity vs. actual biking activity across many locations.
- Does the difference (prediction error) depend on how close we are to government offices?
“Prior” Model Overview:
First, we construct a model of biking activity using data prior to the government shutdown (Oct 2010-Sep 30, 2013).
Regress: Biking Activity on: |
|
Date Opening Status Station Location |
|
Regression, Parameter estimates:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | -3.3689 | 0.8985 | -3.75 | 0.0002 |
weekendTRUE | -0.4908 | 0.3170 | -1.55 | 0.1215 |
monthFeb | 1.6798 | 0.7481 | 2.25 | 0.0247 |
monthMar | 16.7982 | 0.7313 | 22.97 | 0.0000 |
monthApr | 32.5930 | 0.7257 | 44.91 | 0.0000 |
monthMay | 36.6432 | 0.7081 | 51.75 | 0.0000 |
monthJun | 40.3435 | 0.7098 | 56.84 | 0.0000 |
monthJul | 37.3828 | 0.7011 | 53.32 | 0.0000 |
monthAug | 39.4870 | 0.6973 | 56.63 | 0.0000 |
monthSep | 39.1413 | 0.6952 | 56.30 | 0.0000 |
monthOct | 33.3289 | 0.7899 | 42.19 | 0.0000 |
monthNov | 21.7503 | 0.7854 | 27.69 | 0.0000 |
monthDec | 8.6777 | 0.7779 | 11.15 | 0.0000 |
year2011 | 32.7188 | 0.7418 | 44.10 | 0.0000 |
year2012 | 44.4579 | 0.7227 | 61.51 | 0.0000 |
year2013 | 46.0732 | 0.7747 | 59.47 | 0.0000 |
status2Delayed | -16.1892 | 2.7431 | -5.90 | 0.0000 |
status2Closed | -59.5150 | 3.2812 | -18.14 | 0.0000 |
lon.scaled | -57019.7238 | 357.4078 | -159.54 | 0.0000 |
lonsq.scaled | -570405.7667 | 3575.1208 | -159.55 | 0.0000 |
lat.scaled | 36112.6511 | 320.0441 | 112.84 | 0.0000 |
latsq.scaled | -361679.1697 | 3206.3753 | -112.80 | 0.0000 |
Observations
- The summer months have much higher biking activity on average.
- The difference between bikestation visits on weekends vs. weekdays is not statistically significant
- Bike activity tends to decrease dramatically during delays, and even more so during closures (probably due to the adverse weather conditions that cause these)
- Biking activity has gradually increased over the years, presumably as the bikeshare system has become more popular/prevalent. Note that data from 2010 is only for Oct-Dec. As such, average biking activity in that year is much lower than for 2011-2013.
Predicting Out-of-sample (During Shutdown) Using “Prior” Model
- I used the “prior” model (above) to make predictions for each station and day in during the government shutdown.
- I calculate the prediction error (relative to the actual number of visits for each station and day)
- For each station, I compute the average prediction error over all shutdown days
- For each station, I compute a scaled average error = average prediction error / average actual daily visits
- I mapped the avg. prediction errors (scaled and unscaled) by station location:
You can play around with the map by visualizing the averageErrorScaled (default), averageError, and averageVisits.
Analysis:
Interestingly, the average scaled error is negative in central DC. This means that the during the government shutdown, biking activity in central DC underperformed relative to expectations ex-ante. The opposite holds for stations further out of the city; they appear to have experienced above-trend activity. This trend resulted from one of two possible reasons:
MOST Likely:
OR (Less Likely):
Next steps...
Analysis:
Interestingly, the average scaled error is negative in central DC. This means that the during the government shutdown, biking activity in central DC underperformed relative to expectations ex-ante. The opposite holds for stations further out of the city; they appear to have experienced above-trend activity. This trend resulted from one of two possible reasons:
MOST Likely:
- The prediction model is quadratic in longitude and latitude. This might be the true reason why the predictions trend outward in a circular pattern.
OR (Less Likely):
- With the government office closures, fewer biker-commuters took to the streets. Hence the lower activity at bikestations in the heart of DC.
- In the suburbs, people were able to spend more leisure time biking during the shutdown.
Next steps...
- Improve model, experimenting with alternatives to this specification (quadratic wrt longitude/latitude)
- Include Federal Holidays as indicators (oops, forgot)
NOTES
For more details, see my GitHub.
Tools used for this analysis:
Data used for this analysis:
For more details, see my GitHub.
Tools used for this analysis:
- R
- RMarkdown/Knitr
- Google Maps Engine Lite
- BatchGeoCode Tool
Data used for this analysis:
- Bikeshare data: http://www.capitalbikeshare.com/trip-history-data
- Government Shutdown data: http://www.opm.gov/policy-data-oversight/snow-dismissal-procedures/status-archives/ (scraped using XML package in R)