I ran a frequency distribution in order to generate the percentages for dichotomous variables. To generate the average, minimum, and maximum values for the continuous variables, I ran descriptive statistics. The descriptive statistics that I thought were interesting are:
- Percentage of participants from North America? (76% from North America, 24% not from North America).
- Percentage of male versus female participants? (85% male, 15% female).
- Average, minimum, and maximum age of the participants? (Average 44, min 12, max 73).
- Average, shortest, and longest distances travelled? (5052 average, min 300, max 35938, standard deviation 6034 km).
- Average, shortest, and longest trip duration? (average 88, min 7, max 503, standard deviation 121 days)
I wasn’t sure if I should be reporting the distances and durations before or after correcting for outliers. In some ways it feels inauthentic to report after adjusting for outliers, but the descriptive statistics reported here are for the actual dataset used for the analysis.