#NOTES median income in chicago 2019 is $61,811 Source: City Data retrieval date: 08/JAN/2022

Formula used to convert distance travelled from LatLong: formula retrieved from: stackoverflow answered by: JMX on: Aug 13 ’15 at 0:20

which has been modified to the following for the first row:

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Information Sources

Characters and teams

The different types of parties involved with the decision making:

  • Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
  • Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
  • Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
  • Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Statement of the business task

Cyclistic is interested in identifying the number of riders there are with each customer type.

Ask Phase

The client Cyclistic is interested in understanding the difference in use by the different types of members. What are the benefits for the non-annual members to become annual members. Lily Moreno already has an opinion:

“Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers”

It is true that these members already have a knowledge about the product, but there may be different factors to consider, the cost of an annual membership, the distances traveled and the ease of use (pickup location and drop-off).

Three main questions that are intended to be answered are as follows:

  1. How do annual members and casual riders use Cyclistic bikes differently?

  2. Why would casual riders buy Cyclistic annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become members?

  4. Why would (or would not) someone buy the annual membership. this can be compared to the average income of a Chicago citizen. the annual membership is also billed upfront annually and may cause hesitance to purchase. A comparison of the following: (by cost, distance traveled, number of commuters per transportation type)

  • Cars
  • Public transportation
  • Bicycles

Prepare

Customer types

  • Single-ride passes, (aka. casual riders)

    • with a cost of $3.30/trip - One trip up to 30 minutes.
  • Full-day passes, (aka. casual riders)

    • with a cost of $15/day - Unlimited 3-hour rides for 24-hours.
  • Annual memberships, (aka. Cyclistic members)

    • with a cost of $9*/month - Unlimited 45-min rides. paid upfront annually

The client has provided the sources for reliable information. (see Heading: Information Sources). The information of the year 2021 has been downloaded and put in a folder. The csv files have been extracted from the zip files and opened.

Process

For ease of calculations and use, the “Duration” between the start time and end time has been calculated as follows:

= ended_at - started_at

Calculating the “Distance_traveled” by each trip has been done with the following formula and the result is in Km: (the formula has been verified by: latlongdata.com)

= ACOS(COS(RADIANS(90- [start_lat] )) * COS(RADIANS(90 - [end_lat] )) +SIN(RADIANS(90 - [start_lat] )) * SIN(RADIANS(90 - [end_lat] )) * COS(RADIANS( [start_lng] - [end_lng] ))) * 6371

In addition, the day of the week has been added with the following formula: (this makes Sunday = 1, Monday = 2,…)

= MATCH(TEXT( [started_at] ,“ddd”),{“Sun”,“Mon”,“Tue”,“Wed”,“Thu”,“Fri”,“Sat”},0)

Shortcomings of the data

The ride_id are unique variables and do not reflect the user, therefore it is not possible to identify the individual rider’s behavior and to clarify the user’s use of their membership.

Credibility of data

The information has been directly retrieved from their servers, therefore as it is first hand data, it is credible.

Tools used

Both Excel and R will be used for performing the analysis.

Code and analysis

Below, the different codes are displayed with their corresponding explanations for reason of use, process and the results are shown below.

Installing required libraries

Below are the packages used for the analysis

# install.packages("tidyverse")
# install.packages("rmarkdown")
# install.packages("bookdown")
# install.packages("plyr")  #Used for merging multiple CSV's into one dataframe
# install.packages("modeest")
# install.packages("calendR")
# install.packages("readxl")

Activating the Libraries

Activating the different libraries

Analyse Phase

Importing the all “*-divvy-tripdata.csv”, the information is to be stored in “tripdata”, and the column names are illustrated

ctripdata <- list.files(path = "D:/Desktop/CPortfolio/", #location of the csv's on my device   
                       pattern = "*.csv", full.names = TRUE) %>% 
  lapply(read_csv) %>%
  bind_rows

Rows that have missing information to be removed

ctripdata <- drop_na(ctripdata)

Volume removed: tripdata_original = 5,643,268 rows ctripdata (after dropping NA) = 4,588,186 rows which represents a reduction of 1,055,082 rows (18.7%)

—Converting the date and time column from character data type to dateTime, and adding a separate column for just the date This will help in the filtering and sorting

ctripdata$started_on <- as.Date(ctripdata$started_at, "%d/%m/%Y %H:%M")
ctripdata$ended_on <- as.Date(ctripdata$ended_at, "%d/%m/%Y %H:%M")
ctripdata$started_at <- parse_date_time(ctripdata$started_at, "%d/%m/%Y %H:%M")
ctripdata$ended_at <- parse_date_time(ctripdata$ended_at, "%d/%m/%Y %H:%M")

#adding extra columns for day of the week, day, month, year
ctripdata <- ctripdata %>% 
  mutate(d_hour=hour(ymd_hms(ctripdata$started_at))) #adding a column for the hour of the day
ctripdata$d_day_of_week <- format(ctripdata$started_at, "%a") #three letter weekday is used for better presentation in charts
ctripdata$d_day <- as.numeric(format(ctripdata$started_at, "%d")) #adding a column for the day of the month
ctripdata$d_month <- as.numeric(format(ctripdata$started_at, "%m")) #adding a column for the month of the year
ctripdata$d_year <- as.numeric(format(ctripdata$started_at, "%Y")) #adding  a column for the year of the trip

#adding another column for Ride_Duration to seconds
ctripdata$ride_duration_sec <- as.numeric(ctripdata$ride_duration) 

#sorting the data by the Starter At in Ascending order
ctripdata <- ctripdata %>% 
  arrange(started_at) 

#making sure that the day of the week will be displayed correctly in the plots
ctripdata$d_day_of_week <- ordered(ctripdata$d_day_of_week, levels=c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

for backup purposed and not needing to redo the above constantly, the file is saved: (the file is also used in MS Power BI)

Minimum, Average & Maximum

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in minutes & fraction.

ctripdata %>%
  group_by(d_month) %>%
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)
## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0         5.65        9.33      13.9    16.2    1403.
##  2       2   0         6.62       11.1       18.3    19.6    1428.
##  3       3   0         6.95       12.5       20.8    23.3    1435.
##  4       4   0         7.23       12.9       21.7    24.0    1440.
##  5       5   0         7.77       14         23.6    26.0    1438.
##  6       6   0         7.8        13.7       22.7    24.6    1438.
##  7       7   0         7.72       13.4       21.8    24.0    1426.
##  8       8   0         7.37       12.8       20.5    22.8    1434.
##  9       9   0         6.98       12.1       19.4    21.6    1439.
## 10      10   0         6.18       10.6       17.0    18.8    1437.
## 11      11   0.0167    5.28        8.87      13.5    15.2    1434.
## 12      12   0         5.15        8.6       13.1    14.7    1432.

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format (only issue is with Mean & Max not showing a time format but a seconds format).

ctripdata %>%
  group_by(d_month) %>%
  summarise(min_mins = min(ride_duration),
            q1_mins = quantile(ride_duration,0.25),
            median_mins = median(ride_duration),
            mean_mins = mean(ride_duration),
            q3_mins = quantile(ride_duration,0.75),
            max_mins = max(ride_duration))
## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins      q3_mins   max_mins  
##      <dbl> <drtn>   <time>  <time>      <drtn>         <time>    <drtn>    
##  1       1 0 secs   05'39"  09'20"       835.9534 secs 16'13.00" 84194 secs
##  2       2 0 secs   06'37"  11'04"      1098.8898 secs 19'35.25" 85658 secs
##  3       3 0 secs   06'57"  12'28"      1245.2138 secs 23'19.00" 86086 secs
##  4       4 0 secs   07'14"  12'54"      1302.2292 secs 24'01.00" 86397 secs
##  5       5 0 secs   07'46"  14'00"      1418.9013 secs 25'57.00" 86274 secs
##  6       6 0 secs   07'48"  13'42"      1361.7368 secs 24'34.00" 86279 secs
##  7       7 0 secs   07'43"  13'27"      1307.1804 secs 23'59.00" 85581 secs
##  8       8 0 secs   07'22"  12'48"      1230.1429 secs 22'49.00" 86043 secs
##  9       9 0 secs   06'59"  12'08"      1163.2396 secs 21'35.00" 86362 secs
## 10      10 0 secs   06'11"  10'37"      1019.3046 secs 18'50.00" 86244 secs
## 11      11 1 secs   05'17"  08'52"       812.1569 secs 15'12.00" 86048 secs
## 12      12 0 secs   05'09"  08'36"       788.9096 secs 14'40.00" 85913 secs

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format for members

ctripdata %>%
  group_by(d_month) %>%
  filter(member_casual == "member") %>% 
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)
## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0         5.35        8.75      12.0    15.0    1227.
##  2       2   0         6.2        10.2       14.7    17.2    1338.
##  3       3   0         5.87       10.0       13.7    17.3    1354.
##  4       4   0         6.08       10.4       14.2    17.9    1406.
##  5       5   0         6.17       10.6       14.3    18.1    1365.
##  6       6   0         6.27       10.6       14.1    17.9    1322.
##  7       7   0         6.12       10.4       13.8    17.4    1263.
##  8       8   0         5.97       10.1       13.5    17.1    1348.
##  9       9   0         5.73        9.73      13.1    16.5    1318.
## 10      10   0         5.27        8.83      12.0    14.9    1415.
## 11      11   0.0167    4.82        7.97      10.9    13.3    1427.
## 12      12   0         4.72        7.82      10.6    13      1231.

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format for casual

ctripdata %>%
  group_by(d_month) %>%
  filter(member_casual == "casual") %>% 
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)
## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0.0167    7.62        12.6      22.8    23.2    1403.
##  2       2   0.0333    9.42        16.7      32.8    30.9    1428.
##  3       3   0.0167   10.4         19.4      33.0    36.7    1435.
##  4       4   0        10.2         18.8      32.7    36.2    1440.
##  5       5   0        10.8         19.7      33.7    36.9    1438.
##  6       6   0        10.4         18.1      31.3    33.0    1438.
##  7       7   0         9.92        17.2      28.8    31.0    1426.
##  8       8   0         9.52        16.4      27.3    29.5    1434.
##  9       9   0         9.17        15.7      26.4    28.6    1439.
## 10      10   0         8.37        14.4      24.6    26.4    1437.
## 11      11   0.0167    7.12        11.9      20.5    21.7    1434.
## 12      12   0.0167    6.95        11.6      20.6    21.1    1432.

Comparisons

Weekday mostly used by type of user.

aggregate(ctripdata$d_day_of_week ~ ctripdata$member_casual, FUN = mfv)
##   ctripdata$member_casual ctripdata$d_day_of_week
## 1                  casual                     Sat
## 2                  member                     Wed

Average ride duration overall

Average ride duration by member/casual

agg_ctripdata <- ctripdata %>% 
  select(c(member_casual,ride_duration,ride_duration_sec)) #selecting only the relevant columns

aggregate(agg_ctripdata, by = list(agg_ctripdata$member_casual),FUN = mean)
##   Group.1 member_casual ride_duration ride_duration_sec
## 1  casual            NA 1715.505 secs          1715.505
## 2  member            NA  790.481 secs           790.481
remove(agg_ctripdata) 

Weekday Comparison

Average ride duration per day for all users

agg_ctripdata <- ctripdata %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week))

aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)
##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 1509.118 secs          1509.118            NA
## 2     Mon 1155.191 secs          1155.191            NA
## 3     Tue 1037.411 secs          1037.411            NA
## 4     Wed 1007.889 secs          1007.889            NA
## 5     Thu 1011.946 secs          1011.946            NA
## 6     Fri 1132.455 secs          1132.455            NA
## 7     Sat 1434.589 secs          1434.589            NA
remove(agg_ctripdata)

Average ride duration per day for member users

agg_ctripdata <- ctripdata %>% 
  filter(member_casual == "member") %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week)) #selecting only the relevant columns
aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)
##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 911.1003 secs          911.1003            NA
## 2     Mon 763.0025 secs          763.0025            NA
## 3     Tue 742.8148 secs          742.8148            NA
## 4     Wed 747.0413 secs          747.0413            NA
## 5     Thu 740.5237 secs          740.5237            NA
## 6     Fri 767.3038 secs          767.3038            NA
## 7     Sat 888.1996 secs          888.1996            NA
remove(agg_ctripdata)

Average ride duration per day for casual users

agg_ctripdata <- ctripdata %>% 
  filter(member_casual == "casual") %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week)) #selecting only the relevant columns
aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)
##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 1970.070 secs          1970.070            NA
## 2     Mon 1748.751 secs          1748.751            NA
## 3     Tue 1569.394 secs          1569.394            NA
## 4     Wed 1483.454 secs          1483.454            NA
## 5     Thu 1464.066 secs          1464.066            NA
## 6     Fri 1592.963 secs          1592.963            NA
## 7     Sat 1851.188 secs          1851.188            NA
remove(agg_ctripdata)

For an easy comparison of the ride duration per user type on each of the different weekday.

aggregate(ctripdata$ride_duration ~ ctripdata$member_casual + ctripdata$d_day_of_week, FUN = mean)
##    ctripdata$member_casual ctripdata$d_day_of_week ctripdata$ride_duration
## 1                   casual                     Sun          1970.0704 secs
## 2                   member                     Sun           911.1003 secs
## 3                   casual                     Mon          1748.7511 secs
## 4                   member                     Mon           763.0025 secs
## 5                   casual                     Tue          1569.3943 secs
## 6                   member                     Tue           742.8148 secs
## 7                   casual                     Wed          1483.4542 secs
## 8                   member                     Wed           747.0413 secs
## 9                   casual                     Thu          1464.0664 secs
## 10                  member                     Thu           740.5237 secs
## 11                  casual                     Fri          1592.9626 secs
## 12                  member                     Fri           767.3038 secs
## 13                  casual                     Sat          1851.1882 secs
## 14                  member                     Sat           888.1996 secs

Finding the correlation between temperatures and use of the Divvy Bikes

ctemps <- read_xlsx("D:/Desktop/CPortfolio/Chicago Temperatures.xlsx")

Correlation between temperatures and the number of trips:

trips <- ctripdata %>% 
count(started_on) %>% 
  select(-c(started_on))

temps <- ctemps %>% 
  select(c(temp_max_c))
  
cor(trips,temps)
##   temp_max_c
## n  0.8822117
remove(trips,temps)
aggregate(ctripdata$ride_duration ~ ctripdata$member_casual + ctripdata$d_day_of_week, FUN = mean)
##    ctripdata$member_casual ctripdata$d_day_of_week ctripdata$ride_duration
## 1                   casual                     Sun          1970.0704 secs
## 2                   member                     Sun           911.1003 secs
## 3                   casual                     Mon          1748.7511 secs
## 4                   member                     Mon           763.0025 secs
## 5                   casual                     Tue          1569.3943 secs
## 6                   member                     Tue           742.8148 secs
## 7                   casual                     Wed          1483.4542 secs
## 8                   member                     Wed           747.0413 secs
## 9                   casual                     Thu          1464.0664 secs
## 10                  member                     Thu           740.5237 secs
## 11                  casual                     Fri          1592.9626 secs
## 12                  member                     Fri           767.3038 secs
## 13                  casual                     Sat          1851.1882 secs
## 14                  member                     Sat           888.1996 secs

Share Phase

Number of rides during the whole month (total).

ggplot(data = ctripdata) +
  geom_bar(mapping = aes(x=d_day_of_week))+
  facet_wrap(~d_month)+
  xlab("Day of Week")+
  labs(title = "Number of rides/month") +
  ylab("Number of Rides (1e+01 = 100,000)")

Calendar illustration of the busiest days during the year (total)

daily_count <- ctripdata %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) } # placing values within a 0-1 by ratio and to be used in the calendR

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

Calendar illustration of the busiest days during the year (members)

daily_count <- ctripdata %>% 
  filter(member_casual == "member") %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) }
# normalized your data

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

remove(daily_count)

Calendar illustration of the busiest days during the year (casual users)

daily_count <- ctripdata %>% 
  filter(member_casual == "casual") %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) }
# normalized your data

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

remove(daily_count)

Bar chart of the number of rides per day (total)

ctripdata %>% 
  count(started_on) %>% 
  ggplot(aes(x= started_on, y= n,))+
  geom_col(position = "dodge")

Bar chart of the number of rides per month (total)

ggplot(ctripdata, aes(format(started_on, "%Y-%m",label=scales::percent(member_casual)))) +
  geom_bar(stat = "count") +
  labs(x = "Month")+
  theme(axis.text.x = element_text(angle = 90))

Bar chart of the number of rides per month (separated by user type)

ggplot(ctripdata, aes(format(started_on, "%m"),fill=member_casual)) +
  geom_bar(stat = "count") +
  labs(x = "Month")+
  theme(axis.text.x = element_text(angle = 90))

Bar chart of the number of rides per week (separated by user type)

ggplot(ctripdata, aes(format(started_on, "%U"),fill=member_casual)) +
  geom_bar(stat = "count") +
  labs(x = "Week")+
  theme(axis.text.x = element_text(angle = 90))

Ride types

Bar chart of the number of rides per hour of the day (seperated by user ride type)

ctripdata %>%
    ggplot(aes(d_hour, fill=rideable_type)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0:23))

Number of rides with the different bike types (by user type)

ctripdata %>% 
  group_by(member_casual, rideable_type) %>% 
  summarise(number_of_rides = n()) %>% 
  ggplot(aes(x = rideable_type, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Use by ride duration with the different bike types through the week (by rideable type)

ctripdata %>% 
  group_by(rideable_type,d_day_of_week) %>% 
  summarise(number_of_rides = n(),average_duration = mean(ride_duration/60)) %>% 
  ggplot(aes(x = d_day_of_week, y = average_duration, fill = rideable_type)) +
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'rideable_type'. You can override using the
## `.groups` argument.
## Don't know how to automatically pick scale for object of type difftime.
## Defaulting to continuous.

Bar chart of the number of rides per hour of the day (seperated by user type)

ctripdata %>%
    ggplot(aes(d_hour, fill=member_casual)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0:23))

Bar chart of the number of rides per hour of the day (Casual users)

ctripdata %>% dplyr::filter(member_casual == "casual") %>%
    ggplot(aes(d_hour)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0:23))+
    coord_cartesian(ylim = c(0, 300000))

Bar chart of the number of rides per hour of the day (members)

both figures, above and below have been scaled equally for perspective

ctripdata %>% dplyr::filter(member_casual == "member") %>%
    ggplot(aes(d_hour)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0:23))+
    coord_cartesian(ylim = c(0, 300000))

Most active hour of the day for each month (seperated by User type)

ctripdata %>%
    ggplot(aes(d_hour, fill=member_casual)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0,2,4,6,8,10,12,14,16,18,20,22,24))+
    facet_wrap(~d_month)

Most active hour of the day for each day (seperated by User type)

ctripdata %>%
    ggplot(aes(d_hour, fill=member_casual)) +
    labs(x="Hour of the day") +
    geom_bar()+
    scale_x_discrete(limits = c(0,2,4,6,8,10,12,14,16,18,20,22,24))+
    facet_wrap(~d_day_of_week)

Scatterplot for each user type throughout the year

ctripdata %>% 
  count(started_on, member_casual) %>% 
  ggplot(temp, mapping = aes(x=started_on,y=n, fill=member_casual,color=member_casual))+     geom_point() + 
    geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Number of rides per user type per day of the week

ctripdata %>% 
  group_by(member_casual,d_day_of_week) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_duration_sec/60), .groups = 'drop_last') %>% 
  arrange(member_casual, d_day_of_week)  %>% 
  ggplot(aes(x= d_day_of_week, y= number_of_rides/100000, fill=member_casual))+
    geom_col(position = "dodge")+
    ylab("Number of Rides (1 = 100,000)") + 
    xlab("Day of Week")

Number of rides per day of the week for each month. (seperated by user type)

ctripdata %>% 
  group_by(member_casual,d_day_of_week,d_month) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_duration_sec/60), .groups = 'drop_last') %>% 
  arrange(member_casual, d_day_of_week)  %>%

  ggplot(aes(x= d_day_of_week, y= number_of_rides/100000, fill=member_casual))+
  geom_col(position = "dodge")+
  facet_wrap(~d_month)+
  ylab("Number of Rides (1 = 100,000)") +
  xlab("Day of Week")

Findings Summary

  • The average ride duration on from the retrieved data is 1203.446 seconds (20 minutes)
  • for the Casual group, that is 29 minutes (rounded)
  • for the Members group, that is 13 minutes (rounded)
  • The most number of times that the service was used by the different user type:
  • Casual users is a Saturday
  • Member users is a Wednesday
  • Notably, the different users tend to use the service at opposite days, Members tend to use it throughout the week, yet the casuals are highly active during the weekend mainly and less active throughout the workweek (Mon-Fri).
  • The difference between the users throughout the week is also clear in that the members will use it more frequently over the course of the whole week (from 300,000 to 400,000) , yet the casual users fluctuate much more significantly (from around 200,000 to 450,000).
  • The casual users tend to use the service more frequently than the members in during the school summer holidays, this may indicate that there is a large portion of casual users that are students.
  • During the winter months, neither user types are using the service. However, the members do tend to use the service more throughout the whole year including winter months. It may be that the member users tend to use the service as their main form of transportation, or at least part of their daily commute.
  • The correlation between weather and the number of rides taken is fairly strong with a value of 0.88
  • The most popular bike is the classing bike, with second the electric bike, and thirdly the docked bike.
  • However, the Docked bikes are used for a longer duration and the other two types (classing & electric) are used for much smaller trips.

Act Phase

Answering the 4 main questions provided in the Ask phase:

  1. How do annual members and casual riders use Cyclistic bikes differently?
  • the members tend to use the bikes more frequently and as a mode of transportation for their daily commute and their use decreases during the weekend.
  • There is a significant spike in use of casual users during weekends and during the holidays.
  1. Why would casual riders buy Cyclistic annual memberships?
  • A possible reduction in their expenses, however, this depends on the number of times and the duration that each user uses the service. More information is used to complete that part of the analysis (such as the a unique identification number for each user)
  1. How can Cyclistic use digital media to influence casual riders to become members?
  • Additional services can be provided to the users that add value to their experience, this could be done by providing a tour around the city, maybe as a group, or individuals. +Special events could be sponsored and the venue could be placed in the middle of the city to encourage the use of the service
  1. Why would (or would not) someone buy the annual membership.
  • The high initial cost may be off putting for the casual users, or, they believe they do not use it enough. However, this could be countered by providing them a simple analysis, that “based on your last weeks usage, we estimate that you can cut your cost by $#.## if you purchase the annual membership.”, and adding on top the first year discount, would be a very attractive offer. +The best time for these campaigns would be at the beginning of the summer holidays where the number of riders are the highest.