#NOTES median income in chicago 2019 is $61,811 Source: City Data retrieval date: 08/JAN/2022

Formula used to convert distance travelled from LatLong: formula retrieved from: stackoverflow answered by: JMX on: Aug 13 ’15 at 0:20

which has been modified to the following for the first row:

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Information Sources

Divvy Bikes website: divvy
Divvy Bikes pricing: pricing (date: 08/JAN/2022)
Sources: files sources
File used: 202107-divvy-tripdata.zip
License: License

Characters and teams

The different types of parties involved with the decision making:

Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Statement of the business task

Cyclistic is interested in identifying the number of riders there are with each customer type.

Ask Phase

The client Cyclistic is interested in understanding the difference in use by the different types of members. What are the benefits for the non-annual members to become annual members. Lily Moreno already has an opinion:

“Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers”

It is true that these members already have a knowledge about the product, but there may be different factors to consider, the cost of an annual membership, the distances traveled and the ease of use (pickup location and drop-off).

Three main questions that are intended to be answered are as follows:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
Why would (or would not) someone buy the annual membership. this can be compared to the average income of a Chicago citizen. the annual membership is also billed upfront annually and may cause hesitance to purchase. A comparison of the following: (by cost, distance traveled, number of commuters per transportation type)

Cars
Public transportation
Bicycles

Prepare

Customer types

Single-ride passes, (aka. casual riders)
- with a cost of $3.30/trip - One trip up to 30 minutes.
Full-day passes, (aka. casual riders)
- with a cost of $15/day - Unlimited 3-hour rides for 24-hours.
Annual memberships, (aka. Cyclistic members)
- with a cost of $9*/month - Unlimited 45-min rides. paid upfront annually

The client has provided the sources for reliable information. (see Heading: Information Sources). The information of the year 2021 has been downloaded and put in a folder. The csv files have been extracted from the zip files and opened.

Process

For ease of calculations and use, the “Duration” between the start time and end time has been calculated as follows:

= ended_at - started_at

Calculating the “Distance_traveled” by each trip has been done with the following formula and the result is in Km: (the formula has been verified by: latlongdata.com)

= ACOS(COS(RADIANS(90- [start_lat] )) * COS(RADIANS(90 - [end_lat] )) +SIN(RADIANS(90 - [start_lat] )) * SIN(RADIANS(90 - [end_lat] )) * COS(RADIANS( [start_lng] - [end_lng] ))) * 6371

In addition, the day of the week has been added with the following formula: (this makes Sunday = 1, Monday = 2,…)

= MATCH(TEXT( [started_at] ,“ddd”),{“Sun”,“Mon”,“Tue”,“Wed”,“Thu”,“Fri”,“Sat”},0)

Shortcomings of the data

The ride_id are unique variables and do not reflect the user, therefore it is not possible to identify the individual rider’s behavior and to clarify the user’s use of their membership.

Credibility of data

The information has been directly retrieved from their servers, therefore as it is first hand data, it is credible.

Tools used

Both Excel and R will be used for performing the analysis.

Code and analysis

Below, the different codes are displayed with their corresponding explanations for reason of use, process and the results are shown below.

Installing required libraries

Below are the packages used for the analysis

install.packages("tidyverse")
install.packages("rmarkdown")
install.packages("bookdown")
install.packages("plyr") # Used for merging multiple CSV's into one dataframe
install.packages("modeest")
install.packages("calendR")
install.packages('readxl')

Activating the Libraries

Activating the different libraries

Analyse Phase

Importing the all “*-divvy-tripdata.csv”, the information is to be stored in “tripdata”, and the column names are illustrated

Rows that have missing information to be removed

ctripdata <- drop_na(tripdata_original)

Volume removed: tripdata_original = 5,643,268 rows ctripdata (after dropping NA) = 4,588,186 rows which represents a reduction of 1,055,082 rows (18.7%)

ctripdata <- ctripdata %>% 
  select(-c(start_station_id,end_station_id))

—Converting the date and time column from character data type to dateTime, and adding a separate column for just the date This will help in the filtering and sorting

ctripdata$started_on <- as.Date(ctripdata$started_at, "%d/%m/%Y %H:%M")
ctripdata$ended_on <- as.Date(ctripdata$ended_at, "%d/%m/%Y %H:%M")
ctripdata$started_at <- parse_date_time(ctripdata$started_at, "%d/%m/%Y %H:%M")
ctripdata$ended_at <- parse_date_time(ctripdata$ended_at, "%d/%m/%Y %H:%M")

#adding extra columns for day of the week, day, month, year
ctripdata$d_day_of_week <- format(ctripdata$started_at, "%a") #three letter weekday is used for better presentation in charts
ctripdata$d_day <- as.numeric(format(ctripdata$started_at, "%d"))
ctripdata$d_month <- as.numeric(format(ctripdata$started_at, "%m"))
ctripdata$d_year <- as.numeric(format(ctripdata$started_at, "%Y"))

#adding another column for Ride_Duration to seconds
ctripdata$ride_duration_sec <- as.numeric(ctripdata$ride_duration)

Verifying that all the data compiled have been added to the same column names and verifying that the duration is still a time format.

ctripdata <- ctripdata %>% 
  arrange(started_at) #arranging table in ascending order by "started_at"
# print(ctripdata)
# summary(ctripdata)
# str(ctripdata)

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in minutes & fraction.

ctripdata %>%
  group_by(d_month) %>%
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)

## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0         5.65        9.33      13.9    16.2    1403.
##  2       2   0         6.62       11.1       18.3    19.6    1428.
##  3       3   0         6.95       12.5       20.8    23.3    1435.
##  4       4   0         7.23       12.9       21.7    24.0    1440.
##  5       5   0         7.77       14         23.6    26.0    1438.
##  6       6   0         7.8        13.7       22.7    24.6    1438.
##  7       7   0         7.72       13.4       21.8    24.0    1426.
##  8       8   0         7.37       12.8       20.5    22.8    1434.
##  9       9   0         6.98       12.1       19.4    21.6    1439.
## 10      10   0         6.18       10.6       17.0    18.8    1437.
## 11      11   0.0167    5.28        8.87      13.5    15.2    1434.
## 12      12   0         5.15        8.6       13.1    14.7    1432.

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format (only issue is with Mean & Max not showing a time format but a seconds format).

ctripdata %>%
  group_by(d_month) %>%
  summarise(min_mins = min(ride_duration),
            q1_mins = quantile(ride_duration,0.25),
            median_mins = median(ride_duration),
            mean_mins = mean(ride_duration),
            q3_mins = quantile(ride_duration,0.75),
            max_mins = max(ride_duration))

## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins      q3_mins   max_mins  
##      <dbl> <drtn>   <time>  <time>      <drtn>         <time>    <drtn>    
##  1       1 0 secs   05'39"  09'20"       835.9534 secs 16'13.00" 84194 secs
##  2       2 0 secs   06'37"  11'04"      1098.8898 secs 19'35.25" 85658 secs
##  3       3 0 secs   06'57"  12'28"      1245.2138 secs 23'19.00" 86086 secs
##  4       4 0 secs   07'14"  12'54"      1302.2292 secs 24'01.00" 86397 secs
##  5       5 0 secs   07'46"  14'00"      1418.9013 secs 25'57.00" 86274 secs
##  6       6 0 secs   07'48"  13'42"      1361.7368 secs 24'34.00" 86279 secs
##  7       7 0 secs   07'43"  13'27"      1307.1804 secs 23'59.00" 85581 secs
##  8       8 0 secs   07'22"  12'48"      1230.1429 secs 22'49.00" 86043 secs
##  9       9 0 secs   06'59"  12'08"      1163.2396 secs 21'35.00" 86362 secs
## 10      10 0 secs   06'11"  10'37"      1019.3046 secs 18'50.00" 86244 secs
## 11      11 1 secs   05'17"  08'52"       812.1569 secs 15'12.00" 86048 secs
## 12      12 0 secs   05'09"  08'36"       788.9096 secs 14'40.00" 85913 secs

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format for members

ctripdata %>%
  group_by(d_month) %>%
  filter(member_casual == "member") %>% 
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)

## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0         5.35        8.75      12.0    15.0    1227.
##  2       2   0         6.2        10.2       14.7    17.2    1338.
##  3       3   0         5.87       10.0       13.7    17.3    1354.
##  4       4   0         6.08       10.4       14.2    17.9    1406.
##  5       5   0         6.17       10.6       14.3    18.1    1365.
##  6       6   0         6.27       10.6       14.1    17.9    1322.
##  7       7   0         6.12       10.4       13.8    17.4    1263.
##  8       8   0         5.97       10.1       13.5    17.1    1348.
##  9       9   0         5.73        9.73      13.1    16.5    1318.
## 10      10   0         5.27        8.83      12.0    14.9    1415.
## 11      11   0.0167    4.82        7.97      10.9    13.3    1427.
## 12      12   0         4.72        7.82      10.6    13      1231.

Demonstrating the Min, Q1, Median, Mean, Q3 and Max in time format for casual

ctripdata %>%
  group_by(d_month) %>%
  filter(member_casual == "casual") %>% 
  summarise(min_mins = as.numeric(min(ride_duration))/60,
            q1_mins = as.numeric(quantile(ride_duration, 0.25))/60,
            median_mins = as.numeric(median(ride_duration))/60,
            mean_mins = as.numeric(mean(ride_duration))/60,
            q3_mins = as.numeric(quantile(ride_duration, 0.75))/60,
            max_mins = as.numeric(max(ride_duration))/60)

## # A tibble: 12 x 7
##    d_month min_mins q1_mins median_mins mean_mins q3_mins max_mins
##      <dbl>    <dbl>   <dbl>       <dbl>     <dbl>   <dbl>    <dbl>
##  1       1   0.0167    7.62        12.6      22.8    23.2    1403.
##  2       2   0.0333    9.42        16.7      32.8    30.9    1428.
##  3       3   0.0167   10.4         19.4      33.0    36.7    1435.
##  4       4   0        10.2         18.8      32.7    36.2    1440.
##  5       5   0        10.8         19.7      33.7    36.9    1438.
##  6       6   0        10.4         18.1      31.3    33.0    1438.
##  7       7   0         9.92        17.2      28.8    31.0    1426.
##  8       8   0         9.52        16.4      27.3    29.5    1434.
##  9       9   0         9.17        15.7      26.4    28.6    1439.
## 10      10   0         8.37        14.4      24.6    26.4    1437.
## 11      11   0.0167    7.12        11.9      20.5    21.7    1434.
## 12      12   0.0167    6.95        11.6      20.6    21.1    1432.

Day mostly used by type of user.

aggregate(ctripdata$d_day_of_week ~ ctripdata$member_casual, FUN = mfv)

##   ctripdata$member_casual ctripdata$d_day_of_week
## 1                  casual                     Sat
## 2                  member                     Wed

Average ride duration by member/casual

agg_ctripdata <- ctripdata %>% 
  select(c(member_casual,ride_duration,ride_duration_sec)) #selecting only the relevant columns

aggregate(agg_ctripdata, by = list(agg_ctripdata$member_casual),FUN = mean)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

##   Group.1 member_casual ride_duration ride_duration_sec
## 1  casual            NA 1715.505 secs          1715.505
## 2  member            NA  790.481 secs           790.481

remove(agg_ctripdata)

for the sorting of the day of the week in the correct order and making Sunday the first day of the week

ctripdata$d_day_of_week <- ordered(ctripdata$d_day_of_week, levels=c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

Average ride duration per day for all users

agg_ctripdata <- ctripdata %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week))

aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 1509.118 secs          1509.118            NA
## 2     Mon 1155.191 secs          1155.191            NA
## 3     Tue 1037.411 secs          1037.411            NA
## 4     Wed 1007.889 secs          1007.889            NA
## 5     Thu 1011.946 secs          1011.946            NA
## 6     Fri 1132.455 secs          1132.455            NA
## 7     Sat 1434.589 secs          1434.589            NA

Average ride duration per day for member users

agg_ctripdata <- ctripdata %>% 
  filter(member_casual == "member") %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week)) #selecting only the relevant columns
aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 911.1003 secs          911.1003            NA
## 2     Mon 763.0025 secs          763.0025            NA
## 3     Tue 742.8148 secs          742.8148            NA
## 4     Wed 747.0413 secs          747.0413            NA
## 5     Thu 740.5237 secs          740.5237            NA
## 6     Fri 767.3038 secs          767.3038            NA
## 7     Sat 888.1996 secs          888.1996            NA

Average ride duration per day for casual users

agg_ctripdata <- ctripdata %>% 
  filter(member_casual == "casual") %>% 
  select(c(ride_duration,ride_duration_sec,d_day_of_week)) #selecting only the relevant columns
aggregate(agg_ctripdata, by = list(agg_ctripdata$d_day_of_week),FUN = mean)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

##   Group.1 ride_duration ride_duration_sec d_day_of_week
## 1     Sun 1970.070 secs          1970.070            NA
## 2     Mon 1748.751 secs          1748.751            NA
## 3     Tue 1569.394 secs          1569.394            NA
## 4     Wed 1483.454 secs          1483.454            NA
## 5     Thu 1464.066 secs          1464.066            NA
## 6     Fri 1592.963 secs          1592.963            NA
## 7     Sat 1851.188 secs          1851.188            NA

For an easy comparison of the ride duration per user type on each of the different weekday.

aggregate(ctripdata$ride_duration/60 ~ ctripdata$member_casual + ctripdata$d_day_of_week, FUN = mean)

##    ctripdata$member_casual ctripdata$d_day_of_week ctripdata$ride_duration/60
## 1                   casual                     Sun              32.83451 secs
## 2                   member                     Sun              15.18500 secs
## 3                   casual                     Mon              29.14585 secs
## 4                   member                     Mon              12.71671 secs
## 5                   casual                     Tue              26.15657 secs
## 6                   member                     Tue              12.38025 secs
## 7                   casual                     Wed              24.72424 secs
## 8                   member                     Wed              12.45069 secs
## 9                   casual                     Thu              24.40111 secs
## 10                  member                     Thu              12.34206 secs
## 11                  casual                     Fri              26.54938 secs
## 12                  member                     Fri              12.78840 secs
## 13                  casual                     Sat              30.85314 secs
## 14                  member                     Sat              14.80333 secs

Share Phase

Number of rides during the whole month.

ggplot(data = ctripdata) +
  geom_bar(mapping = aes(x=d_day_of_week))+
  facet_wrap(~d_month)+
  xlab("Day of Week")+
  labs(title = "Number of rides/month") +
  ylab("Number of Rides (1e+01 = 100,000)")

Illustration of the busiest days during the year (all member types)

daily_count <- ctripdata %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) } # placing values within a 0-1 by ratio and to be used in the calendR

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

Illustration of the busiest days during the year (members)

daily_count <- ctripdata %>% 
  filter(member_casual == "member") %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) }
# normalized your data

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

remove(daily_count)

Illustration of the busiest days during the year (casual members)

daily_count <- ctripdata %>% 
  filter(member_casual == "casual") %>% 
  count(started_on) %>% 
  as.data.frame(daily_count) %>% 
  select(-c(started_on)) %>% 
  unlist(use.names = FALSE) 

scale_fn <- function(x) { x / sqrt(sum(x^2)) }
# normalized your data

calendR(year = 2021,
        special.days = scale_fn(daily_count),
        gradient = TRUE,
        special.col = rgb(0, 0.2, .7, alpha = 1), 
        low.col = "white")

remove(daily_count)

Bar chart of the number of rides per day

ctripdata %>% 
  count(started_on) %>% 
  ggplot(aes(x= started_on, y= n,))+
  geom_col(position = "dodge")+ theme(aspect.ratio=1/3)

ggplot(ctripdata, aes(format(started_on, "%Y-%U"),fill=member_casual)) +
  geom_bar(stat = "count") +
  labs(x = "Month")+
  theme(axis.text.x = element_text(angle = 90))

ggplot(ctripdata, aes(format(started_on, "%Y-%m"))) +
  geom_bar(stat = "count") +
  labs(x = "Month")+
  theme(axis.text.x = element_text(angle = 90))

ctripdata %>% 
  count(started_on, member_casual) %>% 
  ggplot(temp, mapping = aes(x=started_on,y=n, fill=member_casual,color=member_casual))+  
           geom_point() + 
           geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ctripdata %>% 
  group_by(member_casual,d_day_of_week) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_duration_sec/60), .groups = 'drop_last') %>% 
  arrange(member_casual, d_day_of_week)  %>% 
  ggplot(aes(x= d_day_of_week, y= number_of_rides/100000, fill=member_casual))+
    geom_col(position = "dodge")+
    ylab("Number of Rides (1 = 100,000)") + 
    xlab("Day of Week")

ctripdata %>% 
  group_by(member_casual,d_day_of_week,d_month) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_duration_sec/60), .groups = 'drop_last') %>% 
  arrange(member_casual, d_day_of_week)  %>%

  ggplot(aes(x= d_day_of_week, y= number_of_rides/100000, fill=member_casual))+
  geom_col(position = "dodge")+
  facet_wrap(~d_month)+
  ylab("Number of Rides (1 = 100,000)") +
  xlab("Day of Week")

ctripdata %>%
  #mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, d_day_of_week) %>%
  summarise(number_of_rides = n(),average_duration = mean(ride_duration/60)) %>%
  arrange(member_casual, d_day_of_week)  %>% 
  
  ggplot(aes(x = d_day_of_week, y = number_of_rides/100000, fill = member_casual)) +
   theme(axis.text.y = element_text(angle = 90)) +
  geom_col(position = "dodge") + 
  labs(title = "Table 1: Number of Rides by Day and Rider Type") + 
  ylab("Number of Rides (1 = 100,000)") + 
  xlab("Day of Week")

## `summarise()` has grouped output by 'member_casual'. You can override using the `.groups` argument.

Divvy Case Study

Yousef Menesy

08/01/2022

About the company

Information Sources

Characters and teams

Statement of the business task

Ask Phase

Prepare

Customer types

Process

Shortcomings of the data

Credibility of data

Tools used

Code and analysis

Installing required libraries

Activating the Libraries

Analyse Phase

Act Phase

Divvy Case Study

Yousef Menesy

08/01/2022

About the company

Information Sources

Characters and teams

Statement of the business task

Ask Phase

Prepare

Customer types

Process

Shortcomings of the data

Credibility of data

Tools used

Code and analysis

Installing required libraries

Activating the Libraries

Analyse Phase

Share Phase

Findings

Act Phase