I checked whether TrailLinks they had a way to access the reviews on the site through an API. They didn’t, so I checked their robots.txt file at http://traillink.com/robots.txt. They didn’t disallow access to their reviews for each state, so I was able to download all of the reviews for the 259 trails with reviews in Colorado.
Using this: https://github.com/jrosen48/railtrail
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
library(stringr)
library(lme4)
library(broom)
df <- read_rds("/Users/joshuarosenberg/Dropbox/1_Research/railtrail/data/co.rds")
df <- df %>%
unnest(raw_reviews) %>%
filter(!is.na(raw_reviews)) %>%
rename(raw_review = raw_reviews,
trail_name = name)
On the site, there are “surfaces” (i.e., asphalt and gravel) and “categories” (i.e., rail-trail and paved pathway), so I tried to group them into a few categories.
df <- df %>%
mutate(category = as.factor(category),
category = forcats::fct_recode(category, "Greenway/Non-RT" = "Canal"),
mean_review = ifelse(mean_review == 0, NA, mean_review))
## Warning: Unknown levels in `f`: Canal
df <- mutate(df,
surface_rc = case_when(
surface == "Asphalt" ~ "Paved",
surface == "Asphalt, Concrete" ~ "Paved",
surface == "Concrete" ~ "Paved",
surface == "Asphalt, Boardwalk" ~ "Paved",
str_detect(surface, "Stone") ~ "Crushed Stone",
str_detect(surface, "Ballast") ~ "Crushed Stone",
str_detect(surface, "Gravel") ~ "Crushed Stone",
TRUE ~ "Other"
)
)
To try to figure out what trails had many good reviews, I used an approach that is not an average of all of the reviews for the trail, but a rating that uses the value of the individual reviews for a trail as well as how different they are from each other and how different they are from the “average” review across every trail.
These ratings - model_based_rating
below - are from the mixed effects model specified here:
m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)
The data has to be merged back into the data frame with the other characteristics of the trail:
m1_tidied <- tidy(m1)
m1_fe <- filter(m1_tidied, group == "fixed")
estimated_trail_means <- ranef(m1)$trail_name %>%
rownames_to_column() %>%
as_tibble() %>%
rename(trail_name = rowname, estimated_mean = `(Intercept)`) %>%
mutate(model_based_rating = estimated_mean + m1_fe$estimate)
df_ss <- df %>%
group_by(trail_name) %>%
summarize(raw_mean = mean(raw_review))
df_out <- left_join(df_ss, estimated_trail_means)
df_out <- left_join(df_out, df)
Here are the top-20 trails of any length in Colorado:
df_out %>%
select(trail_name, surface_rc, distance, category, model_based_rating, raw_mean, n_reviews) %>%
distinct() %>%
arrange(desc(model_based_rating)) %>%
mutate_if(is.numeric, function(x) round(x, 3)) %>%
head(20) %>%
knitr::kable()
trail_name | surface_rc | distance | category | model_based_rating | raw_mean | n_reviews |
---|---|---|---|---|---|---|
Poudre River Trail | Paved | 21.8 | Greenway/Non-RT | 4.952 | 5.000 | 12 |
C-470 Bikeway | Paved | 36.0 | Greenway/Non-RT | 4.855 | 5.000 | 3 |
Colorado Riverfront Trail | Paved | 22.1 | Greenway/Non-RT | 4.855 | 5.000 | 4 |
Dillon Dam Recpath | Paved | 9.6 | Greenway/Non-RT | 4.855 | 5.000 | 2 |
Longmont-to-Boulder Regional Trail | Crushed Stone | 10.8 | Greenway/Non-RT | 4.855 | 5.000 | 2 |
Manitou Incline | Other | 1.0 | Rail-Trail | 4.855 | 5.000 | 2 |
Midland Trail | Paved | 1.6 | Rail-Trail | 4.855 | 5.000 | 3 |
Upper Gold Camp Road | Crushed Stone | 15.0 | Rail-Trail | 4.855 | 5.000 | 4 |
Glenwood Canyon Recreation Trail | Paved | 14.4 | Greenway/Non-RT | 4.794 | 4.818 | 12 |
Big Dry Creek Trail (Littleton) | Paved | 5.4 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
Boulder Creek Path | Other | 7.0 | Greenway/Non-RT | 4.755 | 5.000 | 2 |
Cherry Creek Spillway Trail | Paved | 3.1 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
Elmer’s Two Mile Creek Greenway | Paved | 0.8 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
Galloping Goose Trail (CO) | Crushed Stone | 20.0 | Rail-Trail | 4.755 | 5.000 | 1 |
LaForet Trail | Other | 2.0 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
Mason Trail | Paved | 4.5 | Rail-Trail | 4.755 | 5.000 | 1 |
Narrow Gauge Trail (CO) | Crushed Stone | 2.0 | Rail-Trail | 4.755 | 5.000 | 1 |
Power Trail | Paved | 3.9 | Rail-Trail | 4.755 | 5.000 | 1 |
Toll Gate Creek Trail | Paved | 6.3 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
Vail Pass Recpath | Paved | 14.4 | Greenway/Non-RT | 4.755 | 5.000 | 1 |
What if we wanted to take a shorter trip - one less than 10 miles? Here are the top-10 shorter trails in Colorado:
df_out %>%
select(trail_name, surface_rc, distance, category, model_based_rating, raw_mean, n_reviews) %>%
distinct() %>%
filter(distance < 10) %>%
arrange(desc(model_based_rating), desc(n_reviews)) %>%
head(10) %>%
knitr::kable()
trail_name | surface_rc | distance | category | model_based_rating | raw_mean | n_reviews |
---|---|---|---|---|---|---|
Midland Trail | Paved | 1.6 | Rail-Trail | 4.854577 | 5 | 3 |
Dillon Dam Recpath | Paved | 9.6 | Greenway/Non-RT | 4.854577 | 5 | 2 |
Manitou Incline | Other | 1.0 | Rail-Trail | 4.854577 | 5 | 2 |
Boulder Creek Path | Other | 7.0 | Greenway/Non-RT | 4.755275 | 5 | 2 |
Big Dry Creek Trail (Littleton) | Paved | 5.4 | Greenway/Non-RT | 4.755275 | 5 | 1 |
Cherry Creek Spillway Trail | Paved | 3.1 | Greenway/Non-RT | 4.755275 | 5 | 1 |
Elmer’s Two Mile Creek Greenway | Paved | 0.8 | Greenway/Non-RT | 4.755275 | 5 | 1 |
LaForet Trail | Other | 2.0 | Greenway/Non-RT | 4.755275 | 5 | 1 |
Mason Trail | Paved | 4.5 | Rail-Trail | 4.755275 | 5 | 1 |
Narrow Gauge Trail (CO) | Crushed Stone | 2.0 | Rail-Trail | 4.755275 | 5 | 1 |