Access the trail data

I checked whether TrailLinks they had a way to access the reviews on the site through an API. They didn’t, so I checked their robots.txt file at http://traillink.com/robots.txt. They didn’t disallow access to their reviews for each state, so I was able to download all of the reviews for the 259 trails with reviews in Colorado.

Using this: https://github.com/jrosen48/railtrail

library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
library(stringr)
library(lme4)
library(broom)

df <- read_rds("/Users/joshuarosenberg/Dropbox/1_Research/railtrail/data/co.rds")

df <- df %>% 
    unnest(raw_reviews) %>% 
    filter(!is.na(raw_reviews)) %>% 
    rename(raw_review = raw_reviews,
           trail_name = name)

What are the characteristics of the best trails?

On the site, there are “surfaces” (i.e., asphalt and gravel) and “categories” (i.e., rail-trail and paved pathway), so I tried to group them into a few categories.

df <- df %>% 
    mutate(category = as.factor(category),
           category = forcats::fct_recode(category, "Greenway/Non-RT" = "Canal"),
           mean_review = ifelse(mean_review == 0, NA, mean_review))
## Warning: Unknown levels in `f`: Canal
df <- mutate(df,
             surface_rc = case_when(
                 surface == "Asphalt" ~ "Paved",
                 surface == "Asphalt, Concrete" ~ "Paved",
                 surface == "Concrete" ~ "Paved",
                 surface == "Asphalt, Boardwalk" ~ "Paved",
                 str_detect(surface, "Stone") ~ "Crushed Stone",
                 str_detect(surface, "Ballast") ~ "Crushed Stone",
                 str_detect(surface, "Gravel") ~ "Crushed Stone",
                 TRUE ~ "Other"
             )
)

Building a model

To try to figure out what trails had many good reviews, I used an approach that is not an average of all of the reviews for the trail, but a rating that uses the value of the individual reviews for a trail as well as how different they are from each other and how different they are from the “average” review across every trail.

These ratings - model_based_rating below - are from the mixed effects model specified here:

m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)

The data has to be merged back into the data frame with the other characteristics of the trail:

m1_tidied <- tidy(m1)

m1_fe <- filter(m1_tidied, group == "fixed")

estimated_trail_means <- ranef(m1)$trail_name %>% 
    rownames_to_column() %>% 
    as_tibble() %>% 
    rename(trail_name = rowname, estimated_mean = `(Intercept)`) %>% 
    mutate(model_based_rating = estimated_mean + m1_fe$estimate)

df_ss <- df %>% 
    group_by(trail_name) %>% 
    summarize(raw_mean = mean(raw_review))

df_out <- left_join(df_ss, estimated_trail_means)
df_out <- left_join(df_out, df)

So, where are we riding next?

Here are the top-20 trails of any length in Colorado:

df_out %>% 
    select(trail_name, surface_rc, distance, category, model_based_rating, raw_mean, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(model_based_rating)) %>% 
    mutate_if(is.numeric, function(x) round(x, 3)) %>% 
    head(20) %>% 
    knitr::kable()
trail_name surface_rc distance category model_based_rating raw_mean n_reviews
Poudre River Trail Paved 21.8 Greenway/Non-RT 4.952 5.000 12
C-470 Bikeway Paved 36.0 Greenway/Non-RT 4.855 5.000 3
Colorado Riverfront Trail Paved 22.1 Greenway/Non-RT 4.855 5.000 4
Dillon Dam Recpath Paved 9.6 Greenway/Non-RT 4.855 5.000 2
Longmont-to-Boulder Regional Trail Crushed Stone 10.8 Greenway/Non-RT 4.855 5.000 2
Manitou Incline Other 1.0 Rail-Trail 4.855 5.000 2
Midland Trail Paved 1.6 Rail-Trail 4.855 5.000 3
Upper Gold Camp Road Crushed Stone 15.0 Rail-Trail 4.855 5.000 4
Glenwood Canyon Recreation Trail Paved 14.4 Greenway/Non-RT 4.794 4.818 12
Big Dry Creek Trail (Littleton) Paved 5.4 Greenway/Non-RT 4.755 5.000 1
Boulder Creek Path Other 7.0 Greenway/Non-RT 4.755 5.000 2
Cherry Creek Spillway Trail Paved 3.1 Greenway/Non-RT 4.755 5.000 1
Elmer’s Two Mile Creek Greenway Paved 0.8 Greenway/Non-RT 4.755 5.000 1
Galloping Goose Trail (CO) Crushed Stone 20.0 Rail-Trail 4.755 5.000 1
LaForet Trail Other 2.0 Greenway/Non-RT 4.755 5.000 1
Mason Trail Paved 4.5 Rail-Trail 4.755 5.000 1
Narrow Gauge Trail (CO) Crushed Stone 2.0 Rail-Trail 4.755 5.000 1
Power Trail Paved 3.9 Rail-Trail 4.755 5.000 1
Toll Gate Creek Trail Paved 6.3 Greenway/Non-RT 4.755 5.000 1
Vail Pass Recpath Paved 14.4 Greenway/Non-RT 4.755 5.000 1

What if we wanted to take a shorter trip - one less than 10 miles? Here are the top-10 shorter trails in Colorado:

df_out %>% 
    select(trail_name, surface_rc, distance, category, model_based_rating, raw_mean, n_reviews) %>% 
    distinct() %>% 
    filter(distance < 10) %>% 
    arrange(desc(model_based_rating), desc(n_reviews)) %>% 
    head(10) %>% 
    knitr::kable()
trail_name surface_rc distance category model_based_rating raw_mean n_reviews
Midland Trail Paved 1.6 Rail-Trail 4.854577 5 3
Dillon Dam Recpath Paved 9.6 Greenway/Non-RT 4.854577 5 2
Manitou Incline Other 1.0 Rail-Trail 4.854577 5 2
Boulder Creek Path Other 7.0 Greenway/Non-RT 4.755275 5 2
Big Dry Creek Trail (Littleton) Paved 5.4 Greenway/Non-RT 4.755275 5 1
Cherry Creek Spillway Trail Paved 3.1 Greenway/Non-RT 4.755275 5 1
Elmer’s Two Mile Creek Greenway Paved 0.8 Greenway/Non-RT 4.755275 5 1
LaForet Trail Other 2.0 Greenway/Non-RT 4.755275 5 1
Mason Trail Paved 4.5 Rail-Trail 4.755275 5 1
Narrow Gauge Trail (CO) Crushed Stone 2.0 Rail-Trail 4.755275 5 1