The goal of this document is to explore the data base of allometric equations from temperate forests. This is in preparation to a meeting with Erika Gonzalez (GonzalezEB@si.edu). Here I focus mainly on wrangling issues.
library(tidyverse)
library(allodb)
glimpse(allodb::allo_temperate)
Observations: 415
Variables: 25
$ Site <chr> "Lilly Dic...
$ Family <chr> "Anacardia...
$ Species <chr> "Rhus typh...
$ `Species code1` <chr> "899", "36...
$ `Growth form` <chr> "Shrub", "...
$ `Wood specific gravity` <chr> "0.5", "0....
$ a <chr> "-2.48", "...
$ b <dbl> 2.48, 2.48...
$ c <dbl> NA, NA, NA...
$ d <dbl> NA, NA, NA...
$ MinDBH <chr> "2.5", "2....
$ MaxDHB <chr> "56", "56"...
$ `DBH Units` <chr> "cm", "cm"...
$ `Biomass Units` <chr> "kg", "kg"...
$ `AGB equation` <chr> "biomass=e...
$ `Corrected for bias` <chr> "yes", "ye...
$ `Bias correction (CF)` <chr> "0.36", "0...
$ `Biomass componet` <chr> "Total abo...
$ Taxa <chr> "Mixed har...
$ `Development species` <chr> NA, NA, NA...
$ Region2 <chr> "North Ame...
$ `Biomass equation source` <chr> "Jenkins e...
$ `wsg source` <chr> "Jenkins e...
$ `Notes on diameter, others` <chr> NA, NA, NA...
$ `Reference SERC (original metionned in Jenkins et al.2004)` <chr> NA, NA, NA...
I’ll tweak the variable’s names to make it easier to remember them and to handle the data set.
tmp <- allodb::allo_temperate %>%
set_names(tolower) %>%
# replacing with underscore (white space, comma, and period)
set_names(~str_replace_all(., " |,|\\.", "_")) %>%
set_names(~str_replace_all(., "__", "_")) %>%
# remove brackets to make names valid (avoid backtics)
set_names(~str_replace_all(., "\\(|\\)", ""))
head(names(tmp))
[1] "site" "family" "species"
[4] "species_code1" "growth_form" "wood_specific_gravity"
Right now, the code I drafted works with a data set formatted this way:
bmss::toy_default_eqn
It’d be great if we can express the allometric equations only as a function of dbh – which means that all other parameters would need to be replaced by an appropriate corresponding constant. (If this is not possible, then I need to see exactly why so I can adjust the code to take a different input.)
This is a (likely incomplete) list of parameters that appear in the allometric equations. These parameters I would like to replace by appropriate constants.
eq <- unique(tmp$agb_equation)
to_split <- "\\+|\\(|\\)|\\=|\\*|\\[|\\]|-|\\^|x|\\/| |ln|log10|,"
to_discard <- "^biomass$|^DBH$|^dbh$|^[0-9]+[\\.]*[0-9]*$|^pi$|^$"
eq %>%
str_split(to_split) %>%
map(str_trim) %>%
map(unique) %>%
map(~discard(., ~is.na(.))) %>%
map(~discard(., ~grepl(to_discard, .))) %>%
reduce(c) %>%
unique()
[1] "e" "p" "a"
[4] "b" "dia" "c"
[7] "d" "aDBH" "aD1"
[10] "BA_at_5cm_above_ground" "cm" "b×"
[13] "WD" "Bk" "dba"
[16] "BAE" "DBA" "BAT"
[19] "BFT" "BBL" "BST"
[22] "BSW" "BSB" "diameter"
Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets.
– Tidy Data, by Hadley Wickham (https://www.jstatsoft.org/article/view/v059i10)
As a side note, one idea that usually helps me decide how to structure my data is that of “tidy data”. The idea of tidy data is then implemented in this free book: http://r4ds.had.co.nz/ – which explains the most common tools for wranging data (and for data science in general).