Geographic Datascience with R

Author

Peter Baumgartner

Published

2025-01-03 14:10

Preface

This is work in progress: Finished about 6%

The book has twelfe chapters with 267 pages (appendix and bibliography not included.) I skipped Chapter 1 to 4 (83 pages) because I am already familiar with their topics. The percentage is therefore calculated by notes referring to the chapters 5 to 12 (184 pages).

WATCH OUT: This is my personal learning material and is therefore neither an accurate replication nor an authoritative textbook.

I am writing this book as a text for others to read because that forces me to become explicit and explain all my learning outcomes more carefully. Please keep in mind that this text is not written by an expert but by a learner.

Text passages with content I am already familiar I have skipped. In this case I have started with chapter 5 because the chapters 1 to 4 is an introduction into using R which I am already comfortable with.

Section of the original text where I needed more in-depth knowledge I have elaborated and added my own comments resulted from my personal research.

All mistakes are my own responsibility

In spite of replicating most of the content this Quarto book may contain many mistakes. All the misapprehensions and errors are of course my own responsibility.

Content and Goals of this Book

This Quarto book collects my personal notes, trials and exercises of BOOK NAME by XXX: NAME OF AUTHOR [CITATION].

XXX: DESCRIPTION OF THE BOOK AND WHY IT IS WORTH FOR A QUARTO BOOK

Text passages

Quotes and personal comments

My text consists mostly of quotes from the [BOOK NAME].

Code Collection 1 : Quote

XXX: EXAMPLE QUOTE

Often I made minor editing (e.g., shorting the text) or put the content in my own wording. In this case I couldn’t quote the text as it does not represent a specific text passage. In this case I end the paraphrase with (AUTOR NAME ibid.).

In any case most of the text in this Quarto book is not mine but coming from different resources.

Code Collection 2 : Personal note

Assessment 1 : This is a personal note

In this kind of box I will write my personal thoughts and reflections. Usually this box will appear stand-alone (without the wrapping example box).

Glossary

I am using the {glossary} package to create links to glossary entries.]

R Code 1 : Load glossary

Listing / Output 1: Install and load the glossary package with the appropriate glossary.yml file
## 1. Install the glossary package:
## https://debruine.github.io/glossary/

library(glossary)

## If you want to use my glossary.yml file:

## 1. fork my repo
##    https://github.com/petzi53/glossary-pb

## 2. Download the `glossary.yml` file from
##    https://github.com/petzi53/glossary-pb/blob/master/glossary.yml)

## 3. Store the file on your hard disk
##    and change the following path accordingly

glossary::glossary_path("../glossary-pb/glossary.yml")

If you hover with your mouse over the double underlined links it opens an window with the appropriate glossary text. Try this example: Z-Score.

WATCH OUT! Glossary is my private learning vehicle

I have added many of the glossary entries when I was working through other books either taking the text passage of these books I was reading or via an internet recherche from other resources. I have added the source of glossary entry. Sometimes I have used abbreviation, but I need still to provide a key what this short references mean.

If you fork the [XXX: REPO OF THIS BOOK(URL OF THIS BOOK) then the glossary will not work out of the box. Load down the glossary.yml file from my glossary-pb GitHub repo, store it on your hard disk and change the path in the code chunk Listing / Output 1.

In any case I am the only responsible person for this text, especially if I have used code from the resources wrongly or misunderstood a quoted text passage.

R Code and Datasets

Download datasets

The data files used in each chapter has to be downloaded from https://doi.org/10.6084/m9.figshare.21301212.

Run the following code chunk only once (manually).

R Code 2 : Download Data Files

Code
base::source(file = "R/helper.R")

## create data folder (only once, e.g., only in this chapter)
baseURL <- here::here()
pb_create_folder(base::paste0(baseURL, "/data"))

url = "https://figshare.com/ndownloader/files/39733921"
utils::download.file(url, base::paste0(baseURL, "/data/gdswr_data.zip"))
utils::unzip(base::paste0(baseURL, "/data/gdswr_data.zip"), 
             exdir = base::paste0(baseURL, "/data"))

## delete .zip file
fn <- (base::paste0(baseURL, "/data/gdswr_data.zip"))
if (file.exists(fn)) {
  file.remove(fn)
}
(For this R code chunk is no output available)
Watch out!

After I tried to commit my changes, I learned that there is a file mesodata_large.csv in “data/Chapter3” with 151.28 MB. This exceeds GitHub’s file size limit of 100.00 MB and can’t, therefore, be stored on Github the standard way.

There is the possibility to use Git Large File Storage - https://git-lfs.github.com. However, I decided against this complexity. Actually, I do not need the files of Chapter1 to Chapter4 as I skipped the starting chapters. Therefore, I could delete this file. However, I have chosen to change .gitignore accordingly so that this large file will not be transferred to GitHub.

Style guides

Generally I will use the Tidyverse Style Guide for code chunks. I am going to use underscore (_) or snake case to replace spaces as studies has shown that it is easier to read (Sharif and Maletic 2010).

Additionally I will use some Google style modifications from the tidyverse style guide:

  • Start the names of private functions with pb_ (not with a dot, as recommended in the Google style guide).
  • Don’t use base::attach().
  • No right-hand assignments.
  • Use explicit returns.
  • Qualify namespace.

Qualifying namespace

Especially the last point (qualifying namespace) is important for my learning. Besides preventing conflicts with functions of identical names from different packages it helps to learn (or remember) which function belongs to which package. I think this justifies the small overhead and helps to make R code chunks self-sufficient. (No previous package loading, or library calls in the setup chunk.) To foster learning the relation between function and package I embrace the package name with curly brakes and format it in bold.

I am using the package name also for the default installation of base R. This wouldn’t be necessary but it helps me to understand where the base R functions come from. What follows is a list of base R packages of the system library included into every installation and attached (opened) by default:

  • {base}: The R Base Package
  • {datsets}: The R Datasets Package
  • {graphics}: The R Graphics Package
  • {grDevices}: The R Graphics Devices and Support for Colours and Fonts
  • {methods}: Formal Methods and Classes
  • {stats}: The R Stats Package
  • {utils}: The R Utils Package

Code linking

Code linking does not work together with code annotation. I am therefore using standard comments for line numbering and explaining it in normal numbered lists after the code chunk. This is not optimal but for learning issues it is important to have link to the original documentations of the packages function.

Code snippets

I am not using always the exact code snippets for my replications because I am not only replicating the code to see how it works but also to change the values of parameters to observe their influences.

When it is clear then I will follow the advice from Hadley Wickham:

When you call a function, you typically omit the names of data arguments, because they are used so commonly. If you override the default value of an argument, use the full name (tidyverse style guide).

XXX: EXPLAIN AND REWRITE DATA USAGE

Resources

Resource 1 : Resources used for Geographic Datascience with R (GDSWR)

  • Wimberly, M. C. (2023). Geographic Data Science With R: Visualizing and Analyzing Environmental Change (1st ed.). Chapman & Hall/CRC.
  • Wimberly, M. C. (2023). Geographic Data Science with R: Visualizing and Analyzing Environmental Change. https://bookdown.org/mcwimberly/gdswr-book/
  • Wimberly, M. C. (2022). Geographic Data Science with R [Dataset]. figshare. https://doi.org/10.6084/m9.figshare.21301212.v3
  • Dyba, K. (2024, June 26). How to load and save vector data in R. R-Spatial. https://r-spatial.org/r/2024/06/26/sf-load-save.html (additionally used for chapter 5)

Packages introduced in the preface

Resource 2 glossary: Glossaries for Markdown and Quarto Documents


Package Profile 1: {glossary}: A Package to Create Glossaries for Markdown and Quarto Documents

{glossary}: Glossaries for Markdown and Quarto Documents

Add glossaries to markdown and quarto documents by tagging individual words. Definitions can be provided inline or in a separate file.

There is a lot of necessary jargon to learn for coding. The goal of glossary is to provide a lightweight solution for making glossaries in educational materials written in quarto or R Markdown. This package provides functions to link terms in text to their definitions in an external glossary file, as well as create a glossary table of all linked terms at the end of a section.


Private Functions

XXX: DESCRIBE PRIVATE FUNCTION USED IN THIS BOOK

Glossary

term definition
Z-score A z-score (also called a standard score) gives you an idea of how far from the mean a data point is. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is. (<a href="https://www.statisticshowto.com/probability-and-statistics/z-score/#Whatisazscore">StatisticsHowTo</a>)

Session Info

Session Info

Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS Sequoia 15.2
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Vienna
#>  date     2024-12-30
#>  pandoc   3.6.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
#>  colorspace    2.1-1      2024-07-26 [2] CRAN (R 4.4.0)
#>  commonmark    1.9.2      2024-10-04 [2] CRAN (R 4.4.1)
#>  curl          6.0.1      2024-11-14 [2] CRAN (R 4.4.1)
#>  digest        0.6.37     2024-08-19 [2] CRAN (R 4.4.1)
#>  evaluate      1.0.1      2024-10-10 [2] CRAN (R 4.4.1)
#>  fastmap       1.2.0      2024-05-15 [2] CRAN (R 4.4.0)
#>  glossary    * 1.0.0.9003 2024-08-05 [2] Github (debruine/glossary@05e4a61)
#>  glue          1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1    2024-04-04 [2] CRAN (R 4.4.0)
#>  htmlwidgets   1.6.4      2023-12-06 [2] CRAN (R 4.4.0)
#>  jsonlite      1.8.9      2024-09-20 [2] CRAN (R 4.4.1)
#>  kableExtra    1.4.0      2024-01-24 [2] CRAN (R 4.4.0)
#>  knitr         1.49       2024-11-08 [2] CRAN (R 4.4.1)
#>  lifecycle     1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3      2022-03-30 [2] CRAN (R 4.4.0)
#>  markdown      1.13       2024-06-04 [2] CRAN (R 4.4.0)
#>  munsell       0.5.1      2024-04-01 [2] CRAN (R 4.4.0)
#>  R6            2.5.1      2021-08-19 [2] CRAN (R 4.4.0)
#>  rlang         1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.29       2024-11-04 [2] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1     2024-10-22 [2] CRAN (R 4.4.1)
#>  rversions     2.1.2      2022-08-31 [2] CRAN (R 4.4.0)
#>  scales        1.3.0      2023-11-28 [2] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2      2021-12-06 [2] CRAN (R 4.4.0)
#>  stringi       1.8.4      2024-05-06 [2] CRAN (R 4.4.0)
#>  stringr       1.5.1      2023-11-14 [2] CRAN (R 4.4.0)
#>  svglite       2.1.3      2023-12-08 [2] CRAN (R 4.4.0)
#>  systemfonts   1.1.0      2024-05-15 [2] CRAN (R 4.4.0)
#>  vctrs         0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite   0.4.2      2023-05-02 [2] CRAN (R 4.4.0)
#>  xfun          0.49       2024-10-31 [2] CRAN (R 4.4.1)
#>  xml2          1.3.6      2023-12-04 [2] CRAN (R 4.4.0)
#>  yaml          2.3.10     2024-07-26 [2] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

References

Sharif, Bonita, and Jonathan I. Maletic. 2010. “2010 IEEE 18th International Conference on Program Comprehension.” In, 196–205. https://doi.org/10.1109/ICPC.2010.41.