5 Vector Geospatial Data

Chapter section list

Import geospatial data: Section 5.1
Creating simple maps: Section 5.2
Overlaying vector datasets: Section 5.3
Save spatial geodata files: Section 5.4
Choropleth maps: Section 5.5
Modifying map appearance
Exporting graphics output
R-spatial tutorial: Section 5.6
Practice

5.1 Import Geospatial Data

5.1.1 ESRI shapefile format

The data for import in chapter 5 are provided in ESRI shapefile format. This format was developed several decades ago but remains one of the widely used file formats for vector geospatial data. It is a multiple file format, where separate files contain the feature geometries, attribute table, spatial indices, and coordinate reference system.

R Code 5.1 : Import Geospatial Data

Code

glue::glue("############### import esri data #############")
okcounty <- sf::st_read("data/Chapter5/ok_counties.shp", quiet = TRUE)
tpoint <- sf::st_read("data/Chapter5/ok_tornado_point.shp", quiet = TRUE)
tpath <- sf::st_read("data/Chapter5/ok_tornado_path.shp", quiet = TRUE)

glue::glue("")
glue::glue("############### show data class #############")
class(okcounty)

glue::glue("")
glue::glue("############### show data with dplyr #############")
dplyr::glimpse(okcounty)

#> ############### import esri data #############
#> 
#> ############### show data class #############
#> [1] "sf"         "data.frame"
#> 
#> ############### show data with dplyr #############
#> Rows: 77
#> Columns: 8
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
#> $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
#> $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
#> $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
#> $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…

The {sf} objects contain a column called geometry. This is a special column that contains the geospatial information about the location of each feature. This column should not be modified directly. It is used by the functions in the {sf} package for geospatial data processing.

Note 5.1: Using {skimr} with {sf}

Normally I am using the skimr::skim() function for data summary. But for the {sf} data classes in the geometry column are no skimmers available. (Possible data types are: sfc_POINT, sfc_LINESTRING, sfc_POLYGON, sfc_MULTIPOINT, sfc_MULTILINESTRING, sfc_MULTIPOLYGON, and sfc_GEOMETRY.) In the above case the class(okcounty$geometry) = “sfc_POLYGON, sfc” and not user-defined for {skimr} The fall back to the “character” class is not useful. (sfc stands for “simple feature list column”.)

It is possible to adapt {skimr} for working with user defined data types using skimr::skim_with(). Resources that explain how to do this are:

Defining sfl’s for a package: General article that explains how to generate and use with user defined data types. sflstands for “skimr function list”. It is a list-like data structure used to define custom summary statistics for specific data types.
skim of {sf} objects: Discussion specific to the {sf} package.

At the moment I do not understand enough about the {sf} package to get into more details for writing an appropriate function. I wonder if there is not already a solution available as spatial data processing with R and the {sf} package is not an extremely rare use case.

In the R package {sf} (Simple Features), many functions are prefixed with st_. The st_ prefix is inspired by PostGIS, which refers with the abbreviation to “spatial type”. This prefix is used consistently throughout {sf} to indicate that a function operates on spatial data. In the context of {sf}, st_ serves as a namespace for spatial functions, allowing developers and users to easily identify and find functions related to spatial operations. This prefixing convention makes it simple to discover and use spatial functions.

Looking at the file names I noticed: All files have the same filename with different extensions. There are always four files with the extensions .dbf, .prj, .shp, .shx.

The shapefiles are imported to {sf} objects using the sf::st_read() function. The quiet = TRUE argument suppresses output to the console when importing spatial datasets. It

An example for the output when using quit = FALSE (the default option) is:

Reading layer ok_counties' from data source/Users/petzi/Documents/Meine-Repos/GDSWR/data/Chapter5/ok_counties.shp’ using driver `ESRI Shapefile’
Simple feature collection with 77 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -103.0025 ymin: 33.62184 xmax: -94.43151 ymax: 37.00163
Geodetic CRS: NAD83

To read in a shapefile, it is only necessary to specify the filename with a .shp extension. However, all the files, including the .shp file as well as the .dbf, .shx, and .prj files, need to be present in the directory from which the data are read.

The ok_counties.shp dataset contains county boundaries for the state of Oklahoma.
The ok_tornado_point.shp dataset and the ok_tornado_path.shp dataset contain historical information about tornadoes in Oklahoma.
- The points are the initial locations of tornado touchdown.
- The paths are lines that identify the path of each tornado after touchdown.
These data were derived from larger, national-level datasets generated by the National Oceanographic and Atmospheric Administration (NOAA) National Weather Service Storm Prediction Center.

5.1.2 Conversion data sf <-> sp

R Code 5.2 : {sf} data to {sp} data and reverse

Code

glue::glue("############### convert from sf to sp data #############")
okcounty_sp <- sf::as_Spatial(okcounty) # sf::as(okcounty, 'Spatial') does not work!
class(okcounty_sp)

glue::glue("")
glue::glue("############### convert from sp to sf data #############")
okcounty_sf <- sf::st_as_sf(okcounty_sp)
class(okcounty_sf)

#> ############### convert from sf to sp data #############
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"
#> 
#> ############### convert from sp to sf data #############
#> [1] "sf"         "data.frame"

5.2 Creating simple maps

A good strategy to get an overview about the data is to plot the data as map. There are two options: Using ggplot2::geom_sf() or base::plot().

5.2.1 Draw Oklahoma county boundaries

To generate a map of counties using ggplot2::ggplot() with a {sf} object, the ggplot2::geom_sf() function is used.

From the view of the {ggplot2} package the ggplot2::geom_sf() is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons. For text and labels, you can use ggplot2::geom_sf_text() and ggplot2::geom_sf_label().

Code Collection 5.1 : Plotting Oklahoma county boundaries

ggplot2
base::plot()

R Code 5.3 : Oklahoma county boundaries with {ggplot2}

Code

ggplot2::ggplot(data = okcounty) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::theme_void()

Graph 5.1: Oklahoma county boundaries plotted with {ggplot2}

fill = NA makes the counties transparent.

(To get the same result as in the base R approach I used ggplot2::theme_void() to hide the coordinates which is shown in the original book example.)

R Code 5.4 : Oklahoma county boundaries with base::plot()

Code

graphics::par(mar = c(0, 0, 0, 0))
base::plot(okcounty$geometry)

Graph 5.2: Oklahoma county boundaries plotted with base::plot()

From R Graph Gallery I learend that I could also use bese R to plot spatial geodata. But everybody agrees that using {ggplot2} is the preferred approach.

Note 5.2: Too much space around cholorpleth map

As you can see from both graphics there is ample space aorund the map. I do not know how to remove it. Therefore I wrote a question on StackOverflow. I used a simple example provide by the {sf} package.

5.2.2 Inspect `tpoint` and `tpath`

Because {sf} objects are a type of data frame, they can be modified using the normal {tidyverse} functions. Let’s look at the two other R objects we’ve generated in R Code 5.1.

Code Collection 5.2 : Display structure of the tornado files

tpoint
tpath

R Code 5.5 : Glimpse at tpoint

Code

dplyr::glimpse(tpoint)

#> Rows: 4,092
#> Columns: 23
#> $ om       <dbl> 192, 27, 38, 57, 60, 61, 50, 52, 96, 108, 113, 117, 119, 76, …
#> $ yr       <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
#> $ mo       <dbl> 10, 2, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ dy       <dbl> 1, 27, 27, 28, 28, 28, 2, 3, 11, 16, 22, 24, 29, 4, 4, 4, 7, …
#> $ date     <chr> "1950-10-01", "1950-02-27", "1950-03-27", "1950-04-28", "1950…
#> $ time     <chr> "21:00:00", "10:20:00", "03:00:00", "14:17:00", "19:05:00", "…
#> $ tz       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ st       <chr> "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "…
#> $ stf      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
#> $ stn      <dbl> 23, 1, 2, 5, 6, 7, 3, 4, 15, 16, 17, 18, 19, 8, 9, 10, 11, 12…
#> $ mag      <dbl> 1, 2, 2, 3, 4, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1…
#> $ inj      <dbl> 0, 0, 0, 1, 32, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, …
#> $ fat      <dbl> 0, 0, 0, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ loss     <dbl> 4, 4, 3, 5, 5, 4, 4, 3, 2, 3, 0, 4, 2, 4, 3, 5, 0, 4, 3, 4, 3…
#> $ closs    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ slat     <dbl> 36.73, 35.55, 34.85, 34.88, 35.08, 34.55, 35.82, 36.13, 36.82…
#> $ slon     <dbl> -102.52, -97.60, -95.75, -99.28, -96.40, -96.20, -97.02, -95.…
#> $ elat     <dbl> 36.8800, 35.5501, 34.8501, 35.1700, 35.1300, 34.5501, 35.8201…
#> $ elon     <dbl> -102.3000, -97.5999, -95.7499, -99.2000, -96.3500, -96.1999, …
#> $ len      <dbl> 15.8, 2.0, 0.1, 20.8, 4.5, 0.8, 1.0, 1.0, 0.5, 7.3, 1.5, 1.0,…
#> $ wid      <dbl> 10, 50, 77, 400, 200, 100, 100, 33, 77, 100, 100, 33, 33, 293…
#> $ fc       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ geometry <POINT [°]> POINT (-102.52 36.73), POINT (-97.6 35.55), POINT (-95.…

The points are the initial locations of tornado touchdowns.

R Code 5.6 : Glimpse at tpath

Code

dplyr::glimpse(tpath)

#> Rows: 4,092
#> Columns: 23
#> $ om       <dbl> 192, 27, 38, 57, 60, 61, 50, 52, 96, 108, 113, 117, 119, 76, …
#> $ yr       <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
#> $ mo       <dbl> 10, 2, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ dy       <dbl> 1, 27, 27, 28, 28, 28, 2, 3, 11, 16, 22, 24, 29, 4, 4, 4, 7, …
#> $ date     <chr> "1950-10-01", "1950-02-27", "1950-03-27", "1950-04-28", "1950…
#> $ time     <chr> "21:00:00", "10:20:00", "03:00:00", "14:17:00", "19:05:00", "…
#> $ tz       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ st       <chr> "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "…
#> $ stf      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
#> $ stn      <dbl> 23, 1, 2, 5, 6, 7, 3, 4, 15, 16, 17, 18, 19, 8, 9, 10, 11, 12…
#> $ mag      <dbl> 1, 2, 2, 3, 4, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1…
#> $ inj      <dbl> 0, 0, 0, 1, 32, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, …
#> $ fat      <dbl> 0, 0, 0, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ loss     <dbl> 4, 4, 3, 5, 5, 4, 4, 3, 2, 3, 0, 4, 2, 4, 3, 5, 0, 4, 3, 4, 3…
#> $ closs    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ slat     <dbl> 36.73, 35.55, 34.85, 34.88, 35.08, 34.55, 35.82, 36.13, 36.82…
#> $ slon     <dbl> -102.52, -97.60, -95.75, -99.28, -96.40, -96.20, -97.02, -95.…
#> $ elat     <dbl> 36.8800, 35.5501, 34.8501, 35.1700, 35.1300, 34.5501, 35.8201…
#> $ elon     <dbl> -102.3000, -97.5999, -95.7499, -99.2000, -96.3500, -96.1999, …
#> $ len      <dbl> 15.8, 2.0, 0.1, 20.8, 4.5, 0.8, 1.0, 1.0, 0.5, 7.3, 1.5, 1.0,…
#> $ wid      <dbl> 10, 50, 77, 400, 200, 100, 100, 33, 77, 100, 100, 33, 33, 293…
#> $ fc       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ geometry <LINESTRING [°]> LINESTRING (-102.52 36.73, ..., LINESTRING (-97.6 …

The paths are lines that identify the path of each tornado after touchdown.

From dplyr::glimpse() we get an idea about the data structure. But we do not know the numeric span covered by the variable. This is especially important for our next task to focus on data from the last five years. We know from Code Collection 5.2 that the dataset starts with the year 1950 but we have no clue about the middle or end of the dataset.

For this reason I have designed a special functions that returns the first and last dataset and several random data. The default number of data shown is eight but this can be changed using a second parameter.

Code Collection 5.3 : Show some random tornado data, including first and last record

tpoint
tpath

R Code 5.7 : Show random tpoint data

Code

pb_glance_data(tpoint)

#> Simple feature collection with 10 features and 23 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -102.52 ymin: 34.23 xmax: -95.774 ymax: 36.8768
#> Geodetic CRS:  NAD83
#>     obs     om   yr mo dy       date     time tz st stf stn mag inj fat  loss
#> 1     1    192 1950 10  1 1950-10-01 21:00:00  3 OK  40  23   1   0   0 4e+00
#> 2   634    519 1960  8  9 1960-08-09 12:00:00  3 OK  40  96   0   0   0 1e+00
#> 3  1098    261 1969  6 11 1969-06-11 15:00:00  3 OK  40  18   0   0   0 0e+00
#> 4  1177    188 1971  4 26 1971-04-26 19:56:00  3 OK  40  11   0   0   0 0e+00
#> 5  1252    189 1973  4 19 1973-04-19 23:45:00  3 OK  40  11   2   2   0 6e+00
#> 6  2369    371 1996  3 24 1996-03-24 12:55:00  3 OK  40   1   0   0   0 0e+00
#> 7  2609   1209 1999  5 31 1999-05-31 19:24:00  3 OK  40 110   1   0   0 1e-02
#> 8  3218    300 2010  5 10 2010-05-10 18:28:00  3 OK  40  47   1   0   0 3e-03
#> 9  4069 619637 2021  1 30 2021-01-30 13:46:00  3 OK  40   0  -9   0   0 0e+00
#> 10 4092 620265 2021  7 10 2021-07-10 18:25:00  3 OK  40   0   1   0   0 5e+04
#>    closs    slat      slon    elat      elon  len wid fc
#> 1      0 36.7300 -102.5200 36.8800 -102.3000 15.8  10  0
#> 2      0 34.2500  -98.2300 34.2501  -98.2299  0.1  10  1
#> 3      0 36.4500  -98.0200 36.4501  -98.0199  0.1  10  0
#> 4      0 36.6000  -96.1000 36.6001  -96.0999  0.1  10  0
#> 5      0 34.7000  -97.3000 34.8700  -97.0800 16.8 250  0
#> 6      0 34.2300  -97.2700 34.2300  -97.2700  0.1  25  0
#> 7      0 34.7700  -99.4500 34.7000  -99.4000  2.5 100  0
#> 8      0 35.4239  -95.7757 35.4354  -95.7349  2.6 600  0
#> 9      0 36.8768  -95.7740 36.8794  -95.7679  0.4  75  0
#> 10     0 36.0880  -96.7200 36.0730  -96.7040  1.4 100  0
#>                    geometry
#> 1     POINT (-102.52 36.73)
#> 2      POINT (-98.23 34.25)
#> 3      POINT (-98.02 36.45)
#> 4        POINT (-96.1 36.6)
#> 5        POINT (-97.3 34.7)
#> 6      POINT (-97.27 34.23)
#> 7      POINT (-99.45 34.77)
#> 8  POINT (-95.7757 35.4239)
#> 9   POINT (-95.774 36.8768)
#> 10    POINT (-96.72 36.088)

R Code 5.8 : Show random tpath data

Code

pb_glance_data(tpath)

#> Simple feature collection with 10 features and 23 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -102.52 ymin: 34.23 xmax: -95.7349 ymax: 36.88
#> Geodetic CRS:  NAD83
#>     obs     om   yr mo dy       date     time tz st stf stn mag inj fat  loss
#> 1     1    192 1950 10  1 1950-10-01 21:00:00  3 OK  40  23   1   0   0 4e+00
#> 2   634    519 1960  8  9 1960-08-09 12:00:00  3 OK  40  96   0   0   0 1e+00
#> 3  1098    261 1969  6 11 1969-06-11 15:00:00  3 OK  40  18   0   0   0 0e+00
#> 4  1177    188 1971  4 26 1971-04-26 19:56:00  3 OK  40  11   0   0   0 0e+00
#> 5  1252    189 1973  4 19 1973-04-19 23:45:00  3 OK  40  11   2   2   0 6e+00
#> 6  2369    371 1996  3 24 1996-03-24 12:55:00  3 OK  40   1   0   0   0 0e+00
#> 7  2609   1209 1999  5 31 1999-05-31 19:24:00  3 OK  40 110   1   0   0 1e-02
#> 8  3218    300 2010  5 10 2010-05-10 18:28:00  3 OK  40  47   1   0   0 3e-03
#> 9  4069 619637 2021  1 30 2021-01-30 13:46:00  3 OK  40   0  -9   0   0 0e+00
#> 10 4092 620265 2021  7 10 2021-07-10 18:25:00  3 OK  40   0   1   0   0 5e+04
#>    closs    slat      slon    elat      elon  len wid fc
#> 1      0 36.7300 -102.5200 36.8800 -102.3000 15.8  10  0
#> 2      0 34.2500  -98.2300 34.2501  -98.2299  0.1  10  1
#> 3      0 36.4500  -98.0200 36.4501  -98.0199  0.1  10  0
#> 4      0 36.6000  -96.1000 36.6001  -96.0999  0.1  10  0
#> 5      0 34.7000  -97.3000 34.8700  -97.0800 16.8 250  0
#> 6      0 34.2300  -97.2700 34.2300  -97.2700  0.1  25  0
#> 7      0 34.7700  -99.4500 34.7000  -99.4000  2.5 100  0
#> 8      0 35.4239  -95.7757 35.4354  -95.7349  2.6 600  0
#> 9      0 36.8768  -95.7740 36.8794  -95.7679  0.4  75  0
#> 10     0 36.0880  -96.7200 36.0730  -96.7040  1.4 100  0
#>                          geometry
#> 1  LINESTRING (-102.52 36.73, ...
#> 2  LINESTRING (-98.23 34.25, -...
#> 3  LINESTRING (-98.02 36.45, -...
#> 4  LINESTRING (-96.1 36.6, -96...
#> 5  LINESTRING (-97.3 34.7, -97...
#> 6  LINESTRING (-97.27 34.23, -...
#> 7  LINESTRING (-99.45 34.77, -...
#> 8  LINESTRING (-95.7757 35.423...
#> 9  LINESTRING (-95.774 36.8768...
#> 10 LINESTRING (-96.72 36.088, ...

5.2.3 Visualization of the Oklahoma tornado data (2016-2021)

Because {sf} objects are a type of data frame, they can be modified using the normal {tidyverse} functions.

A reduced dataset for the years 2016-2021 and only with the columns ID (om), the year (yr), and the date (date) and is prepared in the first tab reduce data.
Initiation points of tornadoes in Oklahoma from 2016–2021 is shown in tab initiation points.
Tab tornados path shows the paths of tornadoes in Oklahoma from 2016–2021.
Initiation points of tornadoes in Oklahoma from 2016–2021 with years represented by the color aesthetic is in tab color aesthetic.
In the final tab facets you will see the initiation points of tornadoes in Oklahoma from 2016–2021 with years mapped as separate facets.

Code Collection 5.4 : Show different visualization of the Oklahoma tornado data (2016-2021)

R Code 5.9 : Filter data from 2016 to 2021 and select only three columns (ID, year and date)

Code

tpoint_16_21 <- tpoint |> 
  dplyr::filter(yr >= 2016 & yr <= 2021) |> 
  dplyr::select(om, yr, date)

tpath_16_21 <- tpath |> 
  dplyr::filter(yr >= 2016 & yr <= 2021)  |> 
  dplyr::select(om, yr, date)

(For this R code chunk is no output available)

R Code 5.10 : Show initiation points of tornadoes in Oklahoma from 2016–2021

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::geom_sf(data = tpoint_16_21)

Graph 5.3: Initiation points of tornadoes in Oklahoma from 2016–2021.

Because each function maps a different dataset, the data argument must be provided in each ggplot2::geom_sf() function instead of in the ggplot2::ggplot() function.
I am using as default theme the ggplot2::theme_bw() function (see setup chunk of this chapter) to display the map over a white background while retaining the graticules.

R Code 5.11 : Show tornadoes paths in Oklahoma from 2016–2021

Code

ggplot2::ggplot() +                              
  ggplot2::geom_sf(data = okcounty, fill = NA) + 
  ggplot2::geom_sf(data = tpath_16_21,           
          color = "red",                         
          size = 1)

Graph 5.4: Paths of tornadoes in Oklahoma from 2016-2021.

To make the tornado path lines easier to see in relation to the county boundaries, they are displayed in red and their sizes are increased to be larger (size = 1) than the default line width of 0.5.

R Code 5.12 : Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = tpoint_16_21, 
          ggplot2::aes(color = forcats::as_factor(yr))) + # [1]
  ggplot2::geom_sf(data = okcounty, fill = NA) +
# ggplot2::scale_color_discrete(name = "Year") +          # [2]
  ggokabeito::scale_color_okabe_ito(name = "Year") +      # [2]
  ggplot2::coord_sf(datum = NA) +                         # [3]
  ggplot2::theme_void()                                   # [3]

Graph 5.5: Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic.

To view the years of the tornadoes on the map, an aesthetic can be specified.

Line Comment 1: In the book the color argument is specified as base::as.factor(yr) so that the year is displayed as a discrete variable instead of a continuous variable. Instead of the base function I have used forcats::as_factor(yr).

Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform. (Base R sorts in the current locale which can vary from place to place.) (from the {forcats)} help file).

Line Comment 2: The ggplot2::scale_color_discrete() function is used to set the legend name. But the used (standard) color scale is not appropriate for people with color-vision deficiency (CVD). I have therefore used ggokabeito::scale_color_okabe_ito().

Line Comment 3: The book says that the ggplot2::theme_void() function removes the plot axes and labels and shows only the map. I suppose that this is not correct. ggplot2::coord_sf(datum = NA) removes the plot axes and labels; ggplot2::theme_void() removes the border frame around the graphic.

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, 
          fill = NA, 
          color = "gray") +
  ggplot2::geom_sf(data = tpoint_16_21, size = 0.75) +
  ggplot2::facet_wrap(ggplot2::vars(yr), ncol = 2) +
  ggplot2::coord_sf(datum = NA) +
  ggplot2::theme_void()

Alternately, ggplot2::facet_wrap() can be used to display the tornadoes for each year on a separate map. In comparison to the previous tab the sizes of the points are reduced slightly from the standard size = 1 to size = 0.75, so that they are better suited for the smaller maps.

Note 5.3

With the exception of the facet graphics there is too much horizontal space above and below the {sf} graphic. Is this a known problem? How to reduce the horizontal space for {sf} graphics plotted with {ggplot2}?

5.3 Overlaying Vector Datasets

5.3.1 A first spatial join

The number of tornado points in each county can be calculated using the sf::st_join() function to carry out a spatial join. A spatial join with {sf} is different from the joinwith {dplyr}: sf::st_join() links rows from the two tables based on the spatial locations instead of their attributes.

In this case the functions compares the point coordinates of the tpoint_16_21 dataset in its geometry column with the polygon coordinates from the geometry column of the okcounty dataset. It joins tpoint_16_21 with the geometry row that includes the appropriate polygon from okcounty containing the point coordinates.

R Code 5.14 : Overlaying vector datasets with a spatial join

Code

countypnt <- sf::st_join(tpoint_16_21, okcounty)  

dplyr::glimpse(countypnt)

#> Rows: 434
#> Columns: 11
#> $ om       <dbl> 613662, 613675, 613676, 613677, 613678, 613727, 613728, 61372…
#> $ yr       <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2…
#> $ date     <chr> "2016-03-23", "2016-03-30", "2016-03-30", "2016-03-30", "2016…
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "001", "113", "105", "131", "035", "139", "139", "139", "139"…
#> $ COUNTYNS <chr> "01101788", "01101844", "01101840", "01101853", "01101805", "…
#> $ AFFGEOID <chr> "0500000US40001", "0500000US40113", "0500000US40105", "050000…
#> $ GEOID    <chr> "40001", "40113", "40105", "40131", "40035", "40139", "40139"…
#> $ NAME     <chr> "Adair", "Osage", "Nowata", "Rogers", "Craig", "Texas", "Texa…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ geometry <POINT [°]> POINT (-94.5042 35.6916), POINT (-96.0151 36.2151), POI…

5.3.2 Count tornados per county

Afterward, each row in countypnt data contains additional columns from the okcounty dataset that correspond to the county that the tornado with it point coordinates is within. The dataset contains one record for each tornado.

To compute the total number of tornadoes per county, countypnt must be grouped by the GEOID county code or by the county NAME (here by GEOID county code).

But before grouping and summarizing, countypnt must be converted from an {sf} object to a normal data frame using sf::st_drop_geometry().

R Code 5.15 : Count tornados per county

Code

glue::glue("#### show class before `sf::st_drop_geometry()` #####")
base::class(countypnt)
countypnt <- sf::st_drop_geometry(countypnt)
glue::glue("")
glue::glue("##### show class after `sf::st_drop_geometry()` ######")
base::class(countypnt)


countysum <- countypnt |> 
  dplyr::group_by(GEOID) |> 
  dplyr::summarize(tcnt = dplyr::n())  

glue::glue("")
glue::glue("##### glimpse at the new summarized data frame` ######")
dplyr::glimpse(countysum)

#> #### show class before `sf::st_drop_geometry()` #####
#> [1] "sf"         "data.frame"
#> 
#> ##### show class after `sf::st_drop_geometry()` ######
#> [1] "data.frame"
#> 
#> ##### glimpse at the new summarized data frame` ######
#> Rows: 75
#> Columns: 2
#> $ GEOID <chr> "40001", "40005", "40007", "40009", "40011", "40013", "40015", "…
#> $ tcnt  <int> 6, 3, 4, 8, 1, 4, 10, 5, 7, 5, 3, 12, 10, 5, 5, 1, 7, 9, 7, 8, 2…

5.3.3 Associate polygons with tornado counts

In the next step we join okcounty to countysum so that each polygon is associated with the appropriate tornado summary.

If there are between 2016-2021 several tornados per county than we get several rows.
But the reverse is also true: If a county has had no tornado in the years 2016-2021 this county gets NA as the number of tornados.

As a matter of fact a few counties had no tornadoes during 2016–2021 and are therefore missing from countysum, resulting in NA values in the joined table. In this case, we know that NA means zero tornadoes, so the we must replace NA values by zeroes. The mutate() function computes the area of each county using st_area() and then calculates the density of tornadoes per county. Density is initially in tornadoes per square meter but is converted to tornadoes per 1000 km2. The st_area() function returns a column with explicit measurement units, but these are removed using the drop_units() function for simplicity

R Code 5.16 : Associate each polygon with the appropriate tornado summary

Code

countymap <- okcounty   |>
  dplyr::left_join(countysum, by = "GEOID")  |>        # [1]
  dplyr::mutate(tcnt = 
        base::ifelse(base::is.na(tcnt), 0, tcnt)) |>   # [2]
  dplyr::mutate(area = sf::st_area(okcounty),
         tdens = 10^3 * 10^3 * tcnt / area)   |>       # [3]
  units::drop_units()                                  # [4]


dplyr::glimpse(countymap)

#> Rows: 77
#> Columns: 11
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
#> $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
#> $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
#> $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
#> $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ tcnt     <dbl> 1, 12, 1, 10, 6, 2, 6, 0, 4, 9, 3, 10, 12, 1, 2, 8, 13, 4, 5,…
#> $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
#> $ area     <dbl> 1890663261, 4766283042, 2427121029, 1657249513, 1503893122, 3…
#> $ tdens    <dbl> 0.0005289149, 0.0025176851, 0.0004120108, 0.0060340944, 0.003…

Line comment 1: Using dplyr::left_join() instead of dplyr::inner_join() ensures that all of the county polygons are retained in the output of the join. (dplyr::inner_join() only keeps observations from x that have a matching key in y, whereas dplyr::left_join() keeps all observations in x.)

Line comment 2: If there are between 2016-2021 several tornados per county than we get several rows. But the reverse is also true: If a county has had no tornado in the years 2016-2021 this county gets NA values as the number of tornados.

As a matter of fact a few counties had no tornadoes during 2016–2021 and are therefore missing from countysum, resulting in NA values in the joined table. In this case, we know that NA means zero tornadoes, so the we must replace NA values by zeroes. I did this with the dplyr::mutate() function instead of base::replace(). Besides this approach does not need the . symbol of the {magrittr} packages (exporting into dplyr) for refering to the database (respectively its equivalent _ for the R pipe). See for details Note 3.1.

Line comment 3: The second dplyr::mutate() function computes the area of each county using sf::st_area() and then calculates the density of tornadoes per county. Density is initially in tornadoes per square meter but is converted to tornadoes per 1000 km2.

Line comment 4: The sf::st_area() function returns a column with explicit measurement units, but these are removed using the units::drop_units() function for simplicity. For more information see the vignettes and help pages of the {units} package.

5.4 Save spatial geodata files

5.4.1 ESRI format

The sf::st_write() function can be used to save sf objects to a variety of file formats. In many cases, the function can determine the output format from the output filename extension. The following code saves the county-level tornado summaries in ESRI shapefile format. The append = FALSE option overwrites the shapefile if it already exists.

R Code 5.17 : Save spatial data files into ESRI format

Code

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornadosum.shp", 
         append = FALSE)

After a message what the script does

Writing layer oktornadosum' to data sourcedata/Chapter5/oktornadosum.shp’ using driver `ESRI Shapefile’ Writing 77 features with 10 fields and geometry type Polygon.

I got for every feature (= 77 rows) a warning message emitted by the GDAL library:

Warning: GDAL Message 1: Value 1890663260.74707699 of field area of feature 0 not successfully written. Possibly due to too larger number with respect to field width

It turned out that this is a misleading warning and that one should not use the old ESRI format but the newer and better Open Geospatial Consortium (OGC) GeoPackage format. See also StackOverflow and the answer from the {sf} developer:

The general recommendation is to not use shapefiles: the format is not an open standard, it has many limitations and modern formats are available. A good alternative is GeoPackage.

5.4.2 GeoPackage format

GeoPackage is also mentioned as an alternative in the book. The data are stored in an SQLite database that may contain one or more layers. In this example, the delete_dsn = TRUE argument overwrites the entire GeoPackage. Leaving this argument at its default value of FALSE would add a new layer to an existing database.

R Code 5.18 : Save spatial geodata in GeoPackage format

Code

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornado.gpkg", 
         layer = "countysum",
         delete_dsn = TRUE)

#> Deleting source `data/Chapter5/oktornado.gpkg' using driver `GPKG'
#> Writing layer `countysum' to data source 
#>   `data/Chapter5/oktornado.gpkg' using driver `GPKG'
#> Writing 77 features with 10 fields and geometry type Polygon.

5.4.3 GeoJSON format

Another commonly-used open geospatial data format is GeoJSON. It is based on Javascript Object Notation (JSON), a human-readable text format that stores data in ASCII files. The layer_options argument must be set to “RFC7946 = YES” to save the data in the newest GeoJSON specification.

R Code 5.19 : Save spatial geodata in GeoJSON format

Code

sf::st_write(obj = countymap, 
             dsn = "data/Chapter5/oktornado.geojson", 
             layer_options = "RFC7946 = YES",
             delete_dsn = TRUE)

#> Deleting source `data/Chapter5/oktornado.geojson' using driver `GeoJSON'
#> Writing layer `oktornado' to data source 
#>   `data/Chapter5/oktornado.geojson' using driver `GeoJSON'
#> options:        RFC7946 = YES 
#> Writing 77 features with 10 fields and geometry type Polygon.

Here again I had to add delete_dsn = TRUE (append = FALSE did not work for this format!). Otherwise I would get an error message that the dataset already exists.

5.5 Choropleth Maps

5.5.1 Filling with colors (standard)

Another way to display the tornadoes is as a choropleth map, where summary statistics for each county are displayed as different colors. The county-level tornado density can be as a choropleth using the fill aesthetic with ggplot2::geom_sf(). By default, the fill colors are based on a dark-to-light blue color ramp. The ggplot2::theme_void() function eliminates the axes and graticules and displays only the map on a white background.

R Code 5.20 : Densities of tornadoes mapped as a choropleth.

Code

ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens)) +
  ggplot2::theme_void() +
  ggplot2::coord_sf()

Graph 5.7: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth.

5.5.2 Mapping symbols

To map symbols, the county polygons must first be converted to points. The sf::st_centroid() generates a point feature located at the centroid of each county. The sf::st_geometry_type() function returns the feature geometry type. Setting by_geometry = FALSE returns one geometry type for the entire dataset instead of for every feature.

R Code 5.21 : Convert county polygons to points

Code

glue::glue("##### Return geometry type before converted to points #####")
sf::st_geometry_type(okcounty, by_geometry = FALSE)


############# Return the centroid of a geometry
okcntrd = sf::st_centroid(countymap)

#> Warning: st_centroid assumes attributes are constant over geometries

Code

glue::glue("")
glue::glue("##### Return geometry type after converted to points #####")
sf::st_geometry_type(okcntrd, by_geometry = FALSE)

#> ##### Return geometry type before converted to points #####
#> [1] POLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
#> 
#> ##### Return geometry type after converted to points #####
#> [1] POINT
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

Note 5.4: How to get rid of the warning?

At the moment I do not know how to suppress the warning. Possible pointers to solve this problem are:

sf GitHub: suppress specific warning message
Spatial Data Science: Chapter 5

When, while manipulating geometries, attribute values are retained unmodified, support problems may arise. If we look into a simple case of replacing a county polygon with the centroid of that polygon on a dataset that has attributes, we see that R package sf issues a warning:

Warning: st_centroid assumes attributes are constant over geometries

The tornado counts can be mapped using the okcentrd dataset with the size aesthetic. One point is plotted for each county centroid, and the size of the point is proportional to the number of tornadoes in the county.

R Code 5.22 : Choropleth map using graduated symbols

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcntrd, ggplot2::aes(size = tcnt)) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()

Graph 5.8: Numbers of tornadoes in Oklahoma counties from 2016-2021 mapped as graduated symbols.

5.6 R-spatial tutorial

A much more complete overview about loading and saving datasets of different formats is explained in How to load and save vector data in R, a weblog article in r-spatial. I will follow the tutorial in this article and comment it in difference to the GDSWR book.

5.6.1 Getting data

First of all I have to download the data to follow the tutorial.

R Code 5.23 : Download data for the r-spatial tutorial

Code

##  run this code chunk only once (manually)
baseURL = here::here()

## download data into Chapter5 folder

url = "https://github.com/kadyb/sf_load_save/archive/refs/heads/main.zip"
utils::download.file(url, base::paste0(baseURL, "/data/Chapter5/sf_load_save.zip"))
utils::unzip(base::paste0(baseURL, "/data/Chapter5/sf_load_save.zip"),
             exdir = base::paste0(baseURL, "/data/Chapter5"))

5.6.2 Loading Shapefile (.shp)

It consists of several files (e.g., .shp, .shx, .dbf, .prj) that must be present in the same folder. Wikipedia has more details about this format. But this file format has many limitations and shouldn’t be used anymore.

Instead of using the base::getwd() and base::setwd() commands I am using here::here(), because it will always locate the files relative to the project root.

R Code 5.24 : Load Shapefile

Code

## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), "/data/Chapter5/sf_load_save-main/data/")

## load and display the shapefile

(
  countries = sf::read_sf(paste0(r_spatial_path, "countries/countries.shp"))
)

#> Simple feature collection with 52 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -17.53604 ymin: -34.82195 xmax: 51.41704 ymax: 37.3452
#> Geodetic CRS:  WGS 84
#> # A tibble: 52 × 169
#>    featurecla   scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE  TLC  
#>    <chr>            <int>     <int> <chr>      <chr>     <int> <int> <chr> <chr>
#>  1 Admin-0 cou…         0         2 Ethiopia   ETH           0     2 Sove… 1    
#>  2 Admin-0 cou…         0         3 South Sud… SDS           0     2 Sove… 1    
#>  3 Admin-0 cou…         0         6 Somalia    SOM           0     2 Sove… 1    
#>  4 Admin-0 cou…         0         2 Kenya      KEN           0     2 Sove… 1    
#>  5 Admin-0 cou…         0         6 Malawi     MWI           0     2 Sove… 1    
#>  6 Admin-0 cou…         0         3 United Re… TZA           0     2 Sove… 1    
#>  7 Admin-0 cou…         0         5 Somaliland SOL           0     2 Sove… 1    
#>  8 Admin-0 cou…         0         3 Morocco    MAR           0     2 Sove… 1    
#>  9 Admin-0 cou…         0         7 Western S… SAH           0     2 Inde… 1    
#> 10 Admin-0 cou…         0         4 Republic … COG           0     2 Sove… 1    
#> # ℹ 42 more rows
#> # ℹ 160 more variables: ADMIN <chr>, ADM0_A3 <chr>, GEOU_DIF <int>,
#> #   GEOUNIT <chr>, GU_A3 <chr>, SU_DIF <int>, SUBUNIT <chr>, SU_A3 <chr>,
#> #   BRK_DIFF <int>, NAME <chr>, NAME_LONG <chr>, BRK_A3 <chr>, BRK_NAME <chr>,
#> #   BRK_GROUP <chr>, ABBREV <chr>, POSTAL <chr>, FORMAL_EN <chr>,
#> #   FORMAL_FR <chr>, NAME_CIAWF <chr>, NOTE_ADM0 <chr>, NOTE_BRK <chr>,
#> #   NAME_SORT <chr>, NAME_ALT <chr>, MAPCOLOR7 <int>, MAPCOLOR8 <int>, …

Again I will draw a map to get an idea about the content of the data. Again there are the two methods using ggplot2::geom_sf() or base::plot().

Code Collection 5.5 : Plotting Tutorial Shape File

ggplot2::geom_sf
base::plot()

R Code 5.25 : Plotting Tutorial Map with {ggplot2}

Code

ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::theme_void()

Graph 5.9: Tutorial map plotted with {ggplot2}

fill = NA makes the countries transparent.

(To get the same result as in the base R approach I used ggplot2::theme_void() to hide the coordinates which is shown in the original book example.)

R Code 5.26 : Plotting Tutorial Map with base::plot()

Code

graphics::par(mar = c(0, 0, 0, 0))
base::plot(sf::st_geometry(countries))

Graph 5.10: Tutorial map plotted with base::plot

The countries object has many (168) fields (attributes), but to start with we only need the geometry column. Instead of using base::plot(countries$geometry) we can obtain geometry by using the sf::st_geometry() function.

5.6.3 Loading GeoPackage (.gpkg)

The next dataset is rivers (linear geometry) saved in GeoPackage format. It is loaded in exactly the same way as the shapefile before.

However, this format can consist of multiple layers of different types. In this case, we must define which layer exactly we want to load. To check what layers are in the geopackage, use the sf::st_layers() function, and then specify it using the layer argument in sf::read_sf(). If the file only contains one layer, we don’t need to do this.

R Code 5.27 : Check layers of GeoPackage file

Code

## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), "/data/Chapter5/sf_load_save-main/data/")

sf::st_layers(base::paste0(r_spatial_path, "rivers.gpkg"))

#> Driver: GPKG 
#> Available layers:
#>   layer_name     geometry_type features fields crs_name
#> 1     rivers Multi Line String      228     38   WGS 84

We can now load the rivers layers and display the metadata like in the shapefile example.

R Code 5.28 : Load rivers layers and show metadata

Code

(
  rivers = sf::read_sf(paste0(r_spatial_path, "rivers.gpkg"), layer = "rivers")
)

#> Simple feature collection with 228 features and 38 fields
#> Geometry type: MULTILINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -16.54233 ymin: -34.34378 xmax: 49.46094 ymax: 35.12311
#> Geodetic CRS:  WGS 84
#> # A tibble: 228 × 39
#>    dissolve  scalerank featurecla name  name_alt rivernum note  min_zoom name_en
#>    <chr>         <int> <chr>      <chr> <chr>       <int> <chr>    <dbl> <chr>  
#>  1 975River          9 River      <NA>  <NA>          975 <NA>       7.1 <NA>   
#>  2 976River          9 River      Rung… <NA>          976 <NA>       7.1 Rungwa 
#>  3 977River          9 River      Ligo… <NA>          977 <NA>       7.1 Ligonha
#>  4 978River          9 River      Dong… <NA>          978 <NA>       7.1 Dongwe 
#>  5 979River          9 River      Cuito <NA>          979 <NA>       7.1 Cuito  
#>  6 980Lake …         9 Lake Cent… <NA>  <NA>          980 <NA>       7.1 <NA>   
#>  7 980River          9 River      <NA>  <NA>          980 <NA>       7.1 <NA>   
#>  8 981River          9 River      Bagoé <NA>          981 <NA>       7.1 Bagoé  
#>  9 982River          9 River      Hade… <NA>          982 <NA>       7.1 Hadejia
#> 10 983River          9 River      Sous  <NA>          983 <NA>       7.1 Sous   
#> # ℹ 218 more rows
#> # ℹ 30 more variables: min_label <dbl>, ne_id <dbl>, label <chr>,
#> #   wikidataid <chr>, name_ar <chr>, name_bn <chr>, name_de <chr>,
#> #   name_es <chr>, name_fr <chr>, name_el <chr>, name_hi <chr>, name_hu <chr>,
#> #   name_id <chr>, name_it <chr>, name_ja <chr>, name_ko <chr>, name_nl <chr>,
#> #   name_pl <chr>, name_pt <chr>, name_ru <chr>, name_sv <chr>, name_tr <chr>,
#> #   name_vi <chr>, name_zh <chr>, name_fa <chr>, name_he <chr>, …

Visualizing the rivers data we we will plot this time the rivers against the background of country borders. Adding more layers to the visualization is done with the add = TRUE argument in the base::plot() function. Note that the order in which objects are added is important – the objects added last are displayed at the top. The col argument is used to set the color of the object.

Code Collection 5.6 : Plotting rivers inside country borders

base::plot()
ggplot2::geom_sf()

R Code 5.29 : Visualizing rivers inside country borders using base::plot()

Code

base::plot(sf::st_geometry(countries))
base::plot(sf::st_geometry(rivers), add = TRUE, col = "blue")

R Code 5.30 : Visualizing rivers inside country borders using ggplot2::geom_sf()

Code

ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::geom_sf(data = rivers, color = "blue") +
  ggplot2::theme_void()

Plotting rivers inside countriy borders with {ggplot2}

Adding more layers to the visualization is done in {ggplot2} as always with the + sign.

Note that the map is different (better) sized than the base::plot() example.

5.6.4 Loading GeoJSON (.geojson)

R Code 5.31 : Loading GeoJSON cities file

Code

## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), "/data/Chapter5/sf_load_save-main/data/")

## load and display the GeoJSON file
(
  cities = sf::read_sf(paste0(r_spatial_path, "cities.geojson"))
)

#> Simple feature collection with 1287 features and 31 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -34.52953 xmax: 51.12333 ymax: 37.29042
#> Geodetic CRS:  WGS 84
#> # A tibble: 1,287 × 32
#>    scalerank natscale labelrank featurecla      name   namepar namealt nameascii
#>        <int>    <int>     <int> <chr>           <chr>  <chr>   <chr>   <chr>    
#>  1        10        1         8 Admin-1 capital Bassar <NA>    <NA>    Bassar   
#>  2        10        1         8 Admin-1 capital Sotou… <NA>    <NA>    Sotouboua
#>  3        10        1         7 Admin-1 capital Meden… <NA>    <NA>    Medenine 
#>  4        10        1         7 Admin-1 capital Kebili <NA>    <NA>    Kebili   
#>  5        10        1         7 Admin-1 capital Tatao… <NA>    <NA>    Tataouine
#>  6        10        1         7 Admin-1 capital L'Ari… <NA>    <NA>    L'Ariana 
#>  7        10        1         7 Admin-1 capital Jendo… <NA>    <NA>    Jendouba 
#>  8        10        1         7 Admin-1 capital Kasse… <NA>    <NA>    Kasserine
#>  9        10        1         7 Admin-1 capital Sdid … <NA>    <NA>    Sdid Bou…
#> 10        10        1         7 Admin-1 capital Silia… <NA>    <NA>    Siliana  
#> # ℹ 1,277 more rows
#> # ℹ 24 more variables: adm0cap <int>, capalt <int>, capin <chr>,
#> #   worldcity <int>, megacity <int>, sov0name <chr>, sov_a3 <chr>,
#> #   adm0name <chr>, adm0_a3 <chr>, adm1name <chr>, iso_a2 <chr>, note <chr>,
#> #   latitude <dbl>, longitude <dbl>, pop_max <int>, pop_min <int>,
#> #   pop_other <int>, rank_max <int>, rank_min <int>, meganame <chr>,
#> #   ls_name <chr>, min_zoom <dbl>, ne_id <int>, geometry <POINT [°]>

The featurecla column that indicates the type of city. We want to print them selecting only state capitals.

We can print a column (attribute) in two ways, i.e. by specifying the column name in:

Single square brackets – a spatial object will be printed
Double square brackets (alternatively a dollar sign) – only the text will be printed

Code Collection 5.7 : Title for code collection

R Code 5.32 : Print cities as spatial objects

Code

cities["featurecla"]

#> Simple feature collection with 1287 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -34.52953 xmax: 51.12333 ymax: 37.29042
#> Geodetic CRS:  WGS 84
#> # A tibble: 1,287 × 2
#>    featurecla                  geometry
#>    <chr>                    <POINT [°]>
#>  1 Admin-1 capital    (0.7890036 9.261)
#>  2 Admin-1 capital (0.9849965 8.557002)
#>  3 Admin-1 capital       (10.4167 33.4)
#>  4 Admin-1 capital     (8.971003 33.69)
#>  5 Admin-1 capital         (10.4667 33)
#>  6 Admin-1 capital      (10.2 36.86667)
#>  7 Admin-1 capital      (8.749999 36.5)
#>  8 Admin-1 capital   (8.716698 35.2167)
#>  9 Admin-1 capital   (9.500004 35.0167)
#> 10 Admin-1 capital   (9.383302 36.0833)
#> # ℹ 1,277 more rows

R Code 5.33 : Print cities as text

Code

glue::glue("######## Using double square brackets  #############")
utils::head(cities[["featurecla"]])

glue::glue("")
glue::glue("############### Using dollar sign ##################")
utils::head(cities$featurecla)

#> ######## Using double square brackets  #############
#> [1] "Admin-1 capital" "Admin-1 capital" "Admin-1 capital" "Admin-1 capital"
#> [5] "Admin-1 capital" "Admin-1 capital"
#> 
#> ############### Using dollar sign ##################
#> [1] "Admin-1 capital" "Admin-1 capital" "Admin-1 capital" "Admin-1 capital"
#> [5] "Admin-1 capital" "Admin-1 capital"

R Code 5.34 : Detect category for capital cities

Tab “cities object” shows us that this layer contains 1287 different cities. To find out what types of cities these are, we can use the table() function, which will summarize them.

Code

base::table(cities$featurecla)

#> 
#>        Admin-0 capital    Admin-0 capital alt        Admin-1 capital 
#>                     54                      6                    609 
#> Admin-1 region capital        Populated place 
#>                     19                    599

We are interested in Admin-0 capital and Admin-0 capital alt types because some countries have two capitals.

R Code 5.35 : Select and show first ten capital cities

Code

(capitals <- cities |> 
  dplyr::filter(featurecla == "Admin-0 capital" | featurecla == "Admin-0 capital alt") |> 
  dplyr::select(name))

#> Simple feature collection with 60 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -33.91807 xmax: 47.51468 ymax: 36.80278
#> Geodetic CRS:  WGS 84
#> # A tibble: 60 × 2
#>    name                   geometry
#>    <chr>               <POINT [°]>
#>  1 Lobamba        (31.2 -26.46667)
#>  2 Bir Lehlou (-9.652522 26.11917)
#>  3 Kigali     (30.05859 -1.951644)
#>  4 Mbabane    (31.13333 -26.31665)
#>  5 Juba        (31.58003 4.829975)
#>  6 Dodoma        (35.75 -6.183306)
#>  7 Laayoune   (-13.20001 27.14998)
#>  8 Djibouti      (43.148 11.59501)
#>  9 Banjul      (-16.5917 13.45388)
#> 10 Porto-Novo  (2.616626 6.483311)
#> # ℹ 50 more rows

We are interested in Admin-0 capital and Admin-0 capital alt types because some countries have two capitals. I filtered the data using the | (OR) operator.

I am using the tidyverse approach with the native pipe instead of using base R as in the blog article.

Code Collection 5.8 : Plot Africa with rivers and capitals

base::plot()
ggplot2::geom_sf()

R Code 5.36 : Visualize African borders with rivers and capitals using base::plot()

Code

base::plot(sf::st_geometry(countries), main = "Africa", axes = TRUE, bgc = "deepskyblue",
     col = "burlywood")
base::plot(sf::st_geometry(rivers), add = TRUE, col = "blue")
base::plot(sf::st_geometry(capitals), add = TRUE, pch = 24, bg = "red", cex = 0.8)

In the last step, we prepare the final visualization. We can add a title (main argument), axes (axes argument) and change the background color (bgc argument) of the figure. We can also change the point symbol (pch argument), set its size (cex argument) and fill color (bg argument).

R Code 5.37 : Visualize African borders with rivers and capitals

Code

ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = "burlywood") +
  ggplot2::geom_sf(data = rivers, color = "blue") +
  ggplot2::geom_sf(data = capitals, fill = "red", shape = 24) +
  ggplot2::theme(
    panel.background = ggplot2::element_rect(fill = "deepskyblue",
                                color = "deepskyblue")
  )

5.1 Import Geospatial Data

5.1.1 ESRI shapefile format

5.1.2 Conversion data sf <-> sp

5.2 Creating simple maps

5.2.1 Draw Oklahoma county boundaries

5.2.2 Inspect tpoint and tpath

5.2.3 Visualization of the Oklahoma tornado data (2016-2021)

5.3 Overlaying Vector Datasets

5.3.1 A first spatial join

5.3.2 Count tornados per county

5.3.3 Associate polygons with tornado counts

5.4 Save spatial geodata files

5.4.1 ESRI format

5.4.2 GeoPackage format

5.4.3 GeoJSON format

5.5 Choropleth Maps

5.5.1 Filling with colors (standard)

5.5.2 Mapping symbols

5.6 R-spatial tutorial

5.6.1 Getting data

5.6.2 Loading Shapefile (.shp)

5.6.3 Loading GeoPackage (.gpkg)

5.6.4 Loading GeoJSON (.geojson)

5.2.2 Inspect `tpoint` and `tpath`