Appendix A: Load and save vector data in R

Table of content for chapter 04

Chapter section list

A much more complete overview about loading and saving datasets of different formats is explained in How to load and save vector data in R, a weblog article in r-spatial. I will follow the tutorial in this article and comment it in difference to the GDSWR book.

A.1 Getting data

First of all I have to download the data to follow the tutorial.

R Code A.1 : Download data for the r-spatial tutorial

Code
##  run this code chunk only once (manually)
baseURL = here::here()
pb_create_folder(base::paste0(baseURL, "/data/annex-a"))


## download data into annex-a folder

url = "https://github.com/kadyb/sf_load_save/archive/refs/heads/main.zip"
utils::download.file(url, base::paste0(baseURL, "/data/annex-a/sf_load_save.zip"))
utils::unzip(base::paste0(baseURL, "/data/annex-a/sf_load_save.zip"),
             exdir = base::paste0(baseURL, "/data/annex-a"))

(For this R code chunk is no output available)

A.2 Loading Shapefile (.shp)

It consists of several files (e.g., .shp, .shx, .dbf, .prj) that must be present in the same folder. Wikipedia has more details about this format. But this file format has many limitations and shouldn’t be used anymore.

Instead of using the base::getwd() and base::setwd() commands I am using here::here(), because it will always locate the files relative to the project root.

R Code A.2 : Load Shapefile

Code
## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), 
                 "/data/annex-a/sf_load_save-main/data/")

## load and display the shapefile
(
  countries = sf::read_sf(paste0(r_spatial_path, "countries/countries.shp"))
)
#> Simple feature collection with 52 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -17.53604 ymin: -34.82195 xmax: 51.41704 ymax: 37.3452
#> Geodetic CRS:  WGS 84
#> # A tibble: 52 × 169
#>    featurecla   scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE  TLC  
#>    <chr>            <int>     <int> <chr>      <chr>     <int> <int> <chr> <chr>
#>  1 Admin-0 cou…         0         2 Ethiopia   ETH           0     2 Sove… 1    
#>  2 Admin-0 cou…         0         3 South Sud… SDS           0     2 Sove… 1    
#>  3 Admin-0 cou…         0         6 Somalia    SOM           0     2 Sove… 1    
#>  4 Admin-0 cou…         0         2 Kenya      KEN           0     2 Sove… 1    
#>  5 Admin-0 cou…         0         6 Malawi     MWI           0     2 Sove… 1    
#>  6 Admin-0 cou…         0         3 United Re… TZA           0     2 Sove… 1    
#>  7 Admin-0 cou…         0         5 Somaliland SOL           0     2 Sove… 1    
#>  8 Admin-0 cou…         0         3 Morocco    MAR           0     2 Sove… 1    
#>  9 Admin-0 cou…         0         7 Western S… SAH           0     2 Inde… 1    
#> 10 Admin-0 cou…         0         4 Republic … COG           0     2 Sove… 1    
#> # ℹ 42 more rows
#> # ℹ 160 more variables: ADMIN <chr>, ADM0_A3 <chr>, GEOU_DIF <int>,
#> #   GEOUNIT <chr>, GU_A3 <chr>, SU_DIF <int>, SUBUNIT <chr>, SU_A3 <chr>,
#> #   BRK_DIFF <int>, NAME <chr>, NAME_LONG <chr>, BRK_A3 <chr>, BRK_NAME <chr>,
#> #   BRK_GROUP <chr>, ABBREV <chr>, POSTAL <chr>, FORMAL_EN <chr>,
#> #   FORMAL_FR <chr>, NAME_CIAWF <chr>, NOTE_ADM0 <chr>, NOTE_BRK <chr>,
#> #   NAME_SORT <chr>, NAME_ALT <chr>, MAPCOLOR7 <int>, MAPCOLOR8 <int>, …

Again I will draw a map to get an idea about the content of the data. Again there are the two methods using ggplot2::geom_sf() or base::plot().

Code Collection A.1 : Plotting Tutorial Shape File

R Code A.3 : Plotting Tutorial Map with {ggplot2}

Code
ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::theme_void()
Graph A.1: Tutorial map plotted with {ggplot2}

fill = NA makes the countries transparent.

(To get the same result as in the base R approach I used ggplot2::theme_void() to hide the coordinates which is shown in the original book example.)

R Code A.4 : Plotting Tutorial Map with base::plot()

Code
graphics::par(mar = c(0, 0, 0, 0))
base::plot(sf::st_geometry(countries))
Graph A.2: Tutorial map plotted with base::plot

The countries object has many (168) fields (attributes), but to start with we only need the geometry column. Instead of using base::plot(countries$geometry) we can obtain geometry by using the sf::st_geometry() function.

A.3 Loading GeoPackage (.gpkg)

The next dataset is rivers (linear geometry) saved in GeoPackage format. It is loaded in exactly the same way as the shapefile before.

However, this format can consist of multiple layers of different types. In this case, we must define which layer exactly we want to load. To check what layers are in the geopackage, use the sf::st_layers() function, and then specify it using the layer argument in sf::read_sf(). If the file only contains one layer, we don’t need to do this.

R Code A.5 : Check layers of GeoPackage file

Code
## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), 
                 "/data/annex-a/sf_load_save-main/data/")

sf::st_layers(base::paste0(r_spatial_path, "rivers.gpkg"))
#> Driver: GPKG 
#> Available layers:
#>   layer_name     geometry_type features fields crs_name
#> 1     rivers Multi Line String      228     38   WGS 84

We can now load the rivers layers and display the metadata like in the shapefile example.

R Code A.6 : Load rivers layers and show metadata

Code
(
  rivers = sf::read_sf(paste0(r_spatial_path, "rivers.gpkg"), layer = "rivers")
)
#> Simple feature collection with 228 features and 38 fields
#> Geometry type: MULTILINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -16.54233 ymin: -34.34378 xmax: 49.46094 ymax: 35.12311
#> Geodetic CRS:  WGS 84
#> # A tibble: 228 × 39
#>    dissolve  scalerank featurecla name  name_alt rivernum note  min_zoom name_en
#>    <chr>         <int> <chr>      <chr> <chr>       <int> <chr>    <dbl> <chr>  
#>  1 975River          9 River      <NA>  <NA>          975 <NA>       7.1 <NA>   
#>  2 976River          9 River      Rung… <NA>          976 <NA>       7.1 Rungwa 
#>  3 977River          9 River      Ligo… <NA>          977 <NA>       7.1 Ligonha
#>  4 978River          9 River      Dong… <NA>          978 <NA>       7.1 Dongwe 
#>  5 979River          9 River      Cuito <NA>          979 <NA>       7.1 Cuito  
#>  6 980Lake …         9 Lake Cent… <NA>  <NA>          980 <NA>       7.1 <NA>   
#>  7 980River          9 River      <NA>  <NA>          980 <NA>       7.1 <NA>   
#>  8 981River          9 River      Bagoé <NA>          981 <NA>       7.1 Bagoé  
#>  9 982River          9 River      Hade… <NA>          982 <NA>       7.1 Hadejia
#> 10 983River          9 River      Sous  <NA>          983 <NA>       7.1 Sous   
#> # ℹ 218 more rows
#> # ℹ 30 more variables: min_label <dbl>, ne_id <dbl>, label <chr>,
#> #   wikidataid <chr>, name_ar <chr>, name_bn <chr>, name_de <chr>,
#> #   name_es <chr>, name_fr <chr>, name_el <chr>, name_hi <chr>, name_hu <chr>,
#> #   name_id <chr>, name_it <chr>, name_ja <chr>, name_ko <chr>, name_nl <chr>,
#> #   name_pl <chr>, name_pt <chr>, name_ru <chr>, name_sv <chr>, name_tr <chr>,
#> #   name_vi <chr>, name_zh <chr>, name_fa <chr>, name_he <chr>, …

Visualizing the rivers data we we will plot this time the rivers against the background of country borders. Adding more layers to the visualization is done with the add = TRUE argument in the base::plot() function. Note that the order in which objects are added is important – the objects added last are displayed at the top. The col argument is used to set the color of the object.

Code Collection A.2 : Plotting rivers inside country borders

R Code A.7 : Visualizing rivers inside country borders using base::plot()

Code
base::plot(sf::st_geometry(countries))
base::plot(sf::st_geometry(rivers), add = TRUE, col = "blue")

R Code A.8 : Visualizing rivers inside country borders using ggplot2::geom_sf()

Code
ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::geom_sf(data = rivers, color = "blue") +
  ggplot2::theme_void()

Plotting rivers inside countriy borders with {ggplot2}

Plotting rivers inside countriy borders with {ggplot2}

Adding more layers to the visualization is done in {ggplot2} as always with the + sign.

Note that the map is different (better) sized than the base::plot() example.

A.4 Loading GeoJSON (.geojson)

R Code A.9 : Loading GeoJSON cities file

Code
## get path of r-spatial data
r_spatial_path = base::paste0(here::here(), 
                 "/data/annex-a/sf_load_save-main/data/")

## load and display the GeoJSON file
(
  cities = sf::read_sf(paste0(r_spatial_path, "cities.geojson"))
)
#> Simple feature collection with 1287 features and 31 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -34.52953 xmax: 51.12333 ymax: 37.29042
#> Geodetic CRS:  WGS 84
#> # A tibble: 1,287 × 32
#>    scalerank natscale labelrank featurecla      name   namepar namealt nameascii
#>        <int>    <int>     <int> <chr>           <chr>  <chr>   <chr>   <chr>    
#>  1        10        1         8 Admin-1 capital Bassar <NA>    <NA>    Bassar   
#>  2        10        1         8 Admin-1 capital Sotou… <NA>    <NA>    Sotouboua
#>  3        10        1         7 Admin-1 capital Meden… <NA>    <NA>    Medenine 
#>  4        10        1         7 Admin-1 capital Kebili <NA>    <NA>    Kebili   
#>  5        10        1         7 Admin-1 capital Tatao… <NA>    <NA>    Tataouine
#>  6        10        1         7 Admin-1 capital L'Ari… <NA>    <NA>    L'Ariana 
#>  7        10        1         7 Admin-1 capital Jendo… <NA>    <NA>    Jendouba 
#>  8        10        1         7 Admin-1 capital Kasse… <NA>    <NA>    Kasserine
#>  9        10        1         7 Admin-1 capital Sdid … <NA>    <NA>    Sdid Bou…
#> 10        10        1         7 Admin-1 capital Silia… <NA>    <NA>    Siliana  
#> # ℹ 1,277 more rows
#> # ℹ 24 more variables: adm0cap <int>, capalt <int>, capin <chr>,
#> #   worldcity <int>, megacity <int>, sov0name <chr>, sov_a3 <chr>,
#> #   adm0name <chr>, adm0_a3 <chr>, adm1name <chr>, iso_a2 <chr>, note <chr>,
#> #   latitude <dbl>, longitude <dbl>, pop_max <int>, pop_min <int>,
#> #   pop_other <int>, rank_max <int>, rank_min <int>, meganame <chr>,
#> #   ls_name <chr>, min_zoom <dbl>, ne_id <int>, geometry <POINT [°]>

The featurecla column that indicates the type of city. We want to print them selecting only state capitals.

We can print a column (attribute) in two ways, i.e. by specifying the column name in:

  • Single square brackets – a spatial object will be printed
  • Double square brackets (alternatively a dollar sign) – only the text will be printed

Code Collection A.3 : Print cities or capitals

R Code A.10 : Print cities as spatial objects

Code
cities["featurecla"]
#> Simple feature collection with 1287 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -34.52953 xmax: 51.12333 ymax: 37.29042
#> Geodetic CRS:  WGS 84
#> # A tibble: 1,287 × 2
#>    featurecla                  geometry
#>    <chr>                    <POINT [°]>
#>  1 Admin-1 capital    (0.7890036 9.261)
#>  2 Admin-1 capital (0.9849965 8.557002)
#>  3 Admin-1 capital       (10.4167 33.4)
#>  4 Admin-1 capital     (8.971003 33.69)
#>  5 Admin-1 capital         (10.4667 33)
#>  6 Admin-1 capital      (10.2 36.86667)
#>  7 Admin-1 capital      (8.749999 36.5)
#>  8 Admin-1 capital   (8.716698 35.2167)
#>  9 Admin-1 capital   (9.500004 35.0167)
#> 10 Admin-1 capital   (9.383302 36.0833)
#> # ℹ 1,277 more rows

R Code A.11 : Print cities as text

Code
glue::glue("######## Using double square brackets  #############")
utils::head(cities[["featurecla"]])

glue::glue("")
glue::glue("############### Using dollar sign ##################")
utils::head(cities$featurecla)
#> ######## Using double square brackets  #############
#> [1] "Admin-1 capital" "Admin-1 capital" "Admin-1 capital" "Admin-1 capital"
#> [5] "Admin-1 capital" "Admin-1 capital"
#> 
#> ############### Using dollar sign ##################
#> [1] "Admin-1 capital" "Admin-1 capital" "Admin-1 capital" "Admin-1 capital"
#> [5] "Admin-1 capital" "Admin-1 capital"

R Code A.12 : Detect category for capital cities

Tab “cities object” shows us that this layer contains 1287 different cities. To find out what types of cities these are, we can use the base::table() function, which will summarize them.

Code
base::table(cities$featurecla)
#> 
#>        Admin-0 capital    Admin-0 capital alt        Admin-1 capital 
#>                     54                      6                    609 
#> Admin-1 region capital        Populated place 
#>                     19                    599

We are not only interested in Admin-0 capital but also in Admin-0 capital alt because some countries have two capitals, e.g., an alternative capital.

R Code A.13 : Select and show first ten capital cities

Code
(
  capitals <- cities |> 
      dplyr::filter(featurecla == "Admin-0 capital" | 
                    featurecla == "Admin-0 capital alt") |> 
      dplyr::select(name)
)
#> Simple feature collection with 60 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -17.47508 ymin: -33.91807 xmax: 47.51468 ymax: 36.80278
#> Geodetic CRS:  WGS 84
#> # A tibble: 60 × 2
#>    name                   geometry
#>    <chr>               <POINT [°]>
#>  1 Lobamba        (31.2 -26.46667)
#>  2 Bir Lehlou (-9.652522 26.11917)
#>  3 Kigali     (30.05859 -1.951644)
#>  4 Mbabane    (31.13333 -26.31665)
#>  5 Juba        (31.58003 4.829975)
#>  6 Dodoma        (35.75 -6.183306)
#>  7 Laayoune   (-13.20001 27.14998)
#>  8 Djibouti      (43.148 11.59501)
#>  9 Banjul      (-16.5917 13.45388)
#> 10 Porto-Novo  (2.616626 6.483311)
#> # ℹ 50 more rows

We are interested in Admin-0 capital and Admin-0 capital alt types because some countries have two capitals. I filtered the data using the | (OR) operator.

I am using the tidyverse approach with the native pipe instead of using base R as in the blog article.

Code Collection A.4 : Plot Africa with rivers and capitals

In the last step, we prepare the final visualization.

R Code A.14 : Visualize African borders with rivers and capitals using base::plot()

Code
base::plot(sf::st_geometry(countries), main = "Africa", axes = TRUE, bgc = "deepskyblue",
     col = "burlywood")
base::plot(sf::st_geometry(rivers), add = TRUE, col = "blue")
base::plot(sf::st_geometry(capitals), add = TRUE, pch = 24, bg = "red", cex = 0.8)


We can add a title (main argument), axes (axes argument) and change the background color (bgc argument) of the figure. We can also change the point symbol (pch argument), set its size (cex argument) and fill color (bg argument).

R Code A.15 : Visualize African borders with rivers and capitals

Code
ggplot2::ggplot(data = countries) +
  ggplot2::geom_sf(fill = "burlywood") +
  ggplot2::geom_sf(data = rivers, color = "blue") +
  ggplot2::geom_sf(data = capitals, fill = "red", shape = 24) +
  ggplot2::theme(
    panel.background = ggplot2::element_rect(fill = "deepskyblue",
                                color = "deepskyblue")
  )

Africa

Africa

This is essential the same map, but this time generated with {gplot2}.