Appendix B: Creating Maps

B.1 Introduction

I am following here the article (FelixAnalytix 2023) and video (Felix Analytix 2023).

Article and video explain how to

create a choropleth world map using World Bank data,
zoom on any specific region of the world using Openstreetmap, and
improve the scale and color palette of your maps.

As an example Europe will be visualized. But you can reuse the R code shared in this article on any region of your choice.

Packages used in this article

Besides packages from the {tidyverse} the article uses

the {rnaturalearth} R package to download raw data from the Natural Earth Project,
the library {sf} to work with simple features in R,
and the library {wbstats} for the World Bank API.

B.2 Download world data

For doanloading world data for the map, we will call the rnaturalearth::ne_countries() function with scale as “medium” (we don’t need detailed geographic data) and the returnclass will be “sf” (we want an sf object to work with the {sf} package).

We will immediately Antartica remove from the dataset.

R Code B.1 : Download world data and remove Antartica

Code

world <- rnaturalearth::ne_countries(
        scale = "medium",
        returnclass = "sf"
    ) |> 
    dplyr::filter(admin != "Antarctica")

(For this R code chunk is no output available)

B.3 Changing world map projection

Changing the world map projection can easily be done using the sf::st_transform() function. Here we decide to use the mollweide projection.

The Mollweide is a pseudocylindrical projection in which the equator is represented as a straight horizontal line perpendicular to a central meridian that is one-half the equator’s length. The other parallels compress near the poles, while the other meridians are equally spaced at the equator. The meridians at 90 degrees east and west form a perfect circle, and the whole earth is depicted in a proportional 2:1 ellipse. The proportion of the area of the ellipse between any given parallel and the equator is the same as the proportion of the area on the globe between that parallel and the equator, but at the expense of shape distortion, which is significant at the perimeter of the ellipse… (Wikipedia)

There are numerous different projection explained in PROJ (Evenden et al. 2024) a generic coordinate transformation software that transforms geospatial coordinates from one coordinate reference system (CRS) to another. This includes cartographic projections as well as geodetic transformations.

Consulting the PROJ manual (PROJ contributors 2024) is vital to get the code that sf::st_transform() needs as parameter for the transformation. This code for the Mollweide projection is +proj=moll.

R Code B.2 : Change world map projection to Mollweide projection

Code

target_crs = "+proj=moll"
world_moll <- world |> 
    sf::st_transform(crs = target_crs)

B.4 Getting World Bank data

How to use the {wbstats} package is explained well in the package vignette. To get data for our map example we will download the indicator “employment (% of the labor force)”. It is an arbitrary indicator chosen from a list of all the indicators available in the World Bank API generated with the wbstats::wb_cachelist() function. We will filter this indicator using again the dplyr::filter() function.

For the most recent information on available data from the World Bank API wbstats::wb_cache() downloads an updated version of the information stored in wb_cachelist. wb_cachelist is simply a saved return of wb_cache(lang = “en”). To use this updated information in wb_search() or wb_data(), set the cache parameter to the saved list returned from wb_cache(). It is always a good idea to use this updated information to insure that you have access to the latest available information, such as newly added indicators or data sources.

Download “Unemployment” indicator

To download data you need to follow several steps:

Instead of using the information stored in wbstats::wb_cachelist() update data with wbstats::wb_cache()
Use wbstats::wb_search() to find the parameter(s) for the indicator(s) your are looking for.
Filter the search results to keep only the indicator(s) you are interested in.
Download World Bank the appropriate indicator data with wbstats::wb_data().

Code Collection B.1 : Download world bank data

R Code B.3 : Update world bank data

The following chunk takes several seconds. Run this code chunk manually if you need an updated version.

Code

pb_create_folder("data/annex-b")

new_cache <- wbstats::wb_cache(lang = "en")
pb_save_data_file("annex-b", new_cache, "new_cache.RDS")

(For this R code chunk is no output available)

The expression in the parentheses lang = "en" wouldn’t be necessary because the default language is English.

R Code B.4 : Compare cachelist and new_cache World Bank datasets

Code

file_to_check <- "data/annex-b/new_cache.RDS"

if (file.exists(file_to_check)) {
    new_cache <- base::readRDS(file_to_check)
    base::cat("********** Updated World Bank Data Set *********\n")
    utils::str(new_cache, max.level = 1)
} else {
    base::cat("********** No new dataset available *********\n")
    base::cat("********** Run the previous 'Update data' tab manually *********\n")
}

base::cat("\n********** World Bank Data Set in wb_cachelist *********\n")
utils::str(wbstats::wb_cachelist, max.level = 1)

#> ********** Updated World Bank Data Set *********
#> List of 8
#>  $ countries    : tibble [296 × 18] (S3: tbl_df/tbl/data.frame)
#>  $ indicators   : tibble [23,678 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ sources      : tibble [69 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ topics       : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ regions      : tibble [44 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ languages    : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)
#> 
#> ********** World Bank Data Set in wb_cachelist *********
#> List of 8
#>  $ countries    : tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
#>  $ indicators   : tibble [16,649 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ sources      : tibble [63 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ topics       : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ regions      : tibble [48 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ languages    : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)

There are some notable differences in my datasets between the release of the {wbstats} package (2020-12-04) and my update four year later (2024-12-31):

The number of countries and regions is in the updated version smaller than in the wbstats::wb_cachelist version.
The numbers of indicators has in the updated list risen from 16,649 to 23,678.
The number of sources has increased slightly from 63 to 69.
The other four data frames didn’t change.

Note

The numbers are the result of the differences. For instance the difference of four regions is the result of two more regions in the updates version (“Africa Eastern and Southern” and “Africa Western and Central”) contra six less regions (“Andean Region”, “Latin America and the Caribbean”, “Central America”, “Non-resource rich Sub-Saharan Africa countries, of which landlocked”, “Resource rich Sub-Saharan Africa countries, of which oil exporters”, “Southern Cone”)

The situation is even more complicated in the country case: There are in sum 64 differences in various columns, some of them just different names (Czech Republik versus Czechia). These 64 distinctions consist of more and less countries in the two datasets compared and boil down to the number of eight rows. But behind these eight rows lurk 64 mismatches.

The four data frames with the same number of rows (“topics”, “income_levels”, “lending_types”, “languages) are not the result of different mismachtes. These four tibbles are actually identical.

R Code B.5 : Show World Bank datasets

Code

file_to_check <- "data/annex-b/new_cache.RDS"


if (file.exists(file_to_check)) {
    my_cache <- base::readRDS(file_to_check)
} else {
    my_cache <- wbstats::wb_cachelist
}

base::cat("********** List details of all eight tibbles *********\n")

#> ********** List details of all eight tibbles *********

Code

my_cache

#> $countries
#> # A tibble: 296 × 18
#>    iso3c iso2c country capital_city longitude latitude region_iso3c region_iso2c
#>    <chr> <chr> <chr>   <chr>            <dbl>    <dbl> <chr>        <chr>       
#>  1 ABW   AW    Aruba   Oranjestad      -70.0     12.5  LCN          ZJ          
#>  2 AFE   ZH    Africa… <NA>             NA       NA    <NA>         <NA>        
#>  3 AFG   AF    Afghan… Kabul            69.2     34.5  SAS          8S          
#>  4 AFR   A9    Africa  <NA>             NA       NA    <NA>         <NA>        
#>  5 AFW   ZI    Africa… <NA>             NA       NA    <NA>         <NA>        
#>  6 AGO   AO    Angola  Luanda           13.2     -8.81 SSF          ZG          
#>  7 ALB   AL    Albania Tirane           19.8     41.3  ECS          Z7          
#>  8 AND   AD    Andorra Andorra la …      1.52    42.5  ECS          Z7          
#>  9 ARB   1A    Arab W… <NA>             NA       NA    <NA>         <NA>        
#> 10 ARE   AE    United… Abu Dhabi        54.4     24.5  MEA          ZQ          
#> # ℹ 286 more rows
#> # ℹ 10 more variables: region <chr>, admin_region_iso3c <chr>,
#> #   admin_region_iso2c <chr>, admin_region <chr>, income_level_iso3c <chr>,
#> #   income_level_iso2c <chr>, income_level <chr>, lending_type_iso3c <chr>,
#> #   lending_type_iso2c <chr>, lending_type <chr>
#> 
#> $indicators
#> # A tibble: 23,678 × 8
#>    indicator_id       indicator unit  indicator_desc source_org topics source_id
#>    <chr>              <chr>     <lgl> <chr>          <chr>      <list>     <dbl>
#>  1 1.0.HCount.1.90usd Poverty … NA    The poverty h… LAC Equit… <df>          37
#>  2 1.0.HCount.2.5usd  Poverty … NA    The poverty h… LAC Equit… <df>          37
#>  3 1.0.HCount.Mid10t… Middle C… NA    The poverty h… LAC Equit… <df>          37
#>  4 1.0.HCount.Ofcl    Official… NA    The poverty h… LAC Equit… <df>          37
#>  5 1.0.HCount.Poor4u… Poverty … NA    The poverty h… LAC Equit… <df>          37
#>  6 1.0.HCount.Vul4to… Vulnerab… NA    The poverty h… LAC Equit… <df>          37
#>  7 1.0.PGap.1.90usd   Poverty … NA    The poverty g… LAC Equit… <df>          37
#>  8 1.0.PGap.2.5usd    Poverty … NA    The poverty g… LAC Equit… <df>          37
#>  9 1.0.PGap.Poor4uds  Poverty … NA    The poverty g… LAC Equit… <df>          37
#> 10 1.0.PSev.1.90usd   Poverty … NA    The poverty s… LAC Equit… <df>          37
#> # ℹ 23,668 more rows
#> # ℹ 1 more variable: source <chr>
#> 
#> $sources
#> # A tibble: 69 × 9
#>    source_id last_updated source              source_code source_desc source_url
#>        <dbl> <date>       <chr>               <chr>       <lgl>       <lgl>     
#>  1         1 2021-08-18   Doing Business      DBS         NA          NA        
#>  2         2 2024-12-16   World Development … WDI         NA          NA        
#>  3         3 2024-11-05   Worldwide Governan… WGI         NA          NA        
#>  4         5 2016-03-21   Subnational Malnut… SNM         NA          NA        
#>  5         6 2024-12-03   International Debt… IDS         NA          NA        
#>  6        11 2013-02-22   Africa Development… ADI         NA          NA        
#>  7        12 2024-06-25   Education Statisti… EDS         NA          NA        
#>  8        13 2022-03-25   Enterprise Surveys  ESY         NA          NA        
#>  9        14 2024-12-17   Gender Statistics   GDS         NA          NA        
#> 10        15 2024-12-19   Global Economic Mo… GEM         NA          NA        
#> # ℹ 59 more rows
#> # ℹ 3 more variables: data_available <lgl>, metadata_available <lgl>,
#> #   concepts <dbl>
#> 
#> $topics
#> # A tibble: 21 × 3
#>    topic_id topic                           topic_desc                          
#>       <dbl> <chr>                           <chr>                               
#>  1        1 Agriculture & Rural Development "For the 70 percent of the world's …
#>  2        2 Aid Effectiveness               "Aid effectiveness is the impact th…
#>  3        3 Economy & Growth                "Economic growth is central to econ…
#>  4        4 Education                       "Education is one of the most power…
#>  5        5 Energy & Mining                 "The world economy needs ever-incre…
#>  6        6 Environment                     "Natural and man-made environmental…
#>  7        7 Financial Sector                "An economy's financial markets are…
#>  8        8 Health                          "Improving health is central to the…
#>  9        9 Infrastructure                  "Infrastructure helps determine the…
#> 10       10 Social Protection & Labor       "The supply of labor available in a…
#> # ℹ 11 more rows
#> 
#> $regions
#> # A tibble: 44 × 4
#>    region_id iso3c iso2c region                                              
#>        <dbl> <chr> <chr> <chr>                                               
#>  1        NA AFE   ZH    Africa Eastern and Southern                         
#>  2        NA AFR   A9    Africa                                              
#>  3        NA AFW   ZI    Africa Western and Central                          
#>  4        NA ARB   1A    Arab World                                          
#>  5        NA CAA   C9    Sub-Saharan Africa (IFC classification)             
#>  6        NA CEA   C4    East Asia and the Pacific (IFC classification)      
#>  7        NA CEB   B8    Central Europe and the Baltics                      
#>  8        NA CEU   C5    Europe and Central Asia (IFC classification)        
#>  9        NA CLA   C6    Latin America and the Caribbean (IFC classification)
#> 10        NA CME   C7    Middle East and North Africa (IFC classification)   
#> # ℹ 34 more rows
#> 
#> $income_levels
#> # A tibble: 7 × 3
#>   iso3c iso2c income_level       
#>   <chr> <chr> <chr>              
#> 1 HIC   XD    High income        
#> 2 INX   XY    Not classified     
#> 3 LIC   XM    Low income         
#> 4 LMC   XN    Lower middle income
#> 5 LMY   XO    Low & middle income
#> 6 MIC   XP    Middle income      
#> 7 UMC   XT    Upper middle income
#> 
#> $lending_types
#> # A tibble: 4 × 3
#>   iso3c iso2c lending_type  
#>   <chr> <chr> <chr>         
#> 1 IBD   XF    IBRD          
#> 2 IDB   XH    Blend         
#> 3 IDX   XI    IDA           
#> 4 LNX   XX    Not classified
#> 
#> $languages
#> # A tibble: 23 × 3
#>    iso2  lang       lang_native     
#>    <chr> <chr>      <chr>           
#>  1 en    English    English         
#>  2 es    Spanish    Español         
#>  3 fr    French     Français        
#>  4 ar    Arabic     عربي            
#>  5 zh    Chinese    中文            
#>  6 bg    Bulgarian  Български       
#>  7 de    German     Deutsch         
#>  8 hi    Hindi      हिंदी            
#>  9 id    Indonesian Bahasa indonesia
#> 10 ja    Japanese   日本語          
#> # ℹ 13 more rows

R Code B.6 : Search World Bank data

Code

file_to_check <- "data/annex-b/new_cache.RDS"

if (file.exists(file_to_check)) {
    my_cache <- base::readRDS(file_to_check)
} else {
    my_cache <- wbstats::wb_cachelist
}

employment_inds<- wbstats::wb_search(
    pattern = "Unemployment, total",
    extra = TRUE,
    cache = my_cache)

dplyr::glimpse(employment_inds)

#> Rows: 2
#> Columns: 8
#> $ indicator_id   <chr> "SL.UEM.TOTL.NE.ZS", "SL.UEM.TOTL.ZS"
#> $ indicator      <chr> "Unemployment, total (% of total labor force) (national…
#> $ unit           <lgl> NA, NA
#> $ indicator_desc <chr> "Unemployment refers to the share of the labor force th…
#> $ source_org     <chr> "International Labour Organization. “Labour Force Stati…
#> $ topics         <list> [<data.frame[1 x 2]>], [<data.frame[2 x 2]>]
#> $ source_id      <dbl> 2, 2
#> $ source         <chr> "World Development Indicators", "World Development Indi…

Searching for “Unemployment (% of the labor force)” returned an empty data frame. The new (updated) search string is “Unemployment, total”. This search returns two results. The first row uses for the percentage of unemployment national statistics and can therefore vary between different countries. The second row uses the modeled estimate of the ILO.

I am going to use the ILO unemployment indicator SL.UEM.TOTL.ZS as used in the blog example by Felix Analytix.

To provide the source for creating graphics I need the content of the “source_orgcolumn. By the default the parameterextra = FALSEprovides only indicator ID, the short name and the description of the indicator. To get all columns of the cache parameters's indicator we need to setextra = TRUE`.

Important

The larger string “Unemployment, total (% of total labor force)” did not work for me and returned an empty data frame.

One has to have already specific knowledge to get not too many rows returned. If you are using just “Unemployment” then you have to skim through 93 observations to find your indicator.

R Code B.7 : Filter World Bank data

Code

ind <- "SL.UEM.TOTL.ZS"

indicator_info <- employment_inds |> 
    dplyr::filter(indicator_id == ind)

indicator_info$indicator
indicator_info$indicator_desc
indicator_info$source_org

#> [1] "Unemployment, total (% of total labor force) (modeled ILO estimate)"
#> [1] "Unemployment refers to the share of the labor force that is without work but available for and seeking employment."
#> [1] "International Labour Organization. “ILO Modelled Estimates and Projections database (ILOEST)” ILOSTAT. Accessed June 18, 2024. https://ilostat.ilo.org/data/."

R Code B.8 : Numbered R Code Title

Code

file_to_check <- "data/annex-b/new_cache.RDS"

if (file.exists(file_to_check)) {
    my_cache <- base::readRDS(file_to_check)
} else {
    my_cache <- wbstats::wb_cachelist
}

df <- wbstats::wb_data(
    indicator = ind,
    start_date = 2023,
    end_date = 2023,
    cache = my_cache
    )

dplyr::glimpse(df)

#> Rows: 217
#> Columns: 9
#> $ iso2c          <chr> "AW", "AF", "AO", "AL", "AD", "AE", "AR", "AM", "AS", "…
#> $ iso3c          <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARE", "ARG", "ARM",…
#> $ country        <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra",…
#> $ date           <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2…
#> $ SL.UEM.TOTL.ZS <dbl> NA, 14.386, 14.620, 11.584, NA, 2.706, 6.178, 8.586, NA…
#> $ unit           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ obs_status     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ footnote       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ last_updated   <date> 2024-12-16, 2024-12-16, 2024-12-16, 2024-12-16, 2024-1…

Looking at https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS, I learned the latest data are from 2023.

B.5 Visualizing data distribution

With a very quick data exploratory analysis we will see the distribution of our indicator.

Code Collection B.2 : Visualize data distribution

Data as-is
Data transformed

R Code B.9 : Show the data distribution without transformation

Code

df |> 
    dplyr::filter(!is.na(SL.UEM.TOTL.ZS)) |> 
    ggplot2::ggplot(ggplot2::aes(SL.UEM.TOTL.ZS)) +
    ggplot2::geom_histogram(bins = 100)

Graph B.2: Data distribution of unemployment rate

Most of the data show 3-6% unemployment rate. This can be a bit annoying because we will hardly see the difference in the color palette for these specific percentages.

I will therefore use a square root transformation on the percentage values so that the color palette will be better distributed (showed in the next tab).

R Code B.10 : Show the data distribution with a square root transformation

Code

df |> 
    dplyr::filter(!is.na(SL.UEM.TOTL.ZS)) |> 
    ggplot2::ggplot(ggplot2::aes(SL.UEM.TOTL.ZS)) +
    ggplot2::geom_histogram(bins = 100) +
    ggplot2::scale_x_sqrt()

Graph B.3: Data distribution of unemployment rate with square root transformation

With the square root transformation the data are somewhat better distributed.

B.6 Plotting a world map

We are now prepared to join our World Bank data to the previously generated geographic world_moll sf object (see: R Code B.2).

Watch out! B.1: “iso_a3” does not contain all ISO 3166 Country Codes

The blog entry recommends to join the World Bank data object df by “iso3c” code which represents the alpha-3 ISO 3166 Country Codes). But it does it with “iso_a3” code from the naturalearthdata object “wordl_moll”. But it turned out that in my case “iso_a3” does not contain correct data for all countries. For instance is has -99 for France and Norway instead of the valide codes of “FRA” and “NOR”. The result is that these tw3o countries are greyed out in the colored map reprsentation of the unemployment rate.

Instead I am going to use “adm0_a3”. I could also use “iso_a3_eh” as a Brave-KI research result says:

The iso_a3_eh field in Natural Earth data is used to provide a more flexible interpretation of ISO 3-character country codes. The “eh” suffix stands for “okay, but not great” in American English slang, indicating that while the code is not strictly accurate according to ISO standards, it is close enough for most practical purposes. This field was added to help users who are looking for a simpler, albeit less precise, match for their data visualization or analysis needs. For instance, France’s iso_a2 field is set to -99 to indicate a strict ISO match is not available, while iso_a2_eh is set to FR. Similarly, Norway’s iso_a3 might be -99, but iso_a3_eh would be NOR.

See also the discussion in the GitHub thread especially the following comment/answer of the developer.

Using the {ggplot2} R package, we will create a world map with the function ggplot2::geom_sf(). We will also make the square root transformation to see more clearly the percentage differences.

R Code B.11 : Unemployment rate in 2023

Code

world_moll |>  
    dplyr::left_join(df, by = c("adm0_a3" = "iso3c")) |> 
    ggplot2::ggplot() + 
    ggplot2::geom_sf(ggplot2::aes(fill = SL.UEM.TOTL.ZS)) + 
    ggplot2::scale_fill_viridis_c(
        trans = "sqrt", 
        labels = scales::percent_format(scale = 1), 
        breaks = c(1:5)^2
        ) + 
    
    ## fix labels if needed: https://stackoverflow.com/a/60733863 
    ## not necessary, only for Linux/Fedora systems
    # ggplot2::scale_x_continuous(labels = function(x) paste(x, '\udoBo' , "W")) +
    # ggplot2::scale_y_continuous(labels = function(x) paste(x, '\uboBo', "N" )) +
    
    ggplot2::theme_bw() + 
    ggplot2::theme(
        panel.background = ggplot2::element_rect(fill = "aliceblue")
        ) + 
    ggplot2::labs (
        title = paste(unique(df$date), indicator_info$indicator), 
        fill = NULL, 
        caption = paste0("Source: ", 
            stringr::str_split_i(indicator_info$source_org, "\\. ", 1),
            ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 2),
             ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 3),
            ". ",
             stringr::str_split_i(indicator_info$source_org, "\\. ", -1)
        )
    )

Graph B.4: World map of 2023 unemployment, data source: World Bank

The two data frames are joined by the ISO-alpha3 codes that have different names in the two datasets.

The scale_fill_viridis() function is designed to be perceived by viewers with common forms of colour blindness. We have used the ggplot2::scale_fill_viridis_c() function, and applied the “trans” argument to specify the “sqrt” transformation. We have also adjusted the “breaks” argument accordingly.

Without splitting the caption text it would be too long to fit into one line.

B.7 Zooming into a specific area

To zoom on a specific area, you need to know its coordinates, i.e., its bounding box. You can’t determine the exact coordinates from the world map because the graph is too coarse. OpenStreetMap has a nice tool to get the coordinates of a specific bounding box.

OpenStreetMap export view: On the left panel with 25% of the page width you see the coordinates of the bounding box from the right side, overlaying a world map (75% of the page width). The coordinates are written in a gray box on the left panes with four figures in little white fields situated top, left, bottom and right. The figures are (starting from the top clockwise): 71.5, 41.5, 34.31, -25.5 — Graph B.5: Screenshot: Creating bounding box with OpenStreetMap.

The coordinates for my bounding box which I have drawn manually with the cursor on the world map are written in the gray box on the left pane: Starting from the top clockwise: 71.5, 41.5, 34.31, -25.5 they represent latitude and longitude and can be used in the R code. (Before I could draw the bounding box I had to click the link under the gray bounding box in the left pane saying “Choose another area manually”¹.)

Note

I didn’t know how to get maps from continents, such as Europe, for a long time. It was relatively easy for Africa to sort by country names. However, this strategy did not work for Europe, where, for instance, France and Great Britain still have several overseas territories (see: What is a country?).

Now, I have learned that the trick is to zoom in and clip the relevant part of the world map downloaded via Natural Earth (see also Making Maps in R, chapter 23 of the open source book “Working in R” (Soulé, Halbritter, and Telford 2024).

R Code B.12 : Zooming into a area manually specified

Code

window_coord <- sf::st_sfc( 
    sf::st_point(c(-24.5, 34.31)),  # left, bottom 
    sf::st_point(c(41.5, 71.5)),    # right, top 
    crs = 4326 # the EPSG identifier of WGS84 (used in GPS) 
    ) 
    
window_coord_sf <- window_coord |>  
    sf::st_transform(crs = target_crs) |>  
    sf::st_coordinates() # retrieve coordinates

R Code B.13 : Show map of Europe (first version)

Code

world_moll |>  
    dplyr::left_join(df, by = c("adm0_a3" = "iso3c")) |> 
    ggplot2::ggplot() + 
    ggplot2::geom_sf(ggplot2::aes(fill = SL.UEM.TOTL.ZS)) + 

    ## window of the map 
    ggplot2::coord_sf(
        xlim = window_coord_sf[, "X"], 
        ylim = window_coord_sf[, "Y"], 
        expand = FALSE 
        ) +
    
    ggplot2::scale_fill_viridis_c(
    trans = "sqrt", 
    labels = scales::percent_format(scale = 1), 
    breaks = c(1:5)^2
    ) + 

    ## fix labels not needed: https://stackoverflow.com/a/60733863 

    ggplot2::theme_bw() + 
    ggplot2::theme(
        panel.background = ggplot2::element_rect(fill = "aliceblue"),
        aspect.ratio = 3/4
        ) + 
    ggplot2::labs (
        title = paste(unique(df$date), indicator_info$indicator), 
        fill = NULL, 
        caption = paste0("Source: ", 
            stringr::str_split_i(indicator_info$source_org, "\\. ", 1),
            ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 2),
             ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 3),
            ". ",
             stringr::str_split_i(indicator_info$source_org, "\\. ", -1)
        )
    )

Graph B.6: Rate of unemployment in Europe 2023, data source: World Bank

To get a graphic with more width I have changed the aspect ration slightly form 1 to aspect.ratio = 3/4.

There are no data (NA) for the unemployment ratio available for Ukraine and Kosovo. These two countries are therefore colored grey.

B.8 Changing scale and color distribution

In the 2020 dataset used by the blog article I am referring here, there was the unemployment rate quite different. Especially the very low in Greenland and the very high in the African countries distorted the visual outcome. The extreme yellow and dark blue color with the highest and lowest percentage was not within the European countries.

We will therefore remove the African countries and Greenland. We will also reduce our data to European countries only. So now the extreme yellow and dark blue color will be the highest and lowest percentage within the European countries. We will also remove the square root transformation (the square root transformation can be misleading for some audience).

R Code B.14 : Show map of Europe (second version)

Code

world_moll |>  
    dplyr::left_join(df, by = c("adm0_a3" = "iso3c")) |> 
    dplyr::filter(continent == "Europe") |> 
    ggplot2::ggplot() + 
    ggplot2::geom_sf(ggplot2::aes(fill = SL.UEM.TOTL.ZS)) + 

    ## window of the map 
    ggplot2::coord_sf(
        xlim = window_coord_sf[, "X"], 
        ylim = window_coord_sf[, "Y"], 
        expand = FALSE 
        ) +
    
    ggplot2::scale_fill_viridis_c(
    # trans = "sqrt", 
    labels = scales::percent_format(scale = 1) #, 
    # breaks = c(1:5)^2
    ) + 

    ## fix labels not needed: https://stackoverflow.com/a/60733863 

    ggplot2::theme_bw() + 
    ggplot2::theme(
        panel.background = ggplot2::element_rect(fill = "aliceblue"),
        aspect.ratio = 3/4
        ) + 
    ggplot2::labs (
        title = paste(unique(df$date), indicator_info$indicator), 
        fill = NULL, 
        caption = paste0("Source: ", 
            stringr::str_split_i(indicator_info$source_org, "\\. ", 1),
            ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 2),
             ".\n",
            stringr::str_split_i(indicator_info$source_org, "\\. ", 3),
            ". ",
             stringr::str_split_i(indicator_info$source_org, "\\. ", -1)
        )
    )

Graph B.7: Rate of unemployment with 2023 European data only, data source: World Bank

The result surprised me. The graphic of the unemployment rate for Europe is now quite different. As the article by (FelixAnalytix 2023) correctly wrote:

… we see here a comparison of unemployment between European countries only, while in the previous plot (Graph B.6 in my article) we were using global data on unemployment. Spain and Baltic countries are popping up much more, while it was less the case when using a scale based on global world unemployment.

References

Evenden, Gerald I., Even Rouault, Frank Warmerdam, Kristian Evers, Thomas Knudsen, Howard Butler, Mike W. Taves, et al. 2024. PROJ. Zenodo. https://doi.org/10.5281/ZENODO.5884394.

Felix Analytix. 2023. “Using r to Map Europe, Asia, Africa, America, Etc.” January. https://www.youtube.com/watch?v=KZcKv3HgzII.

FelixAnalytix. 2023. “How to Map ANY Region of the World Using r Programming.” https://felixanalytix.medium.com/how-to-map-any-region-of-the-world-using-r-programming-bb3c4146f97f.

PROJ contributors. 2024. “PROJ Coordinate Transformation Software Library.” https://doi.org/10.5281/zenodo.5884394.

Soulé, Jonathan, Aud Halbritter, and Richard J. Telford. 2024. “Working in R.” https://biostats-r.github.io/biostats/workingInR/.

or similar: The text in my page was in German: “Einen anderen Bereich manuell auswählen”.↩︎