5 Lab 1: Intro to RStudio and APIs
For this exercise, you will query species occurrence records from the GBIF API and create your own map that addresses the common landscape ecology mapping issue of overplotting using any technique of your choice such as those from the tutorial.
Learning Objectives:
Learn how to use RMarkdown to publish a reproducible report to PDF
Understand the concept of an API in practice for data using GBIF
Apply a basic method for overcoming the issue of overplotting when mapping
5.1 Install Packages
For this first exercise, I will include more components for you to get you started. First, let’s load our required packages from the tutorial that you may need. If these aren’t installed already, please install them. You may also use other packages or techniques of your own that satisfy the exercise questions.
library(tinytex) #Cross-platform package for PDF publishing
#tinytex::install_tinytex() #Install tinytex
library(rgbif) #Package for accessing GBIF API
library(ggplot2) #Plotting library
library(maps) #Basemap library
library(hexbin) #Hexbin for plotting
library(ggthemes) #Themes for plotting
library(gganimate) #Animation library for ggplot
library(gifski) #GIF-making library
5.2 GBIF Web Interface Exploration
First, go to GBIF and select a species over a region that you will map. Note how all of the filters you apply here are being applied to the same API that you will access through R. You can first build your query here so that you know what to expect when you bring the data in using the rgbif package. For example, if you select a species and then an individual country or set of years, the web address at the top will reflect these filters that are being applied to the GBIF database. These filters work the same way in the R package!
5.2.1 Species selection (10 pts)
Once you select a species and region, narrow down your query so that you have less than 20,000 occurrence records total by adding additional filters. Having more occurrences will work but may be very slow to access through the API (feel free though if you want to wait). If you have less than 300 occurrences please find another species or expand your spatial/temporal extent.
Please fill out the following two questions to practice adding plain text to an RMarkdown document. You may want to knit occasionally to see/test these changes being added to your PDF before moving on to coding.
1) What species did you select? (5 pts)
2) What other filters are you applying to the species occurrence data? (5 pts)
5.2.2 GBIF API Call (40 pts)
In this RMarkdown file, I have provided three code chunks for you to work in, but feel free to add and annotate as many as you need to complete the exercise.
Bring in the occurrence data for your species from the GBIF API using the occ_data function. Specify that the records have coordinates for the records of your region of interest. This may take a few minutes to bring into your session, which is another reason why we want to only do this once!
Clean data as in the tutorial by converting to a data frame, removing NA for year, and selecting the same columns.
Save your data locally using write.csv() and # out your code once you have your local data so that it doesn’t run again when you knit the report.
Next, read the data file you created back in so that it can be used by the rest of your .rmd document.
Grading: 20 pts for API query (-5 pts for each API query that doesn’t match your filters from the Species Selection section), 10 pts for data cleaning, 5 pts for saving file, 5 points for reading file back in
5.2.3 Species Occurrence Mapping (40 pts)
Make a map of your species occurrences without addressing overplotting issues by just plotting the points (20 pts):
The plot should:
Plot your occurrence points (10 pts)
Have basemap under all points (5 pts)
Have a title (5 pts)
Make a map of your species occurrences with one method that addresses overplotting (20 pts):
The plot should:
Address overplotting clearly showing occurrence density (10 pts)
Have basemap under all points (5 pts)
Have a title (5 pts)
5.2.4 Data Exploration (10 pts)
What was the most common database source for your chosen species occurrence data? Please find this by using R code (5 pts for correct answer):
Hint: You could use the table() function.
Create a barchart of your observations for the top 10 contributing datasets for your species occurrences, in descending order of observations:
Hint: You can do this through geom_bar() if using ggplot or with base R functions