Chapter 1 The first steps in R
1.1 RStudio for the first time
When you open RStudio, it is typically split in 4 panels as you can see in the figure below:
Figure 1: Illustration of RStudio
These panels show:
- Your script file, which is an ordinary text file with suffix “.R”. For instance, yourFavoritFileName.R. This file contains your code.
- All objects we have defined in the environment.
- Help files and plots.
- The console, which allows you to type commands and display results.
Some other terms that will be useful:
- Working directory: The file-directory you are working in. Useful commands: with getwd() you get the location of your current working directory and setwd() allows you to set a new location for it. This is very useful when you would like to load or save several files. You can just set your working directory globally and all files can be loaded or will be saved into this directory.
- Workspace: This is a hidden file (stored in the working directory), where all objects you use (e.g., data, matrices, vectors, variables, functions, etc.) are stored. Useful commands: ls() shows all elements in our current workspace and rm(list=ls()) deletes all elements in our current workspace. It is also possible to remove only some objects with rm(object1, object2).
1.2 Simple Calculations
We start with a very basic calculation. Let’s type 1+1 into the console window and press enter. We get
## [1] 2
i.e. the result of the calculation is returned.
Note that everything that is written after the #-sign is ignored by R, which is very useful to comment your code. The second window above, starting with ##, shows the output.
Let’s consider a second calculation
## Error: <text>:2:0: unexpected end of input
## 1: 1+2+3+ # second calculation
## ^
This calculation ended with an error. The reasons is that the command can only processed if entered correctly.
## [1] 10
A similar thing happens when parentheses are not closed.
## Error: <text>:2:0: unexpected end of input
## 1: 2*(2+3
## ^
It will be helpful to use a script file such as yourFavoritFileName.R to store your R commands. Otherwise, you would have to type your code again when an error occurred. You can send single lines or marked regions of your R-code to the console by pressing the keys STRG+ENTER.
1.3 Assigments
The assignment operator will be your most often used tool. Here I state an example where a scalar variable is created:
## [1] 9
## [1] 9
## [1] 10
Note: The R community loves the <-
assignment operator, which is a very unusual syntax. Alternatively, you can use the =
operator:
## [1] 9
## [1] 9
We consider now a vector, an object you will use frequently
## [1] 1 3 5 6
As discussed, ls()
states the content of the workspace, whereas rm(list =ls())
deletes the workspace.
## [1] "x" "y" "z"
## character(0)
In the next step we will consider some vector mutliplication. There are three different ways to multiply a vector, namely element by element, using the inner product, or using the outer product. Element by element gives you a vector of the same dimension.
## [1] 1 9 25 36
The function t()
gives the transpose of a vector (or matrix). Therefore, the inner product of the vector is given by
## [,1]
## [1,] 71
## [1] "matrix"
## [1] "numeric"
Note that R stores zTz as matrix.
Finally, we have the outer product that gives us a matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 6
## [2,] 3 9 15 18
## [3,] 5 15 25 30
## [4,] 6 18 30 36
Be very careful with %*%
versus *
. They don’t lead to the same result!! Using the transpose explicitly might help to avoid mistakes.
You can also multiply a vector with a scalar:
## [1] 4 12 20 24
A strength of R is that operations can be conducted for each element of the vector in one statement
## [1] 1 9 25 36
## [1] 0.000000 1.098612 1.609438 1.791759
## [1] -1.2402159 -0.3382407 0.5637345 1.0147221
So far, we have seen c() that can be used to combine objects. All function calls use the same general notation: a function name is always followed by round parentheses. Sometimes, the parentheses include arguments, such as
## [1] 1 2 3 4 5
## [1] 1 2 3
We consider now a matrix
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
## [1] 3 3
## NULL
## [1] 5
Element by element multiplication for M is given by
## [,1] [,2] [,3]
## [1,] 1 16 49
## [2,] 4 25 64
## [3,] 9 36 81
In addition
## [,1] [,2] [,3]
## [1,] 30 66 102
## [2,] 36 81 126
## [3,] 42 96 150
## [,1] [,2] [,3]
## [1,] 14 32 50
## [2,] 32 77 122
## [3,] 50 122 194
## [,1]
## [1,] 30
## [2,] 36
## [3,] 42
is the standard matrix multiplication. We can access the elements of a matrix with
## [1] 7 8 9
## [1] 9
1.4 Further Data Objects
Besides classical data objects such as scalars, vectors, and matrices there are three further data objects in R:
1.The array: A matrix but with more dimensions. Here is an example of a 2×2×2-dimensional array
- The list: In
lists
you can organize different kinds of data. E.g., consider the following example:
myFirst.List <- list("Some_Numbers" = c(66, 76, 55, 12, 4, 66, 8, 99),
"Animals" = c("Rabbit", "Cat", "Elefant"),
"My_Series" = c(30:1))
- The data frame: A
data.frame
is alist
-object but with some more formal restrictions (e.g., equal number of rows for all columns). As indicated by its name, adata.frame
-object is designed to store data:
1.5 R Help
There exist help files for all commands in R. For example consider the help file of the function sum(). You can access the help file with
The file usually contains the categories: Description, Usage, Arguments, Details, Value and References.
After the general description follows the syntax of the command. This usually indicates which (optional) arguments the command allows. Then follows na.rm., which indicates whether missing values (na) will be removed (rm). The default is FALSE.
## [1] 1 2 NA 4 5
## [1] NA
## [1] 12
Details describes the exact calculation (trivial in this case), and some command specific characteristics. Value describes what is returned, in the case of sum() it is obviously a sum of numbers, given that numeric data was used as an input.
1.6 R-packages
There are tons of R-packages that are useful for different aspects of data analysis. Some packages that can help you make beautiful plots and that you might want check out are the package ggplot2 and the collection of packages tidyverse. You can install the packages with install.package(packagename). In order to use the package type library(package). The package has to be installed only once, but you have to load the package with library(package) every time you want to use it.