R - Basics

Marcio Diniz | Michael Luu

Cedars Sinai Medical Center

06 September, 2022

Introduction

First steps in R

Coding in R is like playing with Lego

  • R is an object-oriented language;
  • Objects are boxes that we can use to store different classes of content;
  • Objects need names. Names should always start with letters and it also can contain numbers, but only a few special characters: underscore or period;

Types of objects

There are different types of objects

  • Objects can be classified in three classes of content: numeric, character, logical;
    • Numeric objects contain numbers such that they can be double (dbl) or integers (int);
    • Character (chr) objects contain strings, i.e., text between quotes;
      • If a character object contains only a pre-defined set of values, then it is considered a factor (fct).
    • Logical objects (lgl) contain logical statements, i.e., TRUE or FALSE.

Let’s code our first objects:

# Who is it?
dwo_icecream <- 11 # days without icecream
favorite_food <- "risotto"
dog.lover = TRUE
enjoy_cold <- FALSE

dwo_icecream
[1] 11
favorite_food
[1] "risotto"
enjoy_cold
[1] FALSE
  • In order to store, we use the assignment operator <- or =.
  • Comments are preceded with #;
  • Objects can be accessed based on their names;
  • Objects are listed in the Environment tab.

Good practices for coding in R

  • It suggests to avoid . on names of objects and = as assignment operator:
dog_lover <- dog.lover

rm(dog.lover)
dog_lover
[1] TRUE
  • The function rm removes objects from the Environment tab.

Good practices for coding in R

Describe and comment your code for your future self!

Good practices for coding in R

Describe and comment your code for your colleagues!

Vectors

Vectors

  • Vectors are stacked objects with different lengths, but the same type of objects
  • We can combine objects using the function c;
  • We calculate the length of a vector using the function length.
  • Let’s do it in R:
dwo_icecream <- c(5, 10, 8, 9, 3) # days without ice-cream
dwo_icecream
[1]  5 10  8  9  3
length(dwo_icecream)
[1] 5
  • All previous objects are also vectors, but with length 1:
length(favorite_food)
[1] 1

Combining vectors

  • If we combine different types of vectors, they will be converted following the hierarchy: character > numeric > logical:
# coerced all objects as characters
c(favorite_food, dog_lover, enjoy_cold, dwo_icecream)
[1] "risotto" "TRUE"    "FALSE"   "5"       "10"      "8"       "9"      
[8] "3"      
c(dog_lover, dwo_icecream)  # TRUE is converted into 1
[1]  1  5 10  8  9  3
c(enjoy_cold, dwo_icecream)  # FALSE is converted into 0
[1]  0  5 10  8  9  3
  • Class of objects can be checked using the function class.
class(dwo_icecream)
[1] "numeric"
class(dog_lover)
[1] "logical"
class(favorite_food)
[1] "character"

Other functions to create vectors

  • Some numeric vectors can also be created with easier approaches than use the function c:

  • Using the function rep:

x <- c(1, 1, 1, 1, 1)
x
[1] 1 1 1 1 1
y <- rep(1, 5)
y
[1] 1 1 1 1 1
  • Using the function seq:
x <- c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
y <- 1:5
y
[1] 1 2 3 4 5
z <- seq(1, 5, by = 1)
z
[1] 1 2 3 4 5

Factors

  • Factors are special vectors that assume only pre-defined values defined using function factor:

  • It can be created based on a character vector:

dog_lover <- c("yes", "no", "yes", "yes", "yes")
dog_lover <- factor(dog_lover, 
                    levels = c("no", "yes"))
dog_lover
[1] yes no  yes yes yes
Levels: no yes
  • Or a numeric vector:
dog_lover <- c(1, 0, 1, 1, 1)
dog_lover <- factor(dog_lover,
                    levels = c(0, 1),
                    labels = c("no", "yes"))
dog_lover
[1] yes no  yes yes yes
Levels: no yes

Factors

  • Shortcut to transform a character vector into a factor.
dog_lover <- c("yes", "no", "yes", "yes", "yes")
dog_lover <- as.factor(dog_lover)
dog_lover
[1] yes no  yes yes yes
Levels: no yes
  • Factors have ordered levels:
dog_lover
[1] yes no  yes yes yes
Levels: no yes
levels(dog_lover)
[1] "no"  "yes"
nlevels(dog_lover)
[1] 2

Matrices

a matrix

  • Matrix is a collection of vectors of the same type and length organized in a two-dimensional array;
  • A matrix has n rows and p columns.
  • Vectors can be bounded into two-dimensional arrays using the function cbind:
pushups <- c(1, 10, 0, 30, 25)

m <- cbind(dwo_icecream, pushups)
m
     dwo_icecream pushups
[1,]            5       1
[2,]           10      10
[3,]            8       0
[4,]            9      30
[5,]            3      25
  • The class of two-dimensional arrays generated by cbind is matrix:
class(m)
[1] "matrix" "array" 

Matrices

  • When we bind vectors of different classes:
names <- c("m", "n", "p", "b", "t")
m <- cbind(names, dog_lover, 
           dwo_icecream, pushups)
m
     names dog_lover dwo_icecream pushups
[1,] "m"   "2"       "5"          "1"    
[2,] "n"   "1"       "10"         "10"   
[3,] "p"   "2"       "8"          "0"    
[4,] "b"   "2"       "9"          "30"   
[5,] "t"   "2"       "3"          "25"   
  • Matrices are not appropriate to store a dataset.

Data frames

a data frame

  • A data frame is a collection of vectors of the same/different types but of the same length organized in a two-dimensional array;
  • A data frame has n rows and p columns.
  • Data frames are created with the function data.frame:
df <- data.frame(names, dog_lover,
                 dwo_icecream, pushups)
df
  names dog_lover dwo_icecream pushups
1     m       yes            5       1
2     n        no           10      10
3     p       yes            8       0
4     b       yes            9      30
5     t       yes            3      25
class(df)
[1] "data.frame"

Subsetting

Vectors

positions of a vector

  • Each vector component is located at a given position;
  • Components of a vector can be accessed using the operator position.
dwo_icecream
[1]  5 10  8  9  3
dwo_icecream[1]
[1] 5
dwo_icecream[c(1, 5)]
[1] 5 3

Matrices

positions of a matrix

  • Each matrix component is located at a given position;
  • Components of a matrix can be accessed using the operator [position row, position column].
  • We can access a specific component, a row and a column in a matrix:
# a specific component
m[1, 1]
names 
  "m" 
# a row
m[5, ] 
       names    dog_lover dwo_icecream      pushups 
         "t"          "2"          "3"         "25" 
# a column
m[, 4]
[1] "1"  "10" "0"  "30" "25"

Data frames

positions of a data frame

  • Each data frame component is located at a given row and column;
  • Components of a data frame can be accessed similar to a matrix;
  • In addition, columns can be accessed be their names using the operator $.
  • We can access a specific component, a row and a column in a data frame:
df$names[5]
[1] "t"
df[5, 1]
[1] "t"
df$names
[1] "m" "n" "p" "b" "t"
df[, 1]
[1] "m" "n" "p" "b" "t"
df[5, ]
  names dog_lover dwo_icecream pushups
5     t       yes            3      25

Logical statements

Object classes

  • Logical statements about object’s class can be done as follow:
is.numeric(dwo_icecream)
[1] TRUE
is.logical(dwo_icecream)
[1] FALSE
is.character(dwo_icecream)
[1] FALSE
is.factor(dog_lover)
[1] TRUE

Conditions

  • Checking conditions can be done with logical statements:
  • Are these objects equal?
dog_lover == "yes"
[1]  TRUE FALSE  TRUE  TRUE  TRUE
  • Are these objects different?
dog_lover != "yes"
[1] FALSE  TRUE FALSE FALSE FALSE
  • Is this object greater (less) than 35?
dwo_icecream > 7 # for less, use <
[1] FALSE  TRUE  TRUE  TRUE FALSE
  • Is this object greater (less) or equal than 35?
dwo_icecream >= 7 # for less or equal, use <=
[1] FALSE  TRUE  TRUE  TRUE FALSE

Combining Conditions

  • Conditions can combined:

    • AND
dwo_icecream > 7 & dog_lover == "yes"
[1] FALSE FALSE  TRUE  TRUE FALSE
  • OR
dwo_icecream > 7 | dog_lover == "no"
[1] FALSE  TRUE  TRUE  TRUE FALSE

Functions

What is a function?

The function f takes objects from in X and transform them into objects in Y

  • A function takes inputs, make operations and gives back outputs;

  • Functions in R are organized in libraries;

  • Every function has a documentation that can be accessed with the operator ?.

  • First, let’s access the documentation of some of these functions:
?length
?c
  • Look at the tab Help to mode details of the function, including examples.

Functions

  • Let’s create our own function to convert Fahrenheit to Celsius.

  • How would we do this conversion manually?

# minimum temperature
temp_f <- 38
temp_c <- (temp_f - 32)*(5/9)
temp_c
[1] 3.333333
# range of temperatures
temp_f <- c(38, 105)
temp_c <- (temp_f - 32)*(5/9)
temp_c
[1]  3.333333 40.555556
  • How about with a function?
temp_f_c <- function(temp_f){

  temp_c <- (temp_f - 32)*(5/9)

  return(temp_c)
}

temp_f_c(38)
[1] 3.333333
temp_f_c(105)
[1] 40.55556
temp_f_c(c(38, 105))
[1]  3.333333 40.555556

For loops

How can we repeat tasks or perform them sequentially?

i <- 1
i
[1] 1
i <- i + 1
i
[1] 2
i <- i + 1
i
[1] 3

For loops

Control flow for iteration

for (i in 1:3){
  print(i)
}
[1] 1
[1] 2
[1] 3

Control flow for iteration

x <- rep(NA, 3)

for (i in 1:3){
  x[i] <- i
}
x
[1] 1 2 3