4  Data Structures

4.1 Objectives

  • Learn how to construct vectors, data frames, matrices, arrays, and lists
  • Learn how each data structure relates to the others
  • Subset and modify objects
  • Know when to use each structure

4.2 Vectors

Vectors are the basic building block of R.
They contain elements of the same type.

# Character
x <- c("a", "b", "c")

# Numeric (default number type)
y <- c(1, 2, 3)

# Integer
z <- c(1L, 2L, 3L)

# Logical
w <- c(TRUE, FALSE, TRUE)

class(y)
[1] "numeric"

4.2.1 Subsetting

Use brackets [].

x <- c("a", "b", "c", "d")

x[1]        # first element
[1] "a"
x[2:4]      # range
[1] "b" "c" "d"
x[c(1,4)]   # multiple positions
[1] "a" "d"

Logical subsetting:

y <- 1:5
y[y > 3]
[1] 4 5

Replace values:

y[y > 3] <- 0
y
[1] 1 2 3 0 0

4.2.2 Factors

Factors represent categorical data.

names <- c("John", "Mary", "John", "Mary")
f <- factor(names)

f
[1] John Mary John Mary
Levels: John Mary
[1] "John" "Mary"
f
John Mary 
   2    2 

4.3 Data Frames

A data frame is a 2D structure (rows × columns).
Each column is a vector.
Columns may have different types.

df <- data.frame(
  name = c("Ana", "Ben", "Chris"),
  age  = c(20, 21, 19),
  passed = c(TRUE, TRUE, FALSE)
)

str(df)
'data.frame':   3 obs. of  3 variables:
 $ name  : chr  "Ana" "Ben" "Chris"
 $ age   : num  20 21 19
 $ passed: logi  TRUE TRUE FALSE

4.3.1 Loading and Saving

Read CSV:

df <- read.csv("data.csv")

Write CSV:

write.csv(df, "output.csv", row.names = FALSE)

4.3.2 Subsetting and Combining

Extract column:

df$age
[1] 20 21 19
df[, "age"]
[1] 20 21 19

Extract rows:

df[df$age > 19, ]
  name age passed
1  Ana  20   TRUE
2  Ben  21   TRUE

Add column:

df$status <- df$age > 20

Combine:

rbind(df, df)
   name age passed status
1   Ana  20   TRUE  FALSE
2   Ben  21   TRUE   TRUE
3 Chris  19  FALSE  FALSE
4   Ana  20   TRUE  FALSE
5   Ben  21   TRUE   TRUE
6 Chris  19  FALSE  FALSE
cbind(df, new_col = 1:3)
   name age passed status new_col
1   Ana  20   TRUE  FALSE       1
2   Ben  21   TRUE   TRUE       2
3 Chris  19  FALSE  FALSE       3

4.4 Matrices and Arrays

A matrix is a vector with dimensions.
All elements must be the same type.

m <- matrix(1:12, nrow = 3, ncol = 4)

m[2,3]   # row 2, column 3
[1] 8
m[2, ]   # row 2
[1]  2  5  8 11
m[, 4]   # column 4
[1] 10 11 12

Matrix operations:

t(m)          # transpose
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
m * m         # element-wise
     [,1] [,2] [,3] [,4]
[1,]    1   16   49  100
[2,]    4   25   64  121
[3,]    9   36   81  144
m %*% t(m)    # matrix multiplication
     [,1] [,2] [,3]
[1,]  166  188  210
[2,]  188  214  240
[3,]  210  240  270

An array is a multi-dimensional matrix:

a <- array(1:8, dim = c(2,2,2))

4.5 Lists

A list can store different types and sizes of objects.

my_list <- list(
  numbers = 1:3,
  name = "Ana",
  matrix = matrix(1:4, 2),
  df = df
)

my_list
$numbers
[1] 1 2 3

$name
[1] "Ana"

$matrix
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$df
   name age passed status
1   Ana  20   TRUE  FALSE
2   Ben  21   TRUE   TRUE
3 Chris  19  FALSE  FALSE

Subsetting lists:

my_list[1]      # returns sub-list
$numbers
[1] 1 2 3
my_list[[1]]    # returns element
[1] 1 2 3
my_list$name
[1] "Ana"

4.6 Summary: When to Use What

Structure Same Type? Dimensions Use Case
Vector Yes 1D Single variable
Factor Yes 1D Categories
Data Frame No (by column) 2D Tabular data
Matrix Yes 2D Math operations
Array Yes 3D+ Multidimensional data
List No Any Complex objects

Core idea: Everything in R builds from vectors.