1 Setup

In this section, we will set up our working environment and organize our project. A well-structured project is the foundation of a reproducible workflow.

1.1 Create a Project

First, create an RStudio Project by selecting File > New Project....

Then, download the code file and place in your project directory: R Code

1.2 Load Required Packages

We will use a small number of packages throughout this workshop:

tidyverse for data manipulation and visualization
palmerpenguins for our example dataset

If you do not already have these installed, run:

install.packages(c("tidyverse", "palmerpenguins"))

Now load the packages:

library(tidyverse)
library(palmerpenguins)

1.3 Set a Seed for Reproducibility

Setting a seed ensures that any random operations (such as splitting data into training and testing sets) produce the same results each time the code is run.

set.seed(123)

1.4 Project Organization

Before writing any code, it is important to organize your project.

In your RStudio Project, manually create the following folders:

data/ → for raw and cleaned datasets
outputs/ → for results, plots, and exported files

This structure helps keep your work organized and makes it easier to rerun or share your analysis.

Create and Save the Dataset

We will use the penguins dataset for this workshop.

To simulate working with an external dataset, we will write it to a CSV file inside the data/ folder. This allows us to practice reading data from a file, which is how most real-world workflows begin.

example_path <- "data/penguins.csv"

if (!file.exists(example_path)) {
  write_csv(penguins, example_path)
}

1.5 Key Takeaway

A reproducible workflow starts with:

clearly organized folders
consistent file paths
explicit package usage
and code that can be rerun from start to finish

These small habits make a big difference as your projects grow in complexity.