1 Setup
In this section, we will set up our working environment and organize our project. A well-structured project is the foundation of a reproducible workflow.
1.1 Create a Project
First, create an RStudio Project by selecting File > New Project....
Then, download the code file and place in your project directory: R Code
1.2 Load Required Packages
We will use a small number of packages throughout this workshop:
-
tidyverse for data manipulation and visualization
- palmerpenguins for our example dataset
If you do not already have these installed, run:
install.packages(c("tidyverse", "palmerpenguins"))Now load the packages:
1.3 Set a Seed for Reproducibility
Setting a seed ensures that any random operations (such as splitting data into training and testing sets) produce the same results each time the code is run.
set.seed(123)1.4 Project Organization
Before writing any code, it is important to organize your project.
In your RStudio Project, manually create the following folders:
-
data/→ for raw and cleaned datasets
-
outputs/→ for results, plots, and exported files
This structure helps keep your work organized and makes it easier to rerun or share your analysis.
Create and Save the Dataset
We will use the penguins dataset for this workshop.
To simulate working with an external dataset, we will write it to a CSV file inside the data/ folder. This allows us to practice reading data from a file, which is how most real-world workflows begin.
example_path <- "data/penguins.csv"
if (!file.exists(example_path)) {
write_csv(penguins, example_path)
}1.5 Key Takeaway
A reproducible workflow starts with:
- clearly organized folders
- consistent file paths
- explicit package usage
- and code that can be rerun from start to finish
These small habits make a big difference as your projects grow in complexity.