Data Science Workflows in R
Welcome

This workshop introduces participants to building reproducible data science workflows in R. Rather than focusing on programming syntax alone, we will walk through the full lifecycle of a data analysis project—from data cleaning and exploration to modeling and evaluation—using a single, consistent workflow.
Participants will gain hands-on experience working with real data, emphasizing the importance of careful data preparation, clear organization, and proper model evaluation. We will also introduce foundational modeling techniques, including linear and logistic regression, and discuss how these workflows scale to larger datasets and high-performance computing environments.
The content for this workshop is developed and taught by the OIT Research Technology Services team at SMU in collaboration with SMU Libraries.
Objectives
- Understand the structure of a reproducible data science workflow in R.
- Learn how to organize projects and work with data in a consistent, repeatable way.
- Gain experience cleaning and preparing real-world datasets for analysis.
- Perform exploratory data analysis using visualization and summary statistics.
- Build and interpret linear and logistic regression models.
- Learn how to properly evaluate models using train/test splits.
- Recognize common pitfalls such as overfitting and extrapolation.
- Understand how these workflows can scale to larger datasets and computing environments.
License
The content in this workshop is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
