Welcome

This is the free website for Design and Analysis of Experiments and Observational Studies using R. A hardcopy of the book can be purchased from Routledge. This book grew out of course notes for a twelve-week course (one term) on the Design of Experiments and Observational Studies in the Department of Statistical Sciences at the University of Toronto. Students are senior undergraduates and applied Masters students who have completed courses in probability, mathematical statistics, and regression analysis. The purposes of the book are to expose students to the foundations of classical experimental design and design of observational studies through the framework of causality, use real data and computational tools, such as simulation, to explore these topics. The book uses R to implement designs and analyse data. It’s assumed that the reader has taken basic courses in probability, mathematical statistics, and linear models, although the essentials are reviewed briefly in the first chapter. Some experience using R is helpful although not essential. I assume that readers are familiar with standard base R and tidyverse syntax. In the course at the University of Toronto, students are given learning resources at the beginning of the course to review these R basics, although most students have had some exposure to computing with R.

This website is free to use, and is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.

About the Author

Nathan Taback is an Associate Professor, Teaching Stream in the Department of Statistical Sciences, University of Toronto

Organization of the book

The structure of each chapter presents concepts or methods followed by a section that shows readers how to implement these in R. These sections are labeled “Computational Lab: Topic”, where “Topic” is the topic that is implemented in R.

Software information and conventions

One of the unique features of this book is the emphasis on simulation and computation using R. R is wonderful because of the many open source packages available, but this can also lead to confusion about which packages to use for a task. I have tried to minimize the number of packages used in the book. The set of packages loaded on startup by default is

getOption("defaultPackages")
#> [1] "datasets"  "utils"     "grDevices" "graphics" 
#> [5] "stats"     "methods"

plus base. If a function from a non-default library is used, then this is indicated by pkg::name instead of

library(pkg)
name

This should make it clear which package a user needs to load before using a function.

Information on the R version used to write this book is below.

version
#>                _                           
#> platform       x86_64-apple-darwin17.0     
#> arch           x86_64                      
#> os             darwin17.0                  
#> system         x86_64, darwin17.0          
#> status                                     
#> major          4                           
#> minor          2.2                         
#> year           2022                        
#> month          10                          
#> day            31                          
#> svn rev        83211                       
#> language       R                           
#> version.string R version 4.2.2 (2022-10-31)
#> nickname       Innocent and Trusting

The packages used in writing this book are:

R code

Whenever possible the R code developed in this book is written as a function instead of a series of statements. “Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting.”1 In fact, I have taken the approach that whenever I’ve copied and pasted a block of code more than twice then it’s time to write a function.

The value an R function returns is the last value evaluated. return() can be used to return a value before the last value. Many of the functions in this book use return() to make code easier to read even when the last value of the function is returned.

R 4.1.0 now provides a simple native forward pipe syntax |>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call. The pipe syntax used in this book is %>% from the magrittr library. Most of the code in this book should work with the native pipe |>, although this has not been thoroughly tested.

Data sets

The data sets used in this book are available in the R package scidesignR, and can be installed by running install.packages("scidesignR").

Acknowledgments

I would like to thank all the students and instructors who used my notes and provided valuable feedback. Michael Moon provided excellent assistance with helping me develop many of the exercises. Monika, Adam, and Oliver, as usual, provided sustained support throughout this project.