This book contains documented R examples to accompany several chapters of the popular data mining textbook Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar. The companion book can be used with either edition: 1st edition (Tan, Steinbach, and Kumar 2005) or 2nd edition (Tan et al. 2017).

The code examples collected in this book were developed for the course CS 7331 - Data Mining taught at SMU since Spring 2013 and will be regularly updated and improved. The learning method used in this book is learning-by-doing. The code examples throughout this book are written in a self-contained manner so you can copy-and-paste a portion of the code, try it out on the provided dataset and then apply it directly to your own data.

The latest update includes the use of the popular packages in the meta-package tidyverse (Wickham 2023b) including ggplot2 (Wickham, Chang, et al. 2023) for data wrangling and visualization along with caret (M. Kuhn 2023) for model building and evaluation. Presentation slides and other instructor resources are available on the book’s GitHub page. Please use the edit function within this book or visit the book’s GitHub project page to submit corrections or suggest improvements. To cite this book use:

Michael Hahsler (2021). An R Companion for Introduction to Data Mining. Online Book.

I hope this book helps you to learn to use R more efficiently for your data mining projects.

Michael Hahsler


Creative Commons License The online version of this book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The cover art is based on “rocks” by stebulus licensed with CC BY 2.0.