This book contains documented R examples to accompany several chapters of the popular data mining textbook Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar. The companion book can be used with either edition: 1st edition (Tan, Steinbach, and Kumar 2005) or 2nd edition (Tan et al. 2017).

The code examples collected in this book were developed for the course CS 7331 - Data Mining taught at SMU since Spring 2013 and will be regularly updated and improved. The latest update includes the use of the popular packages in the meta-package tidyverse (Wickham 2021c) including ggplot2 (Wickham, Chang, et al. 2021) for data wrangling and visualization along with caret (Kuhn 2021) for model building.

Please use the edit function within this book or visit the book’s GitHub project page to submit corrections or suggest improvements. To cite this book use:

Michael Hahsler (2021). An R Companion for Introduction to Data Mining. Online Book.

I hope this book helps you to learn to use R more efficiently for your data mining projects.

Michael Hahsler


Creative Commons License The online version of this book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The cover art is based on “rocks” by stebulus licensed with CC BY 2.0.