Preface

This companion book contains documented R examples to accompany several chapters of the popular data mining textbook Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar. It is not intended as a replacement for the textbook since it does not cover the theory, but as a guide accompanying the textbook. The companion book can be used with either edition: 1st edition (Tan, Steinbach, and Kumar 2005) or 2nd edition (Tan et al. 2017). The sections are numbered to match the 2nd edition. Sections marked with an asterisk are additional content that is not covered in the textbook.

The code examples collected in this book were developed for the course CS 5/7331 Data Mining taught at the advanced undergraduate and graduate level at the Computer Science Department at SMU since Spring 2013 and will be regularly updated and improved. The learning method used in this book is learning-by-doing. The code examples throughout this book are written in a self-contained manner so you can copy and paste a portion of the code, try it out on the provided dataset and then apply it directly to your own data. Instructors can use this companion as a component to create an introduction to data mining course for advanced undergraduates and graduate students who are proficient in programming and have basic statistics knowledge. A complete set of slides (PDF and PowerPoint) is provided on the book’s GitHub page.

The latest update includes the use of the popular packages in the meta-package tidyverse (Wickham 2023c) including ggplot2 (Wickham, Chang, et al. 2024) for data wrangling and visualization, along with caret (M. Kuhn 2023) for model building and evaluation. Please use the edit function within this book or visit the book’s GitHub project page to submit corrections or suggest improvements. To cite this book, use:

Michael Hahsler (2021). An R Companion for Introduction to Data Mining. Online Book. https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/

I hope this book helps you to learn to use R more efficiently for your data mining projects.

Michael Hahsler

License

Creative Commons License The online version of this book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The cover art is based on “rocks” by stebulus licensed with CC BY 2.0.