--- title: > DGP Classification using Stochastic Imputation output: rmarkdown::html_vignette bibliography: references.bib description: > DGP classification of the iris data set. vignette: > %\VignetteIndexEntry{DGP Classification using Stochastic Imputation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( echo = TRUE, eval = FALSE ) ``` This vignette gives a demonstration of the package on classifying the popular iris data set [@anderson1935irises]. ## Load packages and data We start by loading required packages, ```{r} library(dgpsi) library(dplyr) ``` We now load the iris data set, ```{r} data(iris) ``` and do a min-max normalization on its four input variables. ```{r} iris <- iris %>% mutate(across(1:4, ~ (. - min(.)) / (max(.) - min(.)))) ``` Before building a classifier, we set a seed with `set_seed()` from the package for reproducibility ```{r} set_seed(9999) ``` and split a training data set and a testing data set: ```{r} test_idx <- sample(seq_len(nrow(iris)), size = 30) train_data <- iris[-test_idx, ] test_data <- iris[test_idx, ] X_train <- as.matrix(train_data[, 1:4]) Y_train <- as.matrix(train_data[, 5]) X_test <- as.matrix(test_data[, 1:4]) Y_test <- as.matrix(test_data[, 5]) ``` ## Construct and train a DGP classifier We consider a three-layer DGP classifier, using a Matérn-2.5 kernel in the first layer and a squared exponential kernel in the second layer: ```{r} m_dgp <- dgp(X_train, Y_train, depth = 3, name = c('matern2.5', 'sexp'), likelihood = "Categorical") ``` ``` ## Auto-generating a 3-layered DGP structure ... done ## Initializing the DGP emulator ... done ## Training the DGP emulator: ## Iteration 500: Layer 3: 100%|██████████| 500/500 [00:31<00:00, 15.63it/s] ## Imputing ... done ``` We set `likelihood = "Categorical"` since the DGP classifier is essentially a DGP emulator with a categorical likelihood. ## Validation We are now ready to validate the classifier via `validate()` at 30 out-of-sample testing positions: ```{r} m_dgp <- validate(m_dgp, X_test, Y_test) ``` ``` ## Initializing the OOS ... done ## Calculating the OOS ... done ## Saving results to the slot 'oos' in the dgp object ... done ``` Finally, we visualize the OOS validation for the classifier: ```{r} plot(m_dgp, X_test, Y_test) ``` ``` ## Validating and computing ... done ## Post-processing OOS results ... done ## Plotting ... done ``` ![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/classifier_oos_1.png){width=100%} By default, `plot()` displays true labels against predicted label proportions at each input position. Alternatively, setting `style = 2` in `plot()` generates a confusion matrix: ```{r} plot(m_dgp, X_test, Y_test, style = 2) ``` ``` ## Validating and computing ... done ## Post-processing OOS results ... done ## Plotting ... done ``` ![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/classifier_oos_2.png){width=100%} ### References