Clover plot options: new observations

In the following the treatment of new observations with unknown class labels is described. Both visualisation and classification of the new observations are considered.

Visualization

The basic clover plot is constructed from observations with known class labels (samples X and Y). If new observations are available, with unknown class labels, it is possible to visualise them in the clover plot once their depths, bagdistances, illuminations and Mahalanobis distances w.r.t. samples X and Y are computed.

First we generate the training data from two bivariate normal distributions with different means and the same variance matrices (samples X and Y) and also new observations Z, to be classified. Then the necessary quantities for constructing the clover plot are computed:

library(mvtnorm)

n1 <- 100
n2 <- 75
nZ <- 10

set.seed(2020)

X <- rmvnorm(n1, mean=c(0,0), sigma=diag(2)) # Observations of class 1
Y <- rmvnorm(n2, mean=c(2,2), sigma=diag(2)) # Observations of class 2

Z <- rbind(rmvnorm(nZ/2, mean=c(0,0), sigma=diag(2)),rmvnorm(nZ/2, mean=c(2,2), sigma=diag(2))) # New observations with unknown class labels

res <- clover_calc(X, Y, Z=Z)

The new observations Z can be visualised in the clover plot (red points) by setting “drawZ=TRUE” in the “clover_plot” function. Note that the coordinates of the new observations Z in the clover plot must be computed by the “clover_calc” function first (as shown above) and are returned in its output:

clover_plot(res, drawZ=TRUE)

## Misclassification rate of the QDA classifier:   0.0857
## Non-classification rate of the QDA classifier:  0
## 
## Misclassification rate of the DD1 classifier:   0.0914
## Non-classification rate of the DD1 classifier:  0.0686

In two dimensions the data and the new observations (red points) can be of course visualised directly:

clover_plot_data(res, Z=Z)

Classification of new observations

Class labels for the new observations can be obtained, for a specified set of classifiers, by the “clover_classify” function. Its argument “classifiers” determines which types of classifiers should be considered. E.g. when the option “classifiers=c(”QDA“,”DD“)” is used, the QDA classifier is used together with DD0, DD1 and DD2 classifiers.

The reason for specifying only the type of classifier (“DD”) is that if any of the DDn classifiers is required by the user the empirical depths of the new observations must be computed, and then the other DDn classifiers can be obtained as well with essentially no extra computational cost. Similarly, if “BB” type of classifiers is selected by the user, both BB0 and BB1 classifiers are used, and if “II” type of classifiers is selected, both II0 and II1 classifiers are used.

a <- clover_classify(Z,res,classifiers=c("QDA","DD"))

## Computing classification based on Mahalanobis distances (QDA).
## Computing classification based on halfspace depth (DD0, DD1, DD2).
## Done.

## $QDA.class
##  [1] 1 1 1 1 1 2 2 2 2 2
## 
## $DD0.class
##  [1] 1 1 1 1 1 0 2 2 0 2
## 
## $DD1.class
##  [1] 1 1 1 1 1 0 2 2 0 2
## 
## $DD2.class
##  [1] 1 1 1 1 1 0 2 2 0 2

Note that the function “clover_classify” does not require the object “res” to contain information about the new observations, i.e. it can be computed as “res <- clover_calc(X, Y)” only. In other words, if new observations, say W, are acquired, they can be classified without the need to call the “clover_calc” function again.

W <- rmvnorm(nZ, mean=c(0,0), sigma=diag(2)) # Further new observations to be classified
clover_classify(W,res,classifiers=c("QDA","DD"))

## Computing classification based on Mahalanobis distances (QDA).
## Computing classification based on halfspace depth (DD0, DD1, DD2).
## Done.

## $QDA.class
##  [1] 1 1 1 1 1 1 2 1 1 1
## 
## $DD0.class
##  [1] 1 1 1 1 1 1 2 1 1 1
## 
## $DD1.class
##  [1] 1 1 1 1 1 1 2 1 1 1
## 
## $DD2.class
##  [1] 1 1 1 1 1 1 1 1 1 1

Back to the main page.