library("seriation")
data("iris")
<- as.matrix(iris[sample(nrow(iris)), -5])
x <- dist(x) d
A Comparison of Seriation Methods
Introduction
This document compares the seriation methods available in package using the popular Iris data set and randomize the order of the objects.
Preparing the data
Distance seriation
Methods
We first register more seriation methods. Some of these methods require the installation of more packages.
register_DendSer()
register_optics()
register_smacof()
The following methods will be used (a few slow methods are skipped).
<- sort(list_seriation_methods("dist"))
methods
# skip slow methods
<- setdiff(methods, c("BBURCG", "BBWRCG", "Enumerate", "GSA"))
methods methods
[1] "ARSA" "DendSer" "DendSer_ARc" "DendSer_BAR"
[5] "DendSer_LPL" "DendSer_PL" "GW" "GW_average"
[9] "GW_complete" "GW_single" "GW_ward" "HC"
[13] "HC_average" "HC_complete" "HC_single" "HC_ward"
[17] "Identity" "isomap" "isoMDS" "MDS"
[21] "MDS_angle" "MDS_smacof" "metaMDS" "monoMDS"
[25] "OLO" "OLO_average" "OLO_complete" "OLO_single"
[29] "OLO_ward" "optics" "QAP_2SUM" "QAP_BAR"
[33] "QAP_Inertia" "QAP_LS" "R2E" "Random"
[37] "Reverse" "Sammon_mapping" "SGD" "Spectral"
[41] "Spectral_norm" "SPIN_NH" "SPIN_STS" "TSP"
[45] "VAT"
Details about the method can be found in the manual page for seriate()
.
Performing seriation
We use a loop to run the function seriate()
with each method and calculate criterion measures which indicate how good the order is.
<- list()
orders <- list()
criterion for (m in methods) {
cat(m)
<- system.time(orders[[m]] <- seriate(d, method = m))
tm <- data.frame(time = tm[1]+tm[2], rbind(criterion(d, orders[[m]])))
criterion[[m]] cat(" took", tm[1]+tm[2], "sec.\n")
}
ARSA took 0.776 sec.
DendSer took 0.728 sec.
DendSer_ARc took 1.014 sec.
DendSer_BAR took 0.816 sec.
DendSer_LPL took 1.057 sec.
DendSer_PL took 0.914 sec.
GW took 0.107 sec.
GW_average took 0.113 sec.
GW_complete took 0.102 sec.
GW_single took 0.108 sec.
GW_ward took 0.113 sec.
HC took 0.113 sec.
HC_average took 0.114 sec.
HC_complete took 0.113 sec.
HC_single took 0.114 sec.
HC_ward took 0.117 sec.
Identity took 0.108 sec.
isomap
Registered S3 method overwritten by 'vegan':
method from
reorder.hclust gclus
took 0.144 sec.
isoMDS took 0.14 sec.
MDS took 0.108 sec.
MDS_angle took 0.117 sec.
MDS_smacof took 0.12 sec.
metaMDS took 0.428 sec.
monoMDS took 0.116 sec.
OLO took 0.103 sec.
OLO_average took 0.104 sec.
OLO_complete took 0.101 sec.
OLO_single took 0.104 sec.
OLO_ward took 0.105 sec.
optics took 0.11 sec.
QAP_2SUM took 0.139 sec.
QAP_BAR took 0.119 sec.
QAP_Inertia took 0.118 sec.
QAP_LS took 0.118 sec.
R2E took 0.118 sec.
Random took 0.103 sec.
Reverse took 0.104 sec.
Sammon_mapping took 0.111 sec.
SGD took 48.63 sec.
Spectral took 0.11 sec.
Spectral_norm took 0.119 sec.
SPIN_NH took 0.424 sec.
SPIN_STS took 0.534 sec.
TSP took 0.114 sec.
VAT took 0.109 sec.
<- do.call(rbind, criterion) criterion
We align the seriation orders. The reason is that an order 1, 2, 3 and 3, 2, 1 are equivalent and just an artifact of the algorithm. Aligning will reverse some orders so they are in roughly the same order. Then we sort the orders from best to worst according to a popular seriation criterion measure called Gradient_weighted
.
<- ser_align(orders)
orders <- order(criterion[["Gradient_weighted"]], decreasing = TRUE)
best_to_worse
<- orders[best_to_worse]
orders <- criterion[best_to_worse, ] criterion
Visualize the results
Plot the reordered dissimilarity matrices. Dark blocks along the main diagonal mean that the order reveals a “cluster” of similar objects. The Iris dataset contains three species, but two of them are very similar, so we expect to see one smaller block and one larger block.
for (n in names(orders))
pimage(d, orders[[n]], main = n , key = FALSE)
Comparison between methods
We can compare the seriation methods by how similar the orders are that they produce. The following code calculates distances between orders and then performs hierarchical clustering.
<- ser_dist(orders)
dst hmap(dst)
Here are the criterion measures. Some are maximized and some should be minimized. Details about the measures can be found in the manual page for criterion()
.
::datatable(round(criterion, 2)) DT
Matrix seriation
Matrix seriation reorders rows and columns of a data matrix. We perform the same steps as for distances in the previous section.
Methods
<- sort(list_seriation_methods("matrix"))
methods
methods
[1] "BEA" "BEA_TSP" "CA" "Heatmap" "Identity" "LLE"
[7] "Mean" "PCA" "PCA_angle" "Random" "Reverse"
Performing seriation
<- list()
orders <- list()
criterion for (m in methods) {
cat(m)
<- system.time(orders[[m]] <- seriate(x, method = m))
tm <- data.frame(time = tm[1]+tm[2], rbind(criterion(x, orders[[m]])))
criterion[[m]] cat(" took", tm[1]+tm[2], "sec.\n")
}
BEA took 0.122 sec.
BEA_TSP took 0.343 sec.
CA took 0.117 sec.
Heatmap took 0.339 sec.
Identity took 0.113 sec.
LLE took 0.142 sec.
Mean took 0.113 sec.
PCA took 0.113 sec.
PCA_angle took 0.114 sec.
Random took 0.114 sec.
Reverse took 0.111 sec.
<- do.call(rbind, criterion) criterion
Visualize the results
<- order(criterion[["Moore_stress"]], decreasing = FALSE)
best_to_worse
<- orders[best_to_worse]
orders <- criterion[best_to_worse, ] criterion
for (n in names(orders))
pimage(x, orders[[n]], main = n , key = FALSE)
Comparison beween methods
::datatable(round(criterion, 2)) DT