library("seriation")
packageVersion('seriation')
[1] '1.5.6'
This document contains the seriation methods currently implemented in the R package seriation
. The methods are organized by methods for distance matrices and data matrices.
This list was created for the following version of seriation:
library("seriation")
packageVersion('seriation')
[1] '1.5.6'
Some additional methods need to be registered.
register_DendSer()
register_optics()
register_smacof()
register_GA()
register_tsne()
register_umap()
Seriation methods on the data x
can be used in
seriate(x, method = "Spectral", control = NULL, rep = 1L)
The list below shows what seriation criterion (if any) is optimized by the method.
If the method uses randomization (see randomized below), then rep
can be used to run the algorithms several times and return the best permutation.
Available control parameters can be passed on as a list using the control
argument.
Minimize the linear seriation criterion using simulated annealing (Brusco et al, 2008).
default | help | |
---|---|---|
cool | 0.5 | cooling factor (smaller means faster cooling) |
tmin | 1e-04 | stopping temperature |
swap_to_inversion | 0.5 | probability for swap vs inversion local move |
try_multiplier | 100 | number of local move tries per object |
verbose | FALSE | N/A |
Minimize the unweighted row/column gradient by branch-and-bound (Brusco and Stahl 2005). This is only feasible for a relatively small number of objects.
default | help | |
---|---|---|
eps | 0 | Distances need to be at least eps to count as violations |
verbose | FALSE | N/A |
Minimize the weighted row/column gradient by branch-and-bound (Brusco and Stahl 2005). This is only feasible for a relatively small number of objects.
default | help | |
---|---|---|
eps | 0 | Distances need to be at least eps to count as violations |
verbose | FALSE | N/A |
Dendrogram seriation (Earle and Hurley, 2015).
default | help | |
---|---|---|
h | NULL | an hclust object (optional) |
method | “complete” | hclust linkage method |
criterion | NULL | criterion to optimize the dendrogram for |
DendSer_args | NULL | more arguments are passed on to DendSer (? DendSer) |
verbose | FALSE | N/A |
Dendrogram seriation for Anti-Robinson form cost (Earle and Hurley, 2015).
default | help | |
---|---|---|
h | NULL | an hclust object (optional) |
method | “complete” | hclust linkage method |
criterion | NULL | criterion to optimize the dendrogram for |
DendSer_args | NULL | more arguments are passed on to DendSer (? DendSer) |
verbose | FALSE | N/A |
Dendrogram seriation with BAR (Earle and Hurley, 2015).
default | help | |
---|---|---|
h | NULL | an hclust object (optional) |
method | “complete” | hclust linkage method |
criterion | NULL | criterion to optimize the dendrogram for |
DendSer_args | NULL | more arguments are passed on to DendSer (? DendSer) |
verbose | FALSE | N/A |
Dendrogram seriation for Lazy path length (Earle and Hurley, 2015).
default | help | |
---|---|---|
h | NULL | an hclust object (optional) |
method | “complete” | hclust linkage method |
criterion | NULL | criterion to optimize the dendrogram for |
DendSer_args | NULL | more arguments are passed on to DendSer (? DendSer) |
verbose | FALSE | N/A |
Dendrogram seriation for Path length (Earle and Hurley, 2015).
default | help | |
---|---|---|
h | NULL | an hclust object (optional) |
method | “complete” | hclust linkage method |
criterion | NULL | criterion to optimize the dendrogram for |
DendSer_args | NULL | more arguments are passed on to DendSer (? DendSer) |
verbose | FALSE | N/A |
Enumerate all permutations
default | help | |
---|---|---|
criterion | “Gradient_weighted” | Criterion measure to optimize |
verbose | FALSE | N/A |
Use a genetic algorithm to optimize for various criteria.
default | help | |
---|---|---|
criterion | “BAR” | criterion to be optimized |
suggestions | c(“TSP”, “QAP_LS”, “Spectral”) | seed the population with these seriation methods |
selection | function (object, r = 2/(object@popSize * (object@popSize - 1)), q = 2/object@popSize, …) { | selection operator function |
crossover | function (object, parents, …) { if (gaControl(“useRcpp”)) gaperm_oxCrossover_Rcpp(objec | crossover operator function |
mutation | function (object, parent, …) { if (runif(1) > ismProb) GA::gaperm_simMutation(object, p | mutation operator function |
pcrossover | 0.8 | probability for crossover |
pmutation | 0.1 | ptobability of mutations |
popSize | 100 | population size |
maxiter | 1000 | maximum number of generations |
run | 50 | stop after run generations without improvement |
parallel | FALSE | use multiple cores? |
verbose | FALSE | N/A |
Minimize a specified seriation measure (criterion) using simulated annealing.
default | help | |
---|---|---|
criterion | “Gradient_raw” | Criterion measure to optimize |
cool | 0.5 | cooling factor (smaller means faster cooling) |
t_min | 1e-07 | stopping temperature |
localsearch | “LS_insert” | used local search move function |
try_multiplier | 5 | number of local move tries per object |
t0 | NA | initial temperature (if NA then it is estimated) |
p_initial_accept | 0.01 | Probability to accept a bad move at time 0 (used for t0 estimation) |
warmstart | “Random” | permutation or seriation method for warmstart |
verbose | FALSE | N/A |
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by the Gruvaeus and Wainer (1972) heuristic
default | help | |
---|---|---|
hclust | NULL | a precomputed hclust object (optional) |
linkage | “complete” | hclust method |
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by the Gruvaeus and Wainer (1972) heuristic (avg.link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by the Gruvaeus and Wainer (1972) heuristic (complete link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by the Gruvaeus and Wainer (1972) heuristic (single link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by the Gruvaeus and Wainer (1972) heuristic (Ward’s method)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering
default | help | |
---|---|---|
hclust | NULL | a precomputed hclust object (optional) |
linkage | “complete” | hclust method |
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering (avg. link).
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering (complete link).
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering (single link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering (Ward’s method).
Identity permutation
Isometric feature mapping ordination
default | help | |
---|---|---|
k | 30 | number of shortest dissimilarities retained for a point |
path | “shortest” | method used in to estimate the shortest path (“shortest”/“extended”) |
Order along the 1D Kruskal’s non-metric multidimensional scaling
default | help | |
---|---|---|
add | 1e-09 | small constant to avoid 0 distances |
maxit | 50 | maximum number of iterations |
trace | FALSE | trace optimization |
tol | 0.001 | convergence tolerance |
p | 2 | power for Minkowski distance in the configuration space |
Order along the 1D classical metric multidimensional scaling
default | help | |
---|---|---|
add | FALSE | make the distances Euclidean using an additive constant (see ? cmdscale) |
Order by the angular order in the 2D MDS projection space split by the larges gap
default | help | |
---|---|---|
add | FALSE | make the distances Euclidean using an additive constant (see ? cmdscale) |
Seriation based on multidemensional scaling using stress majorization (de Leeuw & Mair, 2009).
default | help | |
---|---|---|
type | “ratio” | MDS type: “interval”, “ratio”, “ordinal” (nonmetric MDS) |
init | “torgerson” | start configuration method (“torgerson”/“random”) |
relax | FALSE | use block relaxation for majorization? |
modulus | 1 | number of smacof iterations per monotone regression call |
itmax | 1000 | maximum number of iterations |
eps | 1e-06 | convergence criterion |
verbose | FALSE | N/A |
Nonmetric Multidimensional Scaling with Stable Solution from Random Starts.
default | help | |
---|---|---|
distance | “bray” | see ? metaMDS for help |
try | 20 | N/A |
trymax | 20 | N/A |
engine | “monoMDS” | N/A |
autotransform | TRUE | N/A |
noshare | FALSE | N/A |
wascores | TRUE | N/A |
expand | TRUE | N/A |
trace | 0 | N/A |
plot | FALSE | N/A |
previous.best | N/A | |
verbose | FALSE | N/A |
Kruskal’s (1964a,b) non-metric multidimensional scaling (NMDS) using monotone regression.
default | help | |
---|---|---|
y | See ? monoMDS for help | |
model | “global” | N/A |
threshold | 0.8 | N/A |
maxit | 200 | N/A |
weakties | TRUE | N/A |
stress | 1 | N/A |
scaling | TRUE | N/A |
pc | TRUE | N/A |
smin | 1e-04 | N/A |
sfgrmin | 1e-07 | N/A |
sratmax | 0.999999 | N/A |
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by with optimal leaf ordering (Bar-Joseph et al., 2001)
default | help | |
---|---|---|
hclust | NULL | a precomputed hclust object (optional) |
linkage | “complete” | hclust method |
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by with optimal leaf ordering (Bar-Joseph et al., 2001) (avg. link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by with optimal leaf ordering (Bar-Joseph et al., 2001) (complete link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by with optimal leaf ordering (Bar-Joseph et al., 2001) (single link)
Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering and reordered by with optimal leaf ordering (Bar-Joseph et al., 2001) (Ward’s method)
Use ordering points to identify the clustering structure (OPTICS) to create an order
default | help | |
---|---|---|
eps | NULL | upper limit of the size of the epsilon neighborhood (see ? optics) |
minPts | 5 | minimum density for dense neighborhoods |
Quadratic assignment problem formulation for seriation solved using a simulated annealing solver to minimize the 2-Sum Problem criterion (Barnard, Pothen, and Simon 1993).
Quadratic assignment problem formulation for seriation solved using a simulated annealing solver to minimize the banded anti-Robinson form (Hahsler, 2017).
default | help | |
---|---|---|
b | function (n) max(1, floor(n * 0.2)) | bandwidth (default is 20%) |
Quadratic assignment problem formulation for seriation solved using a simulated annealing solver to minimize the Inertia criterion (Hahsler, 2017).
Quadratic assignment problem formulation for seriation solved using a simulated annealing solver to minimize the Linear Seriation Problem (LS) criterion (Hubert and Schultz 1976).
Rank-two ellipse seriation (Chen 2002)
Random permutation
Reversed identity permutation
Order along the 1D Sammon’s non-linear mapping
default | help | |
---|---|---|
add | 1e-09 | small constant to avoid 0 distances |
niter | 100 | maximum number of iterations |
trace | FALSE | trace optimization |
magic | 0.2 | initial value of the step size constant in diagonal Newton method |
tol | 1e-04 | tolerance for stopping in units of stress |
Improve an existing solution using stochastic gradient descent.
default | help | |
---|---|---|
criterion | “Gradient_raw” | Criterion measure to optimize |
init | “Spectral” | Start permutation or name of a seriation method |
max_iter | NULL | number of iterations |
localsearch | “LS_insert” | used local search move function |
verbose | FALSE | N/A |
Spectral seriation (Ding and He 2004) uses a relaxation to minimize the 2-Sum Problem (Barnard, Pothen, and Simon 1993). It uses the order of the Fiedler vector of the similarity matrix’s Laplacian.
Spectral seriation (Ding and He 2004) uses a relaxation to minimize the 2-Sum Problem (Barnard, Pothen, and Simon 1993). It uses the order of the Fiedler vector of the similarity matrix’s normalized Laplacian.
Sorting Points Into Neighborhoods (SPIN) (Tsafrir 2005). Neighborhood algorithm to concentrate low distance values around the diagonal.
default | help | |
---|---|---|
sigma | c(20, 17, 15, 13, 11, 9, 7, 5, 3, 1) | emphasize local (small alpha) or global (large alpha) structure. |
step | 5 | iterations to run for each sigma value. |
W_function | NULL | custom function to create the weight matrix W |
verbose | FALSE | N/A |
Sorting Points Into Neighborhoods (SPIN) (Tsafrir 2005). Side-to-Side algorithm which tries to push out large distance values.
default | help | |
---|---|---|
step | 25L | iterations to run |
nstart | 10L | number of random restarts |
X | function (n) seq(n) - (n + 1)/2 | matrix to calculate the W matrix |
verbose | FALSE | N/A |
Use 1D t-distributed stochastic neighbor embedding (t-SNE) a distance matrix to create an order (van der Maaten and Hinton, 2008).
default | help | |
---|---|---|
max_iter | 1000 | number of iterations |
theta | 0.5 | speed/accuracy trade-off (increase for less accuracy) |
perplexity | NULL | perplexity parameter (calculated as n - 1 / 3) |
eta | 100 | learning rate |
mds | TRUE | start from a classical MDS solution |
verbose | FALSE | N/A |
Minimize Hamiltonian path length with a TSP solver.
default | help | |
---|---|---|
method | “arbitrary insertion” | used TSP method (see ? solve_TSP) |
rep | 10 | number of random restarts |
two_opt | TRUE | use the 2-opt improvement heuristic? |
Use 1D Uniform manifold approximation and projection (UMAP) embedding of the distances to create an order (McInnes and Healy, 2018)
default | help | |
---|---|---|
n_neighbors | NA | see ? umap::umap for help |
n_components | 1 | N/A |
metric | “euclidean” | N/A |
n_epochs | 1000 | N/A |
input | NA | N/A |
init | “spectral” | N/A |
min_dist | 0.1 | N/A |
set_op_mix_ratio | 1 | N/A |
local_connectivity | 1 | N/A |
bandwidth | 1 | N/A |
alpha | 0.001 | N/A |
gamma | 1 | N/A |
negative_sample_rate | 5 | N/A |
a | NA | N/A |
b | NA | N/A |
spread | 1 | N/A |
random_state | NA | N/A |
transform_state | NA | N/A |
knn | NA | N/A |
knn_repeats | 1 | N/A |
verbose | FALSE | N/A |
umap_learn_args | NA | N/A |
Visual assesment of clustering tendency (Bezdek and Hathaway (2002). Creates an order based on Prim’s algorithm for finding a minimum spanning tree (MST) in a weighted connected graph representing the distance matrix. The order is given by the order in which the nodes (objects) are added to the MST.
Order by the angle of the first two eigenvectors for correlation matrices (Friendly, 2002)
Bond Energy Algorithm (BEA; McCormick 1972) to maximize the Measure of Effectiveness of a non-negative matrix.
Use a TSP to optimize the Measure of Effectiveness (Lenstra 1974).
default | help | |
---|---|---|
method | “arbitrary insertion” | used TSP method (see ? solve_TSP) |
rep | 10 | number of random restarts |
two_opt | TRUE | use the 2-opt improvement heuristic? |
Implements the method for binary matrices described in Brower and Kile (1988). Reorders using the mean row and column position of presences (1s).
This method calculates a correspondence analysis of the matrix and computes an order according to the scores on a correspondence analysis dimension (Friendly 2023).
default | help | |
---|---|---|
dim | 1L | CA dimension used for reordering |
ca_param | NULL | List with parameters for the call to ca::ca() |
Calculates distances for rows and columns and then independently applies the specified seriation method for distances.
default | help | |
---|---|---|
dist_fun | list(row = function (x, method = “euclidean”, diag = FALSE, upper = FALSE, p = 2) { if (!is.n | A named list with functions to calulate row and column distances |
seriation_method | list(row = “OLO_complete”, col = “OLO_complete”) | A named list with row and column seriation methods |
seriation_control | list(row = NULL, col = NULL) | named list with control parameters for the seriation methods |
scale | “none” | Scale “rows”, “cols”, or “none” |
verbose | FALSE | N/A |
Identity permutation
Find an order using 1D locally linear embedding.
default | help | |
---|---|---|
k | 30 | used number of neighbors |
reg | 2 | regularization method (see ? lle) |
Reorders rows and columns by row and column means.
default | help | |
---|---|---|
transformation | NULL | transformation function applied before calculating means (e.g., scale) |
Uses the projection of the data on its first principal component to determine the order.
default | help | |
---|---|---|
center | TRUE | center the data (mean = 0)? |
scale | FALSE | scale to unit variance? |
verbose | FALSE | FALSE |
Uses the angular order in the 2D PCA projection space split by the larges gap.
default | help | |
---|---|---|
center | TRUE | center the data (mean = 0)? |
scale | FALSE | scale to unit variance? |
verbose | FALSE | FALSE |
Random permutation
Reversed identity permutation
Use 1D t-distributed stochastic neighbor embedding (t-SNE) of the rows of a matrix to create an order (van der Maaten and Hinton, 2008).
default | help | |
---|---|---|
max_iter | 1000 | number of iterations |
theta | 0.5 | speed/accuracy trade-off (increase for less accuracy) |
perplexity | NULL | perplexity parameter (calculated as n - 1 / 3) |
eta | 100 | learning rate |
pca | TRUE | start the PCA solution |
Use 1D Uniform manifold approximation and projection (UMAP) embedding of the data to create an order (McInnes and Healy, 2018)
default | help | |
---|---|---|
n_neighbors | NA | see ? umap::umap for help |
n_components | 1 | N/A |
metric | “euclidean” | N/A |
n_epochs | 1000 | N/A |
input | NA | N/A |
init | “spectral” | N/A |
min_dist | 0.1 | N/A |
set_op_mix_ratio | 1 | N/A |
local_connectivity | 1 | N/A |
bandwidth | 1 | N/A |
alpha | 0.001 | N/A |
gamma | 1 | N/A |
negative_sample_rate | 5 | N/A |
a | NA | N/A |
b | NA | N/A |
spread | 1 | N/A |
random_state | NA | N/A |
transform_state | NA | N/A |
knn | NA | N/A |
knn_repeats | 1 | N/A |
verbose | FALSE | N/A |
umap_learn_args | NA | N/A |
Arabie, P. and L.J. Hubert (1990): The bond energy algorithm revisited, IEEE Transactions on Systems, Man, and Cybernetics, 20(1), 268–274. https://doi.org/10.1109/21.47829
Bar-Joseph, Z., E. D. Demaine, D. K. Gifford, and T. Jaakkola. (2001): Fast Optimal Leaf Ordering for Hierarchical Clustering. Bioinformatics, 17(1), 22–29. https://doi.org/10.1093/bioinformatics/17.suppl_1.S22
Barnard, S. T., A. Pothen, and H. D. Simon (1993): A Spectral Algorithm for Envelope Reduction of Sparse Matrices. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, 493–502. Supercomputing ’93. New York, NY, USA: ACM.
Bezdek, J.C. and Hathaway, R.J. (2002): VAT: a tool for visual assessment of (cluster) tendency. Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN ’02), Volume: 3, 2225–2230. https://doi.org/10.1109/IJCNN.2002.1007487
Brower, J.C. and Kile, K.M. (1988): Sedation of an original data matrix as applied to paleoecology. Lethaia, 21, 79–93. https://doi.org/10.1111/j.1502-3931.1988.tb01756.x
Brusco, M., Koehn, H.F., and Stahl, S. (2008): Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis. Psychometrika, 73(3), 503–522. https://doi.org/10.1007/s11336-007-9049-5
Brusco, M., and Stahl, S. (2005): Branch-and-Bound Applications in Combinatorial Data Analysis. New York: Springer. https://doi.org/10.1007/0-387-28810-4
Chen, C. H. (2002): Generalized Association Plots: Information Visualization via Iteratively Generated Correlation Matrices. Statistica Sinica, 12(1), 7–29.
Ding, C. and Xiaofeng He (2004): Linearized cluster assignment via spectral ordering. Proceedings of the Twenty-first International Conference on Machine learning (ICML ’04). https://doi.org/10.1145/1015330.1015407
Climer, S. and Xiongnu Zhang (2006): Rearrangement Clustering: Pitfalls, Remedies, and Applications, Journal of Machine Learning Research, 7(Jun), 919–943.
D. Earle, C. B. Hurley (2015): Advances in dendrogram seriation for application to visualization. Journal of Computational and Graphical Statistics, 24(1), 1–25.
Friendly, M. (2002): Corrgrams: Exploratory Displays for Correlation Matrices. The American Statistician, 56(4), 316–324. https://doi.org/10.1198/000313002533
Friendly, M. (2023). vcdExtra: ‘vcd’ Extensions and Additions. R package version 0.8-5, https://CRAN.R-project.org/package=vcdExtra.
Gruvaeus, G. and Wainer, H. (1972): Two Additions to Hierarchical Cluster Analysis, British Journal of Mathematical and Statistical Psychology, 25, 200–206. https://doi.org/10.1111/j.2044-8317.1972.tb00491.x
Hahsler, M. (2017): An experimental comparison of seriation methods for one-mode two-way data. European Journal of Operational Research, 257, 133–143. https://doi.org/10.1016/j.ejor.2016.08.066
Hubert, Lawrence, and James Schultz (1976): Quadratic Assignment as a General Data Analysis Strategy. British Journal of Mathematical and Statistical Psychology, 29(2). Blackwell Publishing Ltd. 190–241. https://doi.org/10.1111/j.2044-8317.1976.tb00714.x
Hurley, Catherine B. (2004): Clustering Visualizations of Multidimensional Data. Journal of Computational and Graphical Statistics, 13(4), 788–806. https://doi.org/10.1198/106186004X12425
Kruskal, J.B. (1964). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29, 115–129.
Lenstra, J.K (1974): Clustering a Data Array and the Traveling-Salesman Problem, Operations Research, 22(2) 413–414. https://doi.org/10.1287/opre.22.2.413
Mair P., De Leeuw J. (2015). Unidimensional scaling. In Wiley StatsRef: Statistics Reference Online, Wiley, New York. https://doi.org/10.1002/9781118445112.stat06462.pub2
McCormick, W.T., P.J. Schweitzer and T.W. White (1972): Problem decomposition and data reorganization by a clustering technique, Operations Research, 20(5), 993–1009. https://doi.org/10.1287/opre.20.5.993
McInnes, L and Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018.
Sammon, J. W. (1969) A non-linear mapping for data structure analysis. IEEE Trans. Comput., C-18 401–409.
Tenenbaum, J.B., de Silva, V. & Langford, J.C. (2000) A global network framework for nonlinear dimensionality reduction. Science 290, 2319-2323.
Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A. and Domany, E. (2005): Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices, Bioinformatics, 21(10) 2301–8. https://doi.org/10.1093/bioinformatics/bti329
van der Maaten, L.J.P.; Hinton, G.E. (Nov 2008). Visualizing Data Using t-SNE. Journal of Machine Learning Research. 9: 2579–2605.