Department of Ecology and Evolution

State University of New York at Stony Brook, Stony Brook, NY 11794-5245

e-mail: Dennis.Slice@sunysb.edu

Fred L. Bookstein

Institute of Gerontology

University of Michigan.
Ann Arbor, MI 48109-2007

e-mail: fred@brainmap.med.umich.edu

Leslie F. Marcus

Department of Biology

Queens College of CUNY

Flushing, NY 11367 *and*

Department of Invertebrates

American Museum of Natural History, CPW at 79th,

York, NY 10024

email: marcus@amnh.org

F. James Rohlf

Department of Ecology and Evolution

State University
of New York at Stony Brook, Stony Brook, NY 11794-5245

e-mail:
Rohlf@Life.Bio.SUNYSB.edu

This glossary provides definitions for terms, concepts, and methods
frequently encountered in morphometric literature and discussions. It includes
entries for technical terms with more-or-less special meaning in shape analysis
and biological morphometrics (e.g., preshape, warps, anisotropy) and some of the
casual jargon that may be completely foreign to newcomers to the field (e.g.,
books of various color - Red, Blue, Orange, and Black). Many definitions provide
the general idea behind each entry instead of a technically or mathematically
rigorous treatment. As such, they are intended to give readers an intuitive
understanding of a particular entry that will allow them to follow the main
ideas in the literature without becoming unduly distracted, at first, with
technical details. Unless otherwise indicated, the following general notation
has been used: *n* - number of specimens, *p* - number of
points/landmarks, *k* - number of dimensions, a superscript *t* will
refer to the transpose of a matrix (e.g.,
$$**A**^{t}, but that may not be displayed
properly by all WWW browsers). Members of the morphometrics community,
especially the subscribers to the MORPHMET electronic mailing list, have helped
greatly in the selection of terms to be included in the glossary.

Note: many of the mathematical symbols and equations are patched into this file as images since HTML (the language used to prepare these WWW pages) does not support mathematical symbols. For this reason, many symbols will not appear on text-only WWW browsers and may not line-up well with the rest of the text. Superscripts and subscripts will not display properly on all WWW browsers.

- In a relative warps analysis, this is the exponent used to rescale partial warps before computing their principal components, the relative warps (see Rohlf's chapter in the Black Book)). Scale invariant multivariate analyses using rescaled principal warp scores, such as canonical variates analysis, are not affected by the choice of (see Rohlf, 1996, NATO volume "white book").

-** **The
Kronecker tensor product or direct product. The Kronecker tensor product of
matrices **X** and **Y**, written as **X** **Y**, results in a large
matrix formed by taking all possible products of the elements of **X** and
those of **Y**. For example, if **X** and **Y** are 2x2 then
**X** **Y**
results in a 4x4 matrix:

**accuracy **- The closeness of a measurement or estimate to its true
value. See precision.

**affine superimposition** - A superimposition for which the associated
transformations are all affine. See affine
transformation.

**affine transformation** - A transformation for
which parallel lines remain parallel. Affine transformations of the plane take
squares into parallelograms and take circles into ellipses of the same shape.
Affine transformations of a 3-dimensional space take cubes into parallelopipeds
(sheared bricks) and spheres into ellipsoids all of the same shape. Similar
results are produced in higher dimensional spaces. Equivalent to "uniform
transformation".

As far as form is concerned (that is, ignoring translation and rotation), any
affine transformation can be diagrammed as a *pure strain* taking a square
to a rectangle on the same axes. In studies of shape, where scale is ignored as
well, the picture is the same but now the sum of the squares of the axes is
unchanging. Still ignoring scale (that is, as far as shape is concerned), any
affine transformation can be also diagrammed as a *pure shear* taking a
square into a parallelogram of unchanged base segment and height. This diagram
of shear came into morphometrics via an application to principal components
analysis somewhat before it was applied to landmark-based shape (see shear, Kendall's
shape space, and tangent
space).

**allometry** - Any change of shape with size. It describes any deviation
of the bivariate relation from the simple functional form *y*/*x* =
*c*, where *c* is a constant and *x* and *y* are size
measures in units of the same dimension. See Klingenberg, 1996, NATO volume
"white book".

**anisotropy** - Anisotropy is a descriptor of one aspect of an affine
transformation. In two dimensions, this is the ratio of the axes of the ellipse
into which a circle is transformed by an affine transformation. In general, it
is the maximum ratio of extension of length in one direction to extension in a
perpendicular direction.

**asymptotically unbiased estimator** - An
estimator, , with an
expected value that converges in probability on the parametric value it is
estimating, , as sample size goes to infinity: as
. See unbiased
estimator and consistent
estimator.

**baseline** - For a system of two-point
shape coordinates for landmarks in a plane, the baseline is the line
connecting the pair of landmarks that are assigned to fixed locations (0,0) and
(1,0) in the construction. In general, baselines work better if they are closely
aligned with the long axis of the mean landmark shape and pass near the centroid
of that mean shape (see the Orange
Book).

**bending energy** - Bending energy is a
metaphor borrowed for use in morphometrics from the mechanics of thin metal
plates. Imagine a configuration of landmarks that has been printed on an
infinite, infinitely thin, flat metal plate, and suppose that the differences in
coordinates of these same landmarks in another picture are taken as vertical
displacements of this plate perpendicular to itself, one Cartesian coordinate at
a time. The bending energy of one of these out-of-plane "shape changes" is the
(idealized) energy that would be required to bend the metal plate so that the
landmarks were lifted or lowered appropriately.

While in physics bending energy is a real quantity, measured in appropriate units (g cm2 sec-2), there is an alternate formula that remains meaningful in morphometrics: bending energy is proportional to the integral of the summed squared second derivatives of the "vertical" displacement - the extent to which it varies from a uniform tilt. The bending energy of a shape change is the sum of the bending energies that apply to any two perpendicular coordinates in which the metaphor is evaluated. The bending energy of an affine transformation is zero since it corresponds to a tilting of the plate without any bending. The value obtained for the bending energy corresponding to a given displacement is inversely proportional to scale. Such quantities should not be interpreted as measures of dissimilarity (e.g., taxonomic or evolutionary distance) between two forms.

**bending energy matrix** - The formula for bending energy (see above) -
the formula whose value is proportional to that integral of those summed squared
second derivatives - is a quadratic form (usually written
$$**L**_{k}^{-1}) determined by the coordinates of
the landmarks of the reference form. That is, if **h** is a vector describing
the heights of a plate above a set of landmarks, then bending energy is
$$**h**^{t}**L**_{k}^{-1}**h**. In
morphometrics, the bending energy of a general transformation is the sum
$$**x**^{t}**L**_{k}^{-1}**x**
+**y**^{t}**L**_{k}^{-1}**y** of the
bending energy of its horizontal *x*-component, modeled as a "vertical"
plate, plus the bending energy of its vertical *y*-component, modeled
similarly as a "vertical" plate.

**biplot **- A single diagram that represents two separate scatterplots on
the same pair of axes. One scatter is of some pair of columns of the matrix
**U** of the singular value decomposition of a matrix **S**, and the other
scatter is of the matching pair of columns of **V**. When **S** is a
centered data matrix, the effect is to plot principal component loadings and
scores on the same diagram. See Marcus (Black Book) for
an in depth discussion.

**Black Book **- Marcus, L. F., E. Bello, A.
García-Valdecasas (eds.). 1993. *Contributions to Morphometrics*. Museo
Nacional de Ciencias Naturales Monografias: Madrid.

See also Blue Book, Orange Book, Red Book, and Reyment's Black Book.

**Blue Book** - Rohlf, F. J. and F. L. Bookstein
(eds.). 1990. *Proceedings of the Michigan Morphometrics Workshop*. Special
Publication No. 2, University of Michigan Museum of Zoology: Ann Arbor.

See also Black Book, Orange Book, Red Book, and Reyment's Black Book.

**Bookstein coordinates** - See two-point
shape coordinates.

**canonical** - A canonical description of any statistical situation is a
description in terms of extracted vectors that have especially simple ordered
relationships. For instance, a canonical correlations analysis describes the
relation between two lists of variables in terms of two lists of linear
combinations that show a remarkable pattern of zero correlations. Each score
(linear combination) from either list is correlated with no other combination
from its list and with only one score from the other list.

**canonical correlation analysis** - A multivariate method for assessing
the associations between two sets of variables within a data set. The analysis
focuses on pairs of linear combinations of variables (one for each set) ordered
by the magnitude of their correlations with each other. The first such pair is
determined so as to have the maximal correlation of any such linear
combinations. Subsequent pairs have maximal correlation subject to the
constraint of being orthogonal to those previously determined.

**canonical variates analysis** - A method of multivariate
analysis in which the variation among groups is expressed relative to the pooled
within-group covariance matrix. Canonical variates analysis finds linear
transformations of the data which maximize the among group variation relative to
the pooled within-group variation. The canonical variates then may be displayed
as an ordination to show the group centroids and scatter within groups. This may
be thought of as a "data reduction" method in the sense that one wants to
describe among group differences in few dimensions. The canonical variates are
uncorrelated, however the vectors of coefficients are not orthogonal as in
Principal Component Analysis. The method is closely related to multivariate
analysis of variance (MANOVA), multiple discriminant analysis, and canonical
correlation analysis. A critical assumption is that the within-group
variance-covariance structure is similar, otherwise the pooling of the data over
groups is not very sensible.

**Centroid Size** - Centroid Size is the square root of the sum of squared
distances of a set of landmarks from their centroid, or, equivalently, the
square root of the sum of the variances of the landmarks about that centroid in
*x*- and *y*-directions. Centroid Size is used in geometric
morphometrics because it is approximately uncorrelated with every shape variable
when landmarks are distributed around mean positions by independent noise of the
same small variance at every landmark and in every direction. Centroid Size is
the size measure used to scale a configuration of landmarks so they can be
plotted as a point in Kendall's shape space. The denominator of the formula for
the Procrustes distance between two sets of landmark configurations is the
product of their Centroid Sizes.

**cluster analysis** - A method of analysis that represents multivariate
variation in data as a series of sets. In biology, the sets are often
constructed in a hierarchical manner and shown in the form of a tree-like
diagram called a dendrogram.

**coefficient** - A coefficient, in general, is a number multiplying a
function. In multivariate data analysis, usually the "function" is a variable
measured over the cases of the analysis, and the coefficients multiply these
variable values before we add them up to form a score. A coefficient is not the
same as a loading.

**complex numbers** Complex numbers are an algebraic way of coding points
in the ordinary Euclidean plane so that translation (shift of position)
corresponds to the addition of complex numbers and both rescaling (enlargement
or shrinking) and rotation correspond to multiplication of complex numbers. In
this system of notation, invented by Gauss, the *x*-axis is identified with
the "real numbers" (ordinary decimals numbers) and the *y*-axis is
identified with "imaginary numbers" (the square roots of negative numbers). When
you multiply points on this axis by themselves according to the rules, you get
negative points on the "real" axis just defined. Many operations on data in two
dimensions can be proved valid more directly if they are written out as
operations on complex numbers.

**consensus configuration **- A single set of landmarks intended to
represent the central tendency of an observed sample for the production of
superimpositions, of a weight matrix, or some other morphometric purpose. Often
a consensus configuration is computed to optimize some measure of fit to the
full sample: in particular, the Procrustes mean shape is computed to minimize
the sum of squared Procrustes distances from the the consensus landmarks to
those of the sample.

**consistent estimator **- An estimator,
, that converges in
probability on the parametric value it is estimating, , as sample size goes to
infinity: for any positive *. *Asymptotically unbiased
estimators are consistent estimators if their variance goes to zero as sample
size goes to infinity. See unbiased
estimator.

**coordinates** - A set of parameters that locate a point in some
geometrical space. Cartesian coordinates, for instance, locate a point on a
plane or in physical space by projection onto perpendicular lines through one
single point, the origin. The elements of any vector may be thought of as
coordinates in a geometric sense.

**correlation **- Relation between two or more
variables. Frequently the word is used for Pearson's product-moment correlation
which is the covariance divided by the product of the standard deviations, . This correlation coefficient is +1 or -1 when all values fall on
a straight line, not parallel to either axis. However, there are also Kendall,
Spearman, tetrachoric, etc. correlations which measure other aspects of the
relation between two variables.

**covariance** - A measure of the degree to which two variables vary
together. Computed as for
two variables X and Y in a sample of size *n*. See correlation.

**covariant** - A covariant of a particular shape
change is a shape variable whose gradient vector as a function of changes in any
complete set of shape coordinates lies precisely along the change in question.

For transformations of triangles, the relation between invariants and covariants is a rotation by 90 degrees in the shape-coordinate plane. For more than three landmarks, a given transformation has only one direction of covariants, but a full plane (four landmarks) or hyperplane (five or more landmarks) of invariants (see the Orange Book). See invariant.

**curved space** - A space with coordinates and a distance function such
that the area of circles, volume of spheres, etc. are not proportional to the
appropriate power of the radius, e. g., Kendall's shape space. In curved spaces,
the usual intuitions about what "straight lines" can be expected to do will be
faulty. For instance, corresponding to every triangular shape in Kendall's shape
space, there is another that is "as far from it as possible," just like there is
a point on the surface of the earth as far as possible from where you now sit.

* D* - See 1) generalized distance or 2) fractal dimension.

$$**D**^{2} - Squared Mahalanobis, or
generalized, distance.

**deficient coordinate** - In addition to landmark locations, a digitizer
can be used to supply information of other sorts. For example, a point can be
used to encode part of the information about a curving arc by identifying the
spot at which the arc lies farthest from some other image structure (perhaps
another such curving arc). The null model of independent Gaussian noise does not
apply to position along the tangent direction of the curve that is digitized in
this way, and so that Cartesian coordinate is "deficient." The usual model of
independent Gaussian noise is inapplicable in principle for such points. See Type
III landmark.

**degrees of freedom** - Given a set of parameters estimated from the
data, the "degrees of freedom" of some statistic is the number of independent
observations *required* to compute the statistic. For example, the variance
has *n*-1 degrees of freedom because only *n*-1 of the observations
are needed for its computation given the sample mean. The missing observation
can be computed as .

**dilation** - Increase of length in a particular direction, or along a
particular interlandmark segment.

**discriminant analysis** - A broad
class of methods concerned with the development of rules for assigning
unclassified objects/specimens to previously defined groups. See discriminant
function.

**discriminant function **- A discriminant function is used to assign an
observation to one of a set of groups. Linear discriminant functions take a
vector of observations from a specimen and multiplies it by a vector of
coefficients to produce a score which can be used to classify the specimen as
belonging to one or another predefined group. See discriminant analysis.

**distance** - This term has several meanings in morphometrics; it should
never be used without a prefixed adjective to qualify it, e.g., Euclidean
distance, Mahalanobis distance, Procrustes distance, taxonomic distance.

**edgel** - An extension of the notion of landmark to include partial
information about a curve through the landmark. An edgel specifies rotation of a
direction through a landmark, extension along a direction through a landmark, or
both. The formula for thin-plate splines on landmarks can be extended to
encompass data about edgels as well. They are intended eventually to circumvent
any need for deficient coordinates in multivariate morphometric analysis. See
Little (1996, NATO volume "white book") and Bookstein and Green, 1993, A feature
space for edgels in images with landmarks, *Journal of Mathematical Imaging
and Vision* 3: 231-261.

**EDMA** - See euclidean
distance matrix analysis.

**eigenshapes** - Principal components for outline data. An eigenshape
analysis begins with the selection of a distance function between pairs of
outlines. At the end one gets "eigenshapes," which have the properties of
principal component vectors (uncorrelated, describing the sample in decreasing
order of variance) and also are outline shapes themselves, so that the scores
for each specimen of the sample can be combined to produce a new outline shape
that approximates it in some possibly useful way. Eigenshapes apply to curves as
relative warps apply to landmark shape. See the chapter by Lohmann and
Schweitzer in the Blue
Book and that by Sampson, 1996, NATO volume "white book".

**eigenvalues** - Eigenvalues, , are the diagonal
elements of the diagonal matrix in the equation: . In the common
data analysis case, **S** is a symmetrical variance-covariance matrix,
**E** is a matrix of eigenvectors, , and . The order of the columns
of **E** and is arbitrary, but by convention they are usually sorted from
largest to smallest eigenvalue. See eigenvectors
and singular value decomposition.

**eigenvectors **- In the equation given to
define eigenvalues, **E** contains the eigenvectors. In the common data
analysis case, **E** is an orthonormal matrix (i. e.,
$$**E**^{t}**E**=**I**
and $$**EE*** ^{t}*=

**elliptic Fourier analysis** - A type of outline analysis in which
differences in *x* and *y* (and possibly *z*) coordinates of an
outline are fit separately as a function of arc length by Fourier analysis. The
chapter by Rohlf in the blue
book provides an overview of various methods of fitting curves to outline
data.

**Euclidean distance** - Defined as:
for coordinates of points $$*x _{l}* and
$$

**euclidean distance matrix analysis **--EDMA. A method
for the statistical analysis of full matrices of all interlandmark distances,
averaging elementwise within samples, and then comparing those averages between
samples by computing the ratios of corresponding mean distances. See Lele, S.
and J. T. Richtsmeier, 1991, Euclidean distance matrix analysis: a coordinate
free approach for comparing biological shapes using landmark data, *American
Journal of Physical Anthropology*, 86:415-428.

**Euclidean space** - A space where distances between two points are
defined as Euclidean distances in some system of coordinates.

**factor analysis** - Factor analysis is a multivariate technique for
describing a set of measured variables in terms of a set of causal or underlying
variables. A factor model can be characterized in terms of path diagrams to show
relations between measured variables and factors. See the chapter by Marcus in
the Blue Book and Reyment and Joreskog, 1993, *Applied Factor Analysis in the
Natural Sciences*, Cambridge University Press: Cambridge, United Kingdom.

**FESA** - See finite
element scaling analysis.

**fiber** - The set of preshapes (configurations that have been centered
at the origin and scaled to unit centroid size) that differ only by a rotation.
It is the path, through preshape space, followed by a centered and scaled
configuration under all possible rotations.

**figure **- A representation of an object by the coordinates of a
specified set of points, the landmarks.

**figure space** - The 2*p*- or 3*p*-space of figures, i. e.,
the original coordinate data vectors.

**finite element scaling analysis** - Without
the word "scaling," finite element analysis is a computational system for
continuum mechanics that estimates the deformation (fully detailed changes of
position of all component particles) that are expected to result from a
specified pattern of stresses (forces) upon a mechanical system. As applied in
morphometrics, FESA solves the inverse problem of estimating the strains
representing the hypothetical forces that deformed one specimen into another.
These results are a function of the "finite elements" into which the space
between the landmarks is subdivided. FESA can be compared with the thin-plate
spline, which interpolates a set of landmark coordinates under an entirely
different set of assumptions.

**form **- In morphometrics, we represent the form of an object by a point
in a space of form variables, which are measurements of a geometric object that
are unchanged by translations and rotations. If you allow for reflections, forms
stand for all the figures that have all the same interlandmark distances. A form
is usually represented by one of its figures at some specified location and in
some specified orientation. When represented in this way, location and
orientation are said to have been "removed."

**form space **- The space of figures with differences due to location and
orientation removed. It is of 2*p*-3 dimensions for two-dimensional
coordinate data and 3*p*-6 dimensions for three-dimensional coordinate
data.

**Fourier analysis** - In morphometrics, the
decomposition of an outline into a weighted sum of sine and cosine functions.
The chapter by Rohlf in the Blue Book provides an overview of this and other
methods of analyzing outline data.

**fractal dimension **- *D*. A measure of the complexity of a
structure assuming a consistent pattern of self-similarity (structural
complexity at smaller scales is mathematically indistiguishable from that at
larger-scales) over all scales considered. See the chapter by Slice in the Black
Book.

**generalized distance** - *D*. A
synonym for Mahalanobis distance. Defined by the equation for two row vectors
$$**x**_{i} and
$$**x**_{j} for two individuals, and *p*
variables as: ,
where **S** is the *p*x*p* variance-covariance matrix. It takes
into consideration the variance and correlation of the variables in measuring
distances between points, i. e., differences in directions in which there is
less variation within groups are given greater weight than are differences in
directions in which there is more variation.

**generalized superimposition **- The superimposition of a set of
configurations onto their consensus configuration. The fitting may involve
least-squares, resistant-fit, or other algorithms and may be strictly orthogonal
or allow affine transformations.

**geodesic distance** - The length of the shortest
path between two points in a suitable geometric space (one for which curving
paths have lengths). On a sphere, it is the distance between two points as
measured along a great circle.

**geometric morphometrics** - Geometric
morphometrics is a collection of approaches for the multivariate statistical
analysis of Cartesian coordinate data, usually (but not always) limited to
landmark point locations. The "geometry" referred to by the word "geometric" is
the geometry of Kendall's shape space: the estimation of mean shapes and the
description of sample variation of shape using the geometry of Procrustes
distance. The multivariate part of geometric morphometrics is usually carried
out in a linear tangent space to the non-Euclidean shape space in the vicinity
of the mean shape.

More generally, it is the class of morphometric methods that preserve complete information about the relative spatial arrangements of the data throughout an analysis. As such, these methods allow for the visualization of group and individual differences, sample variation, and other results in the space of the original specimens.

**great circle **- A circle on a sphere with a diameter equal to that of
the sphere. The shortest path connecting two points on the surface of a sphere
lies along the great circle passing through the points. See geodesic
distance.

**homology** - The notion of homology bridges the language of geometric
morphometrics and the language of its biological or biomathematical
applications. In theoretical biology, only the explicit entities of evolution or
development, such as molecules, organs or tissues, can be "homologous."
Following D'Arcy Thompson, morphometricians often apply the concept instead to
discrete geometric structures, such as points or curves, and, by a further
extension, to the multivariate descriptors (e.g., partial warp scores) that
arise as part of most multivariate analyses. In this context, the term
"homologous" has no meaning other than that the same name is used for
corresponding parts in different species or developmental stages. To declare
something "homologous" is simply to assert that we want to talk about processes
affecting such structures as if they had a consistent biological or
biomechanical meaning. Similarly, to declare an interpolation (such as a
thin-plate spline) a "homology map" means that one intends to refer to its
features as if they had something to do with valid biological explanations
pertaining to the regions between the landmarks, about which we have no data.

**Hotelling's $$ T^{2} **- See
$$

**hyperplane** - A *k*-1 dimensional subspace of a
*k*-dimensional space. A hyperplane is typically characterized by the
vector to which it is orthogonal.

**hyperspace** - A space of more than three dimensions.

**hypersphere** - A generalization of the idea of a sphere to a space of
greater than three dimensions.

**hypervolume** - A generalization of the idea of volume to a space of
more than three dimensions.

**invariant** - An invariant, generally speaking, is
a quantity that is unchanged (even though its formula may have changed) when one
changes some inessential aspect of a measurement. For instance, Euclidean
distance is an invariant under translation or rotation of one's coordinate
system, and ratio of distances in the same direction is an invariant under
affine transformations. In the morphometrics of triangles, the invariants of a
particular transformation are the shape variables that do not change under that
transformation (see the Orange
Book). See covariant.

**isometry** - An isometry is a transformation of a geometric space that
leaves distances between points unchanged. If the space is the Euclidean space
of a picture or an organism, and the distances are distances between landmarks,
the isometries are the Euclidean translations, rotations, and reflections. If
the distances are Procrustes distances between shapes, the isometries (for the
simplest case, landmarks in two dimensions) are the rotations of Kendall's
shape space. For triangles, these can be visualized as ordinary rotations of
Kendall's "spherical blackboard."

**isotropic** - Invariant with respect to direction. Isotropic errors have
the same statistical distribution in all directions implying equal variance and
zero correlation between the original variables (e.g., axis coordinates).

**Kendall's shape space** - The
fundamental geometric construction, due to David Kendall, underlying geometric
morphometrics. Kendall's shape space provides a complete geometric setting for
analyses of Procrustes
distances among arbitrary sets of landmarks. Each point in this shape space
represents the shape of a configuration of points in some Euclidean space,
irrespective of size, position, and orientation. In shape space, scatters of
points correspond to scatters of entire landmark configurations, not merely
scatters of single landmarks. Most multivariate methods of geometric
morphometrics are linearizations of statistical analyses of distances and
directions in this underlying space.