In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as the Pythagorean metric. A generalized term for the Euclidean norm is the L2 norm or L2 distance.
The Euclidean distance between points p and q is the length of the line segment connecting them ().
The position of a point in a Euclidean n-space is a Euclidean vector. So, p and q may be represented as Euclidean vectors, starting from the origin of the space (initial point) with their tips (terminal points) ending at the two points. The Euclidean norm, or Euclidean length, or magnitude of a vector measures the length of the vector:
where the last expression involves the dot product.
Describing a vector as a directed line segment from the origin of the Euclidean space (vector tail), to a point in that space (vector tip), its length is actually the distance from its tail to its tip. The Euclidean norm of a vector is seen to be just the Euclidean distance between its tail and its tip.
The relationship between points p and q may involve a direction (for example, from p to q), so when it does, this relationship can itself be represented by a vector, given by
In a two- or three-dimensional space (n = 2, 3), this can be visually represented as an arrow from p to q. In any space it can be regarded as the position of q relative to p. It may also be called a displacement vector if p and q represent two positions of some moving point.
The Euclidean distance between p and q is just the Euclidean length of this displacement vector:
which is equivalent to equation 1, and also to:
In the context of Euclidean geometry, a metric is established in one dimension by fixing two points on a line, and choosing one to be the origin. The length of the line segment between these points defines the unit of distance and the direction from the origin to the second point is defined as the positive direction. This line segment may be translated along the line to build longer segments whose lengths correspond to multiples of the unit distance. In this manner real numbers can be associated to points on the line (as the distance from the origin to the point) and these are the Cartesian coordinates of the points on what may now be called the real line. As an alternate way to establish the metric, instead of choosing two points on the line, choose one point to be the origin, a unit of length and a direction along the line to call positive. The second point is then uniquely determined as the point on the line that is at a distance of one positive unit from the origin.
The distance between any two points on the real line is the absolute value of the numerical difference of their coordinates. It is common to identify the name of a point with its Cartesian coordinate. Thus if p and q are two points on the real line, then the distance between them is given by:
In one dimension, there is a single homogeneous, translation-invariant metric (in other words, a distance that is induced by a norm), up to a scale factor of length, which is the Euclidean distance. In higher dimensions there are other possible norms.
In the Euclidean plane, if p = (p1, p2) and q = (q1, q2) then the distance is given by
This is equivalent to the Pythagorean theorem.
Alternatively, it follows from (2) that if the polar coordinates of the point p are (r1, θ1) and those of q are (r2, θ2), then the distance between the points is
In three-dimensional Euclidean space, the distance is
In general, for an n-dimensional space, the distance is
Squared Euclidean distance
The square of the standard Euclidean distance, which is known as the squared Euclidean distance (SED), is also of interest; as an equation:
Squared Euclidean distance is of central importance in estimating parameters of statistical models, where it is used in the method of least squares, a standard approach to regression analysis. The corresponding loss function is the squared error loss (SEL), and places progressively greater weight on larger errors. The corresponding risk function (expected loss) is mean squared error (MSE).
Squared Euclidean distance is not a metric, as it does not satisfy the triangle inequality. However, it is a more general notion of distance, namely a divergence (specifically a Bregman divergence), and can be used as a statistical distance. The Pythagorean theorem is simpler in terms of squared distance (since there is no square root); if , then:
In information geometry, the Pythagorean identity can be generalized from SED to other Bregman divergences, including relative entropy (Kullback–Leibler divergence), allowing generalized forms of least squares to be used to solve non-linear problems.
The SED is a smooth, strictly convex function of the two points, unlike the distance, which is not smooth when two points are equal and is not strictly convex (because it is linear). The SED is thus preferred in optimization theory, since it allows convex analysis to be used. Since squaring is a monotonic function of non-negative values, minimizing the SED is equivalent to minimizing the Euclidean distance, so the optimization problem is equivalent in terms of either, but easier to solve using the SED.
If one of the points is fixed, the SED can be interpreted as a potential function, in which case a normalization factor of one half is used, and the sign may be switched, depending on convention. In detail, given two points , the vector points from to and has magnitude proportional to their Euclidean distance. If one fixes , one can thus define a smooth vector field "pointing at " by This is the gradient of the scalar-valued function "half SED from ", where the half cancels the two in the power rule. Writing half the squared distance from as , one has Alternatively, one can consider the vector field pointing from , and omit the minus sign.
In information geometry, the notion of a vector field of "pointing from one point to another" can be generalized to statistical manifolds – one can use an affine connection to connect tangent vectors at different points and the exponential map to flow from one point to another, and on a statistical manifold this is invertible, defining a unique "difference vector" from any given point to another. In this context, the SED (whose gradient generates the standard difference vector) generalized to a divergence that generates the information geometry of the manifold; a uniform construction of such a divergence (given the geometric structure) is called a canonical divergence.
- Chebyshev distance measures distance assuming only the most significant dimension is relevant.
- Euclidean distance matrix
- Manhattan distance measures distance following only axis-aligned directions.
- Minkowski distance is a generalization that unifies Euclidean distance, Manhattan distance, and Chebyshev distance.
- Pythagorean addition
- Haversine distance giving great-circle distances between two points on a sphere from their longitudes and latitudes.
- Vincenty's formulae well known as "Vincent distance"
- Deza, Elena; Deza, Michel Marie (2009). Encyclopedia of Distances. Springer. p. 94.
- "Cluster analysis". March 2, 2011.