Aggregate function
In database management an aggregate function or aggregation function is a function rows are grouped together to form a single summary value.
Common aggregate functions include :
Nothing:
 Nanmean (mean ignoring NaN values, also known as "nil" or "null")
 Stddev
Formally, an aggregate function takes input a set, a multiset (bag), or a list from some input domain I and outputs an element of an output domain O.[1] The input and output domains may be the same, such as for SUM
, or may be different, such as for COUNT
.
Aggregate functions occur commonly in numerous programming languages, in spreadsheets, and in relational algebra.
noting }} </ref> aggregates data from multiple rows into a single concatenated string.
Decomposable aggregate functions
Aggregate functions present a bottleneck, because they potentially require having all input values at once. In distributed computing, it is desirable to divide such computations into smaller pieces, and distribute the work, usually computing in parallel, via a divide and conquer algorithm.
Some aggregate functions can be computed by computing the aggregate for subsets, and then aggregating these aggregates; examples include COUNT, MAX, MIN,
and SUM
. In other cases the aggregate can be computed by computing auxiliary numbers for subsets, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE
(tracking sum and count, dividing at the end) and RANGE
(tracking max and min, subtracting at the end). In other cases the aggregate cannot be computed without analyzing the entire set at once, though in some cases approximations can be distributed; examples include DISTINCT COUNT, MEDIAN,
and MODE
.
Such functions are called decomposable aggregation functions[2] or decomposable aggregate functions. The simplest may be referred to as selfdecomposable aggregation functions, which are defined as those functions f such that there is a merge operator such that
where is the union of multisets.
For example, SUM
:
 , for a singleton;
 , meaning that merge is simply addition.
COUNT
:
 ,
 .
MAX
:
 ,
 .
MIN
:
 [3],
 .
Note that selfdecomposable aggregation functions can be combined (formally, taking the product) by applying them separately, so for instance one can compute both the SUM
and COUNT
at the same time, by tracking two numbers.
More generally, one can define a decomposable aggregation function f as one that can be expressed as the composition of a final function g and a selfdecomposable aggregation function h, . For example, AVERAGE
=SUM
/COUNT
and RANGE
=MAX
−MIN
.
In the MapReduce framework, these steps are known as InitialReduce (value on individual record/singleton set), Combine (binary merge on two aggregations), and FinalReduce (final function on auxiliary values),[4] and moving decomposable aggregation before the Shuffle phase is known as an InitialReduce step,[5]
Decomposable aggregation functions are important in online analytical processing (OLAP), as they allow aggregation queries to be computed on the precomputed results in the OLAP cube, rather than on the base data.[6] For example, it is easy to support COUNT, MAX, MIN,
and SUM
in OLAP, since these can be computed for each cell of the OLAP cube and then summarized ("rolled up"), but it is difficult to support MEDIAN
, as that must be computed for every view separately.
Other Decomposable aggregate functions
In order to calculate the Average and Standard Deviation from aggregate data, it is necessary to have available for each group: the total of values (Σx_{i} = SUM(x)), the number of values (N=COUNT(x)) and the total of squares of the values (Σx_{i}^{2}=SUM(x^{2})) of each groups.[7]
AVG
:
 .
or
 .
or, only if COUNT(X)=COUNT(Y)
 .
SUM(x^{2})
:
The sum of squares of the values is important in order to calculate the Standard Deviation of groups
STDDEV
:
For a finite population with equal probabilities at all points, we have[8]
This means that the standard deviation is equal to the square root of the difference between the average of the squares of the values and the square of the average value.
 .
 .
See also
 Crosstabulation a.k.a. Contingency table
 Data drilling
 Data mining
 Data processing
 Drill down
 Extract, transform, load
 Fold (higherorder function)
 Group by (SQL), SQL clause
 OLAP cube
 Online analytical processing
 Pivot table
 Relational algebra
 Utility functions on indivisible goods#Aggregates of utility functions
 XML for Analysis
References
 Jesus, Baquero & Almeida 2011, 2 Problem Definition, pp. 3.
 Jesus, Baquero & Almeida 2011, 2.1 Decomposable functions, pp. 3–4.

Winand, Markus (20170515). "Big News in Databases: New SQL Standard, Cloud Wars, and ACIDRain (Spring 2017)". DZone. Retrieved 20170610.
In December 2016, ISO released a new version of the SQL standard. It introduces new features such as row pattern matching, listagg, date and time formatting, and JSON support.
 Yu, Gunda & Isard 2009, 2. Distributed Aggregation, pp. 2–4.
 Yu, Gunda & Isard 2009, 2. Distributed Aggregation, p. 1.
 Zhang 2017, p. 1.
 Ing. Óscar Bonilla, MBA
 Standard deviation#Identities and mathematical properties
 Yu, Yuan; Gunda, Pradeep Kumar; Isard, Michael (2009). Distributed aggregation for dataparallel computing: interfaces and implementations. ACM SIGOPS 22nd symposium on Operating systems principles. ACM. pp. 247–260. doi:10.1145/1629575.1629600.
 Nothing.
 Zhang, Chao (2017). Symmetric and Asymmetric Aggregate Function in Massively Parallel Computing (Technical report).
Further reading
 Grabisch, Michel; Marichal, JeanLuc; Mesiar, Radko; Pap, Endre (2009). Aggregation functions. Encyclopedia of Mathematics and its Applications. 127. Cambridge: Cambridge University Press. ISBN 9780521519267. Zbl 1196.00002.
 Oracle Aggregate Functions: MAX, MIN, COUNT, SUM, AVG Examples