# Selection (relational algebra)

In relational algebra, a selection (sometimes called a restriction in reference to E.F. Codd's 1970 paper[1] and not, contrary to a popular belief, to avoid confusion with SQL's use of SELECT, since Codd's article predates the existence of SQL) is a unary operation that denotes a subset of a relation.

A selection is written as ${\displaystyle \sigma _{a\theta b}(R)}$ or ${\displaystyle \sigma _{a\theta v}(R)}$ where:

• a and b are attribute names
• θ is a binary operation in the set ${\displaystyle \{\;<,\leq ,=,\neq ,\geq ,\;>\}}$
• v is a value constant
• R is a relation

The selection ${\displaystyle \sigma _{a\theta b}(R)}$ denotes all tuples in R for which θ holds between the a and the b attribute.

The selection ${\displaystyle \sigma _{a\theta v}(R)}$ denotes all tuples in R for which θ holds between the a attribute and the value v.

For an example, consider the following tables where the first table gives the relation Person, the second table gives the result of ${\displaystyle \sigma _{{\text{Age}}\geq 34}({\text{Person}})}$ and the third table gives the result of ${\displaystyle \sigma _{{\text{Age}}={\text{Weight}}}({\text{Person}})}$ .

${\displaystyle {\text{Person}}}$ ${\displaystyle \sigma _{{\text{Age}}\geq 34}({\text{Person}})}$ ${\displaystyle \sigma _{{\text{Age}}={\text{Weight}}}({\text{Person}})}$
Name Age Weight
Harry 34 80
Sally 28 64
George 29 70
Helena 54 54
Peter 34 80
Name Age Weight
Harry 34 80
Helena 54 54
Peter 34 80
Name Age Weight
Helena 54 54

More formally the semantics of the selection is defined as follows:

${\displaystyle \sigma _{a\theta b}(R)=\{\ t:t\in R,\ t(a)\ \theta \ t(b)\ \}}$
${\displaystyle \sigma _{a\theta v}(R)=\{\ t:t\in R,\ t(a)\ \theta \ v\ \}}$

The result of the selection is only defined if the attribute names that it mentions are in the heading of the relation that it operates upon.

## Generalized selection

A generalized selection is a unary operation written as ${\displaystyle \sigma _{\varphi }(R)}$ where ${\displaystyle \varphi }$ is a propositional formula that consists of atoms as allowed in the normal selection and, in addition, the logical operators (and), (or) and ${\displaystyle \lnot }$ (negation). This selection selects all those tuples in R for which ${\displaystyle \varphi }$ holds.

For an example, consider the following tables where the first table gives the relation Person and the second the result of ${\displaystyle \sigma _{{\text{Age}}\geq 30\ \land \ {\text{Weight}}\leq 60}({\text{Person}})}$ .

${\displaystyle {\text{Person}}}$ ${\displaystyle \sigma _{{\text{Age}}\geq 30\ \land \ {\text{Weight}}\leq 60}({\text{Person}})}$
Name Age Weight
Harry 34 80
Sally 28 64
George 29 70
Helena 54 54
Peter 34 80
Name Age Weight
Helena 54 54

Formally the semantics of the generalized selection is defined as follows:

${\displaystyle \sigma _{\varphi }(R)=\{\ t:t\in R,\ \varphi (t)\ \}}$

The result of the selection is only defined if the attribute names that it mentions are in the header of the relation that it operates upon.

The generalized selection is expressible with other basic algebraic operations. A simulation of generalized selection using the fundamental operators is defined by the following rules:

${\displaystyle \sigma _{\varphi \land \psi }(R)=\sigma _{\varphi }(R)\cap \sigma _{\psi }(R)}$
${\displaystyle \sigma _{\varphi \lor \psi }(R)=\sigma _{\varphi }(R)\cup \sigma _{\psi }(R)}$
${\displaystyle \sigma _{\lnot \varphi }(R)=R-\sigma _{\varphi }(R)}$

## Computer languages

In computer languages it is expected that any truth-valued expression be permitted as the selection condition rather than restricting it to be a simple comparison.

In SQL, selections are performed by using WHERE definitions in SELECT, UPDATE, and DELETE statements, but note that the selection condition can result in any of three truth values (true, false and unknown) instead of the usual two.

In SQL, general selections are performed by using WHERE definitions with AND, OR, or NOT operands in SELECT, UPDATE, and DELETE statements.

## References

1. Codd, E.F. (June 1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685.