# Optimal binary search tree

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree,[1] is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In the static optimality problem, the tree cannot be modified after it has been constructed. In this case, there exists some particular layout of the nodes of the tree which provides the smallest expected search time for the given access probabilities. Various algorithms exist to construct or approximate the statically optimal tree given the information on the access probabilities of the elements.

In the dynamic optimality problem, the tree can be modified at any time, typically by permitting tree rotations. The tree is considered to have a cursor starting at the root which it can move or use to perform modifications. In this case, there exists some minimal-cost sequence of these operations which causes the cursor to visit every node in the target access sequence in order. The splay tree is conjectured to have a constant competitive ratio compared to the dynamically optimal tree in all cases, though this has not yet been proven.

## Static optimality

### Definition

In the static optimality problem as defined by Knuth,[2] we are given a set of n ordered elements and a set of ${\displaystyle 2n+1}$ probabilities. We will denote the elements ${\displaystyle a_{1}}$ through ${\displaystyle a_{n}}$ and the probabilities ${\displaystyle A_{1}}$ through ${\displaystyle A_{n}}$ and ${\displaystyle B_{0}}$ through ${\displaystyle B_{n}}$. ${\displaystyle A_{i}}$ is the probability of a search being done for element ${\displaystyle a_{i}}$. For ${\displaystyle 1\leq i, ${\displaystyle B_{i}}$ is the probability of a search being done for an element between ${\displaystyle a_{i}}$ and ${\displaystyle a_{i+1}}$, ${\displaystyle B_{0}}$ is the probability of a search being done for an element strictly less than ${\displaystyle a_{1}}$, and ${\displaystyle B_{n}}$ is the probability of a search being done for an element strictly greater than ${\displaystyle a_{n}}$. These ${\displaystyle 2n+1}$ probabilities cover all possible searches, and therefore add up to one.

The static optimality problem is the optimization problem of finding the binary search tree that minimizes the expected search time, given the ${\displaystyle 2n+1}$ probabilities. As the number of possible trees on a set of n elements is ${\displaystyle {2n \choose n}{\frac {1}{n+1}}}$,[2] which is exponential in n, brute-force search is not usually a feasible solution.

### Knuth's dynamic programming algorithm

In 1971, Knuth published a relatively straightforward dynamic programming algorithm capable of constructing the statically optimal tree in only O(n2) time.[2] Knuth's primary insight was that the static optimality problem exhibits optimal substructure; that is, if a certain tree is statically optimal for a given probability distribution, then its left and right subtrees must also be statically optimal for their appropriate subsets of the distribution.

To see this, consider what Knuth calls the "weighted path length" of a tree. The weighted path length of a tree on n elements is the sum of the lengths of all ${\displaystyle 2n+1}$ possible search paths, weighted by their respective probabilities. The tree with the minimal weighted path length is, by definition, statically optimal.

But weighted path lengths have an interesting property. Let E be the weighted path length of a binary tree, EL be the weighted path length of its left subtree, and ER be the weighted path length of its right subtree. Also let W be the sum of all the probabilities in the tree. Observe that when either subtree is attached to the root, the depth of each of its elements (and thus each of its search paths) is increased by one. Also observe that the root itself has a depth of one. This means that the difference in weighted path length between a tree and its two subtrees is exactly the sum of every single probability in the tree, leading to the following recurrence:

${\displaystyle E=E_{L}+E_{R}+W}$

This recurrence leads to a natural dynamic programming solution. Let ${\displaystyle E_{ij}}$ be the weighted path length of the statically optimal search tree for all values between ai and aj, let ${\displaystyle W_{ij}}$ be the total weight of that tree, and let ${\displaystyle R_{ij}}$ be the index of its root. The algorithm can be built using the following formulas:

{\displaystyle {\begin{aligned}E_{i,i-1}=W_{i,i-1}&=B_{i-1}\operatorname {for} 1\leq i\leq n+1\\W_{i,j}&=W_{i,j-1}+A_{j}+B_{j}\\E_{i,j}&=\min _{i\leq r\leq j}(E_{i,r-1}+E_{r+1,j}+W_{i,j})\operatorname {for} 1\leq i\leq j\leq n\end{aligned}}}
The naive implementation of this algorithm actually takes O(n3) time, but Knuth's paper includes some additional observations which can be used to produce a modified algorithm taking only O(n2) time.

### Mehlhorn's approximation algorithm

While the O(n2) time taken by Knuth's algorithm is substantially better than the exponential time required for a brute-force search, it is still too slow to be practical when the number of elements in the tree is very large.

In 1975, Kurt Mehlhorn published a paper proving that a much simpler algorithm could be used to closely approximate the statically optimal tree in only ${\displaystyle O(n)}$ time.[3] In this algorithm, the root of the tree is chosen so as to most closely balance the total weight (by probability) of the left and right subtrees. This strategy is then applied recursively on each subtree.

That this strategy produces a good approximation can be seen intuitively by noting that the weights of the subtrees along any path form something very close to a geometrically decreasing sequence. In fact, this strategy generates a tree whose weighted path length is at most

${\displaystyle 2+(1-\log({\sqrt {5}}-1))^{-1}H=2+{\frac {H}{1-\log({\sqrt {5}}-1)}}}$

where H is the entropy of the probability distribution. Since no optimal binary search tree can ever do better than a weighted path length of

${\displaystyle (1/\log 3)H={\frac {H}{\log 3}}}$

this approximation is very close.[3]

### Hu–Tucker and Garsia–Wachs algorithms

In the special case that all of the ${\displaystyle A_{i}}$ values are zero, the optimal tree can be found in time ${\displaystyle O(n\log n)}$. This was first proved by T. C. Hu and Alan Tucker in a paper that they published in 1971. A later simplification by Garsia and Wachs, the Garsia–Wachs algorithm, performs the same comparisons in the same order. The algorithm works by using a greedy algorithm to build a tree that has the optimal height for each leaf, but is out of order, and then constructing another binary search tree with the same heights.[4]

## Dynamic optimality

 Unsolved problem in computer science:Do splay trees perform as well as any other binary search tree algorithm?(more unsolved problems in computer science)

### Definition

There are several different definitions of dynamic optimality, all of which are effectively equivalent to within a constant factor in terms of running-time.[5] The problem was first introduced implicitly by Sleator and Tarjan in their paper on splay trees,[6] but Demaine et al. give a very good formal statement of it.[5]

In the dynamic optimality problem, we are given a sequence of accesses x1, ..., xm on the keys 1, ..., n. For each access, we are given a pointer to the root of our BST and can use the pointer to perform any of the following operations:

1. Move the pointer to the left child of the current node.
2. Move the pointer to the right child of the current node.
3. Move the pointer to the parent of the current node.
4. Perform a single rotation on the current node and its parent.

Our BST algorithm can perform any sequence of the above operations as long as the pointer eventually ends up on the node containing the target value xi. The time it takes a given dynamic BST algorithm to perform a sequence of accesses is equivalent to the total number of such operations performed during that sequence. Given any sequence of accesses on any set of elements, there is some BST algorithm which performs all accesses using the fewest total operations.

This model defines the fastest possible tree for a given sequence of accesses, but calculating the optimal tree in this sense therefore requires foreknowledge of exactly what the access sequence will be. If we let OPT(X) be the number of operations performed by the strictly optimal tree for an access sequence X, we can say that a tree is dynamically optimal as long as, for any X, it performs X in time O(OPT(X)) (that is, it has a constant competitive ratio).[5]

There are several data structures conjectured to have this property, but none proven. It is an open problem whether there exists a dynamically optimal data structure in this model.

### Splay trees

The splay tree is a form of binary search tree invented in 1985 by Daniel Sleator and Robert Tarjan on which the standard searchtree operations run in ${\displaystyle O(\log(n))}$ amortized time.[7] It is conjectured to be dynamically optimal in the required sense. That is, a splay tree is believed to perform any sufficiently long access sequence X in time O(OPT(X)).[6]

### Tango trees

The tango tree is a data structure proposed in 2004 by Erik Demaine and others which has been proven to perform any sufficiently-long access sequence X in time ${\displaystyle O(\log \log n\operatorname {OPT} (X))}$. While this is not dynamically optimal, the competitive ratio of ${\displaystyle \log \log n}$ is still very small for reasonable values of n.[5]

### Other results

In 2013, John Iacono published a paper which uses the geometry of binary search trees to provide an algorithm which is dynamically optimal if any binary search tree algorithm is dynamically optimal.[8] Nodes are interpreted as points in two dimensions, and the optimal access sequence is the smallest arborally satisfied superset of those points.

The interleave lower bound is an asymptotic lower bound on dynamic optimality.