# Chinese restaurant process

In probability theory, the **Chinese restaurant process** is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant.
Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time *n*, the *n* customers have been partitioned among *m* ≤ *n* tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book.[1]

## Formal definition

At any positive-integer time *n*, the value of the process is a partition *B*_{n} of the set {1, 2, 3, ..., *n*}, whose probability distribution is determined as follows. At time *n* = 1, the trivial partition { {1} } is obtained with probability 1. At time *n* + 1 the element *n* + 1 is either:

- added to one of the blocks of the partition
*B*_{n}, where each block is chosen with probability |*b*|/(*n*+ 1) where |*b*| is the size of the block (i.e. number of elements), or - added to the partition
*B*_{n}as a new singleton block, with probability 1/(*n*+ 1).

The random partition so generated has some special properties. It is *exchangeable* in the sense that relabeling {1, ..., *n*} does not change the distribution of the partition, and it is *consistent* in the sense that the law of the partition of *n* − 1 obtained by removing the element *n* from the random partition at time *n* is the same as the law of the random partition at time *n* − 1.

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

where *b* is a block in the partition *B* and |*b*| is the size of *b*.

## Generalization

This construction can be generalized to a model with two parameters, *α* and *θ*,[2][3] commonly called the *discount* and *strength* (or *concentration*) parameters. At time *n* + 1, the next customer to arrive finds |*B*| occupied tables and decides to sit at an empty table with probability

or at an occupied table *b* of size |*b*| with probability

In order for the construction to define a valid probability measure it is necessary to suppose that either *α* < 0 and *θ* = *- Lα* for some *L* ∈ {1, 2, ...}; or that 0 ≤ *α* < 1 and *θ* > −*α*.

Under this model the probability assigned to any particular partition *B* of *n*, in terms of the Pochhammer k-symbol, is

where, by convention, , and for

Thus, for the case when the partition probability can be expressed in terms of the Gamma function as

In the one-parameter case, where is zero, this simplifies to

Or, when is zero,

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If *α* = 0, the probability distribution of the random partition of the integer *n* thus generated is the Ewens distribution with parameter θ, used in population genetics and the unified neutral theory of biodiversity.

### Derivation

Here is one way to derive this partition probability. Let *C*_{i} be the random block into which the number *i* is added, for *i* = 1, 2, 3, ... . Then

The probability that *B*_{n} is any particular partition of the set { 1, ..., *n* } is the product of these probabilities as *i* runs from 1 to *n*. Now consider the size of block *b*: it increases by 1 each time we add one element into it. When the last element in block *b* is to be added in, the block size is (|*b*| − 1). For example, consider this sequence of choices: (generate a new block *b*)(join *b*)(join *b*)(join *b*). In the end, block *b* has 4 elements and the product of the numerators in the above equation gets *θ* · 1 · 2 · 3. Following this logic, we obtain Pr(*B*_{n} = *B*) as above.

### Expected number of tables

For the one parameter case, with *α* = 0 and 0 < *θ* < ∞, the number of tables is distributed according to the chinese restaurant table distribution. The expected value of this random variable, given that there are seated customers, is[5]

where is the digamma function. In the general case (*α* > 0) the expected number of occupied tables is[3]

### The Indian buffet process

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data.[6]

## Applications

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,[7] biodiversity modelling, and image reconstruction [8][9]

## References

- Aldous, D. J. (1985). "Exchangeability and related topics".
*École d'Été de Probabilités de Saint-Flour XIII — 1983*. Lecture Notes in Mathematics.**1117**. pp. 1–198. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3. - Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions".
*Probability Theory and Related Fields*.**102**(2): 145–158. doi:10.1007/BF01213386. MR 1337249. - Pitman, Jim (2006).
*Combinatorial Stochastic Processes*.**1875**. Berlin: Springer-Verlag. ISBN 9783540309901. - "Dirichlet Process and Dirichlet Distribution -- Polya Restaurant Scheme and Chinese Restaurant Process".
- Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf Archived April 11, 2011, at the Wayback Machine
- Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process. Gatsby Unit Technical Report GCNU-TR-2005-001.
- Qin, Zhaohui S (2006). "Clustering microarray gene expression data using weighted Chinese restaurant process".
*Bioinformatics*.**22**(16): 1988–1997. doi:10.1093/bioinformatics/btl284. PMID 16766561. - White, J. T.; Ghosal, S. (2011). "Bayesian smoothing of photon‐limited images with applications in astronomy" (PDF).
*Journal of the Royal Statistical Society, Series B (Statistical Methodology)*.**73**(4): 579–599. CiteSeerX 10.1.1.308.7922. doi:10.1111/j.1467-9868.2011.00776.x. - Li, M.; Ghosal, S. (2014). "Bayesian multiscale smoothing of Gaussian noised images".
*Bayesian Analysis*.**9**(3): 733–758. doi:10.1214/14-ba871.