# Substring

A **substring** is a contiguous sequence of characters within a string. For instance, "*the best of*" is a substring of "*It was the best of times*". This is not to be confused with subsequence, which is a generalization of substring. For example, "*Itwastimes*" is a subsequence of "*It was the best of times*", but not a substring.

**Prefix** and **suffix** are special cases of substring. A prefix of a string is a substring of that occurs at the *beginning* of . A suffix of a string is a substring that occurs at the *end* of .

The list of all substrings of the string "*apple*" would be "*apple*", "*appl*", "*pple*", "*app*", "*ppl*", "*ple*", "*ap*", "*pp*", "*pl*", "*le*", "*a*", "*p*", "*l*", "*e*", "".

## Substring

A string is a substring (or factor)[1] of a string if there exists two strings and such that . In particular, the empty string is a substring of every string.

Example: The string `ana`

is equal to substrings (and subsequences) of `banana`

at two different offsets:

banana ||||| ana|| ||| ana

The first occurrence is obtained with `b`

and `na`

, while the second occurrence is obtained with `ban`

and being the empty string.

A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If is a substring of , it is also a subsequence, which is a more general concept. The occurrences of a given pattern in a given string can be found with a string searching algorithm. Finding the longest string which is equal to a substring of two or more strings is known as the longest common substring problem.
In the mathematical literature, substrings are also called **subwords** (in America) or **factors** (in Europe).

## Prefix

A string is a prefix[1] of a string if there exists a string such that . A *proper prefix* of a string is not equal to the string itself;[2] some sources[3] in addition restrict a proper prefix to be non-empty. A prefix can be seen as a special case of a substring.

Example: The string `ban`

is equal to a prefix (and substring and subsequence) of the string `banana`

:

banana ||| ban

The square subset symbol is sometimes used to indicate a prefix, so that denotes that is a prefix of . This defines a binary relation on strings, called the prefix relation, which is a particular kind of prefix order.

## Suffix

A string is a suffix[1] of a string if there exists a string such that . A *proper suffix* of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty^{}. A suffix can be seen as a special case of a substring.

Example: The string `nana`

is equal to a suffix (and substring and subsequence) of the string `banana`

:

banana |||| nana

A suffix tree for a string is a trie data structure that represents all of its suffixes. Suffix trees have large numbers of applications in string algorithms. The suffix array is a simplified version of this data structure that lists the start positions of the suffixes in alphabetically sorted order; it has many of the same applications.

## Border

A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "babooneatingakebab").

## Superstring

A **superstring** of a finite set of strings is a single string that contains every string in as a substring. For example, is a superstring of , and is a shorter one. Generally, one is interested in finding superstrings whose length is as small as possible; a concatenation of all strings of in any order gives a trivial superstring of .

## References

- Lothaire, M. (1997).
*Combinatorics on words*. Cambridge: Cambridge University Press. ISBN 0-521-59924-5. - Kelley, Dean (1995).
*Automata and Formal Languages: An Introduction*. London: Prentice-Hall International. ISBN 0-13-497777-7. - Gusfield, Dan (1999) [1997].
*Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology*. USA: Cambridge University Press. ISBN 0-521-58519-8.

## External links

Media related to Substring at Wikimedia Commons