# Nearest centroid classifier

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]

## Algorithm

• Training procedure: given labeled training samples ${\displaystyle \textstyle \{({\vec {x}}_{1},y_{1}),\dots ,({\vec {x}}_{n},y_{n})\}}$ with class labels ${\displaystyle y_{i}\in \mathbf {Y} }$, compute the per-class centroids ${\displaystyle \textstyle {\vec {\mu _{l}}}={\frac {1}{|C_{l}|}}{\underset {i\in C_{l}}{\sum }}{\vec {x}}_{i}}$ where ${\displaystyle C_{l}}$ is the set of indices of samples belonging to class ${\displaystyle l\in \mathbf {Y} }$.
• Prediction function: the class assigned to an observation ${\displaystyle {\vec {x}}}$ is ${\displaystyle {\hat {y}}={\arg \min }_{l\in \mathbf {Y} }\|{\vec {\mu }}_{l}-{\vec {x}}\|}$.