An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a game tree. A tree of such evaluations is usually part of a minimax or related search paradigm which returns a particular node and its evaluation as a result of alternately selecting the most favorable move for the side on move at each ply of the game tree. The value is a quantized scalar, often in nths of the value of a playing piece such as a stone in go or a pawn in chess. n may be tenths, hundredths or other convenient fraction.
The value is presumed to represent the relative probability of winning if the game tree were expanded from that node to the end of the game. The function looks only at the current position (i.e. what spaces the pieces are on and their relationship to each other) and does not take into account the history of the position or explore possible moves forward of the node (therefore static). This implies that for dynamic positions where tactical threats exist, the evaluation function will not be an accurate assessment of the position. These positions are termed non-quiescent; they require at least a limited kind of search extension called quiescence search to resolve threats before evaluation. Some values returned by evaluation functions are absolute rather than heuristic, if a win, loss or draw occurs at the node.
There do not exist analytical or theoretical models for evaluation functions for unsolved games, nor are such functions entirely ad-hoc. The composition of evaluation functions is determined empirically by inserting a candidate function into an automaton and evaluating its subsequent performance. A significant body of evidence now exists for several games like chess, shogi and go as to the general composition of evaluation functions for them.
The general approach for constructing evaluation functions is as a linear combination of various weighted terms determined to influence the value of a position. Each term may be considered to be composed of first order factors (those that depend only on the space and any piece on it), second order factors (the space in relation to other spaces), and nth-order factors (dependencies on history of the position).
There is an intricate relationship between search and knowledge in the evaluation function. Deeper search favors less near-term tactical factors and more subtle long-horizon positional motifs in the evaluation. There is also a trade-off between efficacy of encoded knowledge and computational complexity: computing detailed knowledge may take so much time that performance decreases, so approximations to exact knowledge are often better. Because the evaluation function depends on the nominal depth of search as well as the extensions and reductions employed in the search, there is no generic or stand-alone formulation for an evaluation function. An evaluation function which works well in one application will usually need to be substantially re-tuned to work effectively in another application.
Computerized games that employ evaluation functions include chess, go, shogi (Chinese chess), othello, hex, and checkers. Some games like tic-tac-toe are strongly solved, and do not require search or evaluation because a discrete solution tree is available.
Evaluation functions in chess consist of a material balance term that dominates the evaluation, plus a set of positional terms usually totaling no more than the value of a pawn, though in some positions the positional terms can get much larger, such as when checkmate is imminent. An evaluation function also implicitly encodes the value of the right to move, which can vary from a small fraction of a pawn to win or loss. In the endgame, it is possible to construct positions where whoever moves, wins, though the position is otherwise in balance; it is also possible to construct positions where whoever must move, loses (Zugzwang).
An evaluation function for chess might take the form
- c1 * material + c2 * mobility + c3 * king safety + c4 * center control + c5 * pawn structure + c6 * king tropism + ...
Each of the terms is a weight multiplied by a difference factor: the value of white's material or positional score minus black's. The material score is obtained by assigning a value in pawn-units to each of the pieces. The conventional values are: Queen=9, Rook=5; Knight or Bishop=3; Pawn=1; the king is assigned an arbitrarily large value, usually larger than the total value of all the other pieces. Not just the absolute value of the material, but also the ratio between white and black material matters: sacrificing a pawn in the opening may confer a positional advantage (the material ratio is scarcely affected), but the plus of a pawn in a king and pawn end game is usually sufficient to win (ratio of material is large). This ratio is usually implemented as an exchange-down bonus according to the rule of thumb: 'trade pieces but not pawns when ahead, and vise-versa when behind.' The mobility score is the number of legal moves available to a player, or alternately the sum of the number of spaces attacked or defended by each piece, including spaces occupied by friendly or opposing pieces. Effective mobility, or the number of "safe" spaces a piece may move to, may also be taken into account. Effective mobility for queens is often very low, though the number of her legal moves may be quite high. The king safety score is a set of bonuses and penalties assessed for the location of the king and the configuration of pawns and pieces adjacent to or in front of the king, and opposing pieces bearing on spaces around the king. Center control is derived from how many pawns and pieces occupy or bear on the four center spaces and sometimes the 12 spaces of the extended center. Pawn structure is a set of penalties and bonuses for various strengths and weaknesses in pawn structure, such as penalties for doubled and isolated pawns. King tropism is a bonus for closeness (or penalty for distance) of certain pieces, especially queens and knights, to the opposing king.
The weights c1, etc., aren't necessarily constant - they are application coefficients that can vary with stage of game (opening, middle game, endgame), pieces on the board (e.g. presence or absence of queens), other characteristics of the position, or high level strategy or plans (e.g. assign higher weight to pieces that bear on squares around the opposing king if the plan is a kingside attack).
The focus, and therefore the relevant terms and weights of the evaluation function, differ depending on the stage of the game. In the opening, the dominant considerations are development of the minor pieces, castling and king safety, and control of the center. Penalties are usually assessed for undeveloped pieces and delayed castling. In endgames, either pawn promotion or mating with the pieces are the dominant considerations. In mating situations, the relevant factors are distance of the target king from the edge or corner of the board, and proximity of the king and mating pieces to the opposing king. For king and pawn endgames, the relevant factors are proximity of the kings to pawns, advancement of pawns, and controlling the queening square(s).
The equation is a conceptual model. In a particular implementation, each composite pseudo-term may be represented by a handful to possibly hundreds of individual terms, each with its own weight or computed value. For example, pawn structure can have terms for isolated, doubled, backward, advanced, passed, protected passed, connected passed, holes, semi-open and open files, pawn majorities, phalanxes, and many other formations. Other special factors that are often considered are: development of the minor pieces, rooks on open files or the seventh rank, doubled rooks, outpost knights (knights in central locations protected by a pawn and not subject to attack by an opposing pawn), possession of the bishop pair, bishops on the long diagonals, pieces occupying or bearing on spaces around the opposing king, and mobility of the kings (kings shouldn't be 'cramped', hence subject to mate-on-the-move). Some terms, such as king safety in an endgame with few pieces, can and should be ignored depending on context.
The terms composing some factors, like king safety, combine non-linearly - one weakness in king safety, like an open file adjacent to the king, may be penalized for example, by 1/4 pawn, but two weaknesses may need to be penalized one or even two full pawns, and three weaknesses by a piece, a rook or even more because checkmate is becoming a likely possibility. Factors involved with pawn advance and promotion also combine non-linearly.
The typical pawn-multiple values assigned to the pieces aren't constant either, but depend on context: undeveloped pieces are worth far less as are pieces with reduced mobility for any reason: bishops confined by their own pawns ("the bad bishop"); knights lose value as the position is cleared of pieces, and bishops and rooks gain value; queens are worth substantially more if the opposing king isn't sheltered against checks.
Evaluation functions typically contain dozens to hundreds of individual terms, and the evaluation of a position typically ranges from plus or minus a small fraction of a pawn. Larger evaluations indicate a material imbalance or that a win of material is usually imminent. Very large evaluations may indicate that checkmate is imminent.
In practice, effective evaluation functions are created not by ever expanding the list of evaluated parameters, but by careful tuning of the weights relative to each other, of a modest set of parameters such as those described above. Toward this end, exemplary positions from master games are employed, and the efficacy of the evaluation function measured by the percentage of moves selected that agree with the choices of the masters.
An important technique in evaluation since at least the early 1990s, is the use of piece-square tables (also called piece-value tables) for evaluation. Each table is a set of 64 values corresponding to the squares of the chessboard. There is a separate table for each kind of piece: king, queen, knight, bishop, rook, pawn. There is a separate (flipped) set of tables for the opposing pieces. The values in the tables are bonuses/penalties for the location of each piece on each space. The values encode a composite of many subtle factors difficult to quantify analytically. Basic tables can be constructed from principles of development, center control, king safety, etc. In master level programs and beyond, the tables are constructed from a composite of positions occupied by the pieces in master games, adjusted for the application. For example, knights are seldom found on left and right edges of the board in master games, so one may assign a penalty value to those spaces of the knight piece-square table proportionate to how seldom a knight is found there in master games. There is often two sets of tables: one for the opening, and one for the endgame; positions of the middle game are interpolated between the two. Authors of chess programs tend to keep the composition of their piece-square tables, as well as the methods used to create them, secret, because a great deal of time, effort, testing and playing experience go into constructing them, and careful tuning here offers a competitive advantage.
Evaluation in monte-carlo tree search
Chess machines like AlphaGo have a substantively different search and evaluation paradigm than the conventional alphabeta/minimax scheme with leaf node evaluation. In monte-carlo tree search, the search space of all variations from a node is sampled by rolling out, or playing the game to the end by alternately choosing a random move for each side. The result, win, lose or draw, is backed up to the starting node. The move selected is the one which leads to a position with the greatest number of wins, or highest average score, though no specific line of play is associated with the move. An analogous situation is the percentage of wins/draws/losses accumulated for various openings employed in master games. If one is choosing an opening, one will tend to choose from the ones with the greatest percentage of wins or greatest percentage of wins+draws. And similarly for each variation within the opening, if statistics are available. The weakness of such a scheme is that the strongest line(s) of play for each side may not be part of that opening - they may be narrow opportunities in an opening which is otherwise weak.
So 'evaluation' in monte-carlo implementations is a probability of winning rather than a numerical valuation of a position.
Evaluation functions in Go take into account both territory controlled, influence of stones, number of prisoners and life and death of groups on the board.
- Shannon, Claude, 1950, "Programming a Computer for Playing Chess", Philosophical Magazine, Ser.7, Vol. 41, No. 314.
- Slate, D and Atkin, L., 1983, "Chess 4.5, the Northwestern University Chess Program" in Chess Skill in Man and Machine 2nd Ed., pp. 93–100. Springer-Verlag, New York, NY.
- Ebeling, Carl, 1987, All the Right Moves: A VLSI Architecture for Chess (ACM Distinguished Dissertation), pp. 56–86. MIT Press, Cambridge, MA
- Stockfish evaluation guide,