# Forensic statistics

Forensic statistics is the application of probability models and statistical techniques to scientific evidence, such as DNA evidence, and the law. In contrast to "everyday" statistics, to not engender bias or unduly draw conclusions, forensic statisticians report likelihoods as likelihood ratios (LR). This ratio of probabilities is then used by juries or judges to draw inferences or conclusions and decide legal matters. Jurors and judges rely on the strength of a DNA match, given by statistics, to make conclusions and determine guilt or innocence in legal matters.

In forensic science, the DNA evidence received for DNA profiling often contains a mixture of more than one person’s DNA. DNA profiles are generated using a set procedure, however, the interpretation of a DNA profile becomes more complicated when the sample contains a mixture of DNA. Regardless of the number of contributors to the forensic sample, statistics and probabilities must be used to provide weight to the evidence and to describe what the results of the DNA evidence mean. In a single-source DNA profile, the statistic used is termed a random match probability (RMP). RMPs can also be used in certain situations to describe the results of the interpretation of a DNA mixture.  Other statistical tools to describe DNA mixture profiles include likelihood ratios (LR) and combined probability of inclusion (CPI), also known as random man not excluded (RMNE).

Computer programs have been implemented with forensic DNA statistics for assessing the biological relationships between two or more people. Forensic science uses several approaches for DNA statistics with computer programs such as; match probability, exclusion probability, likelihood ratios, Bayesian approaches, and paternity and kinship testing.

Although the precise origin of this term remains unclear, it is apparent that the term was used in the 1980s and 1990s. Among the first forensic statistics conferences were two held in 1991 and 1993.

## Random Match Probability

Random match probabilities (RMP) are used to estimate and express the rarity of a DNA profile. RMP can be defined as the probability that someone else in the population, chosen at random, would have the same genotype as the genotype of the contributor of the forensic evidence. RMP is calculated using the genotype frequencies at all the loci, or how common or rare the alleles of a genotype are. The genotype frequencies are multiplied across all loci, using the product rule, to calculate the RMP. This statistic gives weight to the evidence either for or against a particular suspect being a contributor to the DNA mixture sample.

RMP can only be used as a statistic to describe the DNA profile if it is from a single source or if the analyst is able to differentiate between the peaks on the electropherogram from the major and minor contributors of a mixture. Since the interpretation of DNA mixtures with more than two contributors is very difficult for analysts to do without computer software, RMP becomes difficult to calculate with a mixture of more than two people. If the major and minor contributor peaks can not be differentiated, there are other statistical methods that may be used.

If the DNA mixture contains a ratio of 4:1 of major to minor contributors, a modified random match probability (mRMP) may be able to be used as a statistical tool. For calculation of mRMP, the analyst must first deduce a major and minor contributor and their genotypes based on the peak heights given in the electropherogram. Computer software is often used in labs conducting DNA analysis in order to more accurately calculate the mRMP, since calculations for each of the most probable genotypes at each locus become tedious and inefficient for the analyst to do by hand.

## Likelihood Ratio

Sometimes it can be very difficult to determine the number of contributors in a DNA mixture. If the peaks are easily distinguished and the number of contributors is able to be determined, a likelihood ratio (LR) is used. LRs consider probabilities of events happening and rely on alternative pairs of hypotheses against which the evidence is assessed. These alternative pairs of hypotheses in forensic cases are the prosecutor’s hypothesis and the defense hypothesis. In forensic biology cases, the hypotheses often state that the DNA came from a particular person or the DNA came from an unknown person. For example, the prosecution may hypothesize the DNA sample contains DNA from the victim and the suspect, while the defense may hypothesize that the sample contains DNA from the victim and an unknown person. The probabilities of the hypotheses are expressed as a ratio, with the prosecutor’s hypothesis being in the numerator. The ratio then expresses the likelihood of both of the events in relation to each other. For the hypotheses where the mixture contains the suspect, the probability is 1, because one can distinguish the peaks and easily tell if the suspect can be excluded as a contributor at each locus based on his/her genotype. The probability of 1 assumes the suspect can not be excluded as a contributor. To determine the probabilities of the unknowns, all genotype possibilities must be determined for that locus.

Once the calculation of the likelihood ratio is made, the number calculated is turned into a statement to provide meaning to the statistic. For the previous example, if the LR calculated is x, then the LR means that the probability of the evidence is x times more likely if the sample contains the victim and the suspect than if it contains the victim and an unknown person. Likelihood ratio can also be defined as 1/RMP.

## Combined Probability of Inclusion

Combined probability of inclusion (CPI) is a common statistic used when the analyst can not differentiate between the peaks from a major and minor contributor to a sample and the number of contributors can not be determined. CPI is also commonly known as random man not excluded (RMNE). This statistical calculation is done by adding all the frequencies of observed alleles and then squaring the value, which yields the value for probability of inclusion (PI). These values are then multiplied across all loci, resulting in the value for CPI. The value is squared so that all the possible combinations of genotypes are included in the calculation.

Once the calculation is done, a statement is made about the meaning of this calculation and what it means. For example, if the CPI calculated is 0.5, this means that the probability of someone chosen at random in the population not being excluded as a contributor to the DNA mixture is 0.5.

CPI relates to the evidence (the DNA mixture) and it is not dependent on the profile of any suspect. Therefore, CPI is a statistical tool that can be used to provide weight or strength to evidence when no other information about the crime is known. This is advantageous in situations where the genotypes in the DNA mixture can not be distinguished from one another. However, this statistic is not very discriminating and is not as powerful of a tool as likelihood ratios and random match probabilities can be when some information about the DNA mixture, such as the number of contributors or the genotypes of each contributor, can be distinguished. Another limitation to CPI is that it is not usable as a tool for the interpretation of a DNA mixture.

## Blood Stains

Blood stains are an important part of forensic statistics, as the analysis of blood drop collisions may help to picture the event that had previously gone on. Commonly blood stains are an elliptical shape, because of this blood stains are usually easy to determine the blood droplets angle through the formula “α = arcsin d/a”. In this formula 'a' and 'd' are simply estimations of the axis of the ellipse. From these calculations, a visualization of the event causing the stains is able to be drawn, and alongside further information such as the velocity of the entity that caused such stains.

## Bibliography

• Lucy, D. (2005.) Introduction to Statistics for Forensic Scientists, John Wiley and Sons.