# Data envelopment analysis

Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers.[1] It is used to empirically measure productive efficiency of decision making units (DMUs). Although DEA has a strong link to production theory in economics, the tool is also used for benchmarking in operations management, where a set of measures is selected to benchmark the performance of manufacturing and service operations. In benchmarking, the efficient DMUs, as defined by DEA, may not necessarily form a “production frontier”, but rather lead to a “best-practice frontier” (Cook, Tone and Zhu, 2014). DEA is referred to as "balanced benchmarking" by Sherman and Zhu (2013).

Non-parametric approaches have the benefit of not assuming a particular functional form/shape for the frontier, however they do not provide a general relationship (equation) relating output and input. There are also parametric approaches which are used for the estimation of production frontiers (see Lovell & Schmidt 1988 for an early survey). These require that the shape of the frontier be guessed beforehand by specifying a particular function relating output to input. The relative strengths from each of these approaches can be combined in a hybrid method (Tofallis, 2001,) where the frontier units are identified by DEA, then fitted to a smooth surface. This allows a best-practice relationship between multiple outputs and multiple inputs to be estimated.

"The framework has been adapted from multi-input, multi-output production functions and applied in many industries. DEA develops a function whose form is determined by the most efficient producers. This method differs from the Ordinary Least Squares (OLS) statistical technique that bases comparisons relative to an average producer. Like Stochastic Frontier Analysis (SFA), DEA identifies a "frontier" which are characterized as an extreme point method that assumes that if a firm can produce a certain level of output utilizing specific input levels, another firm of equal scale should be capable of doing the same. The most efficient producers can form a 'composite producer', allowing the computation of an efficient solution for every level of input or output. Where there is no actual corresponding firm, 'virtual producers' are identified to make comparisons" (Berg 2010).

Attempts to synthesize DEA and SFA, improving upon their drawbacks, were also made in the literature, via proposing various versions of non-parametric SFA [2], Stochastic DEA.[3], and Stochastic Nonparametric Envelopment of Data (StoNED).

## History

In microeconomic production theory, a firm's input and output combinations are depicted using a production function. Using such a function, one can show the maximum output which can be achieved with any possible combination of inputs, that is, one can construct a production technology frontier (Seiford & Thrall 1990).[4]

Building on the ideas of Farrell (1957), the seminal work "Measuring the efficiency of decision making units" by Charnes, Cooper & Rhodes (1978) applies linear programming to estimate an empirical production technology frontier for the first time. In Germany, the procedure was used earlier to estimate the marginal productivity of R&D and other factors of production (Brockhoff 1970). Since then, there have been a large number of books and journal articles written on DEA or applying DEA on various sets of problems.

Other than comparing efficiency across DMUs within an organization, DEA has also been used to compare efficiency across firms. There are several types of DEA with the most basic being CCR based on Charnes, Cooper & Rhodes, however there are also DEA which address varying returns to scale, either CRS (constant returns to scale, VRS (variable), non increasing returns to scale or the non decreasing returns to scale by Ylvinger (2000). The main developments of DEA in the 1970s and 1980s are documented by Seiford & Thrall (1990).

## Techniques

Data envelopment analysis (DEA) is a linear programming methodology to measure the efficiency of multiple decision-making units (DMUs) when the production process presents a structure of multiple inputs and outputs.[5]

"DEA has been used for both production and cost data. Utilizing the selected variables, such as unit cost and output, DEA software searches for the points with the lowest unit cost for any given output, connecting those points to form the efficiency frontier. Any company not on the frontier is considered inefficient. A numerical coefficient is given to each firm, defining its relative efficiency. Different variables that could be used to establish the efficiency frontier are: number of employees, service quality, environmental safety, and fuel consumption. An early survey of studies of electricity distribution companies identified more than thirty DEA analyses—indicating widespread application of this technique to that network industry. (Jamasb, T. J., Pollitt, M. G. 2001). A number of studies using this technique have been published for water utilities. The main advantage to this method is its ability to accommodate a multiplicity of inputs and outputs. It is also useful because it takes into consideration returns to scale in calculating efficiency, allowing for the concept of increasing or decreasing efficiency based on size and output levels. A drawback of this technique is that model specification and inclusion/exclusion of variables can affect the results." (Berg 2010)

Under general DEA benchmarking, for example, "if one benchmarks the performance of computers, it is natural to consider different features (screen size and resolution, memory size, process speed, hard disk size, and others). One would then have to classify these features into “inputs” and “outputs” in order to apply a proper DEA analysis. However, these features may not actually represent inputs and outputs at all, in the standard notion of production. In fact, if one examines the benchmarking literature, other terms, such as “indicators”, “outcomes”, and “metrics”, are used. The issue now becomes one of how to classify these performance measures into inputs and outputs, for use in DEA." (Cook, Tone, and Zhu, 2014)

Some of the advantages of DEA are:

• no need to explicitly specify a mathematical form for the production function
• proven to be useful in uncovering relationships that remain hidden for other methodologies
• capable of handling multiple inputs and outputs
• capable of being used with any input-output measurement
• the sources of inefficiency can be analysed and quantified for every evaluated unit

Some of the disadvantages of DEA are:

• results are sensitive to the selection of inputs and outputs (Berg 2010).
• you cannot test for the best specification (Berg 2010).
• the number of efficient firms on the frontier tends to increase with the number of inputs and output variables (Berg 2010).

A desire to Improve upon DEA, by reducing its disadvantages or strengthening its advantages has been a major cause for many discoveries in the recent literature. The currently most often DEA-based method to obtain unique efficiency rankings is called cross-efficiency. Originally developed by Sexton et al. in 1986[6], it found widespread application ever since Doyle and Green's 1994 publication[7]. Cross-efficiency is based on the original DEA results, but implements a secondary objective where each DMU peer-appraises all other DMU's with its own factor weights. The average of these peer-appraisal scores is then used to calculate a DMU's cross-efficiency score. This approach avoids DEA's disadvantages of having multiple efficient DMUs and potentially non-unique weights[8]. Another approach to remedy some of DEA's drawbacks is Stochastic DEA, which synthesizes DEA and SFA.[3]

## Sample applications

DEA is commonly applied in the electric utilities sector. For instance, a government authority can choose data envelopment analysis as their measuring tool to design an individualized regulatory rate for each firm based on their comparative efficiency. The input components would include man-hours, losses, capital (lines and transformers only), and goods and services. The output variables would include number of customers, energy delivered, length of lines, and degree of coastal exposure. (Berg 2010)

DEA is also regularly used to assess the efficiency of public and not-for-profit organizations, e.g. hospitals (Kuntz, Scholtes & Vera 2007; Kuntz & Vera 2007; Vera & Kuntz 2007), police forces (Thanassoulis 1995; Sun 2002; Aristovnik et al. 2013, 2014), or liberal arts colleges (Eckles, 2010).

### Examples

In the DEA methodology, formally developed by Charnes, Cooper and Rhodes (1978), efficiency is defined as a ratio of weighted sum of outputs to a weighted sum of inputs, where the weights structure is calculated by means of mathematical programming and constant returns to scale (CRS) are assumed. In 1984, Banker, Charnes and Cooper developed a model with variable returns to scale (VRS).

Assume that we have the following data:

• Unit 1 produces 100 items per day, and the inputs per item are 10 dollars for materials and 2 labour-hours
• Unit 2 produces 80 items per day, and the inputs are 8 dollars for materials and 4 labour-hours
• Unit 3 produces 120 items per day, and the inputs are 12 dollars for materials and 1.5 labour-hours

To calculate the efficiency of unit 1, we define the objective function (OF) as

• ${\displaystyle MaxEfficiency:(100u_{1})/(10v_{1}+2v_{2})}$

which is subject to (ST) all efficiency of other units (efficiency cannot be larger than 1):

• Efficiency of unit 1: ${\displaystyle (100u_{1})/(10v_{1}+2v_{2})\leq 1}$
• Efficiency of unit 2: ${\textstyle (80u_{1})/(8v_{1}+4v_{2})\leq 1}$
• Efficiency of unit 3: ${\displaystyle (120u_{1})/(12v_{1}+1.5v_{2})\leq 1}$

and non-negativity:

• ${\displaystyle u,v\geq 0}$

A fraction with decision variables in the numerator and denominator is nonlinear. Since we are using a linear programming technique, we need to linearize the formulation, such that the denominator of the objective function is constant (in this case 1), then maximize the numerator.

The new formulation would be:

• OF
• ${\displaystyle MaxEfficiency:100u_{1}}$
• ST
• Efficiency of unit 1: ${\displaystyle 100u_{1}-(10v_{1}+2v_{2})\leq 0}$
• Efficiency of unit 2: ${\textstyle 80u_{1}-(8v_{1}+4v_{2})\leq 0}$
• Efficiency of unit 3: ${\displaystyle 120u_{1}-(12v_{1}+1.5v_{2})\leq 0}$
• Denominator of nonlinear OF: ${\displaystyle 10v_{1}+2v_{2}=1}$
• Non-negativity: ${\displaystyle u,v\geq 0}$

## Inefficiency measuring

Data Envelopment Analysis (DEA) has been recognized as a valuable analytical research instrument and a practical decision support tool. DEA has been credited for not requiring a complete specification for the functional form of the production frontier nor the distribution of inefficient deviations from the frontier. Rather, DEA requires general production and distribution assumptions only. However, if those assumptions are too weak, inefficiency levels may be systematically underestimated in small samples. In addition, erroneous assumptions may cause inconsistency with a bias over the frontier. Therefore, the ability to alter, test and select production assumptions is essential in conducting DEA-based research. However, the DEA models currently available offer a limited variety of alternative production assumptions only.

## Notes

1. Sickles, R., & Zelenyuk, V. (2019). Measurement of Productivity and Efficiency: Theory and Practice. Cambridge: Cambridge University Press. doi:10.1017/9781139565981
2. Park, B., L. Simar and V. Zelenyuk (2015). "Categorical data in local maximum likelihood: theory and applications to productivity analysis". Journal of Productivity Analysis. 43 (2): 199–214. doi:10.1007/s11123-014-0394-y.CS1 maint: multiple names: authors list (link). Also see many references therein.
3. Simar, L.; V. Zelenyuk (August 2011). "Stochastic FDH/DEA estimators for frontier analysis". Journal of Productivity Analysis. 36 (1): 1–20. CiteSeerX 10.1.1.1020.5971. doi:10.1007/s11123-010-0170-6.. Also see many references therein.
4. L.M. Seiford; R.M. Thrall (1990). "Recent Developments in DEA: The Mathematical Programming Approach to Frontier Analysis". Journal of Econometrics. 46 (1–2): 7–38. doi:10.1016/0304-4076(90)90045-u.
5. Yishi Zhang; Anrong Yang; Chan Xiong; Teng Wang; Zigang Zhang (2014). "Feature selection using data envelopment analysis". Knowledge-Based Systems. 64: 70–80. doi:10.1016/j.knosys.2014.03.022.
6. Sexton, Thomas R. (1986). "Data envelopment analysis: Critique and extension". New Directions for Program Evaluation. 32: 73–105.
7. Doyle, John; Green, Rodney (1994-05-01). "Efficiency and Cross-efficiency in DEA: Derivations, Meanings and Uses". Journal of the Operational Research Society. 45 (5): 567–578. doi:10.1057/jors.1994.84. ISSN 0160-5682.
8. Dyson, R. G.; Allen, R.; Camanho, A. S.; Podinovski, V. V.; Sarrico, C. S.; Shale, E. A. (2001-07-16). "Pitfalls and protocols in DEA". European Journal of Operational Research. Data Envelopment Analysis. 132 (2): 245–259. doi:10.1016/S0377-2217(00)00149-1.