Software construction is a software engineering discipline. It is the detailed creation of working meaningful software through a combination of coding, verification, unit testing, integration testing, and debugging. It is linked to all the other software engineering disciplines, most strongly to software design and software testing.
|Paradigms and models|
|Methodologies and frameworks|
|Standards and Bodies of Knowledge|
Software construction fundamentals
The need to reduce complexity is mainly driven by limited ability of most people to hold complex structures and information in their working memories. Reduced complexity is achieved through emphasizing the creation of code that is simple and readable rather than clever. Minimizing complexity is accomplished through making use of standards, and through numerous specific techniques in coding. It is also supported by the construction-focused quality techniques.
Anticipating change helps software engineers build extensible software, which means they can enhance a software product without disrupting the underlying structure. Research over 25 years showed that the cost of rework can be 10 to 100 times (5 to 10 times for smaller projects) more expensive than getting the requirements right the first time. Given that 25% of the requirements change during development on average project, the need to reduce the cost of rework elucidates the need for anticipating change.
Constructing for verification
Constructing for verification means building software in such a way that faults can be ferreted out readily by the software engineers writing the software, as well as during independent testing and operational activities. Specific techniques that support constructing for verification include following coding standards to support code reviews, unit testing, organizing code to support automated testing, and restricted use of complex or hard-to-understand language structures, among others.
- Construction for reuse: Create reusable software assets.
- Construction with reuse: Reuse software assets in the construction of a new solution.
Standards in construction
Numerous models have been created to develop software, some of which emphasize construction more than others. Some models are more linear from the construction point of view, such as the Waterfall and staged-delivery life cycle models. These models treat construction as an activity which occurs only after significant prerequisite work has been completed—including detailed requirements work, extensive design work, and detailed planning. Other models are more iterative, such as evolutionary prototyping, Extreme Programming, and Scrum. These approaches tend to treat construction as an activity that occurs concurrently with other software development activities, including requirements, design, and planning, or overlaps them.
The choice of construction method is a key aspect of the construction planning activity. The choice of construction method affects the extent to which construction prerequisites (e.g. Requirements analysis, Software design, .. etc.) are performed, the order in which they are performed, and the degree to which they are expected to be completed before construction work begins. Construction planning also defines the order in which components are created and integrated, the software quality management processes, the allocation of task assignments to specific software engineers, and the other tasks, according to the chosen method.
Numerous construction activities and artifacts can be measured, including code developed, code modified, code reused, code destroyed, code complexity, code inspection statistics, fault-fix and fault-find rates, effort, and scheduling. These measurements can be useful for purposes of managing construction, ensuring quality during construction, improving the construction process, as well as for other reasons.
Software construction is driven by many practical considerations:
In order to account for the unanticipated gaps in the software design, during software construction some design modifications must be made on a smaller or larger scale to flesh out details of the software design.
Low Fan-out is one of the design characteristics found to be beneficial by researchers. Information hiding proved to be a useful design technique in large programs that made them easier to modify by a factor of 4.
Construction languages include all forms of communication by which a human can specify an executable problem solution to a computer. They include configuration languages, toolkit languages, and programming languages:
- Configuration languages are languages in which software engineers choose from a limited set of predefined options to create new or custom software installations.
- Toolkit languages are used to build applications out of toolkits and are more complex than configuration languages.
- Scripting languages are kinds of application programming languages that supports scripts which are often interpreted rather than compiled.
- Programming languages are the most flexible type of construction languages which use three general kinds of notation:
- Linguistic notations which are distinguished in particular by the use of word-like strings of text to represent complex software constructions, and the combination of such word-like strings into patterns that have a sentence-like syntax.
- Formal notations which rely less on intuitive, everyday meanings of words and text strings and more on definitions backed up by precise, unambiguous, and formal (or mathematical) definitions.
- Visual notations which rely much less on the text-oriented notations of both linguistic and formal construction, and instead rely on direct visual interpretation and placement of visual entities that represent the underlying software.
Programmers working in a language they have used for three years or more are about 30 percent more productive than programmers with equivalent experience who are new to a language. High-level languages such as C++, Java, Smalltalk, and Visual Basic yield 5 to 15 times better productivity, reliability, simplicity, and comprehensibility than low-level languages such as assembly and C. Equivalent code has been shown to need fewer lines to be implemented in high level languages than in lower level languages.
- Techniques for creating understandable source code, including naming and source code layout. One study showed that the effort required to debug a program is minimized when the variables' names are between 10 and 16 characters.
- Use of classes, enumerated types, variables, named constants, and other similar entities:
- A study done by NASA showed that the putting the code into well-factored classes can double the code reusability compared to the code developed using functional design.
- One experiment showed that designs which access arrays sequentially, rather than randomly, result in fewer variables and fewer variable references.
- Use of control structures:
- One experiment found that loops-with-exit are more comprehensible than other kinds of loops.
- Regarding the level of nesting in loops and conditionals, studies have shown that programmers have difficulty comprehending more than three levels of nesting.
- Control flow complexity has been shown to correlate with low reliability and frequent errors.
- Handling of error conditions—both planned errors and exceptions (input of bad data, for example)
- Prevention of code-level security breaches (buffer overruns or array index overflows, for example)
- Resource usage via use of exclusion mechanisms and discipline in accessing serially reusable resources (including threads or database locks)
- Source code organization (into statements and routines):
- Highly cohesive routines proved to be less error prone than routines with lower cohesion. A study of 450 routines found that 50 percent of the highly cohesive routines were fault free compared to only 18 percent of routines with low cohesion. Another study of a different 450 routines found that routines with the highest coupling-to-cohesion ratios had 7 times as many errors as those with the lowest coupling-to-cohesion ratios and were 20 times as costly to fix.
- Although studies showed inconclusive results regarding the correlation between routine sizes and the rate of errors in them, but one study found that routines with fewer than 143 lines of code were 2.4 times less expensive to fix than larger routines. Another study showed that the code needed to be changed least when routines averaged 100 to 150 lines of code. Another study found that structural complexity and amount of data in a routine were correlated with errors regardless of its size.
- Interfaces between routines are some of the most error-prone areas of a program. One study showed that 39 percent of all errors were errors in communication between routines.
- Unused parameters are correlated with an increased error rate. In one study, only 17 to 29 percent of routines with more than one unreferenced variable had no errors, compared to 46 percent in routines with no unused variables.
- The number of parameters of a routine should be 7 at maximum as research has found that people generally cannot keep track of more than about seven chunks of information at once.
- Source code organization (into classes, packages, or other structures). When considering containment, the maximum number of data members in a class shouldn't exceed 7±2. Research has shown that this number is the number of discrete items a person can remember while performing other tasks. When considering inheritance, the number of levels in the inheritance tree should be limited. Deep inheritance trees have been found to be significantly associated with increased fault rates. When considering the number of routines in a class, it should be kept as small as possible. A study on C++ programs has found an association between the number of routines and the number of faults.
- Code documentation
- Code tuning
The purpose of construction testing is to reduce the gap between the time at which faults are inserted into the code and the time those faults are detected. In some cases, construction testing is performed after code has been written. In test-first programming, test cases are created before code is written. Construction involves two forms of testing, which are often performed by the software engineer who wrote the code:
Implementing software reuse entails more than creating and using libraries of assets. It requires formalizing the practice of reuse by integrating reuse processes and activities into the software life cycle. The tasks related to reuse in software construction during coding and testing are:
The primary techniques used to ensure the quality of code as it is constructed include:
- Unit testing and integration testing. One study found that the average defect detection rates of unit testing and integration testing are 30% and 35% respectively.
- Test-first development
- Use of assertions and defensive programming
- Inspections. One study found that the average defect detection rate of formal code inspections is 60%. Regarding the cost of finding defects, a study found that code reading detected 80% more faults per hour than testing. Another study shown that it costs six times more to detect design defects by using testing than by using inspections. A study by IBM showed that only 3.5 hours where needed to find a defect through code inspections versus 15–25 hours through testing. Microsoft has found that it takes 3 hours to find and fix a defect by using code inspections and 12 hours to find and fix a defect by using testing. In a 700 thousand lines program, it was reported that code reviews were several times as cost-effective as testing. Studies found that inspections result in 20% - 30% fewer defects per 1000 lines of code than less formal review practices and that they increase productivity by about 20%. Formal inspections will usually take 10% - 15% of the project budget and will reduce overall project cost. Researchers found that having more than 2 - 3 reviewers on a formal inspection doesn't increase the number of defects found, although the results seem to vary depending on the kind of material being inspected.
- Technical reviews. One study found that the average defect detection rates of informal code reviews and desk checking are 25% and 40% respectively. Walkthroughs were found to have defect detection rate of 20% - 40%, but were found also to be expensive specially when project pressures increase. Code reading was found by NASA to detect 3.3 defects per hour of effort versus 1.8 defects per hour for testing. It also finds 20% - 60% more errors over the life of the project than different kinds of testing. A study of 13 reviews about review meetings, found that 90% of the defects were found in preparation for the review meeting while only around 10% were found during the meeting.
- Static analysis (IEEE1028)
Studies have shown that a combination of these techniques need to be used to achieve high defect detection rate. Other studies showed that different people tend to find different defects. One study found that the Extreme Programming practices of pair programming, desk checking, unit testing, integration testing, and regression testing can achieve a 90% defect detection rate. An experiment involving experienced programmers found that on average they were able to find 5 errors (9 at best) out of 15 errors by testing.
80% of the errors tend to be concentrated in 20% of the project's classes and routines. 50% of the errors are found in 5% of the project's classes. IBM was able to reduce the customer reported defects by a factor of ten to one and to reduce their maintenance budget by 45% in its IMS system by repairing or rewriting only 31 out of 425 classes. Around 20% of a project's routines contribute to 80% of the development costs. A classic study by IBM found that few error-prone routines of OS/360 were the most expensive entities. They had around 50 defects per 1000 lines of code and fixing them costs 10 times what it took to develop the whole system.
A key activity during construction is the integration of separately constructed routines, classes, components, and subsystems. In addition, a particular software system may need to be integrated with other software or hardware systems. Concerns related to construction integration include planning the sequence in which components will be integrated, creating scaffolding to support interim versions of the software, determining the degree of testing and quality work performed on components before they are integrated, and determining points in the project at which interim versions of the software are tested.
Object-oriented runtime issues
Object-oriented languages support a series of runtime mechanisms that increase the flexibility and adaptability of the programs like data abstraction, encapsulation, modularity, inheritance, polymorphism, and reflection.
Data abstraction is the process by which data and programs are defined with a representation similar in form to its meaning, while hiding away the implementation details. Academic research showed that data abstraction makes programs about 30% easier to understand than functional programs.
Assertions, design by contract, and defensive programming
Assertions are executable predicates which are placed in a program that allow runtime checks of the program. Design by contract is a development approach in which preconditions and postconditions are included for each routine. Defensive programming is the protection a routine from being broken by invalid inputs.
Error-handling, exception-handling, and fault tolerance
Error-handling refers to the programming practice of anticipating and coding for error conditions that may arise when the program runs. Exception-handling is a programming-language construct or hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution. Fault tolerance is a collection of techniques that increase software reliability by detecting errors and then recovering from them if possible or containing their effects if recovery is not possible.
State-based and table-driven construction techniques
State-based programming is a programming technology using finite state machines to describe program behaviors. A table-driven method is a schema that uses tables to look up information rather than using logic statements (such as if and case).
Runtime configuration and internationalization
Runtime configuration is a technique that binds variable values and program settings when the program is running, usually by updating and reading configuration files in a just-in-time mode. Internationalization is the technical activity of preparing a program, usually interactive software, to support multiple locales. The corresponding activity, localization, is the activity of modifying a program to support a specific local language.
- SWEBOK Pierre Bourque, Robert Dupuis; executive editors, Alain Abran, James W. Moore, eds. (2004). "Chapter 4: Software Construction". Guide to the Software Engineering Body of Knowledge. IEEE Computer Society. pp. 4–1 – 4–5. ISBN 0-7695-2330-7.CS1 maint: uses editors parameter (link)
- SWEBOK 2014, p. 3-3.
- McConnell 2004, Chapter 3.
- SWEBOK 2014, p. 3-5.
- McConnell 2004, Chapter 5.
- SWEBOK 2014, p. 3-5 - 3-6.
- McConnell 2004, Chapter 4.
- SWEBOK 2014, p. 3-6.
- McConnell 2004, Chapter 11.
- McConnell 2004, Chapter 6.
- McConnell 2004, Chapter 7.
- McConnell 2004, Chapter 12.
- McConnell 2004, Chapter 16.
- McConnell 2004, Chapter 19.
- SWEBOK 2014, p. 3-7.
- McConnell 2004, Chapter 20.
- McConnell 2004, Chapter 21.
- McConnell 2004, Chapter 22.
- SWEBOK 2014, p. 3-8.
- Thayer 2013, pp. 140 - 141.
- Thayer 2013, p. 140.
- SWEBOK 2014, p. 3-9.
- Thayer 2013, p. 142.
- SWEBOK 2014, p. 3-10.
- Pierre Bourque, Richard E. (Dick) Fairley, eds. (2014). "Chapter 3: Software Construction". Guide to the Software Engineering Body of Knowledge Version 3.0. IEEE Computer Society. ISBN 978-0-7695-5166-1.CS1 maint: uses editors parameter (link)
- McConnell, Steven (2004). Code Complete (2nd ed.). Microsoft Press. ISBN 978-0-7356-1967-8.
- Thayer, Richard; Dorfman, Merlin (2013). Software Engineering Essentials. Volume I: The Development Process (Fourth ed.). Software Management Training Press, Carmichael, California. ISBN 978-0-9852707-0-4.