Quality Management and Software Development Success/Failure
Abstract
This report aims to analyse the relationship between project success and quality management. Quality management can be a major contributory factor to the smooth running of a software development project and the end result. This report will look at the definitions of Quality, both from a customer and professional standpoint. It will also look at the key areas of quality management and how they apply to the end result. Finally this report will look at real life case studies where quality management has been particularly important. Including the Ariane 5 space rocket explosion and the failure of the Los Angeles air traffic control system that nearly caused several mid air collisions.
1. What is Quality?
The dictionary definition of quality is “The essential character of something, an inherent or distinguishing character, degree or grade of excellence”. In the computer industry however, this definition does not generally apply. Quality can be looked at from both a business and a customer perspective. A business defines quality as “meeting requirements”, i.e. to have a product which has measurable and well-met requirements. Requirements are the most important portion of a project and the quality system revolves around them (William E. Lewis, 2000). From a customer perspective, quality is defined as “Fitness for Use”; this essentially is asking the customer if the product he/she has received actually does what it was intended to do. This idea of “Fitness for Use” is the more generally accepted definition.
As you can see there are several different definitions for quality. As a result there are several commonly held misconceptions.
- It is often believed that quality control can be implemented at the end of a project. But quality requires a commitment, particularly from top management
- Many individuals believe that defect-free products and services are impossible and accept certain levels of defects are normal.
- Quality is frequently associated with cost, meaning that a high quality equals a high development cost.
- Quality demands requirement specifications in enough detail that the products produced can be measured against these specifications.
- Technical personnel often believe that standards stifle their creativity.
(William E. Lewis, 2000).
In 1991 ISO9126 was introduced, by the International Organization for Standardization, to produce a formal definition for software quality. It can be argued, that without a formal definition it is very difficult, if not impossible, for a project to succeed. ISO9126 breaks down software quality into several characteristics. These characteristics are written from two perspectives: quality in use (looks at the implementation of a product) and external characteristics (looks at the quality issues from a post development and user perspective). The characteristics are as follows:
- Quality in use
- Effectiveness: Ability to achieve user goals accurately
- Productivity: Avoiding excessive use of resources
- Safety: Within reasonable levels of risk of harm to people, business, software, property & environment
- Satisfaction (user)
- External software quality characteristics:
- Functionality: Functions that the product provides to satisfy user needs
- Reliability: Capability of the software to maintain its level of performance
- Usability: Effort needed to use the software
- Efficiency: Physical resources needed to run the software
- Maintainability: Effort needed to make changes
- Portability: Ability of the software to be transferred to a different environment
(Bob Hughes & Mike Cotterell, 2006)
The now accepted definitions in ISO 9126 have gone along way to ensuring that software products are delivered with much higher success rate.
2. The Place of Quality
2.1. In the Business
Quality needs to have a place at the core of an organisation; if it does not then any attempt to install a quality assurance system will fail. If a business tries to “tack-on” quality assurance as a side task late on in the development then it will be looked at as a distraction from “real work”. While quality assurance must be at the centre of a company it must also start at the top levels of management. This is due to the fact that as a business needs money to function, a software project needs money to be successful. The high-level management staff of a company generally control financial allocation. So if they do not fully understand quality assurance, they will be unlikely to allocate funds to it.
In most companies the board of directors will issue a mission statement, defining the place of quality. Once this is done the processes needed to fulfil the mission can be defined. Each process must be documented in a procedure. The collection of procedures is known as the quality management system. Which essentially is a formal process for the importance of quality at the business level (Howard T. Garston Smith, 1997). **** When a business has a defined quality plan it gives the software development team a set of standard guidelines to adhere to. This is very important because staff may change between projects, all with different programming styles and development methods. Without a standard set of quality procedures each and every project will produce different types of defects and problems.
2.2. The Role of the Customer
The role of the customer can never be overstated. To a customer, quality is the perceived value of the product he or she purchased. This percieved value is generally regarded as “Fitness for use” or if the product actually does what the customer requested. From an objective point of view this can be regarded as meeting the customers intial requirements.
The customer’s satisfaction with a product is the ultimate validation that it conforms to requirements and is fit for use. The customer is often the most apparent indicator of a projects success or failure. If the customer is not happy with the end product he/she receives then the project will be regarded as a failure. However, conformance to requirements is also an issue from the developer perspective. If the product is to achieve “quality” status then it must be developed in accordance to the specification.
3. Aspects of Quality
3.1. Key Areas of Quality Management
The first of these is prevention vs. detection. Quality cannot be achieved by assessing an already completed product, as a vast amount of defects will have already engrained themselves. There are several ways in which defects can be prevented. Coding standards are a simple yet effective way of doing so. These standards can involve a defined programming style (camel caps, comment formats) and a set of “best practice” processes for development. A business should document these coding standards to ensure that they are maintained between projects. Other methods include change management and Defect documentation. Change management is a process designed to control changes to the system. If a change is not controlled then the impact will not be analysed before it is made, hence it will more than likely introduce more defects. Defect documentation involves documenting any common defects that are found in previous projects so that they can be learnt from in the future.
Verification and validation is concerned with proving a product meets the requirements specified at the start of the development. It is tightly integrated with the testing of a software development project, however this integration spreads testing throughout the life cycle. This includes the introduction of systematic review, analysis and testing. By using testing and verification techniques through the entire development life cycle a lot more defects will be detected hence the final product will be of higher quality. It also provides a level of assurance to the management that developers are following formal software development processes (William E. Lewis, 2000).
Testing will always provide a good bearing of what the quality level of a product is, and while it may not be perfect it will almost inevitably improve the end quality. This is done through the use of the three main testing techniques. The functional elements of an application, i.e. the functions the program has been given to meet the requirements, are tested using the “Black-Box” method. The logical pathways through the code, which may not be picked up by “Black-Box” testing, are analysed using the “White-Box” method. This will closely examine the internal structure of the system and test all logical pathways. These two methods are sometimes combined to form “Gray-Box” testing, whereby both the logical pathways and conformance to the specification are tested at the same time.
Another ongoing process is quality control, it is defined as the processes and methods used to monitor work and observe whether requirements are met. It can be looked at as an abstraction of prevention, verification and testing; as it focuses on the removal and review of defects before the shipment of the project. However it is more of an internal, low-level process than the others. Quality control is the sole responsibility of the team that is developing the product. The process consists of carrying out a set of pre-defined checks that are specified in the product quality assurance plan created at the start of the project. The traditional function of quality control is inspection; independent examinations to assess compliance with defined criteria. This may be in the form of a detailed checklist or a fully-fledged set of checks, processes and tests required (William E. Lewis, 2000).
3.2. Hallmarks of Quality Assured Code
Quality assurance is by no means a new idea; the level of quality in software products has been a concern of governments and major industries as far back as the 1980’s. In 1988 the UK governments Department of Trade and Industry commissioned the Price Waterhouse report. The report set out to identify the hallmarks of quality assured code.
The first hallmark is cost; this states that a quality assurance process delivers the required product within the allocated budget. The report takes into account that financial conditions may change over the duration of the project and as such the product should be financially viable consistent to available market forecasts. Another impact is the fact that development is a labour intensive process; the profitability of the company is directly dependent on the productivity of the development team (Howard T. Garston Smith, 1997). The current relevance of this report is validated further when it talks about the need for standardization, life-cycle methodology, configuration management and prevention.
Secondly the report talks about timeliness. This is of course a widely recognised problem in software development and many vendors complain of the difficulty of delivering software on time. The financial effects of late delivery can be vast, especially if a competitor beats the vendor to shipment. Project planning is the main way of solving the problem of late delivery via the use of time management processes. Price Waterhouse correctly identifies that the poor gathering of requirements is the most serious cause of overruns. The most serious (and expensive) software flaws can be traced back to incorrect or vague requirements. Requirements are often not gathered well as it is can be difficult to get users involved in documenting requirements. This is indicative of the different ways of thinking between programmers and users. Programmers are used to analysis a project as a set of requirements, but it can often been seen as a foreign concept from a user point of view. Updating the requirements during the lifecycle is very important as a user may change his/her mind on a certain issue, or a new business problem may arise that requires a change. The use of software prototypes, demonstrated to the user at set intervals, can aid in this process (Howard T. Garston Smith, 1997).
Lastly the report talks about the reliability of software products. Reliability is something that is perceived by the end user of the project. If a system has frequent problems then its fitness for use, in the customer’s eyes, will be greatly diminished. Of course removing defects in software is the best way to improve long-term reliability. Configuration management is an indirect way of reducing defects, as it will ensure that the user receives the same version of the software that the vendor intends to deliver and support. I.e. the user will not receive a buggy version of the product that the vendor did not intend for shipment.
4. Case Studies
4.1. Ariane 5
Ariane 5 was a space rocket developed by the European Space Agency in the 1990’s. The first test flight of Ariane 5 on the 4^th^ of June 1996 was cut short after 37 seconds when the vehicle self destructed due to a failure in the guidance system. The explosion of Ariane 5 has become known as the most expensive computer bug in history.
The cause of the self-destruction was a section of the guidance system code, designed to convert a 64-bit floating-point value to a 16-bit signed integer. This 64-bit value took data from the rockets many sensors and gyroscopes to define its sideways inertia. This should have been a relatively routine task, but in this case the inertia value was too large to be expressed as a 16-bit integer, causing the guidance system to fail. This failure pitched the rocket off course and triggered the self-destruct mechanism.
After the accident, an official enquiry was launched to discover the root cause the bug that caused the explosion. This report uncovered the fact that the software that powered the guidance system was actually designed for Ariane 4, a much slower rocket. During the design process for the software it was decided that it was not necessary to protect the inertial system computer from a huge value for sideways inertia (Prof. J. L. Lions, 1996). The second major issue that contributed to the accident was that the calculation containing the bug actually served no purpose once the rocket was in the air. Its only function was to align the system before launch. Again it was decided for an earlier version of Ariane to leave this function running for the first 40 seconds of flight as a fail safe (James Gleick, 1996).
During the design process of Ariane 5, a review was carried out of the software to discover its feasibility for the new vehicle. Testing should have been carried out to discover what effect leaving this alignment system turned on would have on the guidance of the rocket. But the limitations of the software were not fully analysed and the possible implications of allowing it to continue to function during flight were not realised. Another failure of the testing was that the tests performed did not include the Ariane 5 trajectory data. It would have been feasible to do this during the testing, but it was decided to use previous simulated data to test the system. Consequently the design error was not discovered. The enquiry board carried out tests on a computer with software of the inertial reference system and the actual trajectory data from the Ariane 5 flight. This simulation recreated the exact chain of events from the disaster (Prof. J. L. Lions, 1996).
The official conclusion was that the Ariane 5 Development Programme did not include adequate analysis and testing of the inertial reference system or of the complete flight control system, which could have detected the potential failure (Prof. J. L. Lions, 1996). **** This was simply a failure to apply standard quality management methods; if correct requirements verification and testing had been carried out the disaster could have been averted.
4.2. Los Angeles Air Traffic Control
On the 14^th^ of September air traffic controllers at the ATC centre in Los Angeles lost contact with 400 airliners. This loss of contact was caused by the unexpected shutdown of the main voice communication system. A backup system that was supposed to take over in such an event crashed within a minute of it being turned on. In at least five cases airplanes came within the minimum separation distances required in the US. Fortunately, there were no collisions (Linda Geppert, 2004).
The system that LA air traffic control used was a Voice Switching and Control System (VSCS). The initial report on the incident, from the FAA, ruled that problem was human error. It stated that the event was not a result of system instability and could have been avoided if strict operation procedures had been carried out. These procedures included the reboot of the VSCS every 30 days.
But this need for a reboot could have been avoided. A subsystem of the VSCS, the control subsystem upgrade (VCSU), contained a software bug. The VCSU is the control system for the VSCS and checks its health by continually running tests on the system. Inside the control system is a timer that counts down in milliseconds, this timer is used as a pulse to send out periodic queries to the VSCS. The timer starts of at the highest number possible (around 4 Billion milliseconds); counting down from this number to zero took around 50 days. Hence the FAA required that the system be rebooted after 30 days to reset this timer (Linda Geppert, 2004).
It is obvious from the inquiry, and the design documentation, that this limitation was known about. It was a key assumption of the system implementation that a reboot was required. The failure of the development team, and the FAA, was that it was not analysed at any point during the testing what would happen if the reboot did not take place. Hence the FAA didn’t learn of the problem until it ran the new system in the field. It ran for 49.7 days and then it crashed. Similarly to the Ariane disaster, this shows a key lack of understanding of quality management. Every single avenue of logic and data should have been inspected, tested and re-tested before this system was implemented. These are standard procedures for all software products, no matter how mundane, let alone a system that has power of life and death over 400 passenger aircraft.
5. Conclusions
- A clear and concise definition of quality is needed
- A business must have quality at its heart. It needs to be supported from the top or it will fall to the wayside.
- A defined set of quality control processes are needed to ensure standard quality levels across products.
- Customer satisfaction is a good barometer of success/failure. So quality assurance is needed to ensure that they are happy with the product.
- To reduce costs, in the short and long term, quality management is essential
- To reduce defects early on a set of “best practice” standards are needed
- Verifying the product against the specification, to ensure it’s “fitness for use”, is essential before delivery
- The management processes need to be supported by strict quality control checks.
- Inefficient and vague requirements are the number one cause of defects and late project delivery.
6. References
- Howard T. Garston Smith (1997) Software Quality Assurance: A Guide for Developers and Auditors, USA: Interpharm Press Inc.
-
William E. Lewis (2000) Software testing and continuous quality improvement, USA: CRC Press LLC **
-
Stephen H. Kan (2003) Metrics and models in software quality engineering, USA: Pearson Education Inc.
- Bob Hughes & Mike Cotterell (2006) Software Project Management, UK: McGraw-Hill Higher Education.
- James Gleick (1996) A Bug and a Crash, http://www.around.com/ariane.html, Date accessed 11/11/09
- Prof. J. L. LIONS (1996*) ARIANE 5 Flight 501 Failure,* http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html, Date accessed 11/11/09
- Linda Geppert (2004*)* ** *Lost Radio Contact Leaves Pilots On Their Own*, http://spectrum.ieee.org/aerospace/aviation/lost-radio-contact-leaves-pilots-on-their-own, Date accessed 11/11/09