Towards Understanding the Importance of Variables in ...

Towards Understanding the Importance of Variables in ...

EDCC 2010 Towards Understanding the Importance of Variables in Dependable Software Matthew Leeke and Arshad Jhumka Department of Computer Science University of Warwick {matt, arshad} Outline Software Dependability Mechanisms for Error Detection and Correction Related Work The Importance Metric A Case Study: FlightGear Discussion and Limitations Future Work Software Dependability Computer systems have become pervasive Functionality increasingly defined by software - Software dependability is critical across all facets of

modern society - A strong motivator for much current work in software dependability How do we actually design for dependability? - A firm conceptual basis Designing For Dependability A dependable software systems contains two important types of artefact [Arora et. al.,1998] - Error Detection Mechanisms (e.g. executable assertions) - Error Recovery Mechanisms (e.g. exception handlers)

In order to contain errors and avoid system failures both artefacts must be effective - Coverage - Latency Error Detection Mechanisms (EDMs) An EDM seeks to detect whether the system state at a given point can threaten the proper functioning of a software system Effectiveness depends on location and predicate - Implemented predicate is essentially a boolean expression defined over a set of program variables - This predicate is non-trivial -

Properties of non-triviality are accuracy and completeness - Accuracy and completeness are sensitive to the set of variables [Jhumka et. al.,2006] Error Recovery Mechanisms (ERMs) An ERM seeks to restore a suitable safe state for a software system in order for it to continue execution - Values of some corrupted variables must be overwritten If correct and timely operation is to be maintained, some set of variables need to hold suitable values in order to make the system state safe for further execution

If this set of critical variables is known, it is easier to develop predicates and determine appropriate locations for EDMs and ERMs Knowledge of critical variables eases the design and placement of EDMs and ERMs Our Approach We develop a variable centric approach to understand and capture the importance of variables in an effort to contain error propagation We present: - Two metrics which contribute to a definition of Importance - The Importance Metric - An experimental approach for estimating Importance -

A case study to demonstrate the application of these metrics Related Work Experimentally evaluating the coverage and latency of EDMs and ERMs [Arlat et. al.,1989] [Vinter et. al,2001] - Established that EDMs exhibiting high coverage and low latency serve to reduce error propagation ERM/EDM Location - Subsequent research on locating EDMs - Guidelines and heuristics for the location and some aspects of design[Hiller et. al.,2000] [Rabejac et. al.,1996] -

Metrics for identifying modules which do not propagate errors [Khoshgoftaar et. al,1999] [Jhumka et. al.,2005] Related Work - Quantifying error propagation / influence between interacting modules [Jhumka et. al.,2001] [Suri et. al.,1998] - Framework for identifying vulnerabilities in software based upon error permeability [Hiller et. al.,2004] Captures how likely errors are to propagate from a module input to its output, allowing the identification of permeable modules

Requires accurate data flow information ERM/EDM Design - Specification used to derive programatic tests which capture some aspects of functional correctness [Richardson et. al.,1992] Related Work - - Static analysis to detect vulnerabilities Completeness False positives Finite state programs considered in the automated design of EDMs [Arora et. al.,1998] [Jhumka et. al.,2006]

[Wilken et. al.,1990] Applicability of analysis on finite state systems Little work relating specifically to the design of predicates - In some sense the described approach is asking What variables should be captured by the implemented predicate? Models System Model - We consider a software to be a set of interconnected components, though we do not know anything about the interconnections - We adopt a grey-box view => Access to the source code is allowed, but knowledge of specific functionality or structure is not available

Fault Model - Transient value fault model - Model hardware faults as bit flips which cause instantaneous changes to values held in memory The Importance Metric Essentially related to measuring the impact that corruption in a particular variable can have in two domains - Spatial - Temporal Intuitively, to minimise the likelihood of software failure, each aspect need to be appropriately handled -

The number of corrupted variables, hence modules, and the duration of the corruption need to be minimised Spatial Impact The spatial impact of variable v of component C in a run r is the number of components that get corrupted in r The spatial impact of variable v of component C is the maximum of all runs Temporal Impact The temporal impact of variable v of component C in a run r is the number of time units over which at least one component remains corrupted in r The spatial impact of variable v of component C is the maximum of all runs The Importance Metric Considerations: 1. A variable may have a high spatial impact, which means that by the time the error disappears, several other components have been corrupted, thus limiting the effectiveness of a recovery 2. A variable may have a high temporal impact but

a low spatial impact, which means that, even though very few components are affected, a recovery may not be effective Calculating Importance A general form for a metric which accounts for the described factors in expressing the importance of a variable v in a component C, using arbitrary function G, K and L, can be taken to be: Calculating Importance Any instantiation should be a rich as possible with respect to the goals of error containment and failure avoidance Account for factors which influence importance, but are not directly captured by the defined impact metrics - Incorporating failure rate helps to realise these goals Calculating Importance Normalisation of the impact metrics is performed to ensure that their addition does not mask or enhance either impact metric Applicable where emphasis is to be placed upon the need to detect errors or to recover from them

- The values of n and m dictate whether emphasis is placed upon the need to avoid failures or the need to prevent widespread system corruption A Case Study - FlightGear Target System - Open-source flight simulator - Over 220,000 lines of C/C+ + Test Cases - Takeoff procedure of 2700 simulation loop iterations - 500 iteration initialisation

and - 2200 iteration pre- and post-injection periods - 9 test cases; 3 aircraft masses and 3 sets of environmental conditions A Case Study - Fault Injection Instrumentation - Instrumented modules randomly selected - Exhaustive analysis with respect to bit representation and code location

Fault Injection and Logging - Modified PROPANE used to perform fault injection - Modifications allowed almost full automation and analysis of 3,773,736 experiments [Hiller et. al.,2002] Characterising Erroneous State and Failure - Deviation from a golden run characterised error states - Failure specification based upon expert knowledge A Case Study - Results Identifier

Module Fail Rate Spatial Impact Temporal Impact fgFlightTime G 0.003472 4 2000 delta_time_sec G 0.006944 3 2000

dump C 0.013889 3 1 EmptyWeight D 0.011905 2 2000 InitialisedEngines C 0.001389

5 77 IsBound C 0.004167 1 1 Mass D 0.011905 12 1432 Table 1 - Components of the Importance metric A Case Study - Results

Identifier Importance Identifier Importance currentThrust 1.047348 Weight 1.048938 HasInitialisedEngines 1.016663 EmptyWeight 1.030808 numTanks 1.012560

bixx 1.008410 TotalQuantityFuel 1.011618 bixy 1.008410 firsttime 1.009914 bixz 1.008410 dt 0.506376 bizz

1.008160 Tw 0.090196 biyz 1.008160 numEngines 0.068625 biyy 1.007966 i_run_3 0.058308 Mass 0.772751 Table 2/3 - Importance metric values for Module C/D

A Case Study - Results Identifier Module Importance Weight D 1.048938 currentThrust C 1.047348 EmptyWeight D 1.030808 delta_time_sec

G 1.023784 fgFlightTime G 1.019890 HasInitialisedEngines C 1.016663 numTanks C 1.012560 TotalQuantityFuel C

1.011618 firsttime C 1.009914 Table 4 - Overall Importance Ranking For All Modules Implications We might envision the relative Importance ranking being used to inform the development and maintenance of software - A software release can not address all known issues - Address vulnerabilities in order of severity Importance does not require knowledge of software structure

or communication paths - Analysis can be performed post-implementation by an engineer with no prior system knowledge - Consistent with modern software development methods (unfortunately) Lessons Learnt A variable centric approach can be employed in guiding the design and placement of EDMs and ERMs The proposed metric can identify critical variables which are not immediately evident, even to those with prior system knowledge - More on this to come The Importance metric and its constituent components can be evaluated in an automated fashion, thus facilitating a low-cost dependability analysis - Automation is a key facilitator for this form of analysis

Limitations The relative ranking generated is sensitive to the set of variables under consideration - Ideally all variables in a software would be analysed - Value in analysing on a module-by-module basis Importance metric does not attempt to estimate / quantify the real-world importance of variables Inherent limitations of the fault injection process - Intrusiveness, test case dependance and potential for unknown sources of variability Future Work Validating identified critical variables - Blind test... interesting!

- Correlating with predicates derived by mining FI data Approaches to EDM and ERM design based upon identified critical variables - Critical variables only? Relating access restrictions to variable importance Summary We develop a variable centric approach to understand and capture the importance of variables in an effort to contain error propagation We present: - Two metrics which contribute to a definition of Importance

- The Importance metric - An experimental approach for estimating Importance - A case study to showcase these metrics References J Arlat, Y Crouzet and J-C Laprie. Fault injection for dependability evaluation of fault tolerant computing systems. In Proceedings of the 19th International Symposium of Fault-Tolerant Computing, pp.348-355, June 1999 A Arora and S S Kulkarni. Component based design of multi-tolerant systems. IEEE Transactions on Software Engineering, 24(1): 63-78, January 1998

A Arora and S S Kulkarni. Detectors and correctors: A theory of faulttolerance components. In Proceedings of the 18th IEEE International Conference on Distributed Computing Systems, pp.436-443, May 1998 M Hiller. Executable assertions for detecting data errors in embedded control systems. In Proceedings of the International Conference on Dependable Systems and Networks, pp.24-33, June 2000 M Hiller, A Jhumka and N Suri. PROPANE: An environment for examining the propagation of errors in software. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, pp.81-85, July 2002 References M Hiller, A Jhumka and N Suri. EPIC: Profiling the propagation and effect of data errors in software. IEEE Transactions on Computers, 53(3):512-530, May 2004 A Jhumka, M Hiller, N Suri. Assessing inter-modular error propagation in distributed software. In Proceedings of the 20th IEEE Symposium on Reliable

Distributed Systems, pp.152-161, January 2001 A Jhumka and M Hiller. Putting detectors in their place. In Proceedings of the 3rd IEEE International Conference on Software Engineering and Formal Methods, pp.33-42, September 2005 A Jhumka, F Freiling, C Fetzer and N Suri. An approach to synthesise safe systems. International Journal of Security and Networks, 1(1):62-74, September 2006 T M Khoshgoftaar, E B Allen, W H Tang, C C Michael and J M Voas. Identifying modules which do not propagate errors. In Proceedings of the IEEE Symposium on Application-Specific Systems and Software Engineering and Teachnology, pp.185-193, March 1999 References C Rabejac, J-P Blanquart and J-P Queille. Executable assertions and timed traces for online software error detection, In Proceedings of the Annual Symposium on Fault Tolerant Computing, pp.138-147, June 1996

D J Richardson, S L Aha and T O OMalley. Specification-based test oracles for reactive systems. In Proceedings of the 14th International Conference on Software Engineering, pp.105-118, July 1992 N Suri, G Ghosh and T Marlowe. A framework for dependability driven software integration. In Proceedings of the 18th International Conference on Distributed Computing Systems, pp.405-415, May 1998 J Vinter, J Aidemark, P Folkesson and J Karlsson. Reducing critical failures for control algorithms using executable assertions and best efforts recovery. In Proceedings of the International Conference on Dependable Systems and Networks, pp.347-356, July 2001 K Wilken and J P Shen. Continuous signature monitoring: Low-cost concurrent detection of processor control error. IEEE Transactions on Computer-Aided Design, 9(6):629-641, June 1990 Questions?

Recently Viewed Presentations

  • ASEAN Overview - PDSI UNISA

    ASEAN Overview - PDSI UNISA

    Prof. Fasli Jalal, Ph.D Cek sumber asli : Apakah tidak ada tanda panah hubungan antar lingkaran ? PRB * * * * * Titik berat gernas tetap Highlight penyakit menularnya * * Digunakan untuk mendukung slide 4 * Slide ini...
  • Gangguan Disosiatif - Gunadarma

    Gangguan Disosiatif - Gunadarma

    Namundalamkehidupannyata, Chris Costner Sizemore terpecahkembalimenjadi 22 kepribadian Film BelahanJiwa Empat wanita , Cairo seorangpelukis ( Rachel Maryam ), Farlynaseorangdesainerpakaian ( Dinna Olivia ), Baby Blue seorangarsitek ( Nirina Zubir ) danpsikologArimby ( Marcella Zalianty ) denganempatpermasalahan psikologis yang ...
  • Design Method of Data Collection Surveys and Polls

    Design Method of Data Collection Surveys and Polls

    Non-respondents tend to behave differently to respondents with respect to the question being asked. 1936 U.S. Election Country struggling to recover from the Great Depression 9 million unemployed 1929-1933 real income dropped by 1/3 1936 U.S. Election Candidates: Albert Landon...
  • Teaching about Earthquakes and Plate Tectonics: Activities ...

    Teaching about Earthquakes and Plate Tectonics: Activities ...

    Iron-Nickel (Earth's mantle) Meteorite (Earth's core) Extensions and Connections Student groups researching Earth's layers Drawing illustrations of the Earth's interior Older students creating their own tour guide and giving guided tours for younger students or other classes Showing video of...
  • Transneft -

    Transneft -

    The project included construction of 945 km of new pipelines and three pumping stations Expansion to throughput of 60/m tpa is to be completed during the 1st half of 2006 Project highlights BPS added significant new export capacity for Russian...
  • Monopoly (Ch.10)

    Monopoly (Ch.10)

    Gamble B: .10*5million + .89*$1million +.01*$0. The Ellsberg Paradox. Suppose you have an urn containing 30 red balls and 60 blue and yellow balls. You don't know how many blue or yellow balls there are (only that the sum of...
  • Supporting the Implementation of the Common Core State ...

    Supporting the Implementation of the Common Core State ...

    Common Core State Standards: Designed to raise achievement. Cleared out the clutter from the basement and attic of the curriculum—fewer and more rigorous. ... Implement changes in content at the preK-2 level now. 6/30/2011.
  • W - Religious Resources Centre

    W - Religious Resources Centre

    Maria Christina Gomez cross = El Salvadorian cross. She set up an organisation to help women who suffered domestic abuse/were raped, she also visited rural villages to teach women how to read. She was abducted and murder, probably by the...