ECE 441 lecture 15 - Purdue University

ECE 441 lecture 15 - Purdue University

Dependability where the Mobile World Meets the Enterprise World a Amiya Kumar Maji Advisor: Prof. Saurabh Bagchi Feb 27, 2015 School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana Slide 1 Introduction Large Scale Internet Mobility

End-to-end services need dependability of both components Slide 2 Summary of Contributions Dependability of Smartphones Study of failures in Android and Symbian. Analyze location of failure manifestation, bug fixes, customizability related failures. ISSRE2010 Evaluation of robustness of Android ICC. Designed and implemented our testing tool JarJarBinks, evaluated and analyzed crashes, suggestions for improving robustness. DSN2012 Dependability of Cloud Applications Evaluated impact of performance interference in public (Amazon

EC2) and private clouds. Mitigate performance interference by intelligent application reconfiguration. MW2014 Mitigate interference by two-level reconfiguration of web server clusters. Improves the previous work by making the controller more agile and effective. ICAC2015 (submitted) Slide 3 Publications A. K. Maji, K. Hao, S. Sultana, S. Bagchi. Characterizing Failures in Mobile OSes: A Case Study with Android and Symbian, in 21st International Symposium on Software Reliability Engineering, ISSRE 2010, November 1-4, 2010, San Jose, California. A. K. Maji, F. A. Arshad, S. Bagchi, J. S. Rellermeyer. An Empirical Study of the Robustness of Inter-component Communication in Android, in 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, June 25-28, 2012, Boston, MA.

A. K. Maji, S. Mitra, B. Zhou, S. Bagchi, A. Verma. Mitigating Interference in Cloud Services by Middleware Reconfiguration, in 15th International Middleware Conference, MIDDLEWARE 2014, December 8-12, 2014, Bordeaux, France. Provisional application for patent A. K. Maji, S. Mitra, S. Bagchi. ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services, in 12th International Conference on Autonomic Computing, ICAC 2015, July 7-10, 2015, Grenoble, France. (Under review) Slide 4 Agenda Introduction Contributions Prelim Review: Dependability of Smartphones

Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary Slide 5 Part I (Prelim Review) Dependability of Smartphones Slide 6

Study of Failures in Android and Symbian Analyzed 628 bugs in Android and 153 bugs in Symbian (Oct 2008-Nov 2009) Most bugs (> 90%) are permanent in nature. Majority bugs in Android middleware, fewer bugs in Kernel layer. Both platforms had significant number of bugs in Dev Tools, Web, Multimedia, and Build segments. Analyzed 233 bug fixes in Android Presented categorization of bug fixes. Only 22% of fixes required major code changes (> 10 lines). Question from Preliminary Exam: How Customizability in Android

hasagainst its cost Linux (more bugs). Q. does Android compare in terms of bug density? Bug density Android (< 2.5*10-4) lower than that reported for Windows -4 A.(2.66*10 Linux has -3 a bug density of 1*10 /LOC in Kernel version 2.6.30 ) [Alhazmi 2007] [Palix et al., Faults in Linux: Ten Years Later, ASPLOS 2011]

Collaborators: Kangli Hao, Salmin Sultana Slide 7 Robustness Evaluation of Android ICC Presented JarJarBinks, a tool for evaluating ICC robustness in Android. JJB tests Intent handling capabilities of Android components by sending large number of Semi-valid Intents (Explicit or Implicit). More than 6+ million Intents were sent to 800+ Android components over a week We found ~10% Activities crashed with Semi-valid Intents All crashes manifest as Exceptions in the runtime system. NPE most prevalent in both Android 2.2 and 4.0. Exception handling improved from 2.2 to 4.0 but still is a major concern.

Similar results with Implicit Intents. Components often crashed with valid Intents (Since extra data not captured in Intent-filter definition). Collaborators: Fahad Arshad, Jan S. Rellermeyer Purdue IBM Research, Austin Slide 8 System Crash from User-level Application 3 Activities crashed Android-Runtime Slide 9 Recommendation for Improving ICC Robustness

A. Intent Sub-typing Class CallIntent extends Intent{ String action="ACTION_DIAL"; telUri data; ComponentName cmp; getAction(){ }; setData(){ }; getData(){ }; .. } B. Checking input constraints Static (Java Annotations) Dynamic (Runtime) C. Full input-validation Use domain specific languages (e.g. WSDL)

Make Intent/Intent-filter descriptions more expressive Slide 10 Agenda Introduction Contributions Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary

Slide 11 Part II Dependability of Cloud Applications Collaborator: Subrata Mitra, Bowen Zhou, Akshat Verma Purdue IBM Research, Delhi Slide 12 Running Web Applications in the Cloud WS1

WS2 VM1 VM2 .. DB2 App1 DB1 VMn

VM1 VM2 .. Hypervisor Hypervisor Host1 Host2 Network

Storage Slide 13 Appm VMm Imperfect Performance Isolation due to Shared Hardware Resources P1 Processor P2

L1 Cache L1 Other shared resources Memory bandwidth Network/IO Translation Lookaside Buffer (TLB) L2 Cache (last level)

Multi-core Cache Sharing Slide 14 Mitigating Performance Interference in Clouds Performance Interference Performance of one VM suffering due to activity of another colocated VM Why it happens? Low level hardware resources are not partitioned well Contention for Cache, Mem bandwidth, Network can degrade Tail ~ 4 X median Tail ~ 55 X median performance

Our experiments with Amazon EC2 Performance of web servers can suffer drastically during interference Cloudsuite Application benchmark m1.large VM instances (2 cores, 7.5GB) Run for 100 hours EC2 Private Cloud Slide 15 Remediation Techniques Traditional techniques for remediation Better VM placement [Paragon ASPLOS2013] Require changes in hypervisor. Hypervisor scheduling [QCloud EurosysNot feasible in public cloud

2010] Dynamic live migration [Deepdive ATC2013] Our approach Requirements Need user level control Fast response during interference Key idea: Reconfigure application to handle change in operating environment (interference) IC2: Interference-aware Cloud application Configuration Slide 16

Solution Overview Slide 17 IC2: Agenda Performance Interference in Cloud Our approach

Solution Overview Interference vs. Middleware Parameters Interference Detection Configuration Controller IC2 in Operation Key Results Slide 18 Interference vs. Middleware Parameters Setup Server 1 Server 2 Server 3

Web Server Database Clients Interference KVM KVM Servers are Poweredge T320 servers, Xeon E5-2440 processor 6(12) cores, 16GB Memory

Application: Cloudsuite (Olio, Social media calendar) Middlewares: Apache + Php-fpm Slide 19 Interference vs. Middleware Parameters Setup Middleware Parameters Thread-pool parameters Apache: MaxClients Php-fpm: pm.max_children (PhpMaxChildren) Timeout parameter Apache: KeepaliveTimeout Interference: Dcopy from BLAS (cache r+w)

LLCProbe from Ristenpart CCS12 (cache r) Varying sizes of Dcopy to create different levels of contention Slide 20 Choice of Optimal Apache Parameters Optimal MXC changes with interference Optimal KAT changes with interference Depends on degree of interference Need dynamic reconfiguration

Slide 21 Parameter Dependency Parameter dependency changes with interference KAT = MXC / #new_connections/sec no longer valid during interference With interference, need smaller MXC larger KAT Slide 22 Observations Optimal configuration values with interference Optimal MXC decreases, KAT, PHP increases Server capacity with interference CPU saturates sooner with interference

IdleCPU with different interferences (MXC=1100) No-Intf 17% Dcopy-15MB Dcopy-1.5GB 7% 1% Lots of cache misses. CPI increases. Slide 23 Agenda: IC2

Performance Interference in Cloud Our approach Solution Overview Interference vs. Middleware Parameters Interference Detection Configuration Controller IC2 in Operation Key Results

Slide 24 Solution Overview Questions that we answer How to detect interference? Which parameters to reconfigure during interference? How to determine new parameter values? Slide 25 IC2 workflow Interference Detection Interference Detection Use Decision Tree classifier

In EC2, use system and appln. metrics to detect interference Load per operation (LPO) is a key indicator Challenge: Capture metrics variations with configuration changes More details on Decision Tree in paper Slide 26 State Manager In EC2, use buffer states to deal with transient interference/noisy data

Reconfigure only after two successive periods under interference Also masks classifier errors Slide 27 Configuration Controller Choice of parameter driven by knowledge base Created from empirical results shown earlier Can be created by expert administrators Our heuristic Decrease MXC based on proportional increase in LPO Increase KAT based on proportional increase in response time. For PHP use two constant values (no-interference, interference) Implementation

Modified Apache to handle graceful parameter update Called httpd-online: https://github.com/amaji/httpd-online-2.4.3 Slide 28 Agenda

Performance Interference in Cloud Our approach Interference vs. Middleware Parameters Solution Overview Interference Detection Configuration Controller IC2 in Operation Key Results Conclusion Slide 29 IC2 in Operation Setup EC2 m1.large VMs Web server co-located with interference VM

Periodic interference of varying intensity and type (LLCProbe, Dcopy) Private testbed VMs configured to match EC2 specifications Metrics to consider Improvement in response time during interference Detection latency Detection accuracy Slide 30 IC2 Improves Response Time Httpd-online reduces overhead New values

Effects of interference lasts longer in EC2 Default Apache distribution has high overhead of reconfiguration Httpd-online solves this Slide 31 Results IC2 improved response time by upto 40% in private testbed and upto 29% in EC2 during interference Median interference detection latency 15 sec in private testbed 20 sec in EC2 testbed

Classifier accuracy Interference detection showed 89% recall and 73% precision Majority misclassifications due to Interference, Nointerference detected as Transient Our labeling does not account for ambient interference Slide 32 Summary: IC2 Interference causes severe performance degradation in cloud Optimal application configurations change during interference Web services can mitigate effects of interference by reconfiguration We presented the design and implementation of IC2 which reconfigures web servers during interference

Our evaluations showed 40% reduction of response time in Private testbed and 29% reduction in EC2. Slide 33 Agenda Introduction Contributions Review: Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters

Directions for Future Research Summary Slide 34 ICE: An Integrated Configuration Engine for Interference Mitigation Motivation IC2 improves response time by configuring WS parameters WS reconfiguration is costly and limited Use residual capacity in a WS cluster efficiently Objectives Make reconfiguration (interference mitigation) faster Make existing load-balancers interference-aware Get better response time during interference (than IC2)

We use HAProxy as our baseline load-balancer Slide 35 ICE Overview Two-level reconfiguration 1. Update load balancer weight Less overhead. More agile. 2. Update Middleware parameters Only for long interferences. Reduces overhead of idle threads. Slide 36

ICE Design We use hardware counters for interference detection Faster detection Hypervisor access not required if counters are virtualized Slide 37 ICE: Load Balancer Reconfiguration Objective: Keep WS VMs CPU utilization below a threshold Uthres If predicted CPU above threshold, find a new request rate such that it goes below threshold Request rate (RPS) determines server weight value in load balancer configuration Use the following empirical function for load estimation

Predicted Util Past Util CPI RPS Slide 38 Indicator of Interference Evaluation Experimental Setup

Cloudsuite benchmark with different interferences We look at ICE with two different load balancer scheduling policies Weighted Round Robin (WRR or simply RR) WRR shows performance of a static configuration. Weighted Least Connection (WLC or simply LC) WLC shows performance of an out-of-box dynamic load balancer Slide 39 Response Time 200ms 400ms

Least Connection (LC) Round Robin (RR) ICE improves response time both in RR and LC LC (out-of-box) reduces effect of interference significantly, but occasional spikes remain ICE reduces frequency of these spikes Slide 40 Results ICE improves median response time by upto 94% compared to a static configuration (RR) ICE improves median response time by upto 39% compared to a dynamic load balancer (LC) Median interference detection latency

3 sec using ICE (15-20 sec for IC2) Slide 41 ICE: Summary Effect of interference can be mitigated by reducing load on the affected VM We presented ICE for two-level configuration in WS clusters ICE improves median RT by 94% compared to static configuration and 39% compared to a dynamic out-ofbox load balancer Median interference detection latency 3s Slide 42 Agenda

Introduction Contributions Review: Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary Slide 43 Directions for Future Research

Reliability with software evolution in Android Enhance JJB by instrumenting ActivityManager IC2: Automated generation of KB How to find which parameters to reconfigure in unknown applications? ICE: Handling long-lasting sessions. Move some sessions to other servers during interference. Slide 44 Summary of Contributions Presented failure characterization of Android and Symbian Robustness testing of Android ICC Designed and implemented JarJarBinks

Analysis of crashes Suggestions for robust ICC Mitigating interference in clouds Presented two solutions for handling interference without hypervisor modification IC2: mitigates interference by middleware reconfiguration ICE: mitigates interference by load-balancer and WS reconfiguration Slide 45 Publications A. K. Maji, K. Hao, S. Sultana, S. Bagchi. Characterizing Failures in Mobile OSes: A Case Study with Android and Symbian, in 21st International Symposium on Software Reliability Engineering, ISSRE 2010, November 1-4, 2010, San Jose, California. [*49]

A. K. Maji, F. A. Arshad, S. Bagchi, J. S. Rellermeyer. An Empirical Study of the Robustness of Inter-component Communication in Android, in 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, June 25-28, 2012, Boston, MA. [*23] A. K. Maji, S. Mitra, B. Zhou, S. Bagchi, A. Verma. Mitigating Interference in Cloud Services by Middleware Reconfiguration, in 15th International Middleware Conference, MIDDLEWARE 2014, December 8-12, 2014, Bordeaux, France. A. K. Maji, S. Mitra, S. Bagchi. ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services, in 12th International Conference on Autonomic Computing, ICAC 2015, July 7-10, 2015, Grenoble, France. (Under review) [*] is Google Scholar Citations Slide 46 Questions

Slide 47 Acknowledgements Prof. Saurabh Bagchi Committee members Collaborators: Akshat Verma (IBM Research, MakeMyTrip)

Jan S. Rellermeyer (IBM Research) Subrata Mitra (Purdue University) Fahad Arshad (Purdue University) Bowen Zhou (Purdue University) Kangli Hao (Purdue University, Samsung) Salmin Sultana (Purdue University, Intel Research) Slide 48 Thank You! Slide 49 Backup Slides Slide 50

Recently Viewed Presentations

  • Depression in the Workplace Presenteeism:  Depressed workers experience

    Depression in the Workplace Presenteeism: Depressed workers experience

    Working with Depression Toronto Star Nov 20, 2007 Andrea Gordon Ipsos Reid poll (Nov. 19, 2007) One in four working Canadians say they suffer from depression - the highest prevalence ever reported. Of the 4,122 employees surveyed, 18 per cent...
  • Art of Multiprocessor Programming - Brown University

    Art of Multiprocessor Programming - Brown University

    On an intuitive level, since we use mutual exclusion locks, then each of the actual updates happens in a non-overlapping interval so the behavior as a whole looks sequential. This fits our intuition, and the question is if we can...
  • 2019 CRP Application Roll-Out RIDE&#x27;s Consolidated Resource ...

    2019 CRP Application Roll-Out RIDE's Consolidated Resource ...

    Request to Obligate Federal Funds Form 2019. Forms Available for Download within the Document Library. Submitting the CRP Application. ... EIS expenditures over the required reserve is an un-allowed cost and must be absorbed with local funds.
  • Smoking cessation &amp; Addictive Behaviour - UKNSCC

    Smoking cessation & Addictive Behaviour - UKNSCC

    The medications course was written by Andy McEwen. We would like to thank Robert West, Hayden McRobbie, Robert Horne, Darush Attar, Katie Oliver, Terri Forward, Maria Gaiger, Ronnie Troughton, Neil Hunt, Alex Bobak, Louise Ross, and Oliver Kershaw for their...
  • Kohlberg&#x27;s theory of moral development

    Kohlberg's theory of moral development

    Kohlberg's theory of moral Reasoning. By Lindsey Busker and Sydney Thomson The purpose of this research study is to investigate levels of moral reasoning correlated with different levels of education as determined by responses on a specified Kohlberg dilemma.
  • Domain: Communication The Royal College of Paediatrics and

    Domain: Communication The Royal College of Paediatrics and

    Effective interprofessional communication is essential in paediatric practice, especially when part of a multi-disciplinary team, managing complex patients or managing an acute scenario. Poor communication between HCPs is a contributing factor leading to clinical errors and adverse outcomes for patients...
  • Volcanoes - Earth and Environmental

    Volcanoes - Earth and Environmental

    Nature of Volcanic Eruptions Factors Affecting Eruptions Volcanoes erupt with different severities Primary factors that determine whether a volcano erupts quietly or violently are: (1) magma composition (2) magma temperature (3) amount of dissolved gases in magma Magma Composition and...
  • Procurement Shared Service Center May 19, 2016 Carmen

    Procurement Shared Service Center May 19, 2016 Carmen

    Housekeeping Items. All lines will be muted *6 to unmute your line to ask a question *6 to mute your line after you are done speaking. The leads will be monitoring the questions via the chat boxes, please feel free...