Hypervisor-Assisted Application Checkpointing for High Availability Min Lee

Hypervisor-Assisted Application Checkpointing for High Availability Min Lee

Hypervisor-Assisted Application Checkpointing for High Availability Min Lee Joint work with A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik Introduction Virtualization technology Gets adopted widely Proves its usefulness Most applications run well Natively run Some important applications dont run well Certain operations cannot run natively Instead they use hypercalls Our target: Application-checkpointing 2009 Avaya Inc. All rights reserved. 2 Xen Virtual Machine Monitor Applications Applications Modified Guest OS Modified Guest OS Applications Modified

Guest OS Virtual machines Virtual hardware (vCpu, vDisk, vNic, vMemory etc.) Xen Hypervisor Physical hardware (Cpu, Disk, Nic, Memory etc.) (Taken/adapted from Xen and co. slides) 2009 Avaya Inc. All rights reserved. 3 High Availability Approaches Categories Application-transparent No changes to application or guest Xen-specific: Remus, Kemari Application-assisted Application implements the checkpointing logic Flexible and light-weight We are targeting Application-assisted under virtualization Xen-specific Applicable to general hypervisors 2009 Avaya Inc. All rights reserved. 4 Hypervisor-Assisted Application Checkpointing Application checkpointing

Provides transactional properties to the traditional heap Make high available heap Processes survive failures Has performance issues in Xen Our technique improves application-checkpointing performance in Xen 2009 Avaya Inc. All rights reserved. 5 High Availability Magical mirror List_add() changes List_del() changes List_add() Crash Takeover List_add() 2009 Avaya Inc. All rights reserved. 6 Transaction APIs APIs:

List of dirty-pages Written pages Mprotect() system call Write-protect int declare(addr, size); void undeclare(Tid); void Tstart(Tid); void Tend(Tid, dirty_pages); Tstart(); Examples: List_add(); Tend(); SIGSEGV signal Tstart(); List_add(); List_del(); List_add(); List_del(); Tend(); 2009 Avaya Inc. All rights reserved. 7 PT Existing Approach Get dirty pages Declare() {} Tstart(); handler() { mprotect(unprotect); add_to_dirty_pages(); } Tend();

Undeclare() {} Process view (virtual pages) 1 2 3 4 5 6 7 8 9 10 11 12 List_add(); List_add(); 5 7 2009 Avaya Inc. All rights reserved. 8 PT Call-Flow Pure User-level For every dirty page User Mprotect() Mprotect() Signal

OS Page fault Hypervisor TLB flush TLB flush 2009 Avaya Inc. All rights reserved. 9 Approaches PT-based Pure user space Emulation-based Scan-based PT (Exisiting) Hypervisorassisted 2009 Avaya Inc. All rights reserved. 10 Approaches PT-based

Emulation-based Pure user space PT (Exisiting) Emulation Hypervisorassisted PTxen Emulxen Scan-based Scanxen Our approaches 2009 Avaya Inc. All rights reserved. 11 Our Approaches 2009 Avaya Inc. All rights reserved. 12 Emulation Under the condition Process view (virtual pages)

Most transactions are small Declare(); Tstart() {} handler() { emulate(); log_to_write_buffer(); } Tend(); Undeclare(); 1 2 3 4 5 6 7 8 9 10 11 12 List_add(); List_add(); (Addr1,100) (Addr2,200) 2009 Avaya Inc. All rights reserved. 13

Hypervisor-Assisted:User-to-hypervisor call Overhead through OS unnecessary Directly talk to Xen Move checkpointing to Xen level Add new interrupt vector 0x80: system call 0x82: hypercall from guest OS 0x84: hypercall from user (Newly added) Xen-based approaches without any changes to guest OS. 2009 Avaya Inc. All rights reserved. 14 Hypervisor-Assisted:User-to-hypervisor call User-to-Hypervisor Call 2009 Avaya Inc. All rights reserved. 15 PTxen Implement PT in Xen Declare(); Tstart() {} Tend(); Undeclare() {} Process view (virtual pages) 1

2 3 4 5 6 7 8 9 10 11 12 List_add(); List_add(); ----- Xen ----Process1, (1-12) page_fault() { mprotect(unprotect); add_to_dirty_pages(); } 5 7 2009 Avaya Inc. All rights reserved. 16 Emulxen Emulation in Xen Declare(); Tstart() {} Tend(); Undeclare();

Process view (virtual pages) 1 2 3 4 5 6 7 8 9 10 11 12 List_add(); List_add(); ----- Xen ----Process1, (1-12) page_fault() { emulate(); log_to_write_buffer(); } (Addr1,100) (Addr2,200) 2009 Avaya Inc. All rights reserved. 17 Scanxen Idea Scan page table rather than trapping writes Hardware marks dirty bit Declare();

Tstart() {} Tend(); Undeclare(); Process view (virtual pages) = Dirty-bit in page table 1 2 3 4 5 6 7 8 9 10 11 12 List_add(); List_add(); ----- Xen ----Process1, (1-12) scan_page_table() { collect_dirty_bit(); add_to_dirty_pages(); } 5 7 2009 Avaya Inc. All rights reserved.

18 Microbenchmark 10000 transactions 10MB heap size 2009 Avaya Inc. All rights reserved. 19 Microbenchmark Transactional heap size For simplicity, whole heap is protected Transaction Write per pages (wpp) # of writes per pages Page per transaction (ppt) # of unique pages written # of writes = wpp * ppt Scanxen Impacted by only heap size Not wpp, ppt, or transaction size 2009 Avaya Inc. All rights reserved. 20 PT vs PTxen 14 12 PT

(wpp = 4, 8, 16 overlapped) Time in sec 10 8 6 4 PTxen (wpp = 4, 8, 16 overlapped) 2 0 PTxen shows 10x speedup PT, PTxen get impacted by ppt 2009 Avaya Inc. All rights reserved. 21 Emulation vs emulxen 45 40 emul 35 Time in sec 30 emul wpp 4 emul wpp 8 emul wpp 16 emulxen wpp 4 emulxen wpp 8

emulxen wpp 16 25 20 emulxen 15 10 5 0 ppt (wpp=16) : 1 ppt (wpp=8) : 2 ppt (wpp=4) : 4 2 4 8 3 6 12 4 8 16 5 10 20 6 12 24 7 14

28 8 16 32 Emul-based gets impacted by transaction size Emulxen shows 4x speedup 2009 Avaya Inc. All rights reserved. 22 PT Call-Flow Pure User-level Xen-assisted For every dirty page User User Mprotect() declare() Mprotect() Signal OS OS Page fault Hypervisor

TLB flush Hypervisor TLB flush Page fault TLB flush 2009 Avaya Inc. All rights reserved. 23 Evaluation Source from the book Data Structures and Algorithm Analysis in C (Second Edition), by Mark Allen Weiss 2009 Avaya Inc. All rights reserved. 24 Data Structures OPS_PER_T=1 aa (AA-trees) writes avg min pages avg max

min max insert 21.9836 5 63 4.9481 1 7 delete 20.4053 2 63 6.0642 1 9 avl (AVL trees) insert

30.5609 6 39 5.1021 1 9 bin (Binomial queues) insert 27.9985 25 64 2.0735 1 10 dsl (Deterministic skip list) insert 10.4176 7

23 3.1421 1 5 hashquad (Quadratic probing hash) insert 11.3983 2 47023 1.0146 1 68 hashsepchain (Separate chaining hash) insert 4 4 4 1.9696

1 3 leftheap (Leftist heap) insert 23.5673 5 31 3.0665 1 6 heap (binary heaps) delete insert 34.0132 2.8693 0 2 59 14 9.2518 2.4009

0 1 15 5 delete insert delete insert delete 12.5523 4 1 3 2 2 4 1 3 2 15 4 1 3 2 2.7349 1.0029 1 1.8984 1

1 1 1 1 1 5 2 1 2 1 rb (Red black tree) insert 13.7011 10 28 4.6102 1 9 splay (Splay trees) insert delete 20.0851 7.7604

4 3 5262 15001 4.7745 3.0258 1 1 34 40 tree (Binary search tree) insert delete 720.7852 1.7139 4 0 1436 3 5.4576 1.7139 1 0

10 3 list (Linked list) queue (Queues) 2009 Avaya Inc. All rights reserved. 25 Evaluation Results 1 queue insert queue delete list insert list delete 0.35 Time in sec 0.3 0.25 0.2 0.15 0.1 Time in sec 0.4 0.05 0 PT mprotect

emul PTXen emulxen mprotxen 0.7 0.6 dsl insert Time in sec Time in sec 0.5 0.4 0.3 0.2 0.1 0 PT mprotect emul emulxen PTXen mprotxen 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1

0.05 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 hashquad insert hashsepchain insert PT mprotect emul emulxen PTXen mprotxen bin insert PT mprotect emul PTXen emulxen mprotxen 2009 Avaya Inc. All rights reserved.

26 Evaluation Results 2 1 1.4 0.9 splay insert splay delete 0.8 0.5 0.4 0.3 0.2 PT mprotect emul 0.6 0.4 0 PTXen emulxen mprotxen 25 tree insert

tree delete 20 15 Time in sec Time in sec 0.8 0.2 0.1 0 aa delete 1 0.6 Time in sec Time in sec 0.7 aa insert 1.2 10 5 0 PT

mprotect emul emulxen PTXen mprotxen PT mprotect emul 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 PTXen emulxen mprotxen leftheap insert leftheap delete PT mprotect emul emulxen PTXen

mprotxen 2009 Avaya Inc. All rights reserved. 27 Evaluation Results 3 0.6 1.2 heap insert 0.4 0.3 0.2 0.8 0.6 0.4 0.1 0.2 0 0 mprotect PT emul emulxen PTXen mprotxen rb insert

avl insert 1 Time in sec Time in sec 0.5 mprotect PT emul emulxen PTXen mprotxen Scanxen shows almost constant 2.5sec across all 2009 Avaya Inc. All rights reserved. 28 Evaluation Summary 16 14 12 speedup emulxen speedup mprotxen PTXen Speedup (1=100%) 10 8

6 4 2 0 t t t t t t t t t t t t er lete ser lete ser lete ser ser ser lete ser lete ser lete ser lete ser lete ser ser s e in e in e in e in in e in e in e in in in e in in e- e-d list- st-d ad- in-d sl- in- ay- y-d aa- a-d ee- e-d ap- p-d ap- p-d rb- vlu d b pl a a tr tre e ea e ea li qu ha la e eu h h

s sp th fth f c qu qu h s p le le ha hse s ha Emulxen has up to 4x speedup compared to emulation PTxen has up to 13x speedup compared to PT 2009 Avaya Inc. All rights reserved. 29 Transaction Aggregation OPT=1 A single operation (e.g. an insert or a delete) OPT=5 Multiple operations merged into one transaction # of writes increases linearly # of unique pages touched remains same in most cases It should benefit PT-based approaches Because of their heavy dependence on PPT Details in the paper 2009 Avaya Inc. All rights reserved. 30

Conclusion Family of application checkpointing techniques introduced Emulation-based techniques Useful for small transactions [fewer # of writes] Hypervisor-Assisted Application Checkpointing 4x~13x than userspace implementation 2009 Avaya Inc. All rights reserved. 31 Thank you! 2009 Avaya Inc. All rights reserved. 32 Extra Slides 2009 Avaya Inc. All rights reserved. 33 Emulation vs PT Note scale difference 3.5 35 ppt 4 emul 30 ppt 4 mprotect 3

ppt 8 emul 20 15 ppt 8 mprotect 2.5 Time in sec Time in sec 25 ppt 4 emulxen ppt 4 mprotxen ppt 8 emulxen ppt 8 mprotxen 2 1.5 10 1 5 0.5 0 0 Emul-based is good for small transaction

Roughly wpp=5 and wpp=1.3 is breakeven point 2009 Avaya Inc. All rights reserved. 34 Note scale difference Scanxen vs PT Scanxen heapsize 5MB 14 4MB 1 3MB PT 2MB 1MB Time in sec Time in sec 10 6 Scanxen heapsize 120KB 80KB 40KB

1.2 12 8 1.4 0.8 0.6 PTxen 4 0.4 2 0.2 0 0 For small buffer and large ppt, scanxen might be better Not the case in our experiments 2009 Avaya Inc. All rights reserved. 35 Scanxen vs emulation 45 40

emul Time in sec 35 30 25 Scanxen scanxen wpp 4 emul wpp 4 emulxen wpp 4 20 15 10 emulxen 5 0 Scanxen might be better than emulation For big transactions 2009 Avaya Inc. All rights reserved. 36 0.25 0.2 No-HA mprotx en

Time in sec 0.15 0.1 0.05 0 2009 Avaya Inc. All rights reserved. 37 PTxen 0.06 0.05 0.04 0.03 0.02 0.01 0 scanxen 3 2.5 2 1.5 1 0.5 0 2009 Avaya Inc. All rights reserved. 38 Operations per transaction OPT=5 , Merging transaction

No impact to emulation-based ones Some slowdown for scanxen Merging transactions Total # of pages written goes down effectively PT and PTxen becomes much better than emul/emulxen Still 13x improvement between PT and PTxen 2009 Avaya Inc. All rights reserved. 39 Evaluation OPT=1 , 10000 Transactions 1.2 1.2 1 1 0.8 0.8 Time in sec Time in sec OPT=5, 2000 Transactions rb insert avl insert

0.6 0.4 0.2 rb insert avl insert 0.6 0.4 0.2 0 mprotect emul emulxen mprotxen 0 mprotect emul emulxen mprotxen 2009 Avaya Inc. All rights reserved. 40 Amount of sent in KB Bandwidth : Amount 400000 350000 300000 250000

200000 150000 100000 50000 0 mprotect-based 6000 Amount of sent in KB 5000 4000 Note that tree-insert is 56311.34375 which is out of scale. emul-based 3000 2000 1000 0 tree-insert; 56311.34 Emul-based mostly less than 2MB No diff process for emul-based 2009 Avaya Inc. All rights reserved. 41 Time in sec

Bandwidth : Time 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 mprotxen mprotect scanxen 0.02 Time in sec 0.02 emulxen emul 0.01 0.01 0 -0.01 Emul-based mostly less than 5ms 2009 Avaya Inc. All rights reserved. 42

Bandwidth : Percentage 70 Percentage 60 mprotxen 50 40 30 20 10 0 12 10 Percentage 8 emulxen emul mprotect scanxen 6 4 2 0 -2 Relatively small fraction Except PTxen --- due to its minimum runtime 2009 Avaya Inc. All rights reserved. 43

Microbenchmark PT scanxen PTxen 2009 Avaya Inc. All rights reserved. 44 emul scanxen emulxen 2009 Avaya Inc. All rights reserved. 45 60 Time in sec 50 40 30 mprotect wpp 4 mprotxen wpp 4 scanxen wpp 4 emul wpp 4 emulxen wpp 4 PT emul

scanxen 20 10 emulxen PTxen 0 2009 Avaya Inc. All rights reserved. 46 Microbenchmark Tstart() of PT Tend() of PT writes Three separate mprotect() calls Tstart() of PTxen writes Tend() of PTxen Single PTxen() call Transactional heap Dirty pages in Transactional heap 2009 Avaya Inc. All rights reserved.

47 Main process Diff process dirty page Backup process diff Network 2009 Avaya Inc. All rights reserved. 48

Recently Viewed Presentations

  • (Breather) Principles of Secure Design by Matt Bishop

    (Breather) Principles of Secure Design by Matt Bishop

    Examples: ssh, login mechanism Exercises Which of these principles apply to operating systems, and which of them are followed by Linux/Unix? Which are followed by Windows? What would be the effect of checking EACH I/O file access for permission? Assume...
  • Types of Speeches

    Types of Speeches

    Expository. This speech serves to provide often interesting and useful information. In short, the purpose is to inform your audience. Some examples of expository speeches: A teacher telling students about the Canadian National Anthem. A student talking about her research....
  • AS - Design

    AS - Design

    AS - Design 1. Development of Technologies in Design ... The aim of Futurism was to object to traditional conventionalism and to wage war against the art of the 19th Century. The 20th Centaury was a new time when people...
  • Important Dates  Charlemagne Dies: 814  Feudalism spreads throughout

    Important Dates Charlemagne Dies: 814 Feudalism spreads throughout

    The typical western European manor in the 13th century consisted partly of the cottages, huts, and barns and gardens of its peasants, which were usually clustered together to form a small village. There might also be a church, a mill,...
  • OpenGL 2004-5-31 BIT Computer OpenGL  ? High performance

    OpenGL 2004-5-31 BIT Computer OpenGL ? High performance

    OpenGL 2004-5-31 BIT Computer OpenGL 이란? High performance API for 2D/3D Computer Graphics OpenGL 은 대화형 2D 와 3D 컴퓨터 그래픽스 응용프로그램을 위한 API.
  • DAYS OF ELIJAH - The Christian Mission

    DAYS OF ELIJAH - The Christian Mission

    DAYS OF ELIJAH From the Songbook by EVSU Life Group These are the days of Elijah, Declaring the word of the Lord; And these are the days of Your servant Moses, Righteousness being restored. And though these are the days...
  • Creating Rich Business Applications Using Microsoft Office SharePoint

    Creating Rich Business Applications Using Microsoft Office SharePoint

    Format Table. Conditional formatting (high light and databars) Publish. SharePoint App Overview. Workstation App overview. Data entry into SharePoint lists. Excel Analysis. ... // Causes the current UDF to impersonate the user that loaded it through the EWA.
  • Ons 2017 - Onap

    Ons 2017 - Onap

    Every Resource can be exposed as a Service. The ONAP model supports this today through the "Allotted Resource" construct. This concept of "Allotted Resource" does not seem to appear in the ETSI model. Perhaps this is due to ETSI seemingly...