The Deconstruction of Dyninst ProcControlAPI, ParsingAPI, and Binary

The Deconstruction of Dyninst ProcControlAPI, ParsingAPI, and Binary

The Deconstruction of Dyninst ProcControlAPI, ParsingAPI, and Binary Analysis Matthew LeGendre and Nathan Rosenblum Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 This Talk o Componentization o Overview o Principles o Challenges o Techniques o ProcControlAPI o ParsingAPI o Binary Analysis Components The Deconstruction of Dyninst 2 Dyninst and the Components

= New Component = Existing Component = Proposed AST Code Gen SymtabAPI SymtabAPI Binary Parsing Parsing API API Process Binary Patching DyninstAPI

Instruction Instruction API API SymEval SymEval DepGraph DepGraph API API Stackwalker API ProcControl ProcControl API API Binary

Guiding Principles o Clean Abstractions o Hide complexity o Necessary for portability o Portability o Same interface across multiple systems o System differences not visible o Sharing o Quickly build new tools The Deconstruction of Dyninst 4 Challenges o Interface and Abstractions o Finding the right level of detail o Building general components o Allowing inter-operability between components o Implementation

o Integrating back into DyninstAPI The Deconstruction of Dyninst 5 Challenges The Right Abstractions o Ask What will a user do with our tool? o e.g., Why does someone need dynamic linking info from SymtabAPI? o Get the right level of detail in interface o Dont want to provide thin wrapper libraries. o e.g., SymtabAPI vs. libbfd or libelf o High level interface allows platform independence The Deconstruction of Dyninst 6 Challenges General Components

o Did we ask the right questions? o DyninstAPI: o Operates third-party o Focuses on instrumentation o Used in HPC and security o Embedded systems, software testing, ? o Hard to anticipate users needs o E.g., StackwalkerAPI written in C++ The Deconstruction of Dyninst 7 Challenges Interoperable components o Easy to combine components we built: o StackwalkerAPI depends on SymtabAPI o ParsingAPI depends on InstructionAPI o Harder to combine components between groups: o Abstraction mismatch between MRNet and

LaunchMON o How strong should dependencies between components be? The Deconstruction of Dyninst 8 Challenges Put it back together o Re-integration can temporarily leave two implementations. Actual: Ideal: DyninstAPI IA-64/ InstructionA SparcSymtabAPI PI Instruction Parsing ProcControl API

Control Flow API o Extra maintenance costs The Deconstruction of Dyninst 9 Examples of What We Did Right o Function abstraction in SymtabAPI o Level of detail in InstructionAPI o Flexibility in StackwalkerAPI o Hiding complexity in ProcControlAPI The Deconstruction of Dyninst 10 SymtabAPI Function Abstraction Why do users look up code symbols? Because they care about functions! Function Symbol

Type Signatur e Local Variables Code Range Symbol Symbol The Deconstruction of Dyninst 11 InstructionAPI Level of Detail o Instructions are very platform dependent o Dont want to hide all platform details o Dont want to show all platform details o Targeting binary analysis

o Common operations platform independent o Read/Write sets o Bind/Eval o The Deconstruction of Dyninst 12 StackwalkerAPIs Flexibility o StackwalkerAPI easily customized o Plug-in interface o 1st vs. 3rd party Stackwalking o Custom symbol access layer o Custom frame stepping routines o Defaults provided for common case The Deconstruction of Dyninst 13 Achievements o More flexibility within our tools

o Static binary rewriter benefited from componentization o DyninstAPI more maintainable o Fine-grained testing o Easier internal learning curve o Opportunity to spring-clean code base. The Deconstruction of Dyninst 14 Achievements o Easier to adopt o More users o HPCToolkit, STAT, Libra, CBI, Cray APT, Open|SpeedShop, TAU, o Quickly develop new end user tools o STAT uses MRNet, StackwalkerAPI, SymtabAPI and LaunchMON The Deconstruction of Dyninst

15 ProcControlAPI o A component library for controlling and monitoring processes. Controll er Process Control/Query Process Target Process Target Process ProcControlAP I Library Receive Events The Deconstruction of Dyninst

Target Process 16 Example Operations o Control Processes o Read/Write memory and registers o Pause/Resume threads o Insert Breakpoints o Receive Events o Thread creation/destruction o System calls (fork/exec/) o Signals The Deconstruction of Dyninst 17 ProcControlAPI Use Cases o StackwalkerAPI o Read stack memory from target processes o Get list of loaded libraries

o DyninstAPI o Write instrumentation to processes o Handle process events (fork/exec) o Monitor threads o Debugger o Everything The Deconstruction of Dyninst 18 ProcControlAPI Goals o Platform Independent Interface o Implemented On: Linux, Windows, BlueGene, AIX, Solaris, VXWorks, FreeBSD o Simple Interface (also powerful) o High-level abstractions in interface. E.g., o Breakpoints o Inferior RPCs The Deconstruction of Dyninst

19 Complexities o Handle multiple inputs Process Control User API Calls OS Debug Interface Event Pipe The Deconstruction of Dyninst 20 Complexities o Handle multiple inputs Event Generator Threads

Process Control Event Generator Threads User API Calls OS Debug Interface Event Pipe o Add threads for each input! The Deconstruction of Dyninst 21 Complexities o Event handling may block Event Generator Threads

Process Control User API Calls Event Generator Threads OS Debug Interface Event Pipe Event Handler Thread o Add thread for event handling! The Deconstruction of Dyninst 22 Complexities o May be handling multiple events Event

Generator Threads Process Control User API Calls Event Generator Threads OS Debug Interface Event Pipe Event Handler Threads o Add multiple event handler threads! The Deconstruction of Dyninst 23

Complexities o Linux allows only one thread to call ptrace Event Event Generator Threads Process Control Generator Threads OS Debug Interface User API Calls Event Pipe Event Handler Threads Ptrace Thread

o Add dedicated ptrace thread! The Deconstruction of Dyninst 24 Complexities o Dont want handlers delivering callbacks Event Event Generator Threads User API Calls Process Control Generator Threads OS Debug Interface

Event Pipe Event Handler Threads Ptrace Thread o Send callbacks to user thread for delivery! The Deconstruction of Dyninst 25 Complexities o Linux 2.4 wants waitpid and ptrace on same thread Event Event Generator Threads User API Calls Process Control

Generator Threads OS Debug Interface Merged Ptrace & Generator Thread Event Pipe Event Handler Threads Ptrace Thread o Merge ptrace and OS generator threads! The Deconstruction of Dyninst 26 Complexities o Windows wants continue calls on

generator thread Event Event Generator Threads Process Control User API Calls Generator Threads OS Debug Interface Merged Ptrace & Generator Thread Event Pipe Event Handler Threads

Ptrace Thread o Send continue events to generator! The Deconstruction of Dyninst 27 More Complexities o Some Linux versions kernel panic if children are forked by a non-ptracer thread. o Need to handle multiple target processes. o Need to handle multi-threaded target processes. o BlueGene hasThehigh-latency Deconstruction of Dyninst debug 28 Interface o Hide complexity o User sees only one thread o High level abstractions

o Consistent across platforms o Two primary interfaces: o Query/Control target process. o Receive and handle events The Deconstruction of Dyninst 29 Process Class o Handle to target process o Create and attach to processes o Access process memory o Insert breakpoints o Track library loads/unloads o Track thread creation and destruction o Stop/Continue all threads o Detach from/terminate target process The Deconstruction of Dyninst 30 Thread Class

o Handle to thread in target process o Stop/Continue individual thread o Get/Set registers o Single step o Thread represents OS level handle for threads o May also provide thread library information The Deconstruction of Dyninst 31 Inferior RPCs o Insert and invoke code in target process (gdb) call getpid() $1 = 14218 (gdb) o User provides machine code to run. o ProcControlAPI allocates memory, saves registers, runs code, restores registers and memory. o Returns result of iRPC

The Deconstruction of Dyninst 32 Events o Can register callbacks for process o Fork o Breakpoint events: o o o o o Exec Thread Create Thread Destroy Library Load Library Unload o o

o o o RPC Completion Exit Crash Single Step Signal o Some events can have pre/post times o E.g., pre-exit and post-exit The Deconstruction of Dyninst 33 Callbacks o Events are delivered via callbacks o User registers callbacks for interesting events o Only certain ProcControlAPI functions will trigger callbacks

o Restrictions on callback functions o Can not call anything that would recursively trigger more callbacks. o This prevents races The Deconstruction of Dyninst 34 Notification o Want to deliver callback on user thread o User thread may be busy or blocked User Code on_fork_callback() { } ProcControlAPI Fork Event handleEvents() { read(...)

} o Notify user that event is pending via Callback or FD The Deconstruction of Dyninst 35 Implementation o Internally threaded o One thread to receive events (Generator) o One thread to handle events (Handler) o One thread to perform ptrace calls on Linux o User thread to handle events, receive callbacks, or trigger operations. o Event handling is a state machine to avoid recursive events. The Deconstruction of Dyninst 36 Implementation User Thread

ProcControlAPI User API Calls Event Generator Threads OS Debug Interface Ptrace Thread Event Handler Thread o Use state machine to keep one handler thread. o No support for problematic OSs (Linux 2.4) o May expand thread model for parallelism The Deconstruction of Dyninst 37

Current Status o Currently on Linux/x86, Linux/x86_64 o Next platforms are Linux/ppc, BlueGene, and FreeBSD. o Windows, AIX, and Solaris support to follow. o Beta release available upon request. The Deconstruction of Dyninst 38 Intermission Binary parsing ain m ck _lo foo

o _fo dynamic instrumentation, debugger, static binary analysis tools, malware analysis, binary editor/rewriter, The Deconstruction of Dyninst 40 Familiar territory Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks, and Scott G. Robinson. Binary translation. 1993. Amitabh Srivastava and Alan Eustace. ATOM: a system for building customized program analysis tools. 1994. Barton Miller, Jeffrey Hollingsworth, and Mark Callaghan. Dynamic Program Instrumentation for Scalable Performance Tools. 1994. Cristina Cifuentes and K. John Gough. Decompilation of binary programs. 1995Extracting safe and precise control flow from binaries. HenrikTheiling. 2000. Benjamin Schwarz, Saumya Debray, and Gregory R. Andrews. Disassembly of

executable revisited. 2002of virtual method invocation for binary J. Troger and C. code Cifuentes. Analysis translation. Christopher 2002. Kruegel, William Robertson, Fredrik Valeur, and Giovanni Vigna. Static disassembly of obfuscated binaries. 2004. Laune C. Harris and Barton P. Miller. Practical analysis of stripped binary code. 2005.Chinchani and Eric van den Berg. A fast static analysis Ramkumar approach to detect exploit code inside network flows. 2005. Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, and Karen Hunt. Learning to analyze binary computer code. 2008. 41 Weve been down this road the DYNINST binary parser

recursive traversal parsing gap parsing heuristicsprobabilistic code models non-contiguous functions code sharing non-returning functions preamble scanning handles stripped binaries The Deconstruction of Dyninst learn to recognize function entry points very accurate gap parsing

42 What makes a parsing component? 2 1 abstracti on 10 101 0 0 1 011 10101 0 1 101 10100 0 0 1 1 0 11 011 01 1 0 01 01

010 1 010 11001 1 0 1 0 1 0 1 10 0 1 1 010 00100 1 010 0 111 Binary code source

1 10 01 0 1 10 functions Parsing API InstructionAPI 010 1 simple, intuitive representati on edges blocks SymtabAPI platform independence

3 supported by previous Dyninst components The Deconstruction of Dyninst 43 Flexible code sources Parser code source requirements: code location access to code bytes a few (optional) facts unsigned char * buf pointer width function hints & names 41 56 49 89 fe 41 55 PLT

code main external linkage data foo bar ba z The Deconstruction of Dyninst a binary code object 44 Code source contract bool isValidAddress

bool isExecutableAddress void * getPtrToInstruction void * getPtrToData unsigne d getAddressWidth bool isCode bool isData

Address codeOffset Address codeLength Nine mandatory methods SymtabAPI implementation in 232 lines (including optional hints, function names) Any binary code object that can be memory mapped can be parsed The Deconstruction of Dyninst 45 Simple control flow interface

Functions contain Blocks joined by Edges in edges start addr. start addr. src extents type tar g end addr.

out edges The Deconstruction of Dyninst 46 Views of control flow Walking a control flow graph while(!work.empty()) { Block *b = work.pop(); /* do something with b */ here g n i t r sta edgeiter eit = b->out().begin(); while(eit != b->out().end()) {

work.push(*eit++); } } What if we only want intraprocedural edges? The Deconstruction of Dyninst 47 Edge predicates Walking a control flow graph while(!work.empty()) { Block *b = work.pop(); /* do something with b */ IntraProc pred; edgeiter eit = b->out().begin(&pred); while(eit != b->out().end()) { work.push(*eit++); } Edge Predicates

Tell iterator whether Edge argument should be returned Composable (and, or) Examples: } The Deconstruction of Dyninst Intraprocedural Single function context Direct branches only 48 Extensible CFG objects ParseAPI Function Simple, only need to represent control flow graph Dyninst image_func

Complex, handles instrumentation, liveness, relocation, etc. Functio n Special callback points during parsing Factory interface for CFG objects custo m factor y parse parse parse unresBranchNotify(insn) image_func [derived class does stuff] parse parse parse

The Deconstruction of Dyninst mkfunc() (Function*) image_func pars er 49 * box to be released soon Whats in the box? Control Flow Graph SymtabAPI-based Binary Parser Representation Code Source recursive descent parsing speculative gap parsing cross platform:

x86, x86-64, PPC, IA64, SPARC graph interface extensible objects for easy tool integration exports Dyninst InstructionAPI interface The Deconstruction of Dyninst cross-platform supports ELF, PE, XCOFF formats 50 Status

conception code refactoring interface design Dyninst re-integration (major test case) other major test case: compiler provenance (come tomorrow!) The Deconstruction of Dyninst 51 Intermission Binary Analysis SymEval Components Instruction Semantics

DepGraphA PI Alias Analysis AST Simplificatio n The Deconstruction of Dyninst 53 Instruction Semantics = ea x add 4,%eax + ea

x 4 Instruction Semantics InstructionAPI ROSE o Instructions into semantic ASTs o Built with ROSE The Deconstruction of Dyninst 54 Alias Analysis push %ebp mov %esp,%ebp sub $12,%esp lea 4(%ebp),%eax mov $12,(%eax) mov 16(%esp),(%ebx) mov 4(%ebp),0x8084100 leave

ret push %ebp mov %esp,%ebp sub $12,%esp Local lea 4(ebp), %eax 1 Local mov $12,(%eax) 1 Param 1 Unknown mov -8(%esp), %ebp Local Global mov 4(ebp),0x8084100 1 leave ret o Identify stack and global variables

o Plug-in more sophisticated analysis The Deconstruction of Dyninst 55 AST Simplification = eb x mov shl add add mov $10,%ebx $2,%ebx $2,%ebx %ebx,%eax $5,(%eax) eb x

eb x = 10 = << eb x = 2 5 + ea x + eb

x * 42 2 The Deconstruction of Dyninst 56 DepGraphAPI o Build Control Dependence Graphs (CDG) o Why am I executed o Build Data Dependence Graphs (DDG) o Where do my inputs come from? o Where do my outputs go? o Already beta released in Dyninst 6.0 The Deconstruction of Dyninst 57

DepGraphAPI - Slicing sub $4,%esp inc %edx pop %ebx mov 0x1000,%ecx lea 10(%ebx),%eax add $2,%ecx call %eax sub $4,%esp pop %ebx %esp lea 10(%ebx),%eax call %eax o Derive slices from CDG and DDG o New features to build slices on-the-fly. o Cheaper for small numbers of slices The Deconstruction of Dyninst 58

Example: Jump Tables ... cmp $0xa,%eax ja 0x804c900 inc %ecx lea 0x1920(%rip),%rdx mov (%rdx,%rax,4),%rax add %rdx,%rax jmpq *%rax Where can the jump go? What is %raxs final value? The Deconstruction of Dyninst 59 Example: Jump Tables ... cmp $0xa,%eax ja 0x804c900 inc %ecx

lea 0x1920(%rip),%rdx mov (%rdx,%rax,4),%rax add %rdx,%rax jmpq *%rax cmp $0xa,%eax ja 0x804c900 lea 0x1920(%rip),%rdx mov (%rdx,%rax,4),%rax add %rdx,%rax jmpq *%rax 1. Slice from jump for relevant instructions The Deconstruction of Dyninst 60 Example: Jump Tables cmp $0xa,%eax ja 0x804c900 lea 1920(%rip),%rdx mov (%rdx,%rax,4),%rax

add %rdx,%rax jmpq *%rax cmp $0xa,%eax Code ja 0x804c900 lea 0x1920(%rip),%rdx [email protected] mov (%rdx,%rax,4),%rax add %rdx,%rax jmpq *%rax 2. Run alias analysis to simplify The Deconstruction of Dyninst 61 Example: Jump Tables = rax cmp $0xa,%eax Code ja 0x804c900 lea 0x1920(%rip),%rdx

[email protected] mov (%rdx,%rax,4),%rax add %rdx,%rax jmpq *%rax rdx = rax + x rax + rax rdx = rdx [email protected]

3. Use Instruction semantics to convert to ASTs The Deconstruction of Dyninst 4 62 Example: Jump Tables = rax x rax + rip + rdx =

rax = 4 + [email protected] = rdx * [email protected] rax rdx + rax [email protected]

4. Use AST Simplification to get single AST The Deconstruction of Dyninst 63 x 4 Example: Jump Tables = rip + * [email protected] Reference to RO data + [email protected]

x rax 4 Single Unknown Input The Deconstruction of Dyninst 64 Status of Analysis Components o Work in progress o Build public interfaces o Finish implementations o Currently being used for o Concolic execution o Safe code relocation o Jump tables o Fingerprinting o DepGraphAPI in beta The Deconstruction of Dyninst

65 Conclusions o ProcControlAPI & ParsingAPI o New components o Available for friendly beta o Binary Analysis o Set of components for binary analysis o Still under development. The Deconstruction of Dyninst 66 Questions? The Deconstruction of Dyninst 67

Recently Viewed Presentations

  • WOW Satire

    WOW Satire

    Origin 1885-90; orig. Scots, variant of pernickety Related forms persnicketiness, noun synonym Careful Choosy Picky Nice Finicky antonyms Open Unconcerned Unfussy Dainty Eclectic Exacting Fastidious Finicky Nice Overparticuar Picky Prissy Particular Selective Related words I tried feeding my cat some...
  • 5 min Lesson Plan - THE LITERACY SHED

    5 min Lesson Plan - THE LITERACY SHED

    The 5 minute Literacy Plan: 9th Dec 2013 part 2 The BIG picture? Stickability! Differentiation SC and A f L Teacher Led or Student Led? Watch film stills. Chn suggest dialoge that could be spoken by either character (write on...
  • New President SEAS Advisory Council  BS, Manufacturing Engineering,

    New President SEAS Advisory Council BS, Manufacturing Engineering,

    American Red Cross Nurse Badge project. In cooperation with the American Red Cross (ARC) Chief Nurse, a group of CSE majors created a badge database and web portal application (Doug Troy - adviser) From Dr. Sharon Stanley, Chief Nurse, ARC:...
  • Proiectarea si Dezvoltarea Serviciilor Distribuite Mugurel Ionu Andreica

    Proiectarea si Dezvoltarea Serviciilor Distribuite Mugurel Ionu Andreica

    Proiectarea si Dezvoltarea Serviciilor Distribuite Mugurel IonuČ› Andreica 2012 Continutul cursului Concepte de baza privind proiectarea si implementarea de servicii (client-server) Arhitectura software Paralelism (thread-uri multiple, core-uri multiple pe aceeasi masina) Distributia serviciilor pe mai multe masini (replicarea functionalitatii, echilibrarea...
  • NAAQS Review and Other Implementation (Regional Perspective)

    NAAQS Review and Other Implementation (Regional Perspective)

    NAAQS Review and Other Implementation (Regional Perspective) Metro4-SESARM MeetingAtlanta, Georgia. May 21, 2019. Air Planning and Implementation Branch Update. Lynorae Benjamin
  • Chapter 1

    Chapter 1

    Pavlov and Skinner. Humanistic- behavior reflects innate 'actualization', developed by Maslow and Rogers. Focus on conscious forces and self perception. Perspectives (continued) ... Chomsky-studied language. Cybernetics-science of information processing.
  • COS311: Migrating Applications to SQL Azure

    COS311: Migrating Applications to SQL Azure

    Migrating to SQL Azure form MySql. Moving Schema and Data into SQL Azure. Transferring Large Data into SQL Azure. ... Full-text Search, Remote Data Access and Linked Servers, Distributed Transactions, Change Tracking, Service Broker etc. ... ADO.Net, OleDB. Fully programmable...
  • Jim and Sandi Curtin

    Jim and Sandi Curtin

    Curtin Family Training winners at 229 Dawsons Rd for 56 years. Reg Curtin. 1547 Starters. 334 Wins. 187 2nds. 144 3rds. Jim Curtin. 2130 Starters. 289 Wins. 255 2nds. 259 3rds. 3677 times horses have left this property to race....