ECE 424 Embedded Systems Design Lecture 8 & 9 & 10: Embedded Processor Architecture Chapter 5 Ning Weng Introduction Concepts with processor within SOC Focus on 32 bit Large data set, high performance and complex applications

Basic execution environment Application binary interface Instruction classes Interrupts Memory mapping and protection Memory hierarchy Ning Weng ECE 424

2 Basic Execution Environment ARM and PowerPC Generalized Registers Intel Segment Registers Ning Weng ECE 424 3

Basic Execution environment Instruction Pointer Program counter maintained by the CPU ARM vs. Intel ARM allows direct modification Intel uses Extended Instruction Pointer (EIP) Flags Status Flags ADD, SUB, and MUL Additional bit flags Information for the programmer and OS.

Ning Weng ECE 424 4 Basic Execution environment Stack Pointer Program stacks are created in system memory Store local variables, allocate storage, pass function arguments Base Pointer Allows a program to manage the function calling hierarchy on the stack.

Memory Dump Also known as frame pointer Ning Weng ECE 424 5 Execution Environment Ning Weng ECE 424 6

Privilege Levels Different levels are provided for security and resource access control Highest privilege (0) for OS and lowest privilege (3) for user programs Helps manage system resources Level 0 Linux Kernel Mode Windows Ring 0 Ning Weng ECE 424

7 Processor Specifics Intel specific CPUID instruction Instructions to display the feature set of a processor Helps in performance tuning Example Linux command /cat/proc/cpuinfo utilizes CPUID instruction RDTSC Returns the maximum number of ticks at maximum processor frequency Very accurate Ning Weng

ECE 424 8 Application Binary Interface Describes the low-level interface between an application and the operating system Useful for debugging Ning Weng ECE 424 9

Processor Instruction Classes Immediate Operands MOV EAX, 00 Register Operands Source/Destination can be almost any register (RISC) Memory Operands MOV [EBX], EAX Data Transfer Instructions MOVNTI Ning Weng ECE 424

10 Processor Instruction Classes Arithmetic Instructions ALU Binary Operations Decimal Operations Logical Operations AND, OR, XOR, NOT Shift Rotate Operations Bit/Byte Operations Tables on page 116-117

Ning Weng ECE 424 11 Branch and Control Flow Instructions Needed to control the flow of a programs execution Examples Jmp, LOOP, CALL, RET Ning Weng ECE 424

12 SIMD Instructions Single Instruction Multiple Data Intel SSE Used to optimize: Speech recognition algorithms Video display and capture routines 3D graphics

Encryption algorithms In theory, the speed up is 75% over SISD instructions Ning Weng ECE 424 13 Exception/Interrupt Sources Ning Weng ECE 424

14 Vector Table Structure PowerPC and ARM Look up vector table using software Intel Processor itself identifies fault without software Segment Selector Selects segment from descriptor table Segment Offset Produces the linear address Privilege Level

Set to zero Ning Weng ECE 424 15 Interrupt Descriptor Dereferencing Ning Weng ECE 424 16

Masking Interrupts Exception frame Format of saved interrupt/exception data Masking Interrupts Pros Disable during parallel pointer operations Cons Can be processed by other hardware threads Degrades system performance Ning Weng

ECE 424 17 Stack Frames Ning Weng ECE 424 18 Components of Interrupt Latency

Ning Weng ECE 424 19 Memory Mapping and Protection Memory Protection Unit (MPU) Defines valid parts of the system memory map and access control Regulates cacheable memory regions Memory Management Unit (MMU)

Protection and fine-grained address translation between linear/virtual and physical address A building block of a virtual memory system Allows OS to overcommit memory on applications by moving data from and to disks via scheme paging Paging increasing memory efficiency by moving infrequently-used part of programming working memory from RAM to disk. The unit of transfer is a fixed-sized called page Page fault: thousand of cycles, not used for Ning Weng embedded systemsECE 424 20 Address Translation

Ning Weng ECE 424 21 Memory Hierarchy CPU Stalls Often caused on read operations Write buffering to avoid write stalls Logic Gate Memory Buffering Static RAM (SRAM) Access time: 1-2 CPU Core Clocks

Dynamic RAM (DRAM) 100 ns Mass Storage Very Slow, 1000s cycles Ning Weng ECE 424 22 Cache Hierarchy Temporal locality If a memory location was accessed then it is likely to be accessed again soon

Spatial locality A program is likely to access a memory address close to the current access Hierarchy CPU -> L1 -> L2 L2 is slower as it is larger, and farther away Cache allocation Evicts data based on a missed memory read Ning Weng ECE 424

23 Six-Way Set Associative 24K Ning Weng ECE 424 24 Cache Coherency Ensures that content from peripherals is correct Responsibility of drivers Snoop Cycles Processor snoops during memory transactions

Cache Flushing Simplifies software development and debugging Multicore Processors use MESI to share cache data Ning Weng ECE 424 25 MESI (Modified, Exclusive, Shared, Invaled) Stands for the different states of cache

MESI is widely popular in muliprocessor designs such as: Intel architectures, ARM11 and ARM Cortex-A9 MPCores Ning Weng ECE 424 26 System Bus Bus Addresses PCIe offers direct memory access to the system memory Create addresses that relate to a physical system

address Device drivers convert virtual addresses to physical addresses Intel PCIe bus is 1:1 with the physical addresses SoCs however require a translation System Bus Interface Front Side Bus (FSB) Connects CPU and Memory Controller Hub Ning Weng ECE 424 27 Ning Weng

ECE 424 28 Summary Concepts with processor within SOC Basic execution environment

Application binary interface Instruction classes Interrupts Memory mapping and protection Memory hierarchy Ning Weng ECE 424 29

