INTRODUCTION ARM is a RISC processor. It is

INTRODUCTION ARM is a RISC processor. It is

INTRODUCTION ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture low power consumption. ARM System - On - Chip A 2 TIMELINE (1/2)

1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.). 1991: ARM6, First embeddable RISC microprocessor. 1992 1994: Various companies use ARM (Sharp, Samsung), while in 1993 ARM7, the first multimedia microprocessor is introduced. ARM System - On - Chip A 3 TIMELINE (2/2) 1995: Introduction of Thumb and ARM8. 1996 2000: Alcatel, Huindai, Philips, Sony, use RM, while in 1999 ARM cooperates with

Erickson for the development of Bluetooth. 2000 2002: ARMs share of the 32 bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced. ARM System - On - Chip A 4 THE ARM ARCHITECTURE GENERAL INFO (1/2) AIM: Simple design Load store architecture 32 bit data bus 3 addressing modes ARM

System - On - Chip A 6 GENERAL INFO (2/2) Simple architecture + Simple instruction set + Code density ARM Small size Low power consumption System - On - Chip A 7 Registers

32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode. ARM System - On - Chip A 8 ARM Programming Model r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10

r11 r12 r13 r14 r15 (PC) CPSR user mode usable in user mode system modes only r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq SPSR_fiq fiq mode

r13_svc r14_svc r13_abt r14_abt SPSR_svc SPSR_abt svc mode abort mode r13_irq r14_irq r13_und r14_und SPSR_irq SPSR_und irq mode

undefined mode CPSR 31 28 27 ARM CPSR format NZCV unused 8 7 6 5 4 IF T 0 mode N: Negative

Z: Zero C: Carry V: Overflow Q: Saturation (for enhanced DSP instructions) ARM System - On - Chip A 10 Memory Organization bit 31 bit 0 23 22 21 20 19 18

17 16 15 14 13 12 11 10 9 8 7 6 5

4 2 1 0 word16 Address bus: 32 bits 1 word = 32 bits half-word14 half-word12 3 word8 byte6 half-word4

byte address byte3 byte2 byte1 byte0 ARM System - On - Chip A 11 Instruction Set Three instruction types Data processing Data transfer Control flow ARM

System - On - Chip A 12 Supervisor mode In user mode the operating system handles operations outside user privileges. Using supervisor calls, the user goes to system level and can perform system functions. ARM System - On - Chip A 13 I/O System

ARM handles peripherals as memory mapped devices with interrupt support. Interrupts: IRQ: normal interrupt FIQ: fast interrupt ARM System - On - Chip A 14 Exceptions Exceptions:

Interrupts Supervisor Call Traps When an exception takes place: The value of PC is copied to r14_exc The operating mode changes into the respective exception mode. The PC takes the exception handler vector address. ARM System - On - Chip A 15 ARM programming model r0 r1 r2 r3

r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) CPSR user mode usable in user mode system modes only r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq

r14_fiq SPSR_fiq fiq mode r13_svc r14_svc r13_abt r14_abt SPSR_svc SPSR_abt svc mode abort mode r13_irq r14_irq

r13_und r14_und SPSR_irq SPSR_und irq mode undefined mode THE ARM INSTRUCTION SET Data Processing Instructions (1/2) Arithmetic Operations ADD r0, r1, r2 ; r0:= r1+r2 and dont update flags ADDS r0, r1, r2 ; r0:= r1+r2 and update flags Logical Operations AND r0, r1, r2 ; r0:= r1 AND r2

Register Movement MOV r0, r2 Comparison CMP r1, r2 ARM System - On - Chip A 18 Data Processing Instructions (2/2) Operands: Immediate operands ADD r3, r3, #1

Shifted register operands: ADD r3, r2, r1, LSL #3 Miscellaneous data processing instructions: Multiplication: MUL r4, r3, r2 ARM System - On - Chip A 19 Data transfer instructions Load and store instructions:

LDR r0, [r1] STR r0, [r1] Offset: LDR r0, [r1,#4] Post indexed: LDR r0, [r1], #16 Auto indexed: LDR r0, [r1,#16]! Multiple data transfers: LDMIA r1, {r0,r2,r5} ARM System - On - Chip A 20 Examples PRE:

r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1, #4]! POST: r0 = 0x02020202 r1 = 0x00009004 ARM System - On - Chip A 21 Examples

PRE: r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1, #4] POST: r0 = 0x02020202 r1 = 0x00009000 ARM System - On - Chip A

22 Examples PRE: r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1], #4 POST: r0 = 0x01010101 r1 = 0x00009004

ARM System - On - Chip A 23 Examples mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIA r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002 r3 = 0x00000003 ARM System - On - Chip A 24 Examples mem32[0x8001c] = 0x04

mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIB r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000002 r2 = 0x00000003 r3 = 0x00000004 ARM System - On - Chip A 25 Conditional execution Instructions can be executed conditionally without braches CMP r2, r3 ;subtract and set flags ADDGE r4, r5, r6 ; if r2>r3 SUBLT r4, r5, r6 ; else ARM

System - On - Chip A 26 Conditional execution mnemonics ARM System - On - Chip A 27 Control flow instructions Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label BL Loop

loop MOV PC, r14 ; ARM System - On - Chip A 28 Example 1 AREA ARMex, CODE, READONLY ; Name this block of code ARMex ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 ADD r0, r0, r1 ; r0 = r0 + r1 stop

MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI END ; Mark end of file ARM System - On - Chip A 29 Example 2 AREA subrout, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit

SWI 0x123456 ; ARM semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file ARM System - On - Chip A 30 ARM ORGANIZATION AND IMPLEMENTATION 3 Stage Pipeline (ARM7 80MHz)

Fetch Decode Execute A[31:0] control address register P C incrementer PC register bank instruction decode A L U b u

s multiply register & A B b u s b u s barrel shifter control ALU

Throughput: 1 instruction / cycle data out register data in register D[31:0] 5 stage pipeline (1/2) Program execution time: N inst CPI T prog f clk Ways to reduceTprog :

Increase f clk Logic simplification Reduce CPI reduce the number of multicycle instructions. ARM System - On - Chip A 33 5 stage pipeline (ARM9150MHz) next pc +4 I-cache pc + 4 pc + 8

I decode r15 immediate fields LDM/ STM +4 Fetch Decode Execute Buffer / Data Write Back instruction

decode register read (2/2) fetch mul post index reg shift shift pre-index execute ALU forwarding paths

mux B, BL MOV pc SUBS pc byte repl. load/store address D-cache buffer/ data rot/sgn ex LDR pc register write write-back ARM coprocessor interface

ARM supports upto 16 coprocessors, which can be software emulated. Each coprocessor has upto 16 generalpurpose registers ARM is a load and store architecture. Coprocessors usually handle on chip functions, such as cache and memory management. ARM System - On - Chip A 35 ARCHITECTURAL SUPPORT FOR HIGH LEVEL LANGUAGES Floating - point accelerator (1/2)

For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating point accelerator. FPA 10 includes: Coprocessor interface Load / store unit Register bank ( 8 registers 80 bit ) ALU (adder, mult, div) ARM System - On - Chip A 37 Floating - point accelerator (2/2) data bus pipeline control

instruction issuer load/store unit coprocessor hand-shake coprocessor interface register bank add mult div ARM arithmetic unit System - On - Chip A

38 APCS (1/2) APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output. Specific use of general purpose registers. (r0 r4: arguments, r4 r8 variables, r10 stack limit, etc. ) Procedure I/O: BL Loop Loop MOV

pc, lr ARM System - On - Chip A 39 APCS (2/2) C code Assembly code void f1(int a) { f2(a); } 16 8 f1 LDR r0, [r13] STR r13!, [r14] STR r13!, [r0] BL f2 SUB r13,#4 LDR r13!, r15 4

0 Stack pointer ARM System - On - Chip A 40 THUMB PROGRAMMERS MODEL General information Thumb objective: Code density. Thumb has a 16 bit instruction set. A subset of the ARM instruction set is coded to a 16bit space

With appropriate use great benefits can be achieved in terms of Power efficiency Enhanced performance ARM System - On - Chip A 42 Going in and out of Thumb mode Using the BX instruction, in ARM state: e.g. r0 Commands are assembled as 16 bit

instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes 1 and the PC is set to the address obtained from the remaining bits of r0. Using the BX instruction from Thumb state, we return to ARM state. ARM System - On - Chip A 43 The Thumb programmers model Thumb registers r0 r1 r2 r3 r4 r5 r6 r7

r8 r9 r10 r11 r12 SP (r13) LR (r14) PC (r15) shaded registers have restricted access Lo registers Hi registers CPSR ARM System - On - Chip A 44 ARM vs. Thumb (1/3) Thumb

Upto 70% code size reduction 40% more instructions. 45% faster code with 16-bit memory Requires about 30% less external memory ARM ARM 40% faster code

when coupled with a 32-bit memory System - On - Chip A 45 ARM vs. Thumb (2/3) If performance is critical: ARM If cost and power consumption are critical: Thumb ARM System - On - Chip A 46

ARM and humb interaction A 32 bit ARM system can go into Thumb mode for specific routines, in order to meet power and memory constraints. A 16 bit system: Can use an on chip, 32 bit memory for ARM state routines, and a 16-bit off chip memory and Thumb code for the rest of the application. ARM System - On - Chip A 47 Example 3 AREA ThumbSub, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute

CODE32 ; Subsequent instructions are ARM header ADR r0, start + 1 ; Processor starts in ARM st BX r0 ; so small ARM code header used ; to call Thumb main program CODE16 ; Subsequent instructions are Thum start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0xAB ; Thumb semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr

; Return from subroutine END ; Mark end of file ARM System - On - Chip A 48 Example 4 Implement the following pseudocode in ARM and Thumb assembly. Which is more efficient in terms of execution time and which in terms of code size? If r1>r2 then R3= r4 + r5 R6 = r4 r5 Else R3= r4 - r5 R6 = r4 + r5 ARM System - On - Chip A

49 Example 5 Write an ARM assembly program that loads data from memory location 0x40, sets bits 3 to 5, clears bits 0 to 2 and leaves the remaining bits unchanged. Test it using 0xAD as input data ARM System - On - Chip A 50 ARCHITECTURAL SUPPORT FOR SYSTEM DEVELOPMENT The ARM memory

interface ROM0e control A basic ARM memor y system RAMwe3 RAMwe2 RAMwe1 A[n+2:2] A[n+2:2] A[n+2:2] RAMwe0 RAMoe

A[n+2:2] A[31:0] ARM D[31:0] D[31:0] SRAM SRAM SRAM SRAM D[7:0] D[7:0] D[7:0] D[7:0] D[31:24] D[23:16]

D[15:8] D[7:0] D[7:0] D[7:0] D[7:0] D[7:0] ROM ROM ROM ROM A[m+2:2] A[m+2:2] A[m+2:2]

A[m+2:2] AMBA (1/4) Advanced Microcontroller Bus Architecture Advanced High Performance Bus Advanced System Bus Advanced Peripheral Bus AMBA objectives: Technology independence To encourage modular system design ARM System - On - Chip A

53 AMBA (2/4) A typical AMBA based system ARM System - On - Chip A 54 AMBA (3/4) AHB bus arbiter Burst

transaction Split transaction Data bus 64 128 bit address master 1 slave 1 write data master 2 master 3 slave 2

slave 3 read data decoder ARM System - On - Chip A 55 AMBA (4/4) AMBA Design Kit (ADK) An environment that assists designers in developing based components SoC designs. ARM System - On - Chip A

56 Signal Processing Support (1/2) Piccolo DSP coprocessor. Various data memories for maximizing throughput. ARM System - On - Chip A 57 Signal Processing Support (2/2) Piccolo ALU mult decode and control

I cache ARM7TDMI output buffer register bank input buffer A M B A i/ f AMBA i/f AMBA MEMORY HIERARCHY Memory hierarchy Larger size Lower speed

Memory Size type Registers 32 bit Speed A few nsec On chip 8 10 nsec cache 32kbytes Off chip 100 cache 200 kbytes 10 30 nsec RAM 100 nsec Mbytes ARM

System - On - Chip A 60 On chip memory Necessary for performance Some system prefer RAM to on chip cache. Simpler, cheaper and less power-hungry. ARM System - On - Chip A 61 Cache types Cache types:

Unified cache. Separate instruction and data caches. Performance:hit rate miss rate t av htcache (1 h)t main Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same place in the cache ARM System - On - Chip A 62 Replacement policy implementation

Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction Fully-associative Direct-mapped Set-associative ARM System - On - Chip A 63 Direct mapped cache (1/2) address A line

of data stored in a tag of memor y tag RAM data RAM compare mux hit data ARM System - On - Chip A 64 Direct mapped cache

(2/2) Each memory location has a specific place in the cache. Tag and data can be accessed at the same time. Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM. ARM System - On - Chip A 65 address

2 way set associative cache. (1/3) tag RAM data RAM compare mux hit compare tag RAM data mux data RAM Set associative cache (2/3)

A set associative cache has a number of sets yielding n way associative cache. Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently. ARM System - On - Chip A 67 Set associative (3/3) Set selection:

Random allocation Least recently used (LRU) Round robin (cyclic) ARM System - On - Chip A 68 Fully associative (1/2) address tag CAM data RAM mux hit data Write strategies Write through

All write operations are passed to main memory Write through with buffered write Write operations are passed to main memory through the write buffer Copy back (write back) Write operations update only the cache. ARM System - On - Chip A 70 Cache feature summary Org ani zat i o nal f eat ure Cac he- MMU re l ati o ns hi p Cac he co nte nt s As s o ci ati v i ty Repl ac ement s trate g y Wri te s t rateg y

Physical cache Unified instruction and data cache Direct-mapped RAM-RAM Cyclic Write-through ARM Opti o ns Virtual cache Separate instruction and data caches Set-associative RAM-RAM Random Write-through with write buffer System - On - Chip A Fully associative CAM-RAM LRU

Copy-back 71 Perfect cache performance Cache fo rm No cache Instruction-only cache Instruction and data cache Data-only cache ARM Perfo rmance 1 1.95 2.5 1.13 System - On - Chip A 72 MMU (1/3)

Two memory management approaches: Segmentation Paging ARM System - On - Chip A 73 MMU (2/3) Segmented memory management: segment selector base logical address limit

segment descriptor table + >? physical address access fault ARM System - On - Chip A 74 MMU (3/3) Paging memory management: 31 22 21 12 11

0 logical address data page directory ARM page table System - On - Chip A page frame 75 ARCHITECTURAL SUPPORT FOR OPERATING SYSTEMS External Clock

W'Dog External Reset & Battery Fail System Control 14 External Interrupts Trace Port Analyser ETM Timers & RTC (PL031) VIC (PL192) 8 external DMA

requests DMAC (PL080) 64 AHB/APB Bridge 64 64 64 1. 2. 3. 4. 5. 6. 7. 8. config

64 64 64 64 MPMC (PL176) Static Memory SMC (PL093) unassigned SDRAM & DDR CLCD Display CLCD (PL110) ARM1136JF

core } 8 AHBs Bus Matrix config 1. 2. 3. 4. 5. 6. 7. 8. ARM Periph AHB ARM D Write AHB ARM D Read AHB ARM I AHB ARM DMA AHB CLCD AHB DMA 2 AHB

DMA 1 AHB AHB/APB Bridge AHB/APB Bridge GPIO (PL061) SSP (PL022) 32 GPIO Lines UART (PL011) 2x UARTs SCI (PL131) Smart Card

(UICC compliant) CP15 On chip coprocessor for MMU, cache, protection unit control. Control takes place through registers with instructions executed in supervisor mode. ARM System - On - Chip A 77 Protection Unit Simpler alternative to the MMU.

Requires simpler software and hardware. Does not use translation tables, but 8 protection regions instead. ARM System - On - Chip A 78 ARM DEVELOPER SUITE ARMULATOR (1/2) Armulator: Emulator of various ARM processors. Allows project development in C, C++ or Assembly. It includes debugger, compilers, assembler and this entire set is

called ARM Developer Suite (ADS). ARM System - On - Chip A 80 ARMULATOR (2/2) Possible project options: ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers MM ARM System - On - Chip A

81 ARMULATOR TUTORIAL CODEWARRIOR ENVIRONMENT ARM System - On - Chip A 82 ARM System - On - Chip A 83 ARM System - On - Chip A 84 ARM

System - On - Chip A 85 ARM System - On - Chip A 86 ARM System - On - Chip A 87

Recently Viewed Presentations

  • Spanish 2 Bell Work - Biloxi Public School District

    Spanish 2 Bell Work - Biloxi Public School District

    S2BW#13 25/8/14 Copy, write the part of speech, translate and define. La República. Noun (f) Republic. A political order in which the supreme power lies in a body of citizens who are entitled to vote for officers and representatives responsible...
  • Atlanta Public Schools Department of Special Education

    Atlanta Public Schools Department of Special Education

    There should be measurable IEP Transition goals, activities and services which will lead to attainment of post secondary goals in training and education, employment, related services, community experiences, post school and daily living if appropriate.
  • Physiology Presentation - WordPress.com

    Physiology Presentation - WordPress.com

    Bone marrow aplasia due to. Radiation. Toxic Chemicals. Autoimmune disorders Unknown cause (idiopathic) Normocytic, normochromic. MCV and MCHC normal, RBC count less. Anemia of Chronic Diseases. Noninfectious inflammatory diseases—rheumatoid arthritis. Chronic infections. Chronic Renal failure. Neoplastic disorders—Hodgkin's disease ...
  • Dividing A Nation - Weebly

    Dividing A Nation - Weebly

    James K. Polk was a strong supporter of Manifest Destiny. ... There shall be neither slavery, nor involuntary servitude in any territory on the continent of America which shall hereafter be acquired by or annexed to the United States…..except for...
  • ภาพนิ่ง 1

    ภาพนิ่ง 1

    (วันศุกร์,เสาร์) McDonald's เชียงใหม่ เป็นสาขาที่ 30 และสาขาที่ 82 ในระบบ McDonald's ประเทศไทย ซึ่งมีกรรมการผู้จัดการ คือ นายภูมิใจ เลาจรัส ...
  • Introduction Characteristics of USB Introduction System Model System

    Introduction Characteristics of USB Introduction System Model System

    USB System Model Host computer Hub Hub Hub Camera Keyboard CD-ROM Scanner Printer Host controller driver Modified USB core DD DD DD Host OS Debug Information Monitoring Application Introduction System Model Progress so far Kernel Design Kernel Issues Java USB...
  • CavinKare Competency Based HR Strategic Initiatives Assessing Key

    CavinKare Competency Based HR Strategic Initiatives Assessing Key

    May happen in future JC Priority KRA Behavioral Competency Functional Competency 360 degree appraisal * 4 3 2 1 Initiative - Delivering results with commitment and perseverance, sharing information, building long lasting and transparent relationships, doing the job with attitude...
  • Welcome to NVS open house - nbed.nb.ca

    Welcome to NVS open house - nbed.nb.ca

    Busing. Young children are required to have an adult wait with them in the mornings to get on the bus. At the end of the day, please be standing outside by your child's bus stop, otherwise the child will not...