Process for systematic conversion of 'C' to assembly ...

Process for systematic conversion of 'C' to assembly ...

Process for systematic conversion of a design in Cpseudo code to SHARC 21061 assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada smithmr @ ucalgary.ca Lots To Be Tackled Today (2 days) Setting up special processor constants and registers to gain speed during assembly language constructs Review of use of index and modify registers Prologue, Body and Epilogue of C program translated to assembly code (NO DIFFERENCE by hand or by compiler) Example conversion of C program into ADSP21061 using a standard procedure Take into account register architecture Take into account LOAD/STORE architecture

Take into account standard assembly code problems Handle Program Flow Constructs Then do conversion of code on line by line basis Learning why to avoid calling C from assembly 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 2 / 56 -- 2 DAYS ADSP-2106x Core Architecture CACHE MEMORY 32 x 48 DAG 1 8 x 4 x 32 JTAG TEST & EMULATION FLAGS DAG 2 8 x 4 x 24 PROGRAM SEQUENCER PMA BUS TIMER

24 PMA DMA BUS 32 DMA PMD BUS 48 DMD BUS 40 PMD BUS CONNECT FLOATING & FIXED-POINT MULTIPLIER, FIXED-POINT ACCUMULATOR DMD REGISTER FILE 16 x 40 32-BIT BARREL

SHIFTER FLOATING-POINT & FIXED-POINT ALU Typical 68K operations to Memory using Indirect Addressing Manipulate a value using address register as a pointer MOVE.L (0, A0), D0 variable_D0 = *pt_A0 Read Adjacent Elements in an Array by incrementing the pointer MOVE.L (0, A0), D0 ADD.L #4, A0 variable_D0 = *pt_A0 pt_A0++ (increment by 1)

MOVE.L (0, A0), D1 01/16/20 variable_D1 = *pt_A0 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 4 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing Manipulate a value using address register as a pointer R0 = dm(0, I4) c.f MOVE.L (0, A0), D0 Read Adjacent Elements in an Array by incrementing the pointer Note increment by 1 may change I4 by 2 or 4 -- WAIL R0 = dm(0, I4) I4 = I4 + 1; R1 = dm(0, I4) 01/16/20

c.f. MOVE.L (0, A0), D0 ILLEGAL c.f. ADD.L SHARC #4, A0OPERATION **** c.f. MOVE.L (0, A0), D1 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 5 / 56 -- 2 DAYS Register and Register Ops in DAG1 SPECIAL CIRCBUFFER STUFF ENCM515 -- Process for pseudo-C design to 21k SPECIAL FFT BIT 01/16/20 assembly Copyright [email protected] 6 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing (Manual) Read Adjacent Elements in an Array

by incrementing the pointer manually Note increment by 1 may change I4 by 2, 4 bytes (WAIL) R0 = dm(0, I4) M6 = 1; Modify(I4, M6) R1 = dm(0, I4) c.f. MOVE.L (0, A0), D0 c.f. ADD.L #4, A0 WHY M6 and not M4? c.f. MOVE.L (0, A0), D1 NOTE -- 68k D0, D1 equivalent to 21k R0, R1 but 68k A0 is similar to 21k I4 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 7 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing Read Adjacent Elements in an Array by incrementing the pointer -- TWO APPROACHES Note increment by 1 may change I4 by 2, 4 bytes M6 = 1;

R0 = dm(M6, I4) c.f. MOVE.L (4, A0), D0 WATCH OUT -- OFFSET NOT INCREMENT R0 = dm(I4, M6) c.f. MOVE.L (0, A0), D0 ADD.L #4, A0 WATCH OUT -- INCREMENT NOT OFFSET BUT WITH THE POTENTIAL OF BEING FASTER INSTRUCTION 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 8 / 56 -- 2 DAYS PSP -- Code review to avoid DEFECT Post-incrementing and OFFSET M6 = 1; R0 = dm(I4, M6); POST-INCREMENT means R0 = dm(I4) and then I4 = I4 + M6 BUT R0 = dm(M6, I4); OFFSET INDEX ONLY means R0 = dm(M5 +I4) and still keeps I4 = I4 01/16/20 ENCM515 -- Process for pseudo-C design to 21k

assembly Copyright [email protected] 9 / 56 -- 2 DAYS Worked Example B4 = 4000; L4 = 0; ***** NORMAL APPROACH -- set to 0 I4 = 4002; M6 = 1; PRESET TO 1 in C startup R0 = dm(M6, I4); OFFSET INDEX ONLY R1 = dm(M6, I4); means R0 = dm(4002 + 1) and R1 = dm(4002 + 1) with I4 = 4002 still unchanged at the end of the code R0 = dm(I4, M6); POST-INCREMENT R1 = dm(I4, M6); means R0 = dm(4002) and R1 = dm(4003) with I4 = 4004 at the end of the code 01/16/20 ENCM515 -- Process for pseudo-C design to 21k

assembly Copyright [email protected] 10 / 56 -- 2 DAYS Effect of length register and Post-incrementing and OFFSET -- Lab. 2 B4 = 4000; L4 = 3; *********** Normally set to zero NOT HERE I4 = 4002; ** Allows 1 21k instruction = 63 68k instructions M6 = 1; ** Key DSP architecture characteristic R0 = dm(M6, I4); OFFSET INDEX ONLY R1 = dm(M6, I4); NO CIRCULAR BUFFER! means R0 = dm(4002 + 1) and R1 = dm(4002 + 1) with I4 = 4002 still R0 = dm(I4, M6); POST-INCREMENT R1 = dm(I4, M6); means R0 = dm(4002) BUT R1 = dm(4000) *(4003 - 3)* with I4 = 4001

HARDWARE CIRCULAR BUFFER 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 11 / 56 -- 2 DAYS ADSP-2106x Core Architecture CACHE MEMORY 32 x 48 DAG 1 8 x 4 x 32 JTAG TEST & EMULATION FLAGS DAG 2 8 x 4 x 24 PROGRAM SEQUENCER PMA BUS TIMER 24

PMA DMA BUS 32 DMA PMD BUS 48 DMD BUS 40 PMD BUS CONNECT FLOATING & FIXED-POINT MULTIPLIER, FIXED-POINT ACCUMULATOR DMD REGISTER FILE 16 x 40 32-BIT BARREL SHIFTER

FLOATING-POINT & FIXED-POINT ALU Register and Register Ops in DAG1 SPECIAL CIRCBUFFER STUFF ENCM515 -- Process for pseudo-C design to 21k SPECIAL FFT BIT 01/16/20 assembly Copyright [email protected] 13 / 56 -- 2 DAYS 21k Code example .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(i7,-2); r2=i1; dm(-3,i6)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6);

// line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7,i6); // line 42 jump(m14,i12) (DB); ENCM515 -- Process for pseudo-C design to 21k i1=dm(-3,i6); assembly 01/16/20 14 / 56 -- 2 DAYS Copyright [email protected] rframe; C on a Super-scalar RISC DSP (e.g. SHARC) 68K C involves many stack operation subroutine parameters passed on stack local variables stored on stack local arrays stored on stack return address on stack subroutines deeply nested int SomeFunction(int inpar1, float inpar2) { int count; float array[200];

etc. } 5 animations C is not natural to SHARC processor -- SHARC has small hardware stack C must be made to happen using stack operations working on a LIFO stack in data memory ENCM515 -- Process for pseudo-C design to 21k 01/16/20 assembly Copyright [email protected] 15 / 56 -- 2 DAYS 21k Code example reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(i7,-2); // PROLOGUE r2=i1; dm(-3,i6)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6);

// line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7,i6); // line 42 // Hidden automatic processor NOP -- WAIL jump(m14,i12) (DB); // RETURN TO C ENCM515 -- Process for pseudo-C design to 21k i1=dm(-3,i6); // EPILOGUE assembly 01/16/20 16 / 56 -- 2 DAYS rframe; Copyright [email protected] Making C work on the 21k C is not natural to SHARC processor Index registers I6 (FP) and I7 (CTOPstack) NOT SP Set aside certain index and length registers for STACK operations Index registers I6, I7 -- Corresponding length registers L6 and L7 SP is a specialized SHARC hardware register for LIMITED non-C

subroutine return addresses and not for parameter passing Corresponding length registers L6, L7 must be kept as 0 LENGTH registers can provide some very useful properties to arrays and array handling -- circular buffers etc. -- Labs 2, 3 and 4 These useful properties are EXACTLY what we DONT want to happen with the array used as the C stack 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 17 / 56 -- 2 DAYS 21k Code example reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2); // PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit;

r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6); // line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(m14, i12) (DB); // RETURN TO C ENCM515 -- Process for pseudo-C design to 21k i1=dm(-3, FP); assembly // EPILOGUE 01/16/20 18 / 56 -- 2 DAYS rframe; Copyright [email protected] Optimize SHARC assembly code for speed with special registerized constants in Modify registers Store certain commonly used fixed constant offsets in nonvolatile Modify registers -- These registers are set automatically during C Start-up code IF USED BY PROGRAM DAG1 -- M5 (0), M6 (-1), M7(+1) -- accessing DM memory data

DAG2 -- M13 (0), M14 (-1), M15 (+1) -- accessing PM memorydata Highly confusing to remember which register contains what when hand coding (and writing exams) We will make use of a SHARC process involving a cdefines.i file to define standard register names for use with the 21k architecture when coding assembly language programs that link to C code during labs and exams. Call it cdefines.i in lectures and labs. Actual file name clanguage_register_defines.i see Lab. 0/1 web 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 19 / 56 -- 2 DAYS process -- use standard clanguage_register_defines.i file #define zeroDM M5 (0 offset for DM ops) #define zeroPM M13 (0 offset for PM ops) #define plus1DM M6 (+1 offset for DM ops) #define plus1PM M14 (+1 offset for PM ops) #define minus1DM M7 (-1 offset for DM ops) #define minus1PM M15 (-1 offset for DM ops)

Note how must take Harvard Architecture into account so can adjust both DM (data memory) and PM (program memory) index registers using Modify registers from both DAGs -- but cant be cross used 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 20 / 56 -- 2 DAYS 21k Code example reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2); // PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4, plus1DM); // line 41 F12=F2*F4, dm(i1, plus1DM)=r1; _L$566002:

i12=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, i12) (DB); // RETURN TO C ENCM515 -- Process for pseudo-C design to 21k i1=dm(-3, FP); assembly // EPILOGUE 01/16/20 21 / 56 -- 2 DAYS rframe; Copyright [email protected] SHARC process -- Respect the registers that the C compiler uses Volatile Registers (not used by C compiler -- destroyed by C) R0, R1, R2 (also F0, F1, F2) R4, I4, M4 (also F4) (S.O.T.T.) R8 (also F8) (S.O.T.T.) R12, I12, M12 (also F12) (S.O.T.T.) S.O.T.T. means Some Of The Time -- special issues Non-volatile Registers (used by C compiler) EVERYTHING ELSE SHARC PROCESS -- Save and recover NON-VOLATILE registers 01/16/20 ENCM515 -- Process for pseudo-C design to 21k

assembly Copyright [email protected] 22 / 56 -- 2 DAYS 21k Code example reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2); // PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4, plus1DM); // line 41 F12=F2*F4, dm(i1, plus1DM)=r1; _L$566002: i12=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, i12) (DB);// RETURN TO C ENCM515 -- Process for pseudo-C design to 21k i1=dm(-3, FP); assembly // EPILOGUE 01/16/20

23 / 56 -- 2 DAYS [email protected] rframe; Copyright // Hidden changes to FP and CTOPofSTACK Conversion of C code to assembly -- an example There is a one-to-one equivalence in concept to what happens on MIPS processor (2nd year) 68K processor (3rd year) ADSP21061 processor (4th year) Most other processors Remember that the concept is exactly the same on all processors EXCEPT THE IMPLEMENTATION IS DIFFERENT 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 24 / 56 -- 2 DAYS

3 parts of SHARC process to obtain C compatible assembly language code PROLOGUE <- Always the same CODY BODY EPILOGUE <- Always the same Always the same means that you learn to write the code once and then use with only minor modification each time you write code in the future Just the same as with 68K C/assembly compatibly taught in ENMCM415. Look up those web-pages http://www.enel.ucalgary.ca/People/Smith/2001webs/01encml415 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 25 / 56 -- 2 DAYS TYPICAL PROLOGUE FOR A SHARC PROCESS FUNCTION

21K .segment/pm seg_pmco; <- to go into PM memory .global _Example; WARNING -- SEMIColon _Example: WARNING -- Colon (Set stack frame and save non-volatile registers) These semicolons are needed because of the parallel capability of the processor instructions -- 4 operations in one instruction 68K .section code .export _Example _Example: (Set stack frame and save non-volatile registers) 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 26 / 56 -- 2 DAYS TYPICAL BODY OF FUNCTION Use and reuse scratch registers

21K scratchR1, scratchR2, scratchDMpt (I4), scratchPMpt (I12), scratchDMmod (M4), scratchPMmod 68K Use a standard process to avoid errors -- REQUIRED scratchD0, scratchD1 scratchA0pt, scratchA1pt If use non-volatile registers as well, then must save to stack (during PROLOGUE) and recover from stack (during EPILOGUE) -- could slow the code e.g during interrupts 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 27 / 56 -- 2 DAYS 21k Code example

reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2); // PROLOGUE scratchR2=i1; dm(-3, FP)= scratchR2; scratchDMpt=_centigrade; // line 37 r4=1072064102; // line 38 scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; _L$566002: scratchDMpt=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, scratchDMpt) (DB); // RETURN TO C i1=dm(-3, FP); // EPILOGUE ENCM515 -- Process for pseudo-C to 21k rframe;

// Hidden changes to FP and design CTOPofSTACK 01/16/20 assembly Copyright [email protected] 28 / 56 -- 2 DAYS Hidden issues scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; Where did F2 come from? lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; Where did F1 go to? Where did F2 come from? scratchR2 =dm(scratchDMpt, plus1DM) is a bit-pattern fetch from memory instruction (char, int, float) NOT integer fetch from memory instruction The registers STORE bit patterns not integer or floating point values. The floating-point-ness or integer-value-ness is a property of the ALU (operations) and NOT the registers themselves!!!!!

01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 29 / 56 -- 2 DAYS SHARC PROCESS TYPICAL EPILOGUE 68K ---- C Stack is part of normal processor (Recover any saved non-volatile registers) ADD.L #FRAME_SIZE, SP <-- Recover stack space (destroy stack frame) RTS <--- Uses SP (A7) by design 21K There is a 21061 Hardware stack -- 6 or 8 deep. C Stack is NOT part of this hardware stack. (Recover any saved non-volatile registers) Activate 21061 code to perform destroy stack frame equivalent -- 68k UNLINK FP Activate 21061 code to perform RTS equivalent Designers have added instructions to the architecture to support the software stack associated with C coding -- CJUMP and RFRAME 01/16/20

ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 30 / 56 -- 2 DAYS 21K return from C -- 5 STANDARD MAGIC LINES scratchPMpt = dm(minus1DM, FP); nop; // might be carefully filled -- TIMING ISSUE jump(plus1PM, scratchPMpt) (DB); nop; // might be carefully filled RFRAME; C specific assembler instruction Note use of SHARC PROCESS of the INDENTING OF INSTRUCTIONS for denoting delayed branch instructions Note the all key nops in code (timing issues) Always the same code. Cut and paste for now Will become obvious later Timing issue -- fetching of a DAG register and then using it 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly

Copyright [email protected] 31 / 56 -- 2 DAYS 21k Code example reformatted .global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2); // PROLOGUE scratchR2=i1; dm(-3, FP)= scratchR2; scratchDMpt=_centigrade; // line 37 r4=1072064102; // line 38 scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; _L$566002: scratchDMpt=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, scratchDMpt) (DB); // RETURN TO C i1=dm(-3, FP); // EPILOGUE

ENCM515 -- Process for pseudo-C to 21k rframe; // Hidden changes to FP and design CTOPofSTACK 01/16/20 assembly Copyright [email protected] 32 / 56 -- 2 DAYS STANDARD SHARC PROCESS We want to have a PROCESS to convert the basic parts of a design in C pseudo-code to SHARC 21k assembly code Minimize ERRORS -- jumping backwards and forwards between editor, assembler and linker while developing a prototype ERRORs become the big time waster when jumping to and from the simulator while testing this prototype. Minimize DEFECTS -- Defects are the carry over of the mistakes from one apparently working prototype into another protype -- HUGE TIME WASTER

01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 33 / 56 -- 2 DAYS Remember the 5-OR-60 rule Spend enough time in design and code review. An EXTREME PROGRAMMING APPROACH with 5 minutes for design and code review will save you 60 minutes during testing. Whats enough time? -- SEI INDUSTRY VALIDATION METRICS FOR QUALITY CODE INDICATION Design Review / Design Time > 25% Code Review / Coding Time > 50% DEFECTS found in assemble/compile < 10 / kLOC DEFECTS found in TEST < 5 / kLOC Code Review rate < 150 LOC / hr If, after code review, you find many SYNTAX errors in your code, it is an indication that there are a large number of LOGICAL defects left in your code and undiscovered by assembler/compiler 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly

Copyright [email protected] 34 / 56 -- 2 DAYS SHARC code -- FM-STEREO Example AM - amplitude modulation -- typically MONO Carrier with varying amplitude Mix to bring to base frequency then rectify FM - frequency modulation Carrier with varying frequency/phase Use FM demodulator to convert frequency changes into amplitude changes Get DC components (0 -- 10 kHz) plus an AM modulated carrier (10 - 30 khz) Channel 1 -- Left sound + Right Sound from DC Channel 2 -- Left Sound - Right Sound from carrier

01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 35 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *) void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { Jump to C -- printf( ) -- why code the slow and obvious printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; // L + R else { *channel_one = (temp_one + temp_two) >> 1; // L+ R +(L - R) *channel_two = (temp_one - temp_two) >> 1; // L+ R - (L - R) } } 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly

Copyright [email protected] 36 / 56 -- 2 DAYS Convert C-design to account for RISC architecture void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { ON SHARC -- First three subroutine parameters are PLACED in DATA registers even if the parameters are copies of values of pointer registers (index registers) void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 37 / 56 -- 2 DAYS Convert C-design to account for RISC architecture int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { . }

BECOMES register int temp_one = *channel_one; register int temp_two = *channel_two; static int comment = 0; <- must be stored in memory and not register if (comment == 0) { <- Got to be specific when writing assembly ... } ENCM515 -- Process for pseudo-C design to 21k 01/16/20 assembly Copyright [email protected] 38 / 56 -- 2 DAYS Convert C-design to account for RISC architecture static int comment = 0; <- Must be stored in memory and not register if (comment != 0) { <- Tests cant be done on memory values printf( ); in a RISC processor architecture comment = 1; } BECOMES static int comment = 0; <- Must be stored in memory -- not register register int temp_comment;

temp_comment = comment; <- Grab the value from memory if (temp_comment == 0) { <- Test using a register printf( ); comment = 1; <- Still okay in THIS RISC architecture } ENDIF: <- Must add this to handle assembly code GOTO structure 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 39 / 56 -- 2 DAYS Convert C-design to account for RISC architecture if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } BECOMES register int temp_constant; temp_constant = 25; ***************!!!!!*****!!!!!********* if (channel_two_strength < temp-constant) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two); *channel_one = *channel_one >> 1; *channel_two = (temp_one - temp_two); *channel_two = *channel_two >> 1;

} 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 40 / 56 -- 2 DAYS void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { register int temp_one = *channel_one; register int temp_two = *channel_two; register temp_value; static int comment = 0; temp_value = comment; if (temp_value == 0) { WARNING -- SPECIAL CASE printf("Smith DecodeFMStereo() -- FM_STEREO demodulation examp comment = 1; } else /* DO NOTHING */; WARNING -- MUST ADD THIS temp_value = 25; if (channel_two_strength < temp_value) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two); *channel_one = *channel_one >> 1; *channel_two = (temp_one - temp_two); ENCM515 -- Process for pseudo-C design to 21k *channel_two

= *channel_two >> 1; 01/16/20 } assembly Copyright [email protected] 41 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2 Develop the subroutine PROLOGUE void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { register int temp_one = *channel_one; register int temp_two = *channel_two; register temp_value; Incoming register int channel_two_strength -- INPAR1 -- in R4 -- leave it there Incoming register int *channel_one -- INPAR2 -- in R8 -- CANT leave it there Must move into volatile DM pointer -- I4 Incoming register int *channel_two -- INPAR3 -- in R12 -- CANT leave it there Must move into volatile DM pointer -- BUT I4 already in use register int temp_one = *channel_one; register int temp_two = *channel_two; register temp_value; 01/16/20 Allowed in R1? Allowed in R2? Allowed in R3? ENCM515 -- Process for pseudo-C design to 21k

assembly Copyright [email protected] 42 / 56 -- 2 DAYS Make use of a standard format for register names -- cdefines.i #define #define #define #define #define scratchR0 scratchR1 scratchR2 scratchF1 scratchF2 #define #define #define #define scratchDMpt I4 scratchDMmod scratchPMpt I12 scratchPMmod M4 (WARNING -- Program Memory DAG) M12 (WARNING -- Program Memory DAG)

#define #define #define #define INPAR1R4 INPAR2R8 INPAR3R12 scratchR4 etc. 01/16/20 R0 R1 R2 F1 F2 (WARNING -- also retvalueR0) (WARNING -- identical to R1 for storage) (WARNING -- identical to R2 for storage) (WARNING -- DATA register NOT POINTER) even when used to pass copy of pointer R4 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected]

43 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2A Develop the subroutine PROLOGUE // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { #define temp_oneR1 scratchR1 // register int temp_one = GARBAGE register int temp_one = *channel_one; #define temp_twoR2 scratchR2 // register int temp_two = GARBAGE register int temp_two = *channel_two; #define temp_valueR0 scratchR0 // register temp_value = GARBAGE Incoming register int *channel_one -- INPAR2 -- in R8 -- CANT leave it there Must move into volatile DM pointer -- I4 Incoming register int *channel_two -- INPAR3 -- in R12 -- CANT leave it there Must move into volatile DM pointer -- BUT I4 already in use CHOICES -- Place I3 onto stack-- or Reuse I4 -- worry about speed later ENCM515 Process for pseudo-C design to 21k 01/16/20

assembly Copyright [email protected] 44 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2B Develop the subroutine PROLOGUE // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { #define temp_oneR1 scratchR1 // register int temp_one = GARBAGE scratchDMpt = INPAR2; // register int temp_one = *channel_one; temp_oneR1 = dm(scratchDMpt); #define temp_twoR2 scratchR2 YOU ADD THE CODE // // register int temp_two = GARBAGE register int temp_two = *channel_two; #define temp_valueR0 scratchR0 // register temp_value = GARBAGE

Placing I3 onto stack Reuse I4 01/16/20 // Two extra lines -- if you get it right (Save/Recover) // Four EXTRA lines of which only two shown here // Actually do-able in 3 (a little dicey) ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 45 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2C Correct the subroutine PROLOGUE .segment/pm seg_pmco; // void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { .global _DecodeFMSTEREO, DecodeFM_STEREO _DecodeFMSTEREO: DecodeFMSTEREO: // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 #define temp_one scratchR1 // scratchDMpt = INPAR2; // temp_oneR1 = dm(scratchDMpt); register int temp_one = GARBAGE register int temp_one = *channel_one;

#define temp_twoR2 scratchR2 YOU ADD THE CODE // // register int temp_two = GARBAGE register int temp_two = *channel_two; #define temp_valueR0 scratchR0 // register temp_value = GARBAGE 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 46 / 56 -- 2 DAYS Correct the subroutine PROLOGUE CORRECTLY void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; NOT A REGISTER NOR A STACK VALUE

static int comment = 0; if (!comment) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } } 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 47 / 56 -- 2 DAYS Correct the subroutine PROLOGUE CORRECTLY .segment/dm seg_dmda var int comment = 0; .endseg; // NASTY HIDDEN ERROR .segment/pm seg_pmco; // void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { .global _DecodeFMSTEREO, DecodeFM_STEREOWhats missing?

_DecodeFMSTEREO: DecodeFMSTEREO: // Show the parameters being passed as part of documentation #define channel_two_strength scratchR4 // Same as INPAR1 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 48 / 56 -- 2 DAYS SHARC PROCESS -- STEP 3 Modify the standard EPILOGUE // Place the return value in retvalueR0 -- N/A // Recover non-volatile registers from stack -- N/A scratchPMpt = dm(minus1DM, FP); nop; // might be carefully filled jump(plus1PM, scratchPMpt) (DB); nop; // might be carefully filled RFRAME; .endseg; Just a CUT-AND-PASTE job 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected]

49 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A Convert C-design body -- standard IF-ELSE scratchR0 = 25; // temp_constant = 25; // if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0); // dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop; // Are these delayed branches fillable nop; scratchDMpt = INPAR2; // *channel_two = *channel_one; scratchR0 = dm(scratchDMpt); scratchDMpt = INPAR3; // Note the indenting as part of the documentation dm(scratchDMpt) = scratchR0; jump (PC, ENDIF) (DB); nop; nop; DO_ELSE: // else { // *channel_one = (temp_one + temp_two); // *channel_one = *channel_one >> 1; // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; // } 01/16/20

ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 50 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A -- in this particular subroutine Convert C-design body -- standard IF-ELSE scratchR0 = 25; // temp_constant = 25; // if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0); // dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop; // Are these delayed branches fillable nop; dm(scratchDMpt) = temp_oneR1 // *channel_two = *channel_one (temp_one); // INPAR3 just HAPPENS to be in scratchDMpt already jump (PC, ENDIF) (DB); // because of the code you added earlier nop; nop; DO_ELSE: // else { // *channel_one = (temp_one + temp_two); // *channel_one = *channel_one >> 1; // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; // } 01/16/20

ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 51 / 56 -- 2 DAYS ENORMOUS DEFECT INTRODUCED You cant do any of this -- ALL WRONG You have forgotten what you are coding in the whole while micromanaging the details Key issues -- volatile/non-volatile register use. 21k C subroutines -- like 68k C subroutines -- destroy volatile registers (R0) 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 52 / 56 -- 2 DAYS

void DecodeFMSTEREO(int, int *, int *) void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; Using R1, R0, R4, I4 etc Probably destroys R1, R0, R4, I4 etc static int comment = 0; if (!comment) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } } 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 53 / 56 -- 2 DAYS

Program smart -- and cut the DEFECTS void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; // printf( ) CODE WAS HERE // If Channel Strength is too week then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } if (!comment) { C CAN DESTROY VOLATILES TO HEARTS CONTENT printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } } 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 54 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A -- in this particular subroutine Convert C-design body -- standard IF-ELSE scratchR0 = 25; // temp_constant = 25;

// if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0); // dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop; // Are these delayed branches fillable nop; dm(scratchDMpt) = temp_oneR1 // *channel_two = *channel_one (temp_one); jump (PC, ENDIF) (DB); nop; nop; DO_ELSE: // else { scratchR0 = temp_oneR1 + temp_twoR2; // *channel_one = (temp_one + temp_two); scratchR0 = ASHIFT scratchR0 BY -1; // *channel_one = *channel_one >> 1; scratchDMpt = INPAR2; dm(scratchDMpt) = scratchR0; // dead <- scratchR0 // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; ENDIF: // } YOU COMPLETE 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 55 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *)

Got placed in seg_dmda in PROLOGUE and given label comment if (comment == 0) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); Label means address-location not value comment = 1; static int comment = 0; } scratchR0 = dm(comment); // NOT scratchR0 = comment // This operation would set 68k N and Z flags // which could then be used to control conditional branch // Not true on the 21k scratchR0 = PASS scratchR0; if NE jump (PC, NOCOMMENT) (DB); NOP; NOP; 01/16/20 // Test for Zero and Negative // NOT pass(scratchR0) // which is MFE ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 56 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *)

static int comment = 0; if (comment == 0) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } #define commentR0 scratchR0 commentR0 = dm(comment); commentR0 = PASS commentR0; if NE jump (PC, NOCOMMENT) (DB); NOP; NOP; // Better code maintainability // Test for Zero and Negative // dead <- R0 Code to call printf ( ) here NOCOMMENT: 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 57 / 56 -- 2 DAYS Why we dont Call C from assembly Coding the printf( ) call printf(Print out the value of %d, comment); .segment/dm seg_dmda var int comment = 0; Ascii code for Print out the value of %d

FORMAT1_LABEL: .var FORMAT1_STRING[ ] =83,109,105,116,104,32,68,101, etc, 0; // Dont forget me! -- C EOS .endseg; .segment/pm seg_pmco; OUTPAR2 = FORMAT1_LABEL; // Pointer to string OUTPAR1 = dm(comment); // Value NOT pointer CALL _printf (DB): nop; nop; ENCM515 -- Process for pseudo-C design to 21k 01/16/20 assembly Copyright [email protected] 58 / 56 -- 2 DAYS Why we dont Call C from assembly Coding the printf( ) call GOT ONE LINE RIGHT // Get starting address of printf format scratchR0 = FORMAT1_LABEL; USING CJUMP not CALL // Note that is not the stack controlled by SP dm(CTOPstack,minus1DM) = scratchR0; CJUMP causes R2 <- FP (I6) R2 is destroyed internally .extern _printf; cjump _printf (DB); Save FP (as R2)

dm(CTOPstack,minus1DM) = r2; dm(CTOPstack,minus1DM) = pc; Save Return Address (one off) modify(CTOPstack,plus1DM); 01/16/20 3 Values placed on stack Only 1 taken off here ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 59 / 56 -- 2 DAYS The importance of C The use of C language is so important that there are specialized instructions added to the processor instruction set in order to support an efficient C language interface CJUMP RFRAME What rules does the compiler use to determine whether to call CJUMP or CALL?

No idea -- but I have never had a problem where the compiler generated the wrong code to access my C or assembly routines. Suspect -- CJUMP for library calls (flag in header file?) 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 60 / 56 -- 2 DAYS How should I code in? Assume that your routines will always get their parameters passed to them in INPAR1, INPAR2 and INPAR3 THE CONCEPT MUST WORK This is the first year I have worried about CJUMP and I have not had problems before. WORRIED is the wrong word -- never ever noticed the distinction before Dont call C routines from your assembly unless you know what you are doing!

Call C from C instead 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright smithm[email protected] 61 / 56 -- 2 DAYS SHARC PROCESS -- STEP 5 OPTIMIZE THE CODE Remember -- not normally worth the effort Going to require Knowing the parallel instructions Knowing which ones are valid in combination Taking into account the limitations associated with the finite number of bits in the op-code to describe the parallel operations wanted Understanding Hardware loops

Understanding memory and ALU pipelining Optimization is NEXT WEEK COUNTRY 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 62 / 56 -- 2 DAYS Other examples of code conversion Many examples in previous years web pages Take a look at the assembly output generated by the C-compiler for Lab. 0 Use the -S option and look for the .asm file Get to know the required stuff so you can quickly break through the barrier and get to the stuff you really want to do -- DSP customization KEY -- Develop a PSP code review process 01/16/20

ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 63 / 56 -- 2 DAYS Tackled over the past 2 lectures Setting up special processor constants and registers to gain speed during assembly language constructs Review of use of index and modify registers Prologue, Body and Epilogue of C program translated to assembly code (NO DIFFERENCE by hand or by compiler) Example conversion of C program into ADSP21061 using a standard procedure Take into account register architecture Take into account LOAD/STORE architecture Take into account standard assembly code problems Handle Program Flow Constructs Then do conversion of code on line by line basis

Learning why to avoid calling C from assembly 01/16/20 ENCM515 -- Process for pseudo-C design to 21k assembly Copyright [email protected] 64 / 56 -- 2 DAYS

Recently Viewed Presentations