FFT in Hardware and Software - University of Calgary

FFT in Hardware and Software - University of Calgary

FFT in Hardware and Software Background Core Algorithm Original Algorithm, the DFT, O(n2) complexity New Algorithm, the FFT (Fast Fourier Transform), O(nlog2(n)) depending on implementation. DFT Computation A summation over the whole input array for every single element in the output array. A VERY computationally inefficient algorithm to implement. X () DFT ( x[n]) x[n]e n jn [1] FFT Computation A much more computationally efficient algorithm Works using the divide and conquer principle. First developed by Cooley and Tukey in 1965! DFT vs. FFT (Number of Operations) Problem Size (N) Standard DFT FFT % of DFT (smaller is better) (smaller is better) (smaller is better)

2 4 1 25 4 16 4 25 8 64 12 19 16 256 32 13 32 1024 80 8 64 4096 192 5

128 16384 448 3 256 65536 1024 2 512 262144 2304 1 1024 1048576 5120 <1 DFT vs. FFT Thousands 1200 1000 Percent of DFT Computation Time (Smaller is Better) 800 600 30% 400

200 0 0 200 400 600 800 1000 1200 Problem Size Thousands Computations Required Nearly Linear Growth of FFT (Smaller is Better) 6 5 4 Percent of DFT Computation Time Computations Required Exponential Growth of DFT (Smaller is Better) 25% 20% 15% 10% 5% 0% 0 3 200 400

600 Problem Size 2 1 0 0 200 400 600 Problem Size 800 1000 800 1000 1200 FFT Butterfly Operations Butterfly arrangement of computations Repeated on successive pairs of input data Then half as many times on alternating pairs Then half again as many times on every fourth element The Butterfly Simple operations repeated many times xe[n] X[n] WnN xo[n] X[n+N/2] -WnN

W nk N e j 2nk N 8-point FFT Demonstration The Entire Calculation Output Input Array x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3]

x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0]

x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6]

x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + +

X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + +

X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + +

X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] +

+ X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] +

+ X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] +

+ X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6]

+ + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0]

+ + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3]

+ + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2]

x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output

x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5]

x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + +

X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array

Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + +

X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] +

+ X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition

Why Hardware? Even more speed for FFT Extremely parallelizable A whole layer can be done in two FPGA clock cycles 1 multiply cycle 1 add cycle (Assuming sufficient multipliers) Hardware Problems Complexity Input speed Output speed If the FPGA takes 24.4ns but takes 20s to transfer the input data, what gain is there? i.e. 24.4ns + 20s + 20s = ~40s! Mitigation of Hardware Problems Use a faster bus AMD Opterons Hypertransport 20.8 GB/s (166.4 Gb/s) per Link (V. 3) Modules that fit into an AMD 64-bit Opteron Socket http:// www.drccomputer.com/pages/modules.html xilinx based module http://www.xtremedatainc.com/xd1000_brief.html - altera based module Mitigation of Hardware Problems Put the FPGA on the die with the DSP Need silicon vendor support FPGA can access memory on a very wide bus (i.e. 128 bits per cycle) Implement the entire project in FPGA Time consuming to program Possibly insufficient room on the FPGA 8-point FFT Demonstration In Hardware Input Array

Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5]

x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration In Hardware Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2]

x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration In Hardware Input Array

Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2] x[6] + + X[3] x[1] + + X[4] x[5] + + X[5]

x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition 8-point FFT Demonstration In Hardware Input Array Output x[0] + + X[0] x[4] + + X[1] x[2] + + X[2]

x[6] + + X[3] x[1] + + X[4] x[5] + + X[5] x[3] + + X[6] x[7] + + X[7] Multiplication by W factor + Addition Why Not Software? Each butterfly must be done sequentially Only slight parallelism enabled by a DSP

like the TigerSHARC Each Butterfly can be done in 2 cycles (after optimization). Results of Testing Linear Profiling of FFT Algorithm in C++ Stage Cycle count Time 8-point 32-point 256-point 8-point 32-point 256-point Initialization 21 25 25 35.07ns 41.75ns 41.75ns Computation 6922 1135 1.895 s 11.559 s

290.950 s Butterfly 91 174222 151.97ns Results of Testing Profiling of VHDL on FPGA Butterfly takes 24.377ns to execute 62% is computational, 38% is routing on FPGA Product Offerings Most DSP Vendors Many FPGA Vendors (IP Intellectual Property) Microcontroller Vendors (i.e. Blackfin) FFTW The Fastest Fourier Transform in the West AMD Math Core Library Intel Library Highly Optimized for the expected hardware Published Results The Radix 4 version delivers a 1 K points complex processing time of 25 microseconds at 200-MHz system speeds and uses only about 10 percent of the resources in a mid-range Stratix device. The Radix 2 is half the size of the Radix 4 and offers a 1 K points complex processing time of 50 microseconds at 200MHz system speeds. Additional versions of the new cores are under development. [6] FFT IP Core Published Results [7] FFT/IFFT length Texas Instruments C6713 Single 4DSP FFT core

(Smaller is Better) Quad 4DSP FFT core (Smaller is Better) 256 12.3s 3.68s 920ns 512 27.3s 6.24s 1.56s 1024 60.2s 11.4s 2.85s References [1] Signals Systems and Transforms [2] James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput. 19, 297301 (1965). [3] http://www.drccomputer.com/pages/modules.html - xilinx based module [4] http://www.xtremedatainc.com/xd1000_brief.html altera based module [5] http://www.amd.com/us-en/Processors/DevelopWithAMD /0,,30_2252_2353,00.html [6] http://www.us.design-reuse.com/news/news5650.html [7] http://www.4dsp.com/fft.htm

Recently Viewed Presentations

  • La rgionalisation de limmigration et la  fabrication  de

    La rgionalisation de limmigration et la fabrication de

    La régionalisation de l'immigration et la « fabrication » de l'élève immigrant Diane Farmer Université de Toronto 78e Congrès de l'Acfas
  • John F. Roatch Global Lecture Series 22 February

    John F. Roatch Global Lecture Series 22 February

    This is a fluid categorisation - given, for example, the recent dilution of medical professional power related to the rise of corporatism and consumerism. Nonetheless, these differences have significant implications in areas such as: The way knowledge is used in...
  • Perimeter Pig Powerpoint 3. MD.8 4. MD.3 Amy's

    Perimeter Pig Powerpoint 3. MD.8 4. MD.3 Amy's

    Perimeter is the distance aroundthe outside of a figure or shape. Look at my pig pen. Farmer . Torger. needed to know the distance around so he could buy the right amount of fencing to keep me in. 10 +...
  • 9.5 The Laws Governing How Compounds Form >

    9.5 The Laws Governing How Compounds Form >

    Chapter 9 Chemical Names and Formulas 9.1 Naming Ions 9.2 Naming and Writing Formulas for Ionic Compounds 9.3 Naming and Writing Formulas for Molecular Compounds
  • Manifest Destiny - White Plains Middle School

    Manifest Destiny - White Plains Middle School

    The map so. constructed, shows at a glance the whole extent of the United States territory from sea to sea; and in tracing the probable expansion of the human race from east to west, the mind finds an agreeable resting...
  • File Systems CSIT 301 (Blum) 1 FAT CSIT

    File Systems CSIT 301 (Blum) 1 FAT CSIT

    CSIT 301 (Blum) * CSIT 301 (Blum) * MFT Zone There will be a record in the MFT for every file on the partition. Thus the MFT needs room to grow. Some space in the partition, called the MFT Zone,...
  • Words, words, words You say them all the

    Words, words, words You say them all the

    This is unlikely due to what we know of its origin. The sunburn or pellagra explanation seems more likely than the anger one. Interestingly, the Afrikaans Rooinek, which literally means redneck, is a disparaging term the Boers used to apply...
  • WaSH and Voucher systems: Hygiene and Water trucking

    WaSH and Voucher systems: Hygiene and Water trucking

    - Oxfam staff & refugees, no partner - Oxfam staff & refugees. Process. Water trucker fill any water storage container once per week. No discussion with community. No analysis of water truckers. No assessment of water containers . Poor monitoring....