Quantum Computer Architectures for Physical Simulations Dr. Mike Frank University of Florida CISE Department [email protected] Presented at: Quantum Computation for Physical Modeling Workshop Wed., May 8, 2002 Summary of Current Research One of my two major research projects: Reversible & Quantum Computing Project MIT
1995-1999, UF 1999- Studying & developing physical computing theory: Technology-independent physical limits of computing Ultimate models of computing for complexity theory Nanocomputer systems engineering Optimizing cost-efficiency scalability of future computing Reversible computing & quantum computing Realistic models, CPU architectures, optimal scaling advantages RevComp project heritage Grew out of work started by the MIT Information Mechanics group in the 1970s.
Key members: Fredkin, Toffoli, Margolus Occasional collaboration with Feynman, Bennett, This group laid much theoretical groundwork for reversible (& eventually quantum) computing. MIT Reversible Computing project (1990s) Group leaders: Knight, Margolus Students: Younis, DSouza, Becker, Vieri, Frank, Ammer Focus: Reducing reversible computing to practice CMOS circuit styles, test chips, architectures, complexity theory, algorithms, high-level languages. RevComp group at UF
Reversible & Quantum Computing group Organized by myself (CISE/ECE depts.) UF collabs. in CISE, ECE, Math, Phys./Chem. Notable graduate: DoRon Motter (highest honors) Now doing PhD work in quantum circuits at U. Mich. 2 current Ph.D students (all ECE) Current focus: Removing remaining barriers to near-term practicality of reversible computing Improved circuit styles, efficient power supplies
Other applications Long-term study of physical computing theory and scaling advantages of reversible/quantum models. Who We Are Dr. Michael Frank MIT Ph.D. stud. & postdoc, 1996-97 & 1999. Area exam studies on quantum computing. DARPA-funded reversible computing research. Feynman Some lineage: Minsky Sussman
Fredkin Toffoli Knight Margolus 1999-now: Head of Reversible & Quantum Computing group at UFs CISE dept. http://www.cise.ufl.edu/research/revcomp Frank Who We Are, cont. DoRon Motter Undergrad in UF CISE dept., 1997-2000.
Coursework in CS + quantum mechanics. Sr. highest honors thesis w. Dr. Frank, 2000. Now a Masters student at U. Mich. Advisor: Igor Markov, U. Mich. DARPA-funded project on quantum logic systhesis Overview of Talk 1. Optimally scalable QC models/architectures Universally maximally scalable (UMS) models Physical limits shape the ultimate model Appropriate programming models
2. Physics sim. algorithms on our QC model Many-particle Schrdinger equation Numerically stable classical reversible sims Quantum versions Quantum field theory 3. QC simulation on classical computers Visualization techniques, various optimizations Polynomial-space techniques 1. Optimally scalable quantum computer architectures Universally maximally scalable (UMS) models Physical limits shape the ultimate model
Appropriate programming models Nanocomputer systems engineering A key goal of my long-term research program: Develop key foundations for a new discipline of nanocomputer systems engineering suited for meeting the challenges of computing at the nanoscale. And convey it to peers, teach it to students. The new field will integrate concerns & methods from a variety of disciplines: Physics of computing
Systems engineering Computer architecture Complexity theory Algorithm design Electrical eng., etc. Quantum computing! ITRS Feature Size Projections 1000 Bacterium uP chan L DRAM 1/2 p min Tox max Tox Feature Size (nanometers)
100 Virus Protein molecule 10 DNA molecule thickness 1 Atom 0.1 1995
2000 2005 We are here 2010 2015 2020 2025 2030 Year of First Product Shipment
2035 2040 2045 2050 Source: ITRS 99 Min transistor switching energy, kTs Trend of minimum transistor switching energy 1000000 High
100000 Low 10000 trend 1000 100 10 1 1995 2005 2015
2025 2035 Year of First Product Shipment CV2 based on ITRS 99 figures for Vdd and minimum transistor gate capacitance. T=300 K Physical Computing Theory The study of theoretical models of computation that are based on (or closely tied to) physics Make no nonphysical assumptions! Includes the study of: Fundamental physical limits of computing Physically-based models of computing Includes reversible and/or quantum models
Ultimate (asymptotically optimal) models An asymptotically tight Churchs thesis Model-independent basis for complexity theory Basis for design of future nanocomputer architectures Asymptotic scaling of architectures & algorithms Physically optimal algorithms Ultimate Models of Computing We would like models of computing that match the real computational power of physics. Not too weak, not too strong. Most traditional models of computing only match physics to within polynomial factors. Misleading asymptotic performance of algorithms.
Not good enough to form the basis for a real systems engineering optimization of architectures. Develop models of computing that are: As powerful as physically possible on all problems Realistic within asymptotic constant factors Unstructured Search Problem Given a set S of N elements and a black-box function f:S{0,1}, find an element xS such that f(x)=1, if one exists (or if not, say so). Any NP problem can be cast as an unstructured search problem. Not necessarily the optimal approach, however. Bounds on classical run-time: (N) expected queries in worst case (0 or 1 solns):
Have to try N/2 elements on average before finding soln. Have to try all N if there is no solution. If elements are length- bit strings, Expected #trials is (2) - exponential in . Bad! Quantum Unstructured Search Minimum time to solve unstructured search problem on a (serial) quantum computer is: (N1/2) queries = (2/2) = (21/2) Still exponential, but with a smaller base. The minimum # of queries can be achieved using Grovers algorithm.
Classical Unstructured Search The classical serial algorithm takes (N) time. But: Suppose we search in parallel! Have M
Communication time: (M1/3) (at lightspeed) Total: T N/M + M1/3 is minimized when M N3/4 N1/4 Faster than Grovers algorithm! M Classical+Quantum Parallelism Similar setup to classical parallelism: M processors, searching N/M items each. Except, each processor uses Grovers algorithm. Time accounting:
Computation: T (N/M)1/2 Communication: T M1/3 Total: T (N/M)1/2 + M1/3 (as before) Total is minimized when MN 3/5 Minimized total is T N1/5. I.e., quantum unstructured search is really only N1/4/N1/5 = N1/20 faster than classical! Scalability & Maximal Scalability A multiprocessor architecture & accompanying performance model is scalable if: it can be scaled up to arbitrarily large problem sizes, and/or arbitrarily large numbers of processors, without the predictions
of the performance model breaking down. An architecture (& model) is maximally scalable for a given problem if it is scalable and if no other scalable architecture can claim asymptotically superior performance on that problem It is universally maximally scalable (UMS) if it is maximally scalable on all problems! I will briefly mention some characteristics of architectures that are universally maximally scalable Universal Maximum Scalability Existence proof for universally maximally scalable (UMS) architectures:
Physics itself is a universal maximally scalable architecture because any real computer is merely a special case of a physical system. Obviously, no real computer can beat the performance of physical systems in general. Unfortunately, physics doesnt give us a very simple or convenient programming model. Comprehensive expertise at programming physics means mastery of all physical engineering disciplines: chemical, electrical, mechanical, optical, etc. Wed like an easier programming model than this! Physics Constrains the Ultimate Model Limits of Quantum Computers
Quantum computers remain subject to all the fundamental limits previously mentioned! Entropy density limit - only 2n distingable states! Contrary to press manglings, a quantum computer cannot store exponentially large amounts of arbitrary data! Information propagation limit - at most c Bells theorem, teleportation, superluminal wave velocities do not give >c information propagation Quantum field theory is explicitly local Computation rate limit - at most 4E/h rate of orthogonal transitions, given available energy E. Non-orthogonal ones are faster, but accomplish less work Speedups are due to fewer ops needed, not faster ops
Simple UMS Architectures (I propose) any practical UMS architecture will have the following features: Processing elements characterized by constant parameters (independent of # of processors) Mesh-type message-passing interconnection network, arbitrarily scalable in 2 dimensions w. limited scalability in 3rd dimension. Processing elements that can be operated in a highly reversible mode, at least up to some limit. Enables improved 3-d scalability, in a limited regime (In long term) Have capability for quantum-coherent operation, for extra perf. on some probs.
Ideally Scalable Architectures Conjecture: A 2- or 3-D mesh multiprocessor with a fixed-size memory hierarchy per node is an optimal scalable computer systems design (for any application). Processing Node Processing Node Processing Node CPU CPU CPU Local memory hierarchy (optimal fixed size) Local memory hierarchy (optimal fixed size)
Local memory hierarchy (optimal fixed size) Processing Node Processing Node Processing Node CPU CPU CPU Local memory hierarchy (optimal fixed size)
Local memory hierarchy (optimal fixed size) Local memory hierarchy (optimal fixed size) Mesh interconnection network Some device parameters The following parameters are considered fixed (for a given device/node technology):
(Maximum) number of bits of state per node (Minimum) node volume (Minimum) transition time (per transition) (Minimum) entropy generated per bit erased (k ln 2) (Minimum) static entropy generation rates For devices even just quiescently maintaining state info Related to energy leakage rates, decoherence times (Minimum) adiabatic frictional coefficient For devices undergoing reversible transitions (Maximum) quality factor Q of active transitions (Minimum) device cost System-level parameters The following parameters may be adjusted as the problem size increases:
Number of nodes utilized Arrangement of utilized nodes in x, y, z Spreading nodes out optimizes perf. on some problems Rate of change of an externally-applied clocking signal (time-dependent potential) Allows trading off adiabaticity vs. speed of computation, as a function of the number of nodes Entropy coefficients of some reversible logic gate operations From Frank, Ultimate theoretical models of nanocomputers (Nanotechnology, 1998): SCRL, circa 1997: ~1 b/Hz Optimistic reversible CMOS: ~10 b/kHz
Merkles quantum FET: ~1.2 b/GHz Nanomechanical rod logic: ~.07 b/GHz Superconducting PQ gate: ~25 b/THz Helical logic: ~.01 b/THz How low can you go? We dont really know! Thermodynamics & Scalability The fastest parallel algorithms for many problems ideally require a 3-D mesh topology. Minimizes communication latencies between points But, entropy flux bounds imply entropy generation rates can scale only proportionally to systems 2-D outer (convex hull) surface area.
Assuming upper bounds on temperature & pressure So, can harness 3rd dimension only to the extent that useful operations can be made reversible. Optimizing efficiency requires a careful tradeoff between performance, power, cost... Reversible/Adiabatic CMOS Chips designed at MIT, 1996-1999: Minimum Losses w. Leakage topt Pleak S leak cE
cS Etot = Eadia + Eleak Eleak = Pleaktr 2 Pleak cE 2T S leak cS Eadia = cE / tr Reversible Emulation - Ben89 k=2 n=3 k=3 n=2
Bennett 89 alg. is not optimal k=2 n=3 k=3 n=2 Just look at all the spacetime it wastes!!! Parallel Frank02 algorithm We can move the triangles closer together, to eliminate the wasted spacetime. Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy! Emulated time k=2
n=3 k=3 n=2 k=4 n=1 Real time Cost-Efficiency Gains, Modified Ben89 Advantage in Arbitrary Computation 100000000 0.6198 y = 1.741x
10000000 70 60 1000000 50 100000 10000 ac p S 1000 100
etim En e 10 e up w blo y = 0.3905x0.3896 30 ed
v a s rgy k 1 20 10 n 0.1 1 100
10000 1000000 40 10000000 0 On/Off Ratio of Individual Devices 1E+10 0 1E+12 out
hw n k Perf. scaling w. # of devices If alg. is not limited by communications needs, Use irreversible processors spread in a 2-D layer. Remove entropy along perpendicular dimension. No entropy rate limits, so no speed advantage from reversibility. If alg. requires only local communication, latency cyc. time, in an NDNDND mesh,
Leak-free reversible machine perf. scales better! Irreversible tcyc = (ND1/3) Reversible tcyc = (ND1/4) (ND1/12) faster! To boost reversibility speedup by 10, one must consider ~1036-CPU machines (1.7 trillion moles of CPUs!) 1.7 trillion moles of H atoms weighs 1.7 million metric tons! A ~100-m tall hill of H-atom sized CPUs! Lower bound on irrev. time Simulate Nproc = ND3 cells for Nsteps ND steps. Consider a sequence of ND update steps. Final cell value depends on ND4 ops in time T. All ops must occur within radius r = cT of cell. Surface area A T2, rate Rop T2 sustainable.
Nops Rop T T3 needs to be at least ND4. T must be (ND4/3) to do all ND steps. Average time per step must be (ND1/3). Any irreversible machine (of any technology or architecture) must obey this bound! Irreversible 3-D Mesh Reversible 3-D Mesh Non-local Communication Best computational task for reversibility: Each processor must exchange messages with another that is ND1/2 nodes away on each cycle Unsure what real-world problem demands this pattern! In this case, reversible speedup scales with number of
CPUs to only the 1/18th power. To boost reversibility speedup by 10, only need 1018 (or 1.7 micromoles) of CPUs If each was a 1-nm cluster of 100 C atoms, this is only 2 mg mass, volume 1 mm3. Current VLSI: Need cost level of ~$25B before you see a speedup. Open issues for reversible comp. Integrate realistic fundamental models of the clocking system into the engineering analysis. There is an open issue about the scalability of clock distribution systems. Exist quantum bounds on reusability of timing signals. Not yet clear if reversible clocking is scalable. Fortunately, self-timed reversible computing also appears to
be a possibility. Not yet clear if this approach works above 1-D models. Simulation experiments planned to investigate this. Develop efficient physical realizations of nano-scale bit-devices & timing systems. Timing in Adiabatic Systems When multiple adiabatic devices interact, the relative timing must be precise, in order to ensure that adiabatic rules are met. There are two basic approaches to timing: Global (a.k.a. clocked, a.k.a. synchronous) timing Approach in nearly all conventional irreversible CPUs Basis for all practical adiabatic/quantum computing mechanisms proposed to date
Local (a.k.a. self-timed, a.k.a. asynchronous) timing Implemented in a few commercial irreversible chips. Feynman 86 showed a self-timed serial reversible computation was implementable in QM, in principle Margolus 90 extended this to a 2-D model with 1-D of parallelism. - Will it work in 3-D? Global Timing Examples of adiabatic systems designed on the basis of global, synchronous timing:
Adiabatic CMOS with external power/clock rails Superconducting parametric quantron (Likharev) Adiabatic Quantum-Dot Cellular Automaton (Lent) Adiabatic mechanical logics (Merkle, Drexler) All proposed quantum computers But, a problem: Synchronous timing may not scale! Work by Janzig & others raises issues of possible limits due to quantum uncertainty. Unresolved. Clock/Power Supply Desiderata Requirements for an adiabatic timing signal / power supply: Generate trapezoidal waveform with very flat high/low regions Flatness limits Q of logic. Waveform during transitions is ideally linear,
But this does not affect maximum Q, only energy coefficient. Operate resonantly with logic, with high Q. Power supply Q will limit overall system Q Reasonable cost, compared to logic it powers. If possible, scale Q t (cycle time) Required to be considered an adiabatic mechanism. May conflict w. inductor scaling laws! At the least, Q should be high at leakage-limited speed (Ideally, independent
of t.) Supply concepts in my research Superpose several sinusoidal signals from phasesynchronized oscillators at harmonics of fundamental frequency Weight these frequency components as per Fourier transform of desired waveform Create relatively high-L integrated inductors via vertical, helical metal coils Only thin oxide layers between turns Use mechanically oscillating, capacitive MEMS structures in vacuo as high-Q (~10k) oscillator Use geometry to get desired wave shape directly Newer Supply Concepts Transmission-line-based adiabatic resonators.
See transparency. MEMS-based resonant power supply See transparency, & next slide Ideal adiabatic supplies - Can they exist? Idealized mechanical model: See transparency. But, may be quantum limits to reusability/scalability of global timing signals. This is a very fundamental issue! A MEMS Supply Concept Energy stored mechanically. Variable coupling strength -> custom wave shape.
Can reduce losses through balancing, filtering. Issue: How to adjust frequency? Programming Model Desiderata Should permit optimally efficient quantum algorithms (constant-factor slowdowns only). Should have reasonable constant factor overheads. Unit cell complexity should be kept low for ease of design & assembly. Should provide a clear separation between program and data, where appropriate. Should be straightforward to program (and to write compilers for). Candidate Programming Model
Unit-cell capabilities: A small number of n-qubit integer registers. Perform programmable 2- and 3- qubit ops on selected data bits (or n-qubit words): Classical digital ops: CNOT, CCNOT, swaps, etc. 1-bit analog unitary ops w. precision up to the limit of the qubit device technology Treat imprecision like decoherence noise, correct it away? Flow of control: For reversibility, could have 2 instruction registers, which take turns executing & loading each other. Data movement:
Streaming between neighboring unit cells. Node Architecture Sketch Instruction registers Data path to/from neighbor node (in 2d or 3d) I/O registers Data registers Execution unit 2. Simulating physical systems on our QC model Many-particle Schrdinger equation
Numerically stable classical reversible sims Quantum equivalents Quantum field theory Simulating Wave Mechanics The basic problem situation: Given: A (possibly complex) initial wavefunction 0 ( x , t0 ) in an N-dimensional position basis, and a (possibly complex and time-varying) potential energy V ( x
, t ), function a time t after (or before) t0, Compute: ( x, t ) Many practical physics applications... The Problem with the Problem An efficient technique (when possible): Convert V to the corresponding Hamiltonian H.
Find the energy eigenstates of H. Project onto eigenstate basis. iH ( t t 0 ) Multiply each component by e . Project back onto position basis. Problem: It may be intractable to find the eigenstates! We resort to numerical methods... History of Reversible Schrdinger Sim. See http://www.cise.ufl.edu/~mpf/sch Technique discovered by Ed Fredkin and student William Barton at MIT in 1975.
Subsequently proved by Feynman to exactly conserve a certain probability measure: Pt = Rt2 + It1It+1 (R=real, I=imag., t=time step index) 1-D simulations in C/Xlib written by Frank at MIT in 1996. Good behavior observed. 1 & 2-D simulations in Java, and proof of stability by Motter at UF in 2000. User-friendly Java GUI by Holz at UF, 2002. Difference Equations Consider any system with state x that evolves according to a diff. eq. that is 1st-order in time: x = f(x ) Discretize time to finite scale t, and use a difference equation instead:
x(t + t) = x(t) + t f(x(t)) Problem: Behavior not always numerically stable. Errors can accumulate and grow exponentially. Centered Difference Equations Discretize derivatives in a symmetric fashion: d x x(t t ) x(t t ) dt 2t Leads to update rules like: x(t + t) = x(t t) + 2t f(x(t)) Problem: States at odd- vs. evennumbered time steps not constrained to stay close to each other!
x1 + x3 + 2tf g g g x2 + x4 Centered Schrdinger Equation Schrdingers equation for 1 particle in 1-D:
2 d2 d i ( x, t ) V ( x, t ) ( x, t ) 2 dt time (& also space) 2m dderivatives x Replace with centered differences. (has t t ) (t t ) i g ( (t )) Centered difference equation real part at odd times that depends only on
R1 imaginary part at even times, & g vice-versa. I2 Drift not an issue - real & imaginary parts represent different state components! R3 g g + I4
Proof of Stability Technique is proved perfectly numerically stable & convergent assuming V is 0 and x2/t > /m (an angular velocity) Elements of proof: Lax-Richmyer equivalence: convergencestability. Analyze amplitudes of Fourier-transformed basis Sufficient due to Parsevals relation Use theorem (cf. Strikwerda) equating stability to certain conditions on the roots of an amplification polynomial (g,), which are satisfied by our rule.
Empirically, technique looks perfectly stable even for more complex potential energy funcs. Phenomena Observed in Model Perfect reversibility Wave packet momentum Conservation of probability mass Harmonic oscillator Tunnelling/reflection at potential energy barriers Interference fringes Diffraction
Gaussian wave packet moving to the right; Array of small sharp potential-energy barriers Initial reflection/refraction of wave packet A little later Aimed a little higher A faster-moving particle Interesting Features of this Model Can be implemented perfectly reversibly, with zero asymptotic spacetime overhead Every last bit is accounted for! As a result, algorithm can run adiabatically, with power
dissipation approaching zero Modulo leakage & frictional losses Can map it to a unitary quantum algorithm Direct mapping: Classical reversible ops only, no quantum speedup Indirect (implicit) mapping: Simulate p particles on kd lattice sites using pd lg k qubits Time per update step is order pd lg k instead of kpd Implicit Mapping Use pd integer registers xj, each lg k qubits long Amplitude of joint state of all registers represents amplitude of wavefunction point x The difference equation term for dimension j amounts to
multiplication of state by matrix xi i (1 2i ) i i (1 2i ) i D j to be) nearly unitary for small (can be normalized i i i
( 1 2 i ) Idea: Can approximate Dj using 1-qubit ops on low-order bit of xi, plus CCNOTs to do carries. Field Theory Systems Goal: Simulate field theory for p particle types in ddimensional space over kd lattice sites
General approach: At each lattice site, have p integer qubit registers nj, denoting the occupancy number of particle type j. nj = 0 or 1 for each type of fermion nj from 0 to nmax for bosons nmax determined by available total energy Use quantum LGCA model (type I) Interact particles at site using collision operator Includes particle creation/annihilation operators Stream particles between sites after collision step 3. Quantum computer simulation on classical computers Visualization techniques, various optimizations
Polynomial-space techniques Simulation of QC Algorithms Visualization: Project states onto 2-D/3-D spaces Corresponding to register pairs/triplets. Use HSV color space to represent amplitudes. Visualize gate ops with continuous color change. Simulation Efficiency: Optimizations: Track only states having non-zero amplitude. Linear-space simulations of n-qubit systems.
Visualization Technique Illustration: 3 stages of Shors algorithm Register value spatial position of pixel Phase angle pixel color hue. Magnitude pixel color saturation. Initial State After doing Hadamard transform on all bits of a After modular exponentiation b=xa (mod N) State After Fourier Transform
Efficient QC Simulations Task: Simulate an n-qubit quantum computer. Maximally stupid approach: Store a 2n-element vector Multiply it by a full 2n2n matrix for each gate op Some obvious optimizations: Never store whole matrix (compute dynamically) Store only nonzero elements of state vector Especially helpful when qubits are highly correlated Do only constant work per nonzero vector element Scatter amplitude from each state to 1 or 2 successors Drop small-probability-mass sets of states Linearity of QM implies no chaotic growth of errors
Linear-space quantum simulation A popular myth: Simulating an n-qubit (or n-particle) quantum system takes e(n) space (as well as time). The usual justification: It takes e(n) numbers even to represent a single (n)dimensional state vector, in general. The hole in that argument: Can simulate the statistical behavior of a quantum system w/o ever storing a state vector! Result BQP PSPACE known since BV93... But practical poly-space sims are rarely described
The Basic Idea Inspiration: Feynmans path integral formulation of QED. Gives the amplitude of a given final configuration by accumulating amplitude over all paths from initial to final configurations. Each path consists of only a single (n)-coordinate configuration at each time, not a full wavefunction over the configuration space. Can enumerate all paths, while only ever representing one path at a time. Simulating Quantum Computations Given: Any n-qubit quantum computation, expressed as a
sequence of 1-qubit gates and CNOT gates. U2 U1 U3 U4 An initial state s0 which is just a basis state in the classical bitwise basis, e.g. 00000. Goal: Generate a final basis state stochasically with the same probability distribution as the quantum computer would do. Matrix Representation
Consider each gate as rank-2n unitary matrix: Each CNOT application is a 0-1 (permutation) matrix - a classical reversible bit-operation. With appropriate row ordering, each Ui gate application is block-diagonal, w. each 22 block equal to Ui. We need never represent these full matrices! The 1 or 2 nonzero entries in a given row can be located & computed immediately given the row id (bit string) and Ui. The Linear-Space Algorithm Generate a random coin c[0,1]. Initialize probability accumulator: p0.
For each final n-bit string y at time t, Compute its amplitude (y) as follows: Generate its possible 1 or 2 predecessor strings x1 (and maybe x2) given the gate-op preceding t. For each predecessor, compute its amplitude at time t1 recursively using this same algorithm, unless t=0, in which case =1 if x=s0, 0 otherwise. Add predecessor amplitudes, weighted by entries. Maybe output y, using roulette wheel algorithm: Accumlate Pr[y] into total: p p +||(y)||2 Output y and halt if p>c. A Further Optimization Dont even have to enumerate all final states! Instead: Stochasically follow a trajectory.
Basic idea: Keep track of 1 current state & its amplitude 0. For CNOTs: Deterministically transform state. For Us: Calculate amplitude 1 of neighbor state w. path-integral Calculate amplitudes 0 and 1 after qubit op Choose 1 successor as new current state, using ||2 distrib. u00 0
0 Current state u10 u Possible 01 successors 1 u11 1 Neighbor state Complexity Comparison To simulate t gate ops (c CNOTs + u 1-bit unitary ops) of an n-qubit quantum computer: Space Time Traditional method: 2n t2n Path-integral method: tn n2t
(Actually, only the u unitary ops, not all t ops or all n qubits, contribute to any of the exponents here.) Upshot: Lower space usage can allow larger systems to be simulated, for short periods. Run time is competitive for case when t < n Conclusion A grab-bag of tricks and techniques Outline of a research program is taking shape Quantum computing is really interesting...
Now if only I can get someone to pay me to devote my full time to studying it! Slides left over from USC talk To import as needed Reversibility of Physics The universe is (apparently) a closed system Closed systems evolve via unitary transforms Apparent wavefunction collapse doesnt contradict this (confirmed by work of Everett, Zurek, etc.) Time-evolution of concrete state of universe (or closed subsystems) is reversible:
Invertible (bijective) Deterministic looking backwards in time Total info. (log # of poss. states) doesnt decrease Can increase, though, if volume is increasing Information cannot be destroyed! Illustrating Landauers principle Before bit erasure s0 0 1 1
sN-1 0 sN 0 2N states sN-1
Unitary (1-1) evolution 0 s0 s0 sN-1 N
states 0 N states After bit erasure s2N-1 0 Benefits of Reversible Computing Reduces energy/cooling costs of computing
Improves performance per unit power consumed Given heat flux limits in the cooling system, Improves performance per unit convex hull area A faster machine in a given size box. For communication-intensive parallel algorithms, Improves performance, period! All these benefits are by small polynomial factors in the integration scale & the device properties. Quantum Computing pros/cons Pros: Removes an unnecessary restriction on the types of quantum
states & ops usable for computation. Opens up exponentially shorter paths to solving some types of problems (e.g., factoring, simulation) Cons: Sensitive, requires overhead for error correction. Also, still remains subject to fundamental physical bounds on info. density, & rate of state change! Myth: A quantum memory can store an exponentially large amount of data. Myth: A quantum computer can perform operations at an exponentially faster rate than a classical one. Some goals of my QC work Develop a UMS model of computation that incorporates quantum computing.
Design & simulate quantum computer architectures, programming languages, etc. Describe how to do the systems-engineering optimization of quantum computers for various problems of interest. Conclusion As we near the physical limits of computing, Further improvements will require an increasingly sophisticated interdisciplinary integration of concerns across many levels of engineering. I am developing a principled nanocomputer systems engineering methodology And applying it to the problem of determining the real costefficiency of new models of computing:
Reversible computing Quantum computing Building the foundations of a new discipline that will be critical in coming decades.