# Slides for Introduction to Stochastic Search and Optimization

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND SAMPLE PATH METHODS Organization of chapter in ISSO Introduction to gradient estimation Interchange of derivative and integral Gradient estimation techniques Likelihood ratio/score function (LR/SF) Infinitesimal perturbation analysis (IPA) Optimization with gradient estimates Sample path method Issues in Gradient Estimation Estimate the gradient of the loss function with respect to parameters for optimization from simulation outputs L L() gg( ) where L() is a scalar-valued loss function to minimize and

is a p-dimensional vector of parameters Essential properties of gradient estimates Unbiased: E g () g() Small variance 15-2 Two Types of Parameters L() E Q ,V Q S , pV | D d where V is the random effect in the system, pV | is the probability density function of V Distributional parameters D: Elements of that enter via their effect on probability distribution of V. For example, if scalar V has distribution N(,2), then and 2 are distributional parameters Structural parameters S: Elements of that have effects directly on the loss function (via Q) Distinction not always obvious 15-3

Interchange of Derivative and Integral Unbiased gradient estimations using only one simulation require the interchange of derivative and integral: ? Q , pV | L d Q , pV | d Above generally not true. Technical conditions needed for validity: Q pV and Q pV / are continuous Q , p | q , q d V 0 0

Q , pV | q1 , q d 1 Above has implications in practical applications 15-4 A General Form of Gradient Estimate Assume that all the conditions required for the exchange of derivative and integral are satisfied, pV | Q , g () Q , pV | d Q , 1 pV | Q , pV |

pV | d log pV V | Q ,V E Q ,V Hence, an unbiased gradient estimate can be obtained as g () Q ,V log pV V | Output from one Q ,V

15-5 Two Gradient Estimates: LR/SF and IPA g () Q ,V log pV V | Q ,V pure LR/SF pure IPA Likelihood Ratio/ Score Function (LR/SF): only distributional parameters g LR / SF () Q ,V log pV V | Infinitestimal Perturbation Analysis (IPA): only structural parameters Q ,V

gIPA () 15-6 Comparison of Pure LR/SF and IPA In practice, neither extreme (LR/SF or IPA) may provide a framework for reasonable implementation: LR/SF may require deriving a complex distribution function starting from U(0,1) IPA may lead to intractable Q/with a complex Q(,V) Pure LR/SF gradient estimate tend to suffer from large variance (variance can grow with the number of components in V) Pure IPA may result in a Q(,V) that fails to meet the conditions for valid interchange of derivative and integral. Hence can lead to biased gradient estimate. In many cases where IPA is feasible, it leads to low variance gradient estimate 15-7 A Simple Example: Exponential Distribution Let Z be exponential random variable with mean . That is pZ z | e z / . Define L E(Z) . Then L/1. LR/SF estimate: V Z; Q(,V) V. g LR / SF () V

log pV V | V V 1 IPA estimate: V U(0,1); Q(,V) logV (Z logV). g IPA () Q ,V logV Both of LR/SF and IPA estimators are unbiased 15-8 Stochastic Optimization with Gradient Estimate Use the gradient estimates in the root-finding stochastic approximation (SA) algorithm to minimize the loss function

L() E[Q(,V)]: Find such that g() 0 based on simulation outputs A general root-finding SA algorithm: k 1 k akYk ( k ) an estimate of g( k ) where ak is the step size with ak 0,ak 0, ak If Yk is unbiased and has bounded variance (and other appropriate assumptions hold), then k (a.s.) 15-9 Simulation-Based Optimization Use gradient estimate derived from one simulation run in the iteration of SA: Q ,Vk log pV (Vk | ) Yk (k ) Q(k ,Vk )

k , k where Vk is the realization of V from a simulation run with parameter set at k k run one simulation with k to obtain Vk derive gradient estimate from Vk iterate SA with the gradient estimate k 1 15-10 Example: Experimental Response (Examples 15.4 and 15.5 in ISSO) Let {Vk} be i.i.d. randomly generated binary (on-off) stimuli with on probability . Assume Q(,,Vk) represents negative of specimen response, where is design parameter. Objective is to design experiment to maximize the response

(i.e., minimize Q) by selecting values for and . 1 T Gradient estimate: [, ] ; pV | 1 , 0 or1 Vk ,V ) Q ( , V ) Q ( k k k k (

1) Yk (k ) Q( k ,Vk ) where k k , k T and Qx denotes derivative w.r.t. x 15-11 Experimental Response (continued) Specific response function: Q ,V 2 (1 ) V pV | 1 1 , 0 or 1.

where is a structural parameter, but is both a distributional and structural parameter. Then: L() E Q ,V 2 2 1/ 3 ; L ( ) 1/ 3 1/ 3 15-12 Search Path in Experimental Response Problem 15-13 Sample Path Method Sample path method based on reusing a fixed set of simulation runs Method based on minimizing L ( rather than L()

N ) L ( ) represents sample mean of N simulation runs N If N is large, then minimum of L ( is close to minimum of L() N ) (under conditions) Optimization problem with L ( )is effectively deterministic N Can use standard nonlinear programming IPA and/or LR/SF methods of gradient estimation still relevant Generally need to choose a fixed value of (reference value) to produce the N simulation runs Choice of reference value has impact on for finite N LN ( ) 15-14 Accuracy of Sample Path Method Interested in accuracy of sample path method in seeking true optimal (minimum of L()) Let *N represent minimum of surrogate loss LN ( ) Let denote final solution from nonlinear programming method

Hence, error in estimate is due to two sources: Error in nonlinear programming solution to finding Difference in and * *N N Triangle inequality can be used to provide bound to overall error: N N Sometimes numerical values can be assigned to two right-hand terms in triangle inequality 15-15

## Recently Viewed Presentations

• Tee it up with great savings on select GE Refrigeration and RAC parts September 6 - October 1, 2011! On-Line: www.gecustomernet.com Phone: 800-851-8777 Fax: 800-821-7021 *Promo prices are net and can be included with your stock order to build order...
• They needed a savior sibling to save their daughter Molly's life, who was suffering from Fanconi anemia. 21-© McGraw-Hill Education. Chapter 3. Savior Siblings and More (2 of 2) An umbilical cord stem cell transplant could save the life of...
• On cooling 2.666 g of pure Cu remains. What is the copper oxide's empiric formula ? (3b. Combustion) PowerPoint Presentation e) count wt: How many grams of O2 are needed to form 9*1022 molecules of H2O ? count wt (4b....
• Don't jump to these. Used to synchronize parallel query workers. Just means you have a parallel query. High wait times mean long running parallel queries. Look at the Tasks. You may not need to do anything. Find queries and tune...
• 1. www.westgard.com Setting QC Limits - illustration Hypothesis Setting the SD limits in the Levy-Jennings QC chart different to the actual instrument SD limits will change the performance of the QC protocol. Terminology ASD - The actual SD of the...
• Provision of incorrect altimeter setting. Misidentification of aircraft by an ATCO or radar operator. Incorrect transmission, receipt or interpretation of significant messages. Airprox and any occurrence in which separation between aircraft is less than that prescribed for the situation. Non...
• produced like plosives, in that they involve a closing stage, a closure stage and a release stage. produced when the active articulator is close to, but not actually in contact with, the passive articulator.
• Entre la aracnoides y la piamadre se halla el espacio subaracnoideo, que contiene líquido cefalorraquídeo. Estas tres capas cubren los nervios espinales hasta el punto donde salen del raquis a través de los agujeros intervertebrales. La inflamación de las meninges...