Experimental Results for Gradient Estimation and Optimization of a Markov Chain in Steady-State

P. L'Ecuyer, N. Giroux, and P. W. Glynn

Proceedings of the 1990 Vienna Conference on Computationally Intensive Methods in Simulation and Optimization, Lecture Notes in Economics and Mathematical Systems, No. 374, 14-23 (1992)

Infinitesimal perturbation analysis (IPA) and the likelihood ratio (LR) method have drawn lots of attention recently, as ways of estimating the gradient of a performance measure with respect to continuous parameters in dynamic stochastic systems. In this paper, we experiment with the use of these estimators in stochastic approximation algorithms, to perform so-called "single-run optimizations" of steady-state systems, as suggested in [23] . We also compare them to finite-difference estimators, with and without common random nunbers. In most cases, the simulation length must be increased from iteration to iteration, otherwise the algorithm converges to the wrong value. We have performed extensive numerical experiments with a simple M/M/1 queue. We state convergence results, but do not give the proofs. The proofs are given in [14].

[14] L'Ecuyer, P. and Glynn, P. W., "A Control Variate Scheme for Likelihood Ratio Gradient Estimation", In preparation (1990)

[23] Suri, R., "Perturbation Analysis: The State of the Art and Research Issues Explained via the GI/G/1 Queue", Proceedings of the IEEE, 77, 1 (1989), 114-137