- Situation of resampling in contemporary statistics

- The underlying principle
- The questions addressed
- The bootstrap: Some Examples

- Some notation

- Complete Enumeration

- Balanced Bootstraps
- Monte Carlo

- Theoretical Underpinnings

- The jackknife

- Cross Validation

- Confidence Intervals

- Bootstrap World

**Question 1**What estimator should be used ?**Question 2**Having chosen an estimator, how accurate is it ?

The second question has to be answered through information on the distribution or at least the variance of the estimator.

Of course there are answers in very simple contexts:
for instance when the parameter of interest is
the mean then the estimator
has a known standard
deviation : the estimated standard error
noted sometimes

However no such estimator is available for the sample median for instance.

In maximum liklihood theory the question 1 is answered through using the mle and then the question 2 can be answered with an approximate standard error of

The bootstrap is a more general way to answer question 2 , with the following aspects:

- Less or no parametric modelling.
- More computation. (a factor 100 to 1000)
- Automatic, whatever the situation (can be complex).

If we had several samples from the unknown
(true) distribution
then we could consider the variations
of the estimator :

Such a situation is never the case, so we replace these new samples by a resampling procedure based on the only information we have about , and that is an empirical :this side is what is called

Heart attacks | Subjects | ||

Aspirin | 104 | 10933 | 11037 |

Placebo | 189 | 10845 | 11034 |

Or is the difference between rates in both populations strictly different from zero?

We could run a simulation experiment to find out.

>> [zeros(1,10) ones(1,2)] ans = 0 0 0 0 0 0 0 0 0 0 1 1 >> [zeros(1,10) ones(1,2)]' ans = 0 0 0 0 0 0 0 0 0 0 1 1 >> [zeros(1,10) ; ones(1,2)] ??? All rows in the bracketed expression must have the same number of columns. >> sample1=[zeros(1,109) ones(1,1)]'; >> sample2=[zeros(1,108) ones(1,2)]'; >> orig=sample1 >>[n,p]=size(orig) n = 110 p = 1 >> thetab=zeros(1,1000);

File myrand.m for the bootstrap index creation:

function out=myrand(m,n,range) %Returns a vector of draws %of integers from 1to n, %with replacement out=floor(rand(m,n)*range)+1;

File bsample.m for the bootstrap resample creation:

function out=bsample(orig) %Function to create one resample from %the original sample orig, where %orig is the original data, and is a %matrix with nrow observations and ncol variables [n,p]=size(orig); indices=myrand(1,n,n); out=orig(indices,:);

File bsample.m:

function out=bsample(orig) %Function to create one resample from %the original sample orig, where %orig is the original data, and is a %matrix with nrow observations and ncol variables [n,p]=size(orig); indices=myrand(1,n,n); out=orig(indices,:);

>> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.bsample1==1); end >>hist(thetab)

This is what the histogram looks like:

Here is the complete data set computation

>> sample1=[zeros(1,10933),ones(1,104)]'; >> sample2=[zeros(1,10845),ones(1,189)]'; >> thetab=zeros(1,1000); >> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.bsample1==1); end

>> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.resample1==1)-2; end >>hist(thetab) >> mean(thetab) ans = -0.9530 >> var(thetab) ans = 1.0398 >> sum(thetab>0) ans = 96

This is what the histogram looks like:

>>thetab=zeros(1,1000); >> for (b =(1:1000)) res.bsample1=bsample(sample1); res.bsample2=bsample(sample2); thetab(b)=sum(res.bsample2==1)-sum(res.bsample1==1); end >>hist(thetab)

This is what the histogram looks like:

The second sample would be considered as coming from another Binomial, in the most general case

Theoretically, what can we say abou the distribution of ?

From an original sample

draw a new sample of observations among the original sample with replacement, each observation having the same probabilty of being drawn (). A bootstrap sample is often denoted

If we are interested in the behaviour of a random variable , then we can consider the sequence of new values obtained through computation of new bootstrap samples.

Practically speaking this will need generatation of an integer between 1 and n, each of these integers having the same probability.

Here is an example of a line of `matlab` that does just
that:

indices=randint(1,n,n)+1;

If we use S we won't need to generate the new observations one by one, the following command generates a n-vector with replacement in the vector of indices (1...n).

sample(n,n,replace=T)

An approximation of the distribution of the
estimate
is provided by the distribution
of

denotes the bootstrap distribution of , often approximated by

*
*

- Compute the original estimate from the original data.
- For b=1 to B do : %B is the number of bootstrap samples
- Create a resample
- Compute

- Compare to .

where is the usual estimate of the variance obtained from the sample.

If we were given true samples, and their associated
estimates
,
we could compute the usual variance estimate
for this sample of values, namely:

where

Here are some computations for the mouse data(page 11 of text)

treat=[94 38 23 197 99 16 141]' treat = 94 38 23 197 99 16 141 >> median(treat) ans = 94 >> mean(treat) ans = 86.8571 >> var(treat) ans = 4.4578e+03 >> var(treat)/7 ans = 636.8299 >> sqrt(637) ans = 25.2389 thetab=zeros(1,1000); for (b =(1:1000)) thetab(b)=median(bsample(treat)); end hist(thetab) >> sqrt(var(thetab)) ans = 37.7768 >> mean(thetab) ans = 80.5110
This is what the histogram looks like:
control=[52 104 146 10 51 30 40 27 46]'; >> median(control) ans = 46 >> mean(control) ans = 56.2222 >> var(control) ans = 1.8042e+03 >> var(control)/length(control) ans = 200.4660 >> sqrt(200.4660) ans = 14.1586 thetab=zeros(1,1000); for (b =(1:1000)) thetab(b)=median(bsample(control)); end hist(thetab) >> sqrt(var(thetab)) ans = 11.9218 >> mean(thetab) ans = 45.4370This is what the histogram looks like: |

Comparing the two medians, we could use the estimates of the standard errors to find out if the difference between the two medians is significant?

Definition:

The sequence of random variables is said to be exchangeable if the distribution of the vector is the same as that of , for any permutation of elements.

Suppose we condition on the sample of distinct observations , there are as many different samples as there are ways of choosing objects out of a set of possible contenders, repetitions being allowed.

At this point it is interesting to introduce a new
notation for a bootstrap resample,
up to now we have noted a possible
reasample, say
,
because of the exchangeability/symmetry property
we can recode this as the vector counting
the number of occurrences of each of the observations.
in this recoding we have
and the set of all bootstrap resamples
is the dimensional simplex

Here is the argument I used in class to explain how big is. Each component in the vector is considered to be a box, there are boxes to contain balls in all, we want to contain to count the number of ways of separating the n balls into the boxes. Put down separators of to make boxes, and balls, there will be positions from which to choose the bars' positions, for instance our vector above corresponds to:

Stirling's formula ( ) gives an approximation ,

here is the function file `approxcom.m`

function out=approxcom(n) out=round((pi*n)^(-.5)*2^(2*n-1));that produces the following table of the number of resamples:

Are all these samples equally likely, thinking about the probability of drawing the sample of all 's by choosing the index times in the integer uniform generation should persuade you that this sample appears only once in times. Whereas the sample with once and all the other observations can appear in out of the ways.

The multinomial distribution

This will be largest when all the 's are , thus the most likely sample in the boostrap resampling is the original sample, here is the table of the most likely values:

As long as the statistic is somewhat a smooth function of the observations, we can see that discreteness of the boostrap distribution is not a problem.

Theoretically, we could give a complete enumeration of the bootstrap sampling distribution, all we need to know is how to compute the statistic for all the bootstrap resamples. There is a way of going through all resamles as defined by the simplex , it is called a Gray code. As an example, consider the law school data used by Efron (1982). One can only hope to enumerate completely for moderate sample sizes ( today). For larger sample sizes partial enumeration through carefully spaced points is discussed in Diaconis and Holmes (1994a). Another idea is to use a small dose of randomness, not as much as Monte Carlo, by doing a random walk between close points so as to be able to use updating procedures all the same, this is detailed in the case of exploring the tails of a bootstrap distribution in Diaconis and Holmes (1994b).

It is easy to give a recursive description of such a list, starting from the list for (namely 0,1). Given a list of length , form by putting a zero before each entry in , and a one before each entry in . Concatenate these two lists by writing down the first followed by the second in reverse order. Thus from we get and then the list displayed above for . For the list becomes :

Gray codes were invented by F. Gray (1939) for sending sequences of bits using a frequency transmitting device. If the ordinary integer indexing of the bit sequence is used then a small change in reception, between 15 and 16, for instance, has a large impact on the bit string understood. Gray codes enable a coding that minimizes the effect of such an error. A careful description and literature review can be found in Wilf (1989). One crucial feature: there are non-recursive algorithms for providing the successor to a vector in the sequence in a simple way. This is implemented through keeping track of the divisibility by 2 and of the step number.

One way to express this is as follows : let be the binary representation of the integer , and let be the string of rank in the Gray code list. Then and . For example, when , the list above shows the string of rank is ; now . So . Thus from a given string in the Gray code and the rank one can compute the successor. There is a parsimonious implementation of this in the algorithm given in the appendix. Proofs of these results can be found in Wilf (1989).

Let be the original data supposed to be independent and identically distributed from an unknown distribution on a space . The bootstrap proceeds by supposing that replacement of by , the empirical distribution, can provide insight on sampling variability problems.

Practically one proceeds by repeatedly choosing from the points with replacement. This leads to bootstrap replications . There are such possible replications, however these are not all different and grouping together replications that generate the same subset we can characterize each resample by its weight vector where is the number of times appears in the replication. Thus .

Let the space of compositions of into at most parts be

Thus . We proceed by running through all compositions in a systematic way. Note that the uniform distribution on induces a multinomial distribution on

To form the exhaustive bootstrap distribution of a statistic one need only compute each of the statistics and associate a weight with it. The shift from to gives substantial savings. For the law school data, while .

Efficient updating avoids multiplying such large numbers by factors of . This is what makes the computation feasible. Gray codes generate compositions by changing two coordinates of the vector by one up and one down. This means that can be easily updated by multiplying and dividing by the new coordinates. Similar procedures, discussed in Section C below allow efficient changes in the statistics of interest.

Following earlier work by Nijenhuis, Wilf and Knuth, Klingsberg (1982) gave methods of generating Gray codes for compositions. We will discuss this construction briefly here, details can be found in Klingsberg (1982) and Wilf (1989).

For , the algorithm produces the 10 compositions of in the
following order:

The easiest way to understand the generation of such a list is recursive, construction of the -compositions of can actually be done through that of the compositions of .

For any , the 2-composition is just provided by the list , which is of length . So the 2-compositions out of 3 are

the 2-compositons out of 2 are

and 2-compositions out of 1 are

Finally there is only one 2-composition of :

The 3-out of 3 list is obtained by appending a to the list, a
1 to the list, a 2 to the list and a 3 to the
list. These four lists are then concatenated by writing the first, the
second in reverse order, the third in its original order followed by the
fourth in reverse order. This is actually written:

and more generally

The same procedure leads to the 35 compositions of in the following
order:

The lists generated in this way have the property that two successive compositions differ only by in two coordinates.

Klingsberg (1982) provides a simple nonrecursive algorithm that generates the successor of any compositon in this Gray code. This is crucial for the implementationin the the present paper. It requires that one keep track of the whereabouts of the first two non-zero elements and an updating counter. Both a and a version of the algorithm are provided in the appendix.

We conclude this subsection by discussing a different algorithm due to Nijenhuis and Wilf (1978, pp. 40-46) which runs through the compositions in lexicographic order (reading from right to left). This algorithm was suggested for bootstrapping by Fisher and Hall (1991).

__ algorithm to run through compositions __.

(1) Set .

(2) Let first with . Set .

(3) Stop when .

For example, the following list gives all 35 compoisitions in in the order produced by the N.W. algroithm

Intermediary between the complete enumeration of all samples and the drawing of a few samples at random lies the Balanced Bootstrap, (see Gleason, 1988, American Statistician,263-266) for a simple algorithm.

The idea is that as when we count all samples each observation has the same chance of appearing, we would like to have each observation appearing overall the same number of times after all bootstrap resamples have been drawn.

We can be ensured of this by taking a list of repetitions of each of the observations and repeating them times, take a permutation of these values and then take the first as the first sample, and so on.

This is the method used for drawing a sample at random from the empirical distribution. I will start by giving a history and general remarks about Monte Carlo methods for those who have never studied them before.

What is a Monte Carlo Method?

In physics and statistics many of the problems Monte Carlo is used on is under the form of the estimate of an integral unkown in closed form:

- The crude, or mean-value Monte Carlo method thus
proposes to generate numbers
uniformly from and take their average:
to estimate ,

- The hit-or-miss Monte Carlo method
generates random points in a bounded
rectangle and
counts the number of 'hits' or points that are
in the region whose area we want to evaluate.

This is similar to comparing statistical estimators in general.

There are certain desirable properties that we want estimators to have, consistency which ensures that as the sample size increases the estimates converges to the right answer is ensured here by the properties of Riemann sums. Other properties of interest are:

- Unbiasedness.
- Minimal Variance.

So the choice between them lies in finding the one which has the less variance. The heuristic I developed in class to see that the hit-and-miss has a higher variance is based on the idea that the variance comes from the added randomness of generating both coordinates at random, instead of just the absissae in the crude Monte Carlo.

More precisely, the variance of crude Monte Carlo is

and that of hit and miss Monte Carlo, which is just a Binomial(n,) is:

The difference between these two variances is always positive:

Most improvements to Monte Carlo methods are variance-reduction techniques.

This the idea in antithetic resampling (see Hall, 1989). Suppose we are interested with a real-valued parameter, and that we have ordered our original sample , for each resample and statistic we associate by taking a special permutation of the 's that will make , and as small as possible. If the statistic is a smooth function of mean for instance, then the 'reversal permutation' that maps into , into , etc... is the best, the small sample values are transformed into the larger observations, and the average of these two estimates will give an estimate with smaller variance.

We often speak of the asymptotic properties of
the sample mean .These refer to the sequence .
These functions are the *same* in some sense, for all
sample size. The notion of statistical functional makes this
clearer.

Suppose we are interested in real-valued parameters.
We often have a situation where the parameter of
interest is a function of the
distribution function , these are called statistical functionals.
Examples:

Goodness of fit statistics:

is estimated by:

Ratio of two means.

We use the sample cdf

as the nonparametric estimate of the unknown distribution .

The usual estimates for these functionals are obtained by simply plugging in the empirical distribution function for the unknown theoretical one.

Thus taking into account that
for any function we have:

the plug-in estiamte for the mean is the usual sample estimate, for the variance, it is the biased estimate:

A sequence of cumulative distribution functions is said to converge in distribution to iff on all continuity points of .

We say that if the random variable has
cdf and the rv has cdf ,
*converges in law* to , and we write

This does not mean that and are arbitrarily close, think of the random variables and .

Convergence in Probability

Note:

If
where is a limit distribution and
then
.

This number has a binomial distribution with probability so that the theoretical expectation is and . From this and de Moivre's central limit theorem for binomial, we can show for fixed real

This shows consistency pointwise.

Because of the result noted above, this also ensures that
, this is actually true
uniformly in because Kolmogorov's statistic is pivotal

has a distribution that does not depend on . In fact , as . It is easy for continuous , but also true for larger classes of cdfs (Serfling, Billingsley).

**Definition:**
A statistic is said to be pivotal if its distribution does not
depend on any unknown parameters.

**Example:**
Student's statistic.

- The sampling distribution of the error:
- The bias:
- The standard error:

The bootstrap is said to work if

Typically, we will be in the situation where

If the above is true the bootstrap works if and only if:

We will not give any proofs in this class. (If you want to do a theoretical project, I can guide you to cases where this can be done.)

Examples:
Estimating the bias of the median:

Bootstrapping a probability:

as in the case of finding an error distribution: , we can see that this is finding the volume of a set under a known distribution, and Monte Carlo is the natural route. So in this new notation we see that bootstrapping is a two step process:

- Estimate by .
- Approximate
by its
Monte Carlo approximation
, with B
as big as possible.
When
,
.
Note: is not an unknown parameter.

Even if the numerators are unkowns depending on the real distribution .

This method was the `jackknife`.

This is the first time that the sample was manipulated, each observation is dropped once from the sample, the new distribution function is called when the observation has been dropped, and the functional estimate gives the estimator and their average is denoted

We can use the jackknife for estimating the bias by:

We can show that if the bias is of the order
as in then the Jackknife estimate is
in

So the order of the bias has been decreased from to

Example:

Suppose
for which the plug in estimate is:

%------------------------------------------ function out=jackbias(theta,orig) %Estimate the bias using the jackknife %Theta has to be a character string containg % a valid function name [n,p]=size(orig); lot=feval(theta,orig(2:n,:)); k=length(lot); lo=zeros(n,k); lo(1,:)=lot; lo(n,:)=feval(theta,orig(1:(n-1),:)); for i=(2:(n-1)) lo(i,:)=feval(theta,orig([1:(i-1),(i+1):n],:)); end thetadot=mean(lo); out=(n-1)*thetadot -(n-1)*feval(theta,orig); %------------------------------- function out=ratio(yszs) %Computes the ratio of mean of first %column of mean of second column out=mean(yszs(:,1))/mean(yszs(:,2)); %------------------------------------- >>z=oldpatch-placebo ; y=newpatch-oldpatch >> yz=[y,z] -1200 8406 2601 2342 -2705 8187 1982 8459 -1290 4795 351 3516 -638 4796 -2719 10238 >>ratio(yz) -0.0713 >> jackbias('ratio',yz) ans = 0.0080

**Bootstrap Simulations**

function out=bootbias(theta,orig,B) thetabs=zeros(1,B); for b=(1:B) bs=bsample(orig); thetabs(b)=feval(theta,bs); end theta0=feval(theta,orig); out=mean(thetabs)-theta0; %----------------------------------- >> bootbias('ratio',yz,1000) ans = 0.0085 %----------------------------------- function out=bootmultin(orig,B) [n,p]=size(orig); out=zeros(B,n); for b=1:B inds=randint(1,n,n)+1; for i=1:n out(b,inds(i))=out(b,inds(i))+1; end; end %-----------------------------------

The best thing to do is at the beginning of the study to take a random sub-sample, without any particular stratification and to put it aside for the confirmatory stage. Many scientists are mean with their data, and only have just enough to model, but nowadays the expense of an extra 25 % or so, should be made - especially when the consequences of the study are medical, this is what tukey and mallows call a careful serarate diagnostic.

For instance in Discriminant Analysis

For each observation, do the analysis without that one, and
look whether or not it is well classified, this will give an unbiased
estimate of the percentage of badly classified.
Cross Validation can thus be used when one variable
has the particular status of being explained.

And in regresssion

We want to estimate the prediction error:

This can be done by cross validation, writing:

However it has also been used at the diagnostic stage in principal components, and in classification and regression trees where it helps choose the size of an `optimal tree'.

We call the unknown parameter and our estimate . Suppose that we had an ideal (unrealistic) situation in which we knew the distribution of , we will be interested especially in its quantiles : denote the quantile by and the quantile by .

By definition we have:
and
giving a confidence interval:

- Would we get the same answer without
centering with regards to ?
What we called and were the ideal quantiles from which we build the confidence interval : .Estimated by : using the th quantile of , the 's do NOT cancel out.

- These intervals would not be the same if we took the distribution of to mimick simply that of the 's, (those are seen later as the percentile method estimates).

- By choosing better quantities to bootstrap than the original estimate, through various transforms some of which require a double level bootstrap.
- By using techniques inspired from Monte Carlo methods : importance sampling, antithetic sampling. balanced sampling.
- By transforming the results obtained after an ordinary bootstrap process : bias correction EFRON 1982, accelerated bias correction EFRON 1987 for instance.

**Preliminaries : Pivotal quantities**

Construction of confidence intervals
is based (ideally) on a pivotal quantity
, that is a function of the sample
and the parameter whose distribution
is independant of the parameter
, the sample, or any other unknown
parameter.
Thus for instance no knowledge
is needed about the parameters and
of a normal variable
to construct
confidence intervals for
both of these parameters,
because one has two pivotal quantities available :

Let
denote the largest
quantile of the distribution of the
pivot , and
the parameter space, set of all possible values for
then

is a confidence set whose level is . Thus if , and denote the relevant quantiles for the distributions cited above the corresponding confidence intervals will be

and

Such an ideal situation is of course rare and one has to do with approximate pivots, (sometimes called roots). Therefore errors will be made on the level () of the confidence sets. Level error is sensitive to how far is from being a pivot.

We will present several approaches for finding roots One approach is to studentize the root that is divide it by an estimate of its standard deviation.

We will suppose that the asymptotic variance of is with a corresponding estimate .

The studentized bootstrap or bootstrap-t method (EFRON 1982, HALL 1988) is based on approximating the distribution function of by , distribution of .

Denote by
the th quantile
of the distribution of
,

Thus the equitailed bootstrap-t confidence interval of coverage is :

One way of finding an estimate is to use a second level bootstrap, this is rather expensive computation-wise : one could prefer as an intermediate solution a jackknife estimate.

- Compute the original estimate of : THETA(0)
- For b=1 to B do :
- Generate a sample
- Compute the estimate of based on : THETA(b)
- Use as original data and
compute an estimate of the standard deviation
of

(requires another set of iterations): SIGMA(b). - Compute =(THETA(b)-THETA(0))/ SIGMA(b)

- Compute CNA and CNCA the and 'th quantiles for the sample formed by all the R's.
- LO=THETA(0)-SQRT(N)*CNCA*SIGMA(0)
- HI=THETA(0)-SQRT(N)*CNA*SIGMA(0)

function out=corr(orig) %Correlation coefficient c1=corrcoef(orig); out=c1(1,2); %------------------------------------------ function interv=boott(orig,theta,B,sdB,alpha) %Studentized bootstrap confidence intervals % theta0=feval(theta,orig); [n,p]=size(orig); thetab=zeros(B,1); sdstar=zeros(B,1); thetas=zeros(B,1); for b=1:B indices=myrand(1,n,n); samp=orig(indices,:); thetab(b)=feval(theta,samp); %Compute the bootstrap se,se* sdstar(b)=feval('btsd',samp,theta,n,sdB); %Studentized statistic thetas(b)=(thetab(b)-theta0)/sdstar(b); end se=sqrt(var(thetab)); Pct=100*(alpha/2); lower=prctile(thetas,Pct); upper=prctile(thetas,100-Pct); interv=[(theta0-upper*se) (theta0 - lower*se)]; %---------------------------------------------- function out=btsd(orig,theta,n,NB) %Compute the bootstrap estimate %of the std error of the estimator %defined by the function theta %NB number of bootstrap simulations thetab=zeros(NB,1); for b=(1:NB) indices=myrand(1,n,n); samp=orig(indices,:); thetab(b)=feval(theta,samp); end out=sqrt(var(thetab)); %---------------------------------------------- >> boott(law15,'corr',1000,30,.05) -0.4737 1.0137 %---------------------------------------------- >> boott(law15,'corr',2000,30,.05) -0.2899 0.9801 %----------------------

function out=transcorr(orig) %transformed correlation coefficient c1=corrcoef(orig); rho=c1(1,2); out=.5*log((1+rho)/(1-rho)); >> transcorr(law15) 1.0362 >> tanh(1.03) 0.7739 >> boott(law15,'transcorr',100,30,.05) -0.7841 1.7940 >> tanh(ans) -0.6550 0.9462 >> boott(law15,'transcorr',1000,30,.05) 0.0473 1.7618 >> tanh(ans) 0.0473 0.9427 >> transcorr(law15) 1.0362 >> 2/sqrt(12) 0.5774 >> 1.0362 - 0.5774 0.4588 >> 1.0362 + 0.5774 1.6136 >> tanh([.4588 1.6136]) 0.4291 0.9237 %%%%%%%%%%%%%%%True confidence Intervals% >> prctile(res100k,[5 95]) ans = 0.5307 0.9041

The percentile method consists
in taking the
confidence interval
for as being

Theoretically speaking this is equivalent to replacement of the unknown distribution by the estimate .

- Input the level of the confidence interval
- For b=1 to B do /* B is the number of bootstrap samples*/
- For i=1 to n do /* Generation of Sample */
- Compute
- Combine the into a new data set THETA
- Compute the and 'th quantiles for THETA.

If is linear as we know finding it's expectation is trivial, and we can also find it's variance (recalling that and )

So when is not linear we have to bring ourselves back to that
case by using a Taylor expansion, we hope to find a region where
dwells with high probability and where is linear.

This gives as a first order approximation to the expectation of : and as a second order approximation

This is called the

1) How nonlinear is in a neighborhood of 2) How big is.

We are just interested in the neighborhood of because that's where X lies most of the time. (remember Chebychev).

This method proves a generalisation of a central limit theorem:

as long as exists and is non zero.

By Taylor's theorem:

**Binomial:**
We're interested in estimating , should we

- Do binomial trials with probability of success .
- Do binomial trials with probability of success and then estimate the parameter by

Otherwise it is that is the most efficient.

The delta method provides a route towards variance stabilization, ie making the variance less dependent on (and thus making the distribution closer to being pivotal).

We would like to find transformations such that , where does not depend on .

By taking the transformation to be such that , we have a constant variance.

**Poisson case:**

,

For , we have and is asymptotically pivotal.

**Chisquare:**

Let , where
.
,
,
,
the variance stabilizing
is

a function of two random variables we know about, we need two-dimensional Taylor series, note Then

as a first approximation, this gives us straight away for the expectation and with a little more work for the variance we have :

An improved estimate of the expectation can be obtained by taking the second order Taylor expansion:

From this and from the properties of the expectation:

For -variables it is similar.

Note :
If you don't remember about Taylor series you can
get the computer to remind you by using `Maple`.

Type `maple`, then:

>help(taylor); >taylor(f(x),x=mu,3); for multivariate: >readlib(mtaylor); >mtaylor(f(x,y),[x=a,y=b],3);

To simplify the formulas, suppose
the variances of both and are 1 and the means are 0,
then a simplified version of the argument in Lehmann(page 317)
says that any bivariate distribution with
correlation coefficient

The variance stabilizing transformation is given by the condition, with :

so that

This is called Fishers -transformation.