Quiz 5
Suppose one observed m + n independent random variables, m from one distribution, and n
from another distribution, and obtained the samples
x1,
x2, ...
xm,
and
y1,
y2, ...
yn.
Further suppose that it is known that these (real world) distributions have exactly the same shape.
But the distributions may have different means.
(So the distributions have the same variance, same skewness, and same kurtosis, but possible differ in location
--- a shift model can be assumed.)
Explain how one can generate bootstrap samples from the observed data so that the distributions have the same
shape in the bootstrap world.
Note: One could get bootstrap samples of the x values by resampling from the
combined sample of the observed x values and the observed y values, and
get bootstrap samples of the y values by resampling from the
combined sample of the observed x values and the observed y values, and this would have the
bootstrap samples of the x values and
bootstrap samples of the y values coming from exactly the same distribution (which satisfies the stipulation
that the underlying distribution of the x values and the underlying distribution of the y values have
the same shape), this scheme would not be
good if the means of the two real world distributions seem clearly different and the purpose of the bootstrapping is
to estimate the standard error of the estimator of the difference in the means (which could be the difference in two
M-estimators). (If the two means were relatively
far apart, sampling from the combined sample would have the bootstrap observations coming from a bimodal distribution
with a relative large variance compared to the common variance of the distributions underlying the x and
y observations, and this could make the estimated standard error of interest way off.)
Solution
If the shift model is true for the real world, we have
Xi = μX + Ei (i = 1, 2, ..., m)
and
Yj = μY + Em+j (j = 1, 2, ..., n),
where
E1,
E2,
E3, ...,
Em+n
are iid random variables.
We can use the empirical distribution of the pooled residuals to approximate the real world error term distribution
in the bootstrap world. (That is, from each xi subtract the sample mean of the x sample,
and from each y observation subtract the sample mean of the y sample, and use the residuals obtained
from both samples to approximate the error term distribution in the bootstrap world.)
To create a bootstrap sample of x and y values, resample from the pooled residuals m + n
times. Add m of the resampled residuals to the estimate of
μX (the sample mean of the original x sample) to obtain the bootstrap sample x values,
and add n of the resampled residuals to the estimate of
μY (the sample mean of the original y sample) to obtain the bootstrap sample y values.