Quiz 5

Suppose one observed m + n independent random variables, m from one distribution, and n from another distribution, and obtained the samples

x₁, x₂, ... x_m,

and

y₁, y₂, ... y_n.

Further suppose that it is known that these (real world) distributions have exactly the same shape. But the distributions may have different means. (So the distributions have the same variance, same skewness, and same kurtosis, but possible differ in location --- a shift model can be assumed.) Explain how one can generate bootstrap samples from the observed data so that the distributions have the same shape in the bootstrap world.

Note: One could get bootstrap samples of the x values by resampling from the combined sample of the observed x values and the observed y values, and get bootstrap samples of the y values by resampling from the combined sample of the observed x values and the observed y values, and this would have the bootstrap samples of the x values and bootstrap samples of the y values coming from exactly the same distribution (which satisfies the stipulation that the underlying distribution of the x values and the underlying distribution of the y values have the same shape), this scheme would not be good if the means of the two real world distributions seem clearly different and the purpose of the bootstrapping is to estimate the standard error of the estimator of the difference in the means (which could be the difference in two M-estimators). (If the two means were relatively far apart, sampling from the combined sample would have the bootstrap observations coming from a bimodal distribution with a relative large variance compared to the common variance of the distributions underlying the x and y observations, and this could make the estimated standard error of interest way off.)

Solution

If the shift model is true for the real world, we have

X_i = μ_X + E_i (i = 1, 2, ..., m)

and

Y_j = μ_Y + E_m+j (j = 1, 2, ..., n),

where E₁, E₂, E₃, ..., E_m+n are iid random variables.

We can use the empirical distribution of the pooled residuals to approximate the real world error term distribution in the bootstrap world. (That is, from each x_i subtract the sample mean of the x sample, and from each y observation subtract the sample mean of the y sample, and use the residuals obtained from both samples to approximate the error term distribution in the bootstrap world.)

To create a bootstrap sample of x and y values, resample from the pooled residuals m + n times. Add m of the resampled residuals to the estimate of μ_X (the sample mean of the original x sample) to obtain the bootstrap sample x values, and add n of the resampled residuals to the estimate of μ_Y (the sample mean of the original y sample) to obtain the bootstrap sample y values.