© Michael J. Rosenfeld, 2009, 2012

Notes on terminology for evaluation of research.

1) What we mean and don’t mean by bias.

What we don’t mean: Statistical bias has nothing to do with personal bias or prejudice.

There are two kinds of errors relevant to statistical measurement. One is noise, the other is bias.

Given a set of observed measurements X,

X= Xo+ bias + random error

where Xo is the perfect theoretical measure, which we are never able to quite reach.

The expected value of X, or let’s say the average of our X measurements, is

E(X)= Xo+ bias

bias is a constant which skews our measurements. We usually can’t tell exactly what the bias is because the true theoretical values, Xo are not measurable. The only thing we have to examine is X, which comes with bias already in it.

Variance of X comes entirely from the random error.

Key to remember: bias skews the results, whereas random errors increase the variance but do not skew the results.

In a random sample, larger sample size can help reduce the influence of random noise (I will explain this later in the class). But larger sample size usually does nothing to minimize the effect of bias.

The world of research is full of biases and potential biases, only some of which we will actually discuss and exemplify in this class.

2) Sampling frame. This is the universe of individuals from whom we want to know something about (potential voters, all persons, adults). Think of this as the list of people we might want to randomly sample from. Of course, in real life, we usually don’t have a full and complete list of everyone in the desired sampling frame. Another word for the sampling frame is the sampling universe, this is the population your data pertain to. You *must* always know what your sampling frame is. We make hypotheses about the wider universe, or the sampling frame, but we measure the sample. Keep the two separate in your mind.

3) Random sampling versus Convenience sampling

* Random sampling occurs when every subject in your sampling frame has an equal and random chance of being sampled. A random sample is generalizable to the whole sampling frame.

The purpose of statistical analysis is usually to use the data in our sample to test hypotheses about the whole sampling frame.

* Convenience sampling occurs when you find the subjects who are easiest or most convenient to find, and interview them. Convenience samples are not generalizable, meaning even if we know a lot about our sample, we may not be able to make inference about the wider sampling frame.

There are other kinds of sampling as well, which fall in between these extremes:

* Snowball Sampling

* Respondent Driven Sampling (a variety of Snowball Sampling)

* Stratified Random Sampling

And what if our data contain not a sample but the entire sampling frame? All 50 states, for instance? This is what Rice refers to as sampling fraction=1. Certainly, if the sampling fraction=1, we are not going to making probabilistic arguments about the sampling frame because we have measured the whole sampling frame. There may still be some use in fitting models to the data, but standard errors of coefficients may not be meaningful.

Kinds of bias or potential bias

<page under construction>