next_group up previous


Transforming continuous data into ranks

Continuous data are numbers with decimals in general, sometimes just numbers,(big numbers), continuous means really that once you have observed two values you can also see a value between them, there are not a fixed number of categories.

Ordinal data are numbers that can be ordered. Years, ages, number of siblings, weight , height are all ordinal.

Hair color, eye color are not ordinal, success, failure.

Brad considered two variables simultaneously in his ballot problem, but on categories.

We are going to do the same but with ordinal data.

We are looking for association between variables. We will see later that this is not the same as finding a causal relationship between the two, it may just mean that the two variables are influenced by a third characteristic.

Looking at 2 ordinal variables at the same time

We study here two ordinal variables measured on the same subjects, and we want to see whether they are correlated, positively negatively or not at all.

If you want to see some interactive calculations of correlations for various scatterplots, here are some good surfing spots:

  1. Different plots
  2. Guess the correlation game
  3. Scatterplot builder
  4. Phil Stark's Correlation Demo

Albino Male Mice


Intellig. Dominin.
   45.0   63.7
   26.0    0.1
   20.0   15.6
   40.0  101.2
   36.0   25.4
   23.0    1.8
We rank the data:

Intellig. Dominin.
   6        5
   3        1
   1        3
   5        6
   4        4
   2        2
We rearrange to get an increasing x value, but keeping the observation rows whole.

Then look down the Y column, and find how many concordant pais there are , call this number P, then how many discordant pairs, call this number Q. There are $\frac{6\times 5}{2}=15$ possible pairs in all.

Kendall's tau is given by:

\begin{displaymath}\tau=\frac{P-Q}{15}=\frac{7}{15}\approx \frac{1}{2}\end{displaymath}

The question we can ask is: does a value of .5 indicate a strong association when there are only 6 pairs?

We answered it by doing a simulation study, just like Brad's. Except, we have as our null hypothesis, that there is no association, so the ranks could have been in any particular order in the Y column, thus, we can take 1000 or 2000 permutations of the numbers from 1 to 6 and plug them into the Y column, and each time recompute what we would have obtained as a possible tau value.

The question we ask is: how many of them were larger than 7?

For 1000 simulations, we get 57 larger than 1000, giving the pvalue of $57/1000=0.057$, this is borderline to being significant, so I redid the simulation with $5,000$ random permuations and I get $343$ values larger than 7 out of 5,000. This gives a p-value of

\begin{displaymath}\frac{343}{5000}=0.0686
\end{displaymath}

Not significant.

There is not a case for a strong association between social dominance and IQ in this data.

Crying Babies and Moving Eyes

The eye movement data:



980.8 926.4 892.9 870.2 854.6 777.2  772.6 702.4 561.7
4.85  4.41  3.80  4.53  4.33   3.81   3.97  3.68  3.43

Making graphics, we will talk alot about scatterplots when we look at 2 variables measured on the same subjects.

We create these by taking a coordinate system whose limits are fixed by the range of the two variables, and plotting always the first variable as the horizontal coordinate and the second as the vertical one.

The transformed data:


 mateye
     9     8     7     6     5     4     3     2     1
     9     7     3     8     6     4     5     2     1
This can also be plotted, this is still meaningful, we have lost the precision, and some extra information about certain clusters.

Pvalue?


 tau(mateye)=24
 sum(outeye>24)= 12
 12/2000=    0.0060

Description:
Do babies who cry more tend to have a higher IQ later?
Cry count and Stanford -Binet IQ for 14 out of a total of 22 babies, here are the first data: The Data:
Crys 20 17 14 23 13 27 18 15 22 16 12 19 26 21                      
IQ 90 94 100 103 106 108 109 112 114 118 119 132 155 157                      

Max number of concordances=7*13=91.

Number of concordances= 7.

Is it significant:
sum(outcry>7)

670 out of 2000=pvalue of 0.3350
Not significant.......

Visualization by direction of joining lines between pairs.

Other measurements of correlation:

  1. Pearson's correlation coefficient.
  2. Spearman's Rank correlation coefficient: $
Sum( (xranki-yranki)^2)$
  3. Tukey's corner test.



next_group up previous
Susan Holmes
2001-02-08