Lesson 22 Comparing correlations

# Episode 2

Reasoning Resources: Worksheet 1 (A2/A3 size) or on OHT and pupil copies
Whole class preparation
There is a whole family of quantitative measures of correlation which begin with either the full correlation matrix, or the same matrix reduced to a four-cell table: Talk through the science/maths graph at the board, and how to split the data into high and low grades in maths and in science. There are 22 pieces of data.To find the median for science count up vertically from grade G. The median (11'/th) value is halfway between D and E. Draw a horizontal line through this point. Repeat, counting horizontally for maths: the median value is halfway between D and C. Draw a vertical line through this point. Note that the split would be different if we were to ‘count down’ from highest grades.

The four cells can be called ‘high/high; low/low; high/low, and low/high’
In the four cells the ‘1's and the ‘0's refer either to success or failure, or to high and low values of the two variables concerned. So cells b and c refer to cases tending to confirm a correlation relation, and cells and the cases tending to
disconfirm the correlation. If cells a and d are empty or nearly empty, we obtain a very large number, and the correlation is nearly perfect.
Then talk through the numbers in terms of high/high (7) and low/low (10) confirming the relation that ‘the better one is in maths the better in science; and the others (low/high (3) and high/low (2)) as disconfirming. Ask different pupils to explain the meaning of this pairing of cells in their own terms. How can we quantify the relationship between all confirming cases to all disconfirming cases?'‘There are 17 confirming cases, which is 12 more than the 5 disconfirming cases; or more sophisticated answers of the type ‘there are about 3 confirming cases to each disconfirming case’
Pair and small group work
Here we use the ratio, (b+c)/ (a +d), the sum of confirming to the sum of disconfirming cases, which agrees with intuitive judgements about relative degrees of correlation: when the sums of confirming and disconfirming cases are the same the ratio is 1, and the correlation is 0. Allow time for the class to split the other two graphs in a similar fashion. Then conduct another exchange of ideas about good, bad and fair prediction and about the notion of correlation. It may help to draw in top and bottom lines, or narrow and wide ellipses around the scatters, for runners and science/maths grades, and a rough circle for English/PE.

Higher attaining pupils may be able to link the ratio of confirming to disconfirming cases to the visual ratio of width to length of ellipses, and to the ratio of the range of prediction to the total range. They could see the linear relation as the limit of the correlation relationship. But there is no need to proceed to formal values for correlation, which relate all values to the line of best fit and measure how good that is.
End of Lesson Reflection
This is the lowest level algorithm we can think of that is still intellectually honest. The intention is to provide something which will help the more able pupils to think further about what correlation means, while giving the less able pupils something they can handle. Regardless of how far the pupils have progressed in this activity, they should spend a few minutes discussing the value of the scatter graphs:

• When is a scatter graph a good way to display data?
• How can we tell whether there is a predictive relationship between two sets of data?