Recently there's been a bit of discussion in our Faculty on how to get a reliable evaluation of people's teaching. The traditional approach is with the appraisal. At the end of each paper the students get to answer various questions on the teacher's performance on a five-point Likert Scale (i.e. 'Always', 'Usually', 'Sometimes', 'Seldom', 'Never'.) For example: "The teacher made it clear what they expected of me." The response 'Always' is given a score of 1, 'Usually' is given 2, down to 'Never' which is given a score of 5. An averaged response of the questions across students gives some measure of teaching success – ranging in theory from 1.0 (perfect) through to 5.0 (which we really, really don't want to see happening).
We've also got a general question – "Overall, this teacher was effective". This is also given a score on the same scale.
A question that's been raised is: Does the "Overall, this teacher was effective" score correlate well with the average of the others?
I've been teaching for several years now, and have a whole heap of data to draw from. So, I've been analyzing it (for 2008 onwards), and, in the interests of transparency, I'm happy for people to see it. For myself, the question of "does a single 'overall' question get a similar mark to the averaged response of the other questions?" is a clear yes. The graph below shows the two scores plotted against each other, for different papers that I have taught. For some papers I've had a perfect score – 1.0 by every student for every question. For a couple scores have been dismall (above 2 on average):