From Shawn Cornally.
From Shawn Cornally.
The consistently-fascinating Mind Hacks has a story today about a really pretty sketchy study from 1938, in which researchers were trying to measure some aspects of student conversation without inducing any reactivity.
In order not to introduce artifacts into the conversations, the investigators took special precautions to keep the subjects ignorant of the fact that their remarks were being recorded. To this end they concealed themselves under beds in students’ rooms where tea parties were being held, eavesdropped in dormitory smoking-rooms and dormitory wash-rooms, and listened to telephone conversations.
The rest of his write-up is here.
Dave Kleinschmidt has some commentary on the Nate Silver fangirl/boy-ing that many of us quantitative types have been engaging in for the last week.
My tribe—the data nerds—is feeling pretty smug right now, after Nate Silver’s smart poll aggregation totally nailed the election results. But we’re also a little puzzled by the cavalier way in which what Nate Silver does is described as just “math”, or “simple statistics”. There is a huge amount of judgement, and hence subjectivity, required in designing the kind of statistical models that 538 uses. I hesitate to bring this up because it’s one of the clubs idiots use to beat up on Nate Silver, but 538 does not weight all polls equally, and (correct me if I’m wrong) the weights are actually set
by handusing a complex series of formulae.
The point is that the kind of model-building that Nate Silver et al. do is not just “math”, but science. This is why I don’t really likethat XKCD comic that everyone has seen by now. Well I like the smug tone, because that is how I, a data scientist, feel about 538′s success. That is right on. But we’ve known that numbers work for a long time. Nate Silver and 538 is not just about numbers, about quantifying things. Pollsters have been doing that for a long time. It is about understanding the structured uncertainty in those numbers, the underlying statistical structure, the interesting relationships between the obvious data (polling numbers) and the less obvious data (economic activity, barometric pressure, etc.) and using that understanding to combine lots of little pieces of data into one, honkin’, solid piece of data.
When I teach stats, or talk about stats in my other classes, I try to hammer on this point about uncertainty. As scientists, we're dealing with noise in our data from all kinds of places. Is the sample under study "weird" in some way? Is our measure noisy? How noisy? How variable are people? Why? Does the time of day/day of week/week of year when people are tested matter? We can estimate how much uncertainty (what statisticians call "error") comes from each of these sources, and try to figure out if there's a structure/pattern underneath the noise, but in order to do that successfully you have to really think about the sources of the error. I think every time I've been really screwed over by an experiment, it's been because there was a source of variability or a kind of variability that I just didn't expect.
My Research Methods class has been wrestling lately with the relationships between statistical significance, power, and effect sizes, and with the balance between what's best, or most true, and what's actually practiced by researchers.
This post from a few months ago has some nice discussion of the difference between small-but-significant effects, and big whopping useful/meaningful effects:
The problem with small effect sizes is that they mean all you've done is nudge the system. The embodied nervous system is exquisitely sensitive to variations in the flow of information it is interacting with, and it's not clear to me that merely nudging such a system is all that great an achievement. What's really impressive is when you properly break it – If you can alter the information in a task and simply make it so that the task becomes impossible for an organism, then you have found something that the system considers really important. The reverse is also true, of course – if you find the right way to present the information the system needs, then performance should become trivially easy.
Their example of breaking the right thing is a bit hard to understand without reading the linked materials, but their example of fixing the right thing is beautiful:
A real problem in visually guided action is the accurate, metric perception of size (to pick an object up, you need to scale your hand to the right size ahead of time). Study after study after study has showed that vision simply can't provide this without haptic feedback from touching the object; but we do scale our hands correctly! The question is how do we do it? Geoff has been plugging away at this for years, trying to provide people with what he thought were sensible opportunities to explore objects visually, with no luck, until he rotated the objects through 45° (a huge amount in vision). BAM! Suddenly people could visually perceive metric shape, and this persisted over time without being constantly topped up (Lee & Bingham, 2010). Suddenly we knew how we did this task; metric visual perception of shape is enabled by all the large scale locomotion we get up to – moving into a room, for example. Without this calibration, the task was impossible, but as soon as the right manipulation was made, the impossible became straight-forward, and the effect size is huge.
If you're still trying to make sense of why psychologists care about effect sizes, take a look.