Recently, March 28 2012, I spent the afternoon at the plenary session of an “International Conference” on “Educational Assessment, Accountability, and Equity: Conversations on Validity around the World.” The plenary speaker was Michael T. Kane, “The Samuel J. Messick Chair in Test Validity” at the Educational Testing Service. He talked about validity as measurement scientists deliberate about it, and about some of their soul-searching when they consider the impact of their measurements. Or, as I would put it, wearing my “anthropologist of Nacirema” hat, he talked about the misgivings of an obscure priesthood specializing in an abstruse numerology few understand outside their rarified convents. Kane, as a master in this polity of conjurers of numbers, gave us, the uninitiated or very peripheral, a glimpse of his doubts and those of other masters as they discover that they are now at the very center of political storms where their more abstruse spells are thrown at opponents for all sorts of reasons having little to do with numerology.
To the extent that I understand it (and I am very far at the periphery of numerology, or rather, I am at the periphery of the gravity well that might have made me, at some point in my career, a legitimate peripheral participant), it all has to do with the “interpretation” of the test that leads to its being used in a particular case. But Kane and his peers are not quite where Geertz and his peers have been. For one, Kane is deeply concerned with specifying and justifying the interpretive steps. For another, he and is peers have, precisely been thrust into the center, while symbolic anthropologists are pushed even further away from it.
This occasion was the second in recent weeks when I heard thoughtful (psycho-)metricians wonder about the public face of their craft. I had not suspected how much debate do happen among the scientists of the measurement of individual behavior about what happens with the measurements when these measurements are used outside the world of measurement. Kane taught me something about the relationship between the “datum” (an answer on a test question) and the inferred “claim” (that Johnny failed the test) and the “warrant” that allows on to make the claim based on the datum. The warrants themselves are “backed” by empirical studies. Thus, everything depends on the quality of the studies which back the warrant that allows for the inference. Things are even more difficult since the various inferences that can be made about the individual as this_test taker can be transformed into inferences about this_field (that is, that Johnny who failed this reading test do not know how to read), and then transformed into even more general properties of the individual as performed in any_field (that Johnny is “with” this or that syndrome), and then transformed into properties of a population (White vs. Black, poor vs. prosperous, American vs. Chinese).
As I listened, I was particularly struck by his discussion of “warrants” in the making of inferences and the place of various logical and mathematical ways of explaining how one gets to the inference. Listening to this, I understood better why ethnography is looked askance by measurement scientists: we, anthropologists, could be said to be “warrant-challenged” when we watch a cock-fight and then make inferences about humanity…
And then things became truly interesting. Kane started to talk about a particular type of inference that shift from identification (Johnny is with X or Y) to the meting of high stake consequences (that Johnny should be shifted to a special education classroom, that he should not receive a degree, that he should not be hired or promoted). He illustrated the difficulties by reminding us of the Supreme Court ruling in the case of Griggs v. Duke Power where the issue was the use of a test (or more precisely inferences about the people who had taken the test) for employment, that is as a step in the making of a high stake decision that could have heavy negative consequences. In effect, the Court extended the notion of validity to include the impact of the test on the life of the taker.
I am about sure that no inference from anthropology has ever been debated in the Supreme Court of the United States.
Thus, the Court also, and by implication of course, definitely placed (in the Garfinkel sense) testing as the proper instrument of high stake decision making and the testing scientists as perhaps the most powerful engineers of social structural production (along with the professional in charge of diagnosing decease and its legitimate political implications). That is, by requiring that tests be “reasonably related” to the job for which the test is required, the Court fully legitimated a process of assembling people and practices that had fully flowered with Thorndike and other measurement specialists when they convinced school people that psychological testing might produce what Dewey and others had appeared to call for: a democratic educational system where the real properties of the child were the sole criteria for the advancement of the child through the rewards of that part of social life (for examples being hired for a job) that the state, through its courts, can regulate.
Thus, the Supreme Court, and by implication of course, placed ETS at the core of the political process and thus made a particular class of scientists the arbiter of this process—all the more so that only they fully understand the means they use (regression formulas and the like) to produce something that later allow human resources personnel, or college admissions officers, make decisions without appearing to have made them. When I talked about terminating Skynets in my last entry, I did not yet know that I was echoing was some measurement scientists have actually said:
Quantification is a way of making decisions without seeming to decide (Porter 1995: 8).
1995 Trust in numbers: the pursuit of objectivity in science and public life. Princeton, NJ, Princeton University press.