On Hirschman, measurement and RCTs

11-baboon-in-pod — Every intervention is unique, perhaps unintentionally so.

Recently, Dan Honig of Johns Hopkins forwarded Ranil and me some thoughts he had in reaction to an Albert Hirschman on development projects that he felt was pertinent to the discussion on the pros and cons of RCTs. What followed is a discussion (rant) between Dan, Ranil and me. I've edited out the e-maily bits for clarity:

Dan:

Matt, just after I hit send on this I realized I should have included you on this – I generally think you’re right on RCTs and the stale-ness of the conversation (and discussed this with Ranil a few months back some 2 days after you had dinner with him, hence the cc to Ranil) but feel like I’ve never seen this Hirschmann frame and wondering if it struck you as interesting. And yes, basically I’m trying to catalyze you writing something cool on this so I can quote/reference it down the road

Reading Hirschmann’s Development Projects Observed for the first time, and as I read it he’s with [Lant Pritchett and Michael Woolcock] on RCTs and causal density in international development projects. The quote below is from page 186 of the 1967 edition; italics are his, brackets mine; just before this he suggests we may not be able to identify good indicators of effects ex-ante and thus presumably couldn’t be pre-specified in a trial, meaning presumably we would be ill served by an RCT on a particular intervention even if we ignored external validity concerns.

“The indirect effects [of development projects] are so varied as to escape detection by one or even several criteria uniformly applied to all projects. Upon inspection, each project turns out to represent a unique constellation of experiences and consequences, of direct and indirect effects.”

Matt:

Hey Dan, that's a really interesting quote by Hirschman. If my interpretation is correct, it seems to be more damning for empirical evaluation in general than for RCTs in particular.

I'm not sure how I feel about this. Even if you move away from a simple, reduced form causal framework, Hirschman's critique seems like it would apply. Even if development is a messy, complex thing that can't really be boiled down in an impact evaluation framework, we still rely on measurement when we talk about development, and any given set of measurements is going to leave out things which might matter which are unmeasured. We can point at improving test scores but leave out student stress, etc, and the set of things that we leave out that might be important will change depending on the context. I guess I see this as a problem of measurement rather than as a problem for RCTs.

I also wonder what this means for how an empirical researcher operates. Over the last few years, I have become incredibly suspicious of surprising, counter-intuitive results, where a researcher measures something outside of the standard set of outcomes and finds a result. In a world of multiple hypothesis tests, expanding the set of outcomes to include as much of Hirschman's unique constellation as possible will open up the door to a lot of false positives which will end up getting written up and published.

So that was a rant. Um, what do you think Ranil?

Dan:

Really interesting insight – I think the Hirschman thrust goes beyond RCTs, but is limited (maybe?) to empirics (quantitative and qualitative) set ex-ante. In contract theory language I’d say Hirschman is saying that effects are non-contractible, as they can’t be specified ex-ante, and sometimes observable non-verifiable, and thus not tractable to quantitative inquiry ex-post. So this is a big problem for both RCTs and e.g. Cash on Delivery aid, but not necessarily foreclosing empirical evaluation (quant and qual) of project impacts as long as impacts not foreseen ex-ante are picked up on in eval. Of course this runs against what I normally think of as progress in more empirical econ/poli sci… I’m also not sure that trying to look at everything you can (and thus getting lots of false positives) would be his answer vs. a more inductive/mixed-methods-y kind of thing. So Hirschman might go Biju Rao/poverty observatory-type stuff as a response, though of course I’m totally speculating…

And yes, Ranil, drop some knowledge on us?

Also Matt, why should I be more suspicious about the counter-intuitive than the intuitive result? E.g. Why should I doubt Jesse Driscoll’s voter suppression any more or less than a result of democracy promotion assistance on citizen activism? This is a genuine question, I suspect you have a good answer that could turn me around on this...

Matt:

Maybe counter-intuitive wasn't the write way to phrase it. Shouldn't we be wary of an effect on an outcome we didn't expect/hypothesize/theorize up front? Isn't that part of the intuition behind pre-analysis plans?

I guess I'm also ambivalent about this. It seems reasonable that the probability that a result is `true' is higher if we had a well-defined explanation for why it should be true in place before we checked to see if it was true, rather than formulating an ex-post explanation for why it was true when we find a result on an outcome we hadn't pre-specified. How many times have we seen papers trying to explain an unusual finding in this way?

So maybe I'm just more wary of counter-intuitive results which are presented to me by other people, knowing the incentives that currently exist for researchers...

Ranil:

Hi both, just catching up with this fascinating conversation now.

My issue with the Hirschman quote is similar but not the same as Matt’s. Observation of any kind involves the imposition of ‘frames of legibility’ – ways in which we choose to interpret phenomena. Whether ex-ante or ex-post, we need to establish parameters on which we can understand a thing: ex ante, this takes the form of what simplifications we choose to measure (and all measures are a simplification of some corresponding reality, and thus involve omission as well as codification); and ex post, it involve imposing analytical frames on observed experience so that we can talk about it, as any way of talking about a thing that happens is a simplification or codification of it, or it would just be a complete restatement of the thing that happened.

So, I guess what I’m saying is that regardless of what technique you use to understand a project, you can never capture all of its uniqueness. You are always making choices about what aspects of it to represent. Either ex ante or ex post, there are things you are omitting in order to render your ‘description’ or ‘analysis’ of the project something more simple and less time-consuming than reliving the project itself. Hirschman’s argument is therefore less about uniqueness and complexity and more about the ability to choose what things to report afterwards, and hinges on the implicit assumption that you can observe all of this uniqueness ex post, which I would really doubt – the frames of reference we observe a thing with inevitably blind us to some angles of it, which is one reason why history keeps getting updated, even when the events (and often data) haven’t changed.

[by the way, none of this is really original thought – it’s just applying James Scott’s Seeing Like a State, which I’m reading at the moment, to research. This is of course supporting the point I’m making – the book has influenced the framing I’m using in my head, and I’m interpreting the problem accordingly].

Ultimately, as Matt gets at, the thing I think matters most of all is having a credible theory to support the empirical observations. The big advantage of ex-ante indicators is that it forces you to set out a theory ahead of time, which is a marker of credibility of thought. Ex post observation is harder because humans have a behavioural bias towards over-specified explanations. We try and impose and ordered reason for things that might just be random. This is basically why I, like Matt, am a bit sceptical of ‘we thought we’d find X did Y, but you won’t guess the effect it had on Z’ papers, unless there’s a really strong theory of why the thing happened, and even then – it should probably be tried a few more times to see if it happens again. Being a one-off is not the same as being random, but for policy-making purposes it might as well be.

Anyway – this is also a bit ranty. As a part-historian, I’m fine with ex-post analysis, but the academic culture around it needs to be much more like academic history: very combative, and full of active debates and counter-arguments. In economics we tend to see that someone has done a study in a context and consider it ‘done’ and then we try something else. In history, if someone does a study of Qing China’s export tax structure, it spawns 25 immediate rebuttals and becomes a sub-genre. That’s what you need to keep you honest in the world of ex-post observation, and economics lacks it, I think.

Dan:

On Hirschman and what we can see ex-post even if we didn’t anticipate it ex-ante: I love – and often quote – James Scott’s Seeing Like a State. Indeed my book manuscript opens at the moment by pointing out that my ‘Navigation by Judgment’ idea has echoes in both Scott’s Seeing Like a State and Hayek’s distrust of central planning. But I think you take the point a little too far, Ranil. That is, I absolutely grant that the act of observation always exists within a frame, and that legibility and omission are intertwined. In an analogue to the difference between an autistic savant and the general public, interpretation requires selective attention and that selection is inextricably bound with priors. I full concur that not only are we imperfect observers, a hypothetical perfect observer would be unable to generalize from the case, to interpret what they’ve seen. I don’t think this forecloses what we’re taking as Hirschman’s thrust, though; yes, it’s about being able to report afterwards. So is much of qualitative research, or the discipline of history you reference with the Qing dynasty example; that we are fallible and biased doesn’t mean valuable things can’t be learned from reflecting back at what was not anticipated. I thus want to modify your articulation of the assumption, Ranil, to "hinges on the implicit assumption that you can observe some of this uniqueness ex post”. If we can observe none of it we’re in trouble, definitely. But I think all Hirschman needs is that there are indeed observable non-verifiable features of the project observable ex-post even when unanticipated ex-ante; and I think this is correct, and can be done. We don’t criticize our flashlight (torch) for failing to illuminate the entire landscape, though we ought always be conscious of what we may be missing both in the area left in darkness AND that we have illuminated, seems to me.

I think on reflection you’re both entirely right on how we should treat unexpected findings and the logic both of pre-analysis plans and the Qing example – as I suspected having been prompted to think harder about this by y’all I think you’re right. Clearly unexpected findings need to be treated as exploratory not confirmatory (as by definition they have no prior they’re confirming – and thus the importance of models, 4th grade science class explanation of the scientific method, etc.). There’s a separate conversation to be had on the structure of science and what I might call the economics cousin to what Espeland & Sauder call the “patina of objectivity” of quantitative data (very much focused on what Ranil eloquently describes as the simplification, omission, and codification of data) – basically I think there’s a complicated (or maybe not) set of incentives in the industrial organization of academia/research that undergirds this.

I don’t know that I agree this is the same as your view, Ranil, that we shouldn’t update our policy priors at all based on these results. That depends, it seems to me, on the structure of the policy prior; an unexpected finding is still an instantiation claim for the finding, and thus ought prompt Bayesian updating even before being confirmed, particularly if it’s in a causally complex area we don’t know much about. To get a little bit closer to the ground, if you were to run a study where you looked at the effect of unconditional cash transfers on e.g. Labor market effort and found that it had a null labor market effect but did lead to better nutritional and educational outcomes for children in the household, I in no way think that we’ve demonstrated that the (or even an) important channel for UCT impact is intergenerational health and education outcomes. And I definitely agree that future studies should study precisely this. But in the interim I think we’ve learned something that should update my prior on this channel from “imaginable” to “plausible”, and I do think that distinction may be policy relevant… particularly in cases such as this one where confirmation or disconfirmation may take a LONG time.

Not to belabor the point (which I say because I’m clearly about to belabor the point), but if we’re willing to extend this beyond international development I think there’s a great recent case to illustrate some of this and ask what both researchers and policy makers ought do. I’ve been thinking a lot about the US’ Moving to Opportunity (MTO) intervention the past few weeks following an offhanded comment by someone (Michael Woolcock maybe?). This is a massive ‘kitchen sink’ bundle intervention allowing randomly selected poor families to move from high-poverty to low-poverty communities, the largest well-identified attempt to look at neighborhood effects in US social history (though perhaps the end of slavery and thus exogenous allowance of mobility to ex-slaves was better, waiting for Nathan Nunn to do a paper on this). This is an intervention so central to US social policy research that as a Harvard Inequality & Social Policy fellow we had basically a whole class on MTO; I took that class circa 2013, with the main take-away being MTO’s disappointing null results on earnings and educational attainment for kids.

Then in 2015 we learned that in fact there were positive effects for a sub-sample of the youngest kids at the time of moving… we just had to wait 20 years to find them (nice summary from Justin Wolfers – also a graduate of the same Harvard fellowship and thus probably also a participant in a seminar which focused on MTO’s mixed record - here). While the 2015 studies that found results are framed around strong priors, these priors were clearly somewhat informed by actually looking at the data (or seeing the results of others doing so), and thus I think could fairly be thought of as ex-ante unanticipated results. So should we retain the 2013 “MTO as disappointing and ineffective” prior? Do we need to treat this as exploratory and thus wait another 20 years to confirm this in out-of-sample testing? Or does the fact that the study could have had a pre-analysis plan even though it looks at an ex-ante unanticipated result of the intervention, a pre-analysis plan steeped in a frame of legibility heavily influenced by previous results of the intervention and deep knowledge of its data, obviate this concern? That is, precisely how far does this logic of ‘unanticipated results’ extend and when can it inform policy priors? (my tentative answer: to both MTO and my hypothetical UCT case, to similar degrees, as I’m inclined to see these cases as more the same than different).

PS: Ranil, I just read that Draymond Green’s college coach (Michigan State’s Tom Izzo) called (Detroiter) Dan Gilbert, owner of the Cavs, to urge Gilbert to take Green at the beginning of the 2nd round of the draft. Imagine what this finals would have looked like if he’d listened… at the very least this Cavs team would have either Wiggins or Draymond right now (the Love trade having been altered as a result).

Ranil:

Just to clarify – I definitely support updating priors after ex-post observations; I just think this needs to be measured. We have a tendency to put narratives on everything, even randomness and one-offs. It’s just about how strongly we update priors after the new observation.

Also, yes, point taken – some of the uniqueness needs to be observable e- post. But if ex-post vision is curtailed at all, it means that it has similar problems to ex ante establishment of what you want to measure. They’re both incomplete, in different ways, and nice complements, but both need careful scrutiny and I think we’re better at scrutiny of the ex ante stuff than the ex post. I would expect ex ante choice would help us identify those things we’d expect to happen that don’t a bit better than ex post observation, where we probably have a bias to noticing things that do happen rather than those that don’t, though that’s speculative.

PS: Yeah Wiggins on this team would mean two players capable of guarding either Green or Curry; and that would in turn mean the Curry/Green pnr would be much less viable. Cavs might still push it to 6, and if they do, Lebron is perfectly capable of turning in two monster performances in a row. The Green counterfactual is a funny one – would have developed into the special passer he is on a different team to the Warriors? I don’t know….

Matt:

Also want to clarify that I'm not against updating because of unexpected ex-post observations. I'm just more skeptical when they are presented to me.

As to Ranil's second e-mail, I assume you two are talking about sports?

3 Comments

David · June 16, 2016 at 08:02 PM

Great read! One of the values I think that our collection of evidence can get from appreciating the likelihood that important factors are ones we can't know is to distinguish between two aspects that are a bit collapsed in your comments back to Dan above - that because we tend to put narratives onto ex-post findings, we should be suspicious of those, and that ex-ante theories being tested is central to advancing knowledge. I agree with the former but don't think the latter follows as directly.

First, I'd note that there are techniques to gather information on what is happening, in real time, without waiting long enough for ex-post narratives to emerge. Training the lens of data on a wider volume than the expected arena of change seems both cheap and likely to generate information of relevance, even if we don't know in advance what it is. Appreciative approaches and broad sampling/polling (where it is being done anyway) can generate lots of rich data unshaped by interpretation, at least initially.

Second, I don't think the development community does enough to require programs to put together broad ex-ante theories to which programming can be attached. The idea of ex-ante questions all to often, in practice, narrows in to tiny theories of change about specific mechanisms of change, rather than theories about domains that matter. For example, to say that information may influence voting, and here are four different pathways, is interesting; I find it much less interesting to say that information about past record in combating corruption of mayors should, ceterus paribus, affect how people vote for mayor, although it may be easier to experiment on, because it is much narrower. With four or five overlapping theories about how information may influence voting, one might not have a "conclusive" result a la RCT (not enough power or inability to separate out countervailing effects) but could gather data that would be invaluable for subsequent theory-building. I think most development work is still in the discovery phase, more than testing, but our theorizing doesn't reflect that very well. It should be possible to use RCTs to explore the weights of different factors on certain outcomes, in concert with a rich theory that expects several different issues to matter all at once, but that sort of probing never seems to be in the minds of those designing the evaluations.

Ranil · June 16, 2016 at 10:04 PM

Alternate title: This is what three geeks talk about when we have too much time on our hands...

Pingback: […] criteria vs. using an observational approach that decides what to consider after the fact, setting off a geeky, ranty back-forth-and-back-again between the three of us, which Matt then published. While the discussion of economic evaluation is riveting (of course it […]