
Recently, Dan Honig of Johns Hopkins forwardedĀ Ranil and me some thoughts he had in reaction to an Albert Hirschman on development projects that he felt was pertinent to the discussion on the pros and consĀ of RCTs. What followed is a discussion (rant) between Dan, Ranil and me. I've edited out the e-maily bits for clarity:
Dan:
Matt, just after I hit send on this I realized I should have included you on this ā I generally think youāre right on RCTs and the stale-ness of the conversation (and discussed this with Ranil a few months back some 2 days after you had dinner with him, hence the cc to Ranil) but feel like Iāve never seen this Hirschmann frame and wondering if it struck you as interesting. And yes, basically Iām trying to catalyze you writing something cool on this so I can quote/reference it down the road
Reading Hirschmannās Development Projects Observed for the first time, and as I read it heās with [Lant Pritchett and Michael Woolcock] on RCTs and causal density in international development projects. The quote below is from page 186 of the 1967 edition; italics are his, brackets mine; just before this he suggests we may not be able to identify good indicators of effects ex-ante and thus presumably couldnāt be pre-specified in a trial, meaning presumably we would be ill served by an RCT on a particular intervention even if we ignored external validity concerns.
āThe indirect effects [of development projects] are so varied as to escape detection by one or even several criteria uniformly applied to all projects. Upon inspection, each project turns out to represent a unique constellation of experiences and consequences, of direct and indirect effects.ā
Matt:
Hey Dan, that's aĀ really interesting quote by Hirschman. If my interpretation is correct, it seems to be more damning for empirical evaluation in general than for RCTs in particular.
I'm not sure how I feel about this. Even if you move away from a simple, reduced form causal framework, Hirschman'sĀ critique seems like it would apply. Even if development is a messy, complex thing that can't really be boiled down in an impact evaluation framework, we still rely on measurement when we talk about development, and any given set of measurements is going to leave out things which might matter which are unmeasured. We can point at improving test scores but leave out student stress, etc, and the set of things that we leave out that might be important will change depending on the context. I guess I see this as a problem of measurement rather than as a problem for RCTs.
I also wonder what this means for how an empirical researcher operates. Over the last few years, I have become incredibly suspicious of surprising, counter-intuitive results, where a researcher measures something outside of the standard set of outcomes and finds a result. In a world of multiple hypothesis tests, expanding the set of outcomes to include as much of Hirschman's unique constellation as possible will open up the door to a lot of false positives which will end up getting written up and published.
So that was a rant. Um, what do you think Ranil?
Dan:
Really interesting insight ā I think the HirschmanĀ thrust goes beyond RCTs, but is limited (maybe?) to empirics (quantitative and qualitative) set ex-ante. In contract theory language Iād say Hirschman is saying that effects are non-contractible, as they canāt be specified ex-ante, and sometimes observable non-verifiable, and thus not tractable to quantitative inquiry ex-post. So this is a big problem for both RCTs and e.g. Cash on Delivery aid, but not necessarily foreclosing empirical evaluation (quant and qual) of project impacts as long as impacts not foreseen ex-ante are picked up on in eval. Of course this runs against what I normally think of as progress in more empirical econ/poli sci⦠Iām also not sure that trying to look at everything you can (and thus getting lots of false positives) would be his answer vs. a more inductive/mixed-methods-y kind of thing. So Hirschman might go Biju Rao/poverty observatory-type stuff as a response, though of course Iām totally speculatingā¦
And yes, Ranil, drop some knowledge on us?
Also Matt, why should I be more suspicious about the counter-intuitive than the intuitive result? E.g. Why should I doubt Jesse Driscollās voter suppressionĀ any more or less than a result of democracy promotion assistance on citizen activism? This is a genuine question, I suspect you have a good answer that could turn me around on this...
Matt:
Maybe counter-intuitive wasn't the write way to phrase it. Shouldn't we be wary of an effect on an outcome we didn't expect/hypothesize/theorize up front? Isn't that part of the intuition behind pre-analysis plans?
I guess I'm also ambivalent about this. It seems reasonable that the probability that a result is `true' is higher if we had a well-defined explanation for why it should be true in place before we checked to see if it was true, rather than formulating an ex-post explanation for why it was true when we find a result on an outcome we hadn't pre-specified. How many times have we seen papers trying to explain an unusual finding in this way?
So maybe I'm just more wary of counter-intuitive results which are presented to me by other people, knowing the incentives that currently exist for researchers...
Ranil:
Hi both, just catching up with this fascinating conversation now.
My issue with the Hirschman quote is similar but not the same as Mattās. Observation of any kind involves the imposition of āframes of legibilityā ā ways in which we choose to interpret phenomena. Whether ex-ante or ex-post, we need to establish parameters on which we can understand a thing: ex ante, this takes the form of what simplifications we choose to measure (and all measures are a simplification of some corresponding reality, and thus involve omission as well as codification); and ex post, it involve imposing analytical frames on observed experience so that we can talk about it, as any way of talking about a thing that happens is a simplification or codification of it, or it would just be a complete restatement of the thing that happened.
So, I guess what Iām saying is that regardless of what technique you use to understand a project, you can never capture all of its uniqueness. You are always making choices about what aspects of it to represent. Either ex ante or ex post, there are things you are omitting in order to render your ādescriptionā or āanalysisā of the project something more simple and less time-consuming than reliving the project itself. Hirschmanās argument is therefore less about uniqueness and complexity and more about the ability to choose what things to report afterwards, and hinges on the implicit assumption that you can observe all of this uniqueness ex post, which I would really doubt ā the frames of reference we observe a thing with inevitably blind us to some angles of it, which is one reason why history keeps getting updated, even when the events (and often data) havenāt changed.
[by the way, none of this is really original thought ā itās just applying James Scottās Seeing Like a State, which Iām reading at the moment, to research. This is of course supporting the point Iām making ā the book has influenced the framing Iām using in my head, and Iām interpreting the problem accordingly].
Ultimately, as Matt gets at, the thing I think matters most of all is having a credible theory to support the empirical observations. The big advantage of ex-ante indicators is that it forces you to set out a theory ahead of time, which is a marker of credibility of thought. Ex post observation is harder because humans have a behavioural bias towards over-specified explanations. We try and impose and ordered reason for things that might just be random. This is basically why I, like Matt, am a bit sceptical of āwe thought weād find X did Y, but you wonāt guess the effect it had on Zā papers, unless thereās a really strong theory of why the thing happened, and even then ā it should probably be tried a few more times to see if it happens again. Being a one-off is not the same as being random, but for policy-making purposes it might as well be.
Anyway ā this is also a bit ranty. As a part-historian, Iām fine with ex-post analysis, but the academic culture around it needs to be much more like academic history: very combative, and full of active debates and counter-arguments. In economics we tend to see that someone has done a study in a context and consider it ādoneā and then we try something else. In history, if someone does a study of Qing Chinaās export tax structure, it spawns 25 immediate rebuttals and becomes a sub-genre. Thatās what you need to keep you honest in the world of ex-post observation, and economics lacks it, I think.
Dan:
On Hirschman and what we can see ex-post even if we didnāt anticipate it ex-ante: I love ā and often quote ā James Scottās Seeing Like a State. Indeed my book manuscript opens at the moment by pointing out that my āNavigation by Judgmentā idea has echoes in both Scottās Seeing Like a State and Hayekās distrust of central planning. But I think you take the point a little too far, Ranil. That is, I absolutely grant that the act of observation always exists within a frame, and that legibility and omission are intertwined. In an analogue to the difference between an autistic savant and the general public, interpretation requires selective attention and that selection is inextricably bound with priors. I full concur that not only are we imperfect observers, a hypothetical perfect observer would be unable to generalize from the case, to interpret what theyāve seen. I donāt think this forecloses what weāre taking as Hirschmanās thrust, though; yes, itās about being able to report afterwards. So is much of qualitative research, or the discipline of history you reference with the Qing dynasty example; that we are fallible and biased doesnāt mean valuable things canāt be learned from reflecting back at what was not anticipated. I thus want to modify your articulation of the assumption, Ranil, to "hinges on the implicit assumption that you can observe some of this uniqueness ex postā. If we can observe none of it weāre in trouble, definitely. But I think all Hirschman needs is that there are indeed observable non-verifiable features of the project observable ex-post even when unanticipated ex-ante; and I think this is correct, and can be done. We donāt criticize our flashlight (torch) for failing to illuminate the entire landscape, though we ought always be conscious of what we may be missing both in the area left in darkness AND that we have illuminated, seems to me.
I think on reflection youāre both entirely right on how we should treat unexpected findings and the logic both of pre-analysis plans and the Qing example ā as I suspected having been prompted to think harder about this by yāall I think youāre right. Clearly unexpected findings need to be treated as exploratory not confirmatory (as by definition they have no prior theyāre confirming ā and thus the importance of models, 4th grade science class explanation of the scientific method, etc.). Thereās a separate conversation to be had on the structure of science and what I might call the economics cousin to what Espeland & Sauder call the āpatina of objectivityā of quantitative data (very much focused on what Ranil eloquently describes as the simplification, omission, and codification of data) ā basically I think thereās a complicated (or maybe not) set of incentives in the industrial organization of academia/research that undergirds this.
I donāt know that I agree this is the same as your view, Ranil, that we shouldnāt update our policy priors at all based on these results. That depends, it seems to me, on the structure of the policy prior; an unexpected finding is still an instantiation claim for the finding, and thus ought prompt Bayesian updating even before being confirmed, particularly if itās in a causally complex area we donāt know much about. To get a little bit closer to the ground, if you were to run a study where you looked at the effect of unconditional cash transfers on e.g. Labor market effort and found that it had a null labor market effect but did lead to better nutritional and educational outcomes for children in the household, I in no way think that weāve demonstrated that the (or even an) important channel for UCT impact is intergenerational health and education outcomes. And I definitely agree that future studies should study precisely this. But in the interim I think weāve learned something that should update my prior on this channel from āimaginableā to āplausibleā, and I do think that distinction may be policy relevant⦠particularly in cases such as this one where confirmation or disconfirmation may take a LONG time.
Not to belabor the point (which I say because Iām clearly about to belabor the point), but if weāre willing to extend this beyond international development I think thereās a great recent case to illustrate some of this and ask what both researchers and policy makers ought do. Iāve been thinking a lot about the USā Moving to Opportunity (MTO) intervention the past few weeks following an offhanded comment by someone (Michael Woolcock maybe?). This is a massive ākitchen sinkā bundle intervention allowing randomly selected poor families to move from high-poverty to low-poverty communities, the largest well-identified attempt to look at neighborhood effects in US social history (though perhaps the end of slavery and thus exogenous allowance of mobility to ex-slaves was better, waiting for Nathan Nunn to do a paper on this). This is an intervention so central to US social policy research that as a Harvard Inequality & Social Policy fellow we had basically a whole class on MTO; I took that class circa 2013, with the main take-away being MTOās disappointing null results on earnings and educational attainment for kids.
Then in 2015 we learned that in fact there were positive effects for a sub-sample of the youngest kids at the time of moving⦠we just had to wait 20 years to find them (nice summary from Justin Wolfers ā also a graduate of the same Harvard fellowship and thus probably also a participant in a seminar which focused on MTOās mixed record - here). While the 2015 studies that found results are framed around strong priors, these priors were clearly somewhat informed by actually looking at the data (or seeing the results of others doing so), and thus I think could fairly be thought of as ex-ante unanticipated results. So should we retain the 2013 āMTO as disappointing and ineffectiveā prior? Do we need to treat this as exploratory and thus wait another 20 years to confirm this in out-of-sample testing? Or does the fact that the study could have had a pre-analysis plan even though it looks at an ex-ante unanticipated result of the intervention, a pre-analysis plan steeped in a frame of legibility heavily influenced by previous results of the intervention and deep knowledge of its data, obviate this concern? That is, precisely how far does this logic of āunanticipated resultsā extend and when can it inform policy priors? (my tentative answer: to both MTO and my hypothetical UCT case, to similar degrees, as Iām inclined to see these cases as more the same than different).
PS: Ranil, I just read that Draymond Greenās college coach (Michigan Stateās Tom Izzo) called (Detroiter) Dan Gilbert, owner of the Cavs, to urge Gilbert to take Green at the beginning of the 2nd round of the draft. Ā Imagine what this finals would have looked like if heād listened⦠at the very least this Cavs team would have either Wiggins or Draymond right now (the Love trade having been altered as a result).
Ranil:
Just to clarify ā I definitely support updating priors after ex-post observations; I just think this needs to be measured. We have a tendency to put narratives on everything, even randomness and one-offs. Itās just about how strongly we update priors after the new observation.
Also, yes, point taken ā some of the uniqueness needs to be observable e- post. But if ex-post vision is curtailed at all, it means that it has similar problems to ex ante establishment of what you want to measure. Theyāre both incomplete, in different ways, and nice complements, but both need careful scrutiny and I think weāre better at scrutiny of the ex ante stuff than the ex post. I would expect ex ante choice would help us identify those things weād expect to happen that donāt a bit better than ex post observation, where we probably have a bias to noticing things that do happen rather than those that donāt, though thatās speculative.
PS: Yeah Wiggins on this team would mean two players capable of guarding either Green or Curry; and that would in turn mean the Curry/Green pnr would be much less viable. Cavs might still push it to 6, and if they do, Lebron is perfectly capable of turning in two monster performances in a row.Ā The Green counterfactual is a funny one ā would have developed into the special passer he is on a different team to the Warriors? I donāt knowā¦.
Matt:
3 Comments
Great read! One of the values I think that our collection of evidence can get from appreciating the likelihood that important factors are ones we can't know is to distinguish between two aspects that are a bit collapsed in your comments back to Dan above - that because we tend to put narratives onto ex-post findings, we should be suspicious of those, and that ex-ante theories being tested is central to advancing knowledge. I agree with the former but don't think the latter follows as directly.
First, I'd note that there are techniques to gather information on what is happening, in real time, without waiting long enough for ex-post narratives to emerge. Training the lens of data on a wider volume than the expected arena of change seems both cheap and likely to generate information of relevance, even if we don't know in advance what it is. Appreciative approaches and broad sampling/polling (where it is being done anyway) can generate lots of rich data unshaped by interpretation, at least initially.
Second, I don't think the development community does enough to require programs to put together broad ex-ante theories to which programming can be attached. The idea of ex-ante questions all to often, in practice, narrows in to tiny theories of change about specific mechanisms of change, rather than theories about domains that matter. For example, to say that information may influence voting, and here are four different pathways, is interesting; I find it much less interesting to say that information about past record in combating corruption of mayors should, ceterus paribus, affect how people vote for mayor, although it may be easier to experiment on, because it is much narrower. With four or five overlapping theories about how information may influence voting, one might not have a "conclusive" result a la RCT (not enough power or inability to separate out countervailing effects) but could gather data that would be invaluable for subsequent theory-building. I think most development work is still in the discovery phase, more than testing, but our theorizing doesn't reflect that very well. It should be possible to use RCTs to explore the weights of different factors on certain outcomes, in concert with a rich theory that expects several different issues to matter all at once, but that sort of probing never seems to be in the minds of those designing the evaluations.
Alternate title: This is what three geeks talk about when we have too much time on our hands...