Bad performance metrics revisited
In January, just as I was returning to teaching after a year's absence due to my illness, I wrote a blog post about a conversation I'd had in an executive education class at the end of three sessions on using performance measurement. I had asked the participants to think of a performance measure currently being used in their organization that they thought created problems. Then I asked them whether they would favor simply getting rid of it, or if not, what they would do to improve it. When I called on a few people during our class that day, those who spoke cited metrics they would like to nix -- but couldn't because they had been imposed from the outside.
This year, I asked the same questions, but improved my data collection. I gave participants three alternatives and had everyone vote in class using clickers, so I could get results for the whole cohort. The three alternatives were:
- scrap metric if I can
- keep measure as is (can't come up with better idea, and better than nothing), and
- suggest a change in the metric
The percentages revealed in the clicker poll were fascinating. Only 11 percent said they would simply scrap the metric, while 19 percent said they would continue to accept it without change despite its imperfections.
This latter reaction reflected an argument I call "compared to what?" which is often ignored in discussions of performance measurement. It is tempting to say that if our measures are imperfect and that performance using such imperfect measures is worse than using perfect ones, we shouldn't manage using such measures. But that is mistaken, because the right comparison is performance using no measures at all. If performance using imperfect measures is better than using no measures at all, those imperfect metrics may well be worth keeping.
The most interesting number was the 70 percent stating they would work to improve the measure; that is good news about prospects for the success of performance measurement in government.
Anyone who reflects on performance measures, whether used in government or elsewhere, knows they are typically imperfect and create some level of dysfunctional effects, with people seeking to meet even a problematic measure. The question is whether, given that fact, we should just wring our hands and bemoan our fate -- or whether we should try to take control of our destiny and, as intelligent human beings, try to improve the imperfect measures over time.
I often cite the example of Army physical fitness metrics that became criticized for not reflecting physical challenges on actual battlefields: the Army got a committee together to develop better metrics that were more realistic. Some version of this seemed to be what my students were doing in their organizations.
Here were the specific examples cited in class:
Managers in an intelligence organization were using measures of how much schooling applicants had, and their background in relevant area studies (e.g. Middle East). They were unhappy with the quality and the performance of applicants they got using these screens. So hiring managers put their heads together to see what they thought made for a successful analyst. It was creativity and general analytic ability, they concluded. They then, with the help of a psychologist, changed their screening to select for those abilities. The quality of new hires under the revised metrics went up.
The Office of Management and Budget had given one program within an agency a goal for fielding certain discoveries from an ambitious new research effort that the program felt went beyond what it could do by itself. After conversations with OMB, the goal was changed to one shared by several programs, as part of an ecosystem, and its attainment divided into several chunks.
A metric for timeliness in closing out contracts did not reflect what was under the closeout agency's control because it included contracts that could not be closed out pending a final audit report from the Defense Contract Audit Agency. The agency changed the metric to measure closeout timeliness for contracts available to be closed out.
A metric for speed in issuing an agency's regulations did not distinguish between simple and complex regulations. The agency introduced a classification scheme for regulation complexity, and began to measure performance separately for the two categories.
One agency's metric stipulated a 45-day turnaround time for investigations. In this case, after discussions the agency decided not to change the metric itself, but to recognize that managers would have some wiggle room if they could explain to superiors situations where a suddenly emerging priority made reaching that goal impossible for a while.
These examples were interesting, and all reflected a refusal to simply be a victim of one's fate. However, the discussion also suggested a problem with agencies' use of performance measures. None of those who discussed problems with a turnaround-time metric -- and, notably, three of the five examples brought up in class involved processing speed -- suggested using a metric of the quality of reviews to complement one for speed. And if the only thing you measure is speed, it creates an incentive for reviews that are quick but superficial.
Indeed, earlier in the very same class, when students were talking about an effort in the D.C. city government to improve throughput time for face-to-face transactions at the Department of Motor Vehicles, two students actually had noted that this metric created an incentive for sloppy handling of customers. I had suggested earlier in class that when one performance measure leaves out an important dimension of performance, the classic solution is to develop an additional measure to represent that dimension.
I am guessing that for many agencies that use some measure of processing time, this processing is an important part of what they do, and that therefore it would be worth putting resources into measuring quality -- even if only on a sample basis. (Indeed, one could imagine a quality metric used instead of a speed metric, though this would create the opposite incentive to process at molasses speed.) Yet the apparent lack of using both quality and speed metrics to complement each other suggests an area where agencies still need to make progress in thinking about how best to do performance measurement to improve government.
Posted by Steve Kelman on Apr 19, 2016 at 9:58 AM