the lectern banner

By Steve Kelman

Blog archive

Bad performance metrics revisited

Shutterstock image (by Ismagilov): charting data on the wall.

In January, just as I was returning to teaching after a year's absence due to my illness, I wrote a blog post about a conversation I'd had in an executive education class at the end of three sessions on using performance measurement. I had asked the participants to think of a performance measure currently being used in their organization that they thought created problems. Then I asked them whether they would favor simply getting rid of it, or if not, what they would do to improve it. When I called on a few people during our class that day, those who spoke cited metrics they would like to nix -- but couldn't because they had been imposed from the outside.

This year, I asked the same questions, but improved my data collection. I gave participants three alternatives and had everyone vote in class using clickers, so I could get results for the whole cohort. The three alternatives were:

  1. scrap metric if I can
  2. keep measure as is (can't come up with better idea, and better than nothing), and
  3. suggest a change in the metric

The percentages revealed in the clicker poll were fascinating. Only 11 percent said they would simply scrap the metric, while 19 percent said they would continue to accept it without change despite its imperfections.

This latter reaction reflected an argument I call "compared to what?" which is often ignored in discussions of performance measurement. It is tempting to say that if our measures are imperfect and that performance using such imperfect measures is worse than using perfect ones, we shouldn't manage using such measures. But that is mistaken, because the right comparison is performance using no measures at all. If performance using imperfect measures is better than using no measures at all, those imperfect metrics may well be worth keeping.

The most interesting number was the 70 percent stating they would work to improve the measure; that is good news about prospects for the success of performance measurement in government.

Anyone who reflects on performance measures, whether used in government or elsewhere, knows they are typically imperfect and create some level of dysfunctional effects, with people seeking to meet even a problematic measure. The question is whether, given that fact, we should just wring our hands and bemoan our fate -- or whether we should try to take control of our destiny and, as intelligent human beings, try to improve the imperfect measures over time.

I often cite the example of Army physical fitness metrics that became criticized for not reflecting physical challenges on actual battlefields: the Army got a committee together to develop better metrics that were more realistic. Some version of this seemed to be what my students were doing in their organizations.

Here were the specific examples cited in class:

  • Managers in an intelligence organization were using measures of how much schooling applicants had, and their background in relevant area studies (e.g. Middle East). They were unhappy with the quality and the performance of applicants they got using these screens. So hiring managers put their heads together to see what they thought made for a successful analyst. It was creativity and general analytic ability, they concluded. They then, with the help of a psychologist, changed their screening to select for those abilities. The quality of new hires under the revised metrics went up.
  • The Office of Management and Budget had given one program within an agency a goal for fielding certain discoveries from an ambitious new research effort that the program felt went beyond what it could do by itself. After conversations with OMB, the goal was changed to one shared by several programs, as part of an ecosystem, and its attainment divided into several chunks.
  • A metric for timeliness in closing out contracts did not reflect what was under the closeout agency's control because it included contracts that could not be closed out pending a final audit report from the Defense Contract Audit Agency. The agency changed the metric to measure closeout timeliness for contracts available to be closed out.
  • A metric for speed in issuing an agency's regulations did not distinguish between simple and complex regulations. The agency introduced a classification scheme for regulation complexity, and began to measure performance separately for the two categories.
  • One agency's metric stipulated a 45-day turnaround time for investigations. In this case, after discussions the agency decided not to change the metric itself, but to recognize that managers would have some wiggle room if they could explain to superiors situations where a suddenly emerging priority made reaching that goal impossible for a while.

These examples were interesting, and all reflected a refusal to simply be a victim of one's fate. However, the discussion also suggested a problem with agencies' use of performance measures. None of those who discussed problems with a turnaround-time metric -- and, notably, three of the five examples brought up in class involved processing speed -- suggested using a metric of the quality of reviews to complement one for speed. And if the only thing you measure is speed, it creates an incentive for reviews that are quick but superficial.

Indeed, earlier in the very same class, when students were talking about an effort in the D.C. city government to improve throughput time for face-to-face transactions at the Department of Motor Vehicles, two students actually had noted that this metric created an incentive for sloppy handling of customers. I had suggested earlier in class that when one performance measure leaves out an important dimension of performance, the classic solution is to develop an additional measure to represent that dimension.

I am guessing that for many agencies that use some measure of processing time, this processing is an important part of what they do, and that therefore it would be worth putting resources into measuring quality -- even if only on a sample basis. (Indeed, one could imagine a quality metric used instead of a speed metric, though this would create the opposite incentive to process at molasses speed.) Yet the apparent lack of using both quality and speed metrics to complement each other suggests an area where agencies still need to make progress in thinking about how best to do performance measurement to improve government.

Posted by Steve Kelman on Apr 19, 2016 at 9:58 AM

Cyber. Covered.

Government Cyber Insider tracks the technologies, policies, threats and emerging solutions that shape the cybersecurity landscape.


Reader comments

Mon, Apr 25, 2016 Al

I don't understand the metric of how much it costs to spend $1, as applied to contracting activities. Since an individual worker cannot cut staff, or budget, how are they *not* incentivized to spend more money?

Tue, Apr 19, 2016 Bruce Waltuck NJ

Thanks Steve. This reminded me of a presentation on metrics that I gave to government managers some years ago. Will have to check if I still have it and pop it onto my Slideshare page. My favorite (sad but true) story on performance measures in government came one day when I was on a six-month detail to DOL's national office. At lunch I ran into a manager I knew from Texas. We got talking and he asked me about how my colleagues in the Northeast counted the results in our agricultural investigations. He said specifically, "how do you report an investigation of a farm that has 9 farm labor contractors (one farmer using nine folks to provide migrant harvest workers) where you spent 100 staff hours on the case?" I said that was easy. "We report ten cases... One for each party whose records and status we investigated. So ten parties (farmer and the nine crew leaders) at ten hours each." My friend looked troubled. "Really? We were told that we had to report it all as just one case under the name of the farm, and put all hundred hours on that one file." I asked him why this was happening in the Southwest region. "The Regional Administrator told us we have to do it like that," he said with a sigh. I was incredulous. "Are you aware that the rest of the country does what we do?" He sighed again. We both understood that agency performance measures- which in part were used for annual evaluations - counted both the number of cases done per investigator, and the average hours per case. I told my friend I should be able to help him out. I knew the woman who was the national head of the farm enforcement program. I left the cafeteria and went right to her office. I told her what was happening in the Southwest region. She thanked me for the info and said she'd check into it. But apparently nothing was changed. Months later, when the Deputy Assistant Secretary was ending their term, their farewell email to the staff nationwide mentioned among other things, their "disappointment that we weren't more productive in the farm program." Of course, I knew we were a hell of a lot more productive than the reports he saw indicated. Our large Southwest region that included Texas and California among other states, was suppressing case productivity by as much as 90% and inflating hours per case by the same factor. Moral of the story: make sure your metric is clearly defined and being measured and reported consistently by all.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group