Quantifying Department Performance

An objective metric for ranking departments and choosing which to close

and

Aug 03, 2025

Given current internal and federal budget constraints, closing underperforming departments and other units at the University of Chicago is on the table. We suspect many other universities are in a similar situation. In order to avoid favoritism and maximize performance, cutting should be done on a rational basis, using a well-motivated objective measure. But what should that measure be?

A department score function should quantify performance on advancing the mission of the University: original research and teaching. This is stated clearly in the Kalven report

The mission of the university is the discovery, improvement, and dissemination of knowledge.

and the Shils report

The existence of The University of Chicago is justified if it achieves and maintains superior quality in its performance of the three major functions of universities in the modern world. These functions are: (1) the discovery of important new knowledge; (2) the communication of that knowledge to students and the cultivation in them of the understanding and skills which enable them to engage in the further pursuit of knowledge; and (3) the training of students for entry into professions which require for their practice a systematic body of specialized knowledge.

We need a score function that combines research and teaching impact using the same unit. The most straight-forward unit is dollars, which is appropriate given that budget cuts are motivating the consideration of closing departments. Moreover, using dollars is easy to explain to the public and politicians. Teaching impact can be quantified by tuition generated from enrollments and research impact can be quantified by grant money obtained1, including all sources (federal, private foundations, industry, and internal). The score function for department j with faculty members indexed by i would therefore be:

$S_j = \frac{\sum_{i=1}^{N_j}f_{ij}(T_{ij}+G_{ij})}{\sum_{i=1}^Nf_{ij}}$

Here T is the tuition generated by a faculty member, G is the grant money generated by a faculty member, N is the number of faculty members in the unit, and f is the fractional commitment of the faculty member to the unit (for example if a faculty member is in two departments, her f would be 0.5 in each department).

Now let’s develop intuition for the balance between research and teaching in this score function to confirm that it is reasonable. A typical federal grant is for about $150,000 per year. Annual tuition at the University of Chicago is $66,939, but the average tuition paid is only $36,991 due to financial aid. Students typically take 10 courses per year, so per-course tuition received is $3,699.10 per enrollment. This means that teaching a class with 40.5 students generates as much score as getting a typical federal grant. Having taught many classes and gotten many grants, this seems like a reasonable balance in terms of time commitment and University mission advancement.

To make things more concrete, let’s consider the score Dorian contributes. In the 2024-2025 year he taught two classes with a total enrollment of 349, so his tuition contribution was $1,290,985.90. He also had a three-year grant from the Army Research Office for $554,747, a three-year grant from the National Science Foundation for $568,887, and a three-year grant from NASA for $91,590. So his per-year grant contribution was $405,074.67. His total contribution to the University in the 2024-2025 year would therefore be $1,696,060.57. It’s important to note that he relied on teaching assistants and research assistants to generate these funds and do this work, so this should not be viewed as his individual contribution. But for the department score function, it’s easiest (and equivalent) to index and normalize by faculty members. This gives us a sense of the sort of scores faculty might generate for their departments.

The main advantages of this score function are that it measures well the stated mission of the University and it can be applied equally and fairly across all units within the University. As shown above, Dorian’s contribution is more heavily weighted toward teaching than research. Another professor might bring in huge amounts of research funding, but only teach a couple small graduate seminars. The score function naturally accounts for this variation in valued contribution in a fair way. More broadly, entire divisions and schools might be more heavily weighted toward either teaching or research, and the score function allows us to fairly compare their performance. It also avoids subjective judgements and difficulties assessing contribution across fields due to different expectations for publication rate and citation statistics. If you are doing good research by your field’s standards, then presumably you will be able to generate research funding. Similarly, if you are working hard and teaching good courses, then students will take them and you will generate tuition. Finally, a major advantage of the score function is that it only counts work that contributes to the mission of the University, and explicitly does not count other activities, such as political activism and protest, that do not contribute (and probably even impose a negative externality on the productive units of the University).

The proposed score function is not perfect, and it would be possible to make a variety of criticisms of it and try to complicate it in various ways. This would be a mistake that would open it to corruption and exploitation. As an analogy that shows how imperfect, but simple, quantitative measures can be extremely useful, let’s consider some measures in health that we are all aware of: Body Mass Index (BMI), total cholesterol, and blood pressure. Everyone acknowledges the limitations of these measures (for example a muscular athlete might be classified as overweight using BMI), but if at your annual physical examination you have a BMI of 34, total cholesterol of 280, and blood pressure 160/100, we would all agree that you are in bad shape and need to make some major changes to your lifestyle if you want to survive long. Similarly, if a department has a score of $60,000, we can all agree that it simply isn’t performing at an acceptable level and would be a good unit to close if we have to close one. Stated another way, I don’t think the score function would be very useful for comparing the performance of departments with scores of $1,000,000 and $1,500,000, but I do think it would very clearly tell us that a department with a score of $1,000,000 is contributing much more to the University than a department with a score of $100,000.

A second criticism is that the score function does not count high-level, low-enrollment courses (and the faculty who teach them) with a weight commensurate with their intellectual importance. I strongly agree that these courses are essential and departments should continue to offer them for high-level undergraduates and graduate students. Since the score function is for an entire department, it just requires that the department teach some large enrollment courses that appeal to a broad range of students as well. But within the department, faculty teaching small, essential courses should be valued as contributing to the department’s mission.

A third criticism is that the score function does not include costs. If two departments have the same score but the University spends twice as much of its own funds on one (in terms of salaries, lab equipment, secretarial support, etc.) it seems you should prefer the cheaper department. This is a valid point, and definitely something the administration should take into account. But figuring out the cost of a department is pretty straight-forward and we already know how to do it. The University could certainly combine the score function described here with its cost estimates to determine which departments are the most “cost-effective” to retain. The point of the score function we are introducing here is that it allows you to quantify the contribution of a department to the University’s mission, which we haven’t seen done before, and would be very useful at the moment. Additionally, the per-faculty cost to the University likely varies much less than the per-faculty score function outlined here, so it is better to start with this score function to identify under-performing departments.

The last potential criticism we want to address is the claim that the value of some entire fields is not well-represented by course enrollments and research funding. This boils down to arguing that the experts in a field know that their own field is extremely important, but they can’t convince others of this (neither students nor funders). It is not surprising that experts in a field would argue that their field is important, whether it actually is or not, so their opinion can’t be taken at face value without external evidence. To put it bluntly, we’re being asked to justify ourselves to society (and bankers) right now. We don’t think either group is interested in accepting this type of self-serving argument anymore.

A critical aspect of this proposal is that the process must be open and public. All the data should be made publicly available and the scores for each department should be listed on a website. This would make sure everyone understands why tough decisions are being made, ensure that there’s no funny business, and help departments understand what they would need to do to improve their score and put themselves in better standing with the University.

Although budget cuts can be uncomfortable, they do force a focusing on the mission that can be very healthy in the long term. If the right departments are closed, a University can even emerge from a fiscally challenging period in a stronger position, leaner and more mission-focused. Using a score function to quantify contribution would increase the chance that this is done rationally, show the federal government that we are taking their concerns seriously, and help to convince the public that we are providing value to society and deserve support.

While citations can be part of a comparison of faculty within a unit (e.g., for purposes of granting tenure), they are not useful for comparing units. Citations are not comparable across fields because some fields have slower publication cycles, do not cite prior work in the same way, or have a different total size (affecting the number of researchers who can cite a paper).

A guest post by

Anup Malani

Health & development economist, & law professor. U. Chicago. Founder & director, Int'l Innovation Corps. Studies slums, health policy, epidemics & blockchain.

Kent Osband

Aug 3

I find a scoring function intriguing but worry that the initial form is too much a gross output unadjusted for costs and value-added, and as a result will incentivize a combination of low-level introductory courses and super-expensive research, with insufficient attention to the long sherpa-guided trek in between that universities are meant to excel at. At a minimum, I would (1) subtract overhead/TA/RA costs of supplying teaching or research and (2) reward high-level courses more than lower-level courses--say, students in a 500-level grad course get multiplied by at least 5 compared to students in an introductory 100-level course.

Expand full comment

11 replies

Matt Burgess

I think a much more straightforward, less controversial, and equivalent way to implement this proposal would be to make academic units more financially independent of each other, with a budget model that explicitly tracks cross-unit subsidies. Several universities have already moved or are moving to this type of budget model. It doesn't automatically slate units in deficit for closure, but it does make subsidization choices conscious and discussed openly. For example, I was on faculty senate when my previous university switched to this budget model and it became apparent that the College of Music (which is excellent) needed massive subsidies due to its small enrollment and high capital costs. But people were pretty ok with that--we want to have a thriving college of music. In other cases, it revealed things that were less ok with many people (like English at the time having similar sized TT faculty as Econ and Poli Sci combined). I think the calculations are close to equivalent to what you're proposing, but they lead to less formulaic decisions and the calculations are also done in a way that doesn't implicitly pit individual faculty against each other.

1 reply

63 more comments...

Heterodox STEM

Discussion about this post