Friday, September 17, 2021

What Motivates Egalitarians?

The real world provides us with a nice test case to examine whether egalitarians are motivated by envy or a desire to help the poor.


If they were motivated by envy, then if they get rich, we'd expect them to live large and find excuses for their own behavior.

 

If they were motivated by a desire to help the poor, then if they get rich, we'd expect them to donate lots of money to help the poor. 

 

What we overwhelmingly see among egalitarians is the former, not the latter. That's the opposite of what we see among utilitarians, who generally seem to mean what they say. 


Egalitarians of course have plenty of rationalization strategies to excuse their behavior, but none of them work, as Chris and I show here.

Thursday, August 19, 2021

I Am Now Immune to Criticism

I've decided to copy-cat a style of argumentation which is prominent among democrats and socialists in the philosophy literature. This move will now render me and my work immune from criticism.
By epistocracy, I henceforth mean not only a system that gives greater weight to the wise during voting, but which actually makes substantively wise decisions! Thus, any time a seemingly epistocratic decision-system makes a bad choice--such as a choice that runs afoul of the demographic objection--it wasn't *true* or *real* epistocracy! Epistocracy by definition always makes the wisest choices. Therefore, to oppose epistocracy is to oppose good choices and favor bad ones.

In the same way, socialists will often say that socialism is not merely a system with a certain kind of ownership and control rights over productive property, but a system that in fact lives by certain norms, including substantive norms about outcomes. For instance, a socialist might say that it's not real socialism unless things are done properly and people are treated as equals, with equal incomes. Democrats say similar things about democracy meaning not merely equal inputs, but substantively liberal and egalitarian results.

If they can do it, so can I.

Monday, August 16, 2021

The Afghanistan Disaster: A Case of Unjust War Exit

As we watch wretched Afghanis clinging to the wings of departing US airplanes, the Taliban, one of the cruelest regimes on earth, has taken control of Afghanistan following the withdrawal of United States troops. Most observers are dismayed, but for various reasons and with different motivations. Some think the US should have never invaded; others say that, while the initial invasion was defensible, the US subsequently botched its mission; yet others obsess over who was at fault: G.W. Bush, Obama, Trump, Biden, or all of the above.

Here I set aside these questions. At the time I defended the invasion, but I will not assume here that I was right then. I wish to make a different point: the withdrawal of the United States from Afghanistan constitutes an unjust war exit. 

Just war theory specifies the conditions for the justice of a war:

1. The war has a just cause. A just cause consists in stopping or preventing the violation, backed by lethal force, of persons’ rights to life and physical integrity (in brief, repelling or preventing attacks against persons).

2. The commander intends the just cause either as an end, or as a means to some other end, or, perhaps, as a foreseen side effect.

3. The war stands a reasonable chance of succeeding by military means that do not breach jus in bello requirements.

4. The war is a necessary means to pursue the just cause while minimizing casualties.

5. The war is a proportionate response to the wrong it seeks to remedy.

6. Neither the war’s occurrence nor the way it is fought should threaten the establishment of a just peace. 

Those who question the justice of the Afghanistan war may claim that it violated condition 1, the failure to have a just cause; or condition 3, the success requirement; or perhaps condition 6, the failure to create the conditions for a just peace.

I don’t have to decide whether or not the critics are right, because even if they are, the withdrawal of troops violates the requirements for the just termination of war. The idea is that “the question of whether belligerents may, or indeed must, end their war at time t2 cannot be settled solely by a verdict on the justness or unjustness of their war at t1.” (C├ęcile Fabre, here, at 632). 

We must consider two cases: either the Afghanistan war was initially just or it was not.

Let’s assume that the Afghanistan war was initially just at T1, the time of the invasion. It follows that the United States had an obligation to realize the just cause (say, free Afghanistan from the Taliban and eliminate the chances of anti-Western terrorists operating there). But suppose that at a later time, T2, the war had become impossible to win or could only be won at a prohibitive cost, defined as a cost in human lives that significantly exceeds the value of the realization of the just cause. Then, perhaps the United States would have had to stop fighting at the time the impossibility of victory became apparent.  By hypothesis, continuing the war would have led to worse consequences than withdrawing. If success cannot be achieved (thus turning an initially just war into an unjust one), but the United States presence is necessary to avert a catastrophe, then the United States must stay. The United States may withdraw only if doing so complies with the rules of proportionality, that is, if predictably the consequences of withdrawal are acceptable.

Similar reasoning holds if we assume that the war is unjust at T1, which means the invader should not have invaded in the first place. However, at T2 it became apparent that stopping the (initially ill-conceived) fight would have led to a major humanitarian catastrophe. Then, the principles of war exit recommend staying the course, on grounds of proportionality as before: the consequences of leaving are worse than the consequences of continuing the fight, even if starting the fight was wrong.

It is apparent that the United States withdrawal is unjust regardless of any view about the justness of the original invasion, or any view about supervening justice or injustice of the war. The Taliban’s reconquest of power will predictably lead to disastrous results. At the very least, the Taliban will resume its reign of terror against its own citizens, especially women. And in the worst case, the country will become once again a military base for terrorists, especially given the likely involvement of Iran in Afghanistan. All of this makes the withdrawal unjust. The United States should have continued to fend off the assault of the Taliban with the contingent it had. This would have been morally preferable, I contend, even if the United States troops would have had to stay forever.

A final word about the rationale offered by the present and previous US administrations: “We cannot fight for them, they must fight themselves.” This noxious doctrine, going back to J.S. Mill, is that freedom is only valuable if people fight for it. The doctrine is wrong because it blames the victim. The victims of the Taliban, especially women and children, are helpless against the Taliban and are not responsible for their corrupt and ineffectual government’s failure to defend them. The United States military presence in Afghanistan, flawed and indecisive as it was, was indispensable to save human lives. It should not have ended. 



Friday, August 6, 2021

Why Don't Socialists Take Exploitation Seriously?

Socialist: "Capitalism is full of exploitation. Capitalist employers exploit their employees by underpaying them."

Randian Capitalist: "Well, maybe the mixed economies we currently have see that kind of thing happening. The problem is that the presence of socialism in our mixed economies socializes individuals to believe that individuals have no inherent worth and so it's okay to sacrifice or exploit them for any bigger end. What we need is real capitalism. Once we have real capitalism, people will grow up with a new ethos. Under real capitalism, a Randian ethos will emerge in which everyone insists on paying others exactly what they deserve, not a penny less or more. Therefore, real capitalism eliminates all exploitation.” 

This sounds pretty dumb, right? But socialists often say the same thing in reverse. They claim that socialist societies will create a new ethos in which people will refuse to mistreat one another, will work for the public good, and will abide by various demanding moral rules. They claim that existing selfishness--including selfishness and malice in socialist communes or countries--is the result of past socialization from capitalism. 

Of course, it's possible they are right! While most people just assert this stuff because it flatters their ideology, we can at least study how different systems affect things like interpersonal trust, altruism, cooperativeness, trustworthiness, and so on. Unfortunately for socialists, though, the empirics are rather clear that capitalism increases these good things while socialism and traditional society tend to demote them. You can read Markets without Limits for a review of that literature. 

Socialists oddly don't seem to take exploitation very seriously despite talking about it all the time. Consider another dialogue:

Socialist: "Exploitation occurs when someone uses their market power--which results from power differentials--to give someone a bad deal that takes advantage of the other's misfortune or differential power."

Capitalist: "That sounds bad. We should stop that if we can. So, I'm guessing then that you think it's really important to foster a competitive market--competitive in the technical economic sense--so that everyone everywhere is a price taker and no one has market power. You'd probably hate it if there were monopolies or monopsonies."

Socialist: "Well, no, what I propose we do is create a monopsony for labor and a monopoly seller of goods."

Capitalist: "Huh? What? I think I must have misheard you, because you just said literally the wrongest thing anyone could propose as a solution to this problem."

Socialist: "No, we'll create a monopoly and monopsony but make sure everyone is super duper nice."

Capitalist: "What the fuck?"

Socialist: "For real."


Wednesday, August 4, 2021

Why Was Rawls's Work So Bad?: Some Speculation

Rawls's work is quite bad. His style is boring and awkward. He writes in half-retractions. He offers a method but doesn't stick to it. He advances principles but doesn't really argue for them. (For instance, his argument for moral powers test of basic liberty is radically incomplete--he offers hardly any reason to believe it, and he offers even less reason to think it picks out the liberties he favors.) He largely ignores critics. He straw mans the views he criticizes. He offers criticisms of others but ignores whether his criticisms apply to his own work. His argument for his most famous idea--the difference principle--is clearly unsound, as Wolff demonstrates here. 

Why?

I suspect it's because he gave himself an absurd task.

Rawls often said that he was trying to articulate and defend the implicit, substantive theory of justice inside the culture of the modern democratic nation state.

Here, then, are somewhat snotty but nevertheless accurate summaries of his two biggest books. A Theory of Justice: Idealized agents under a veil of ignorance but who possess special knowledge of sociology and economics would unanimously choose...wait for it...something like the 1972 Democratic Party Platform. Political Liberalism: All reasonable people committed to a free and equal society, but who recognize and respect diversity of value would...wait for it...end up agreeing with Rawls that we should implement either the 1972 Democratic Party Platform or the 1972 Social Democratic Party of Germany's platform. 

Here's the problem, then: Consider the big political parties--the ones that actually get into power and rule in modern democratic nation-states. (You can ignore fringe parties like the communists, libertarians, etc., here.) The platforms that these political parties have--not merely their particular platform in any given year, but their overall tendencies towards various policies--are not derived from philosophical principles combined with economics. Rather, most political parties are composed of a wide range of interest groups with very different ideologies, and often no ideologies at all. These different groups have different goals with differing strengths. The platforms that emerge in any party are half compromise and half accident. Parties push incompatible ideas at the same time and in the same breath. 

The platform of any big party is a hodgepodge of largely incompatible ideas or ideas in deep tension with each other. These tensions arise in part because the party is trying to please different people with conflicting goals and interest. These arise because their voters are usually uninformed, irrational, and inconsistent themselves. They arise because parties mostly want to do what sounds good but also sometimes want to do what is good. They arise because most people are confused and unprincipled themselves. Any given member of parliament probably has inconsistent ideas and goals. The body as a whole does too.

This holds not merely for any one party, but for the overall politics that emerge in any particular country. 

You cannot produce a good, simple, principled, but very substantive account of the implicit theory of justice underlying a modern democratic nation-state because the principled part is rather minimal. My point is not to say that because people disagree, there is no truth in politics, and no substantive theory of justice is correct. Rather, my point is that Rawls was trying to provide a principled defense of what he regarded as the rather substantive implicit principles of justice in modern democratic nation states. But modern democratic nations states have no such implicit robust theory; all they have is something far more minimal. 

Thursday, July 29, 2021

Against MSB's Curve

I am trying to eliminate the undergraduate curve at Georgetown's business school. Here is a draft of a document I'm preparing to make that case. This should be interest to faculty or anyone interested in issues of perverse incentives.



 

A Proposal for the 

Elimination of the Grading Curve in the Undergraduate Program

or

Exemption from the Curve for Teach-to-Mastery Methods

 

Submitted by:

 

Jason Brennan

Flanagan Family Professor, SEEPP

 

 

 

 

 


 

A.    Executive Summary

 

This document forwards two proposals. The first proposal is to eliminate the undergraduate curve entirely. The second, conditional on the failure to pass the first, is to allow undergraduate faculty to opt out of the curve at will provided they teach to mastery.

 

The curve disadvantages our students in graduate admissions and job searches because it artificially makes them appear worse than students from comparable universities, it decreases collaboration and increases competition among students, it increases the role of luck in determining grades, it often makes grades dependent on unreliable and statistically insignificant small differences in absolute scores, it compounds equity issues especially in the first year that arise from differences in the strength of high school curriculums, and it incentivizes professors to either make classes too hard or to not teach everyone to full mastery of the material in order to get a distribution of skill levels. 

 

There is no strong evidence of grade inflation, and even if there were, solving the problem cannot be done unilaterally. Further, most of the purported reasons to adopt a grading curve can be satisfied more effectively without a curve. 

 

B.    The Two Proposals

 

This document forwards two proposals, the second of which is conditional upon the failure of the first to be approved.

 

Proposal 1: Effective immediately, grading curves are eliminated in the undergraduate business curriculum. Undergraduate business faculty will receive the same academic freedom to assign grades as their peers throughout Georgetown University and their peers at most other colleges and universities within the United States.

 

Proposal 2: If proposal 1 fails and a grading curve remains as the default for undergraduate courses, faculty may, at will, and without requiring prior approval from their peers, opt out of the curve provided they follow a teach-to-mastery method. 

 

C.    Benchmarking: Who Curves?

 

Mandatory grading curves are unusual in most fields of study in the US. 

 

Nearly all law schools impose grading curves, such that letter grades serve as an approximation of or shorthand for class rank, though there is a conceptual problem (discussed below) about the coherence of averaging grades across courses. 

 

Beyond that, the overwhelming majority of universities and individual academic programs in nearly all undergraduate majors do not have a mandatory or externally imposed curve. Individual faculty retain academic freedom to distribute grades largely as they see fit, following their judgment about appropriate norms.

 

Table 1 below lists the undergraduate business schools ranked among the top 20 by US News and World Report and indicates whether they have a mandatory or suggested curve. 

 

TABLE 1: Mandatory Curves at the Top 20 Undergraduate Business Schools

 

2021 US News Rank

School

Curve?

1

Pennsylvania

No

2

MIT

No

3

Berkeley

Fixed 3.45 for core; 3.5 for electives, 3.65 for low enrollment electives[1]

3

Michigan

No

5

New York University

In classes of 25 or greater, no more than 35% of students receive A or A-.[2]

5

Texas

No

7

Carnegie Mellon

No

7

Cornell

No

7

North Carolina

No

7

Virginia

No

11

Indiana

No

12

Emory

Recommended distribution[3]

12

Notre Dame

Variable mean dependent on year and department[4]

12

Southern California

No (recently eliminated)

12

Washington, St. Louis

No

16

Georgetown 

Fixed mean cannot exceed 3.5 except in FYS courses

16

Ohio State

No

16

Wisconsin

3.0 in select courses, 3.3 and no more than 30% As in others.[5]

19

Georgia Tech

No

19

Illinois

No

19

Maryland

No

19

Minnesota

No

19

Washington

No

 

The upshot: Among the 23 universities ranked within the “top 20,” 5 have mandatory curves. 1 has a suggested curve, which appears to be lightly enforced. 

 

D.    What Do Grades Signify? (Skip If Necessary)

 

While nearly colleges and universities within the United States use the A, B, C, D, F, +/- grading system, there is no universal definition of what these grades signify. Indeed, Guy Montrose Whipple noted this point in the early 1900s:

When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking system. School administrators have been using with confidence an absolutely uncalibrated instrument…

What we know to know is: What are the traits, qualities or capacities we are actually trying to measure in our marking systems? How are these capacities distributed in the body of pupils or students? What method ought we to follow in measuring these capacities? What faults appear in the marking systems that we are now using, and how can these be avoided or minimized?[6]

American universities and colleges have in effect landed on a common set of symbols for grades without agreeing on what these symbols mean or signify, except that in some way A > B > C, etc.

 

To illustrate, even within one university, grades in different classes might in various professors’ own minds signify any of the following:

 

1.     Grades as rankings: 

a.     A letter grade ranks a student against other students in the same section of the same class that semester.

b.     A letter grade ranks a student against all other students in any section of a given class in a semester.

c.     Further expansion: We could in principle expand the ranking set outward to go across professors, years, or even universities. At the limit, a letter grade in introductory microeconomics could rank a student against all other students who have ever taken or ever will take that class at any university anywhere.

 

2.     Grades as qualitative evaluations:

a.     A letter grade reports a qualitative description of how well a student mastered material according to the professor’s absolute standards, though different professors might have different standards. 

b.     A letter grade reports a qualitative description of how well a student mastered material according to the university’s absolute standards, consistent among all professors, but the standards might vary from university to university. 

c.     A letter grade reports a qualitative description of how well a student mastered a given set of material according to what is meant to be a universal absolute standard, e.g., such that a B in ECON 101 at Boise State = a B in ECON 101 at Cornell.

 

3.     Grades as quantitative scores/percentages:

a.     A letter grade reports what percent of questions and problems a student got correct, according to the professor’s standards, but the standards might vary from professor to professor.

b.     A letter grade reports what percent of questions and problems a student mastered according to the university’s internal standards, consistent among all professors, though the standards might vary from university to university. 

c.     A letter grade reports what percent of questions and problems a student mastered, according to what is meant to be a universal standard, e.g., such that a B in ECON 101 at Boise State = a B in ECON 101 at Princeton.

 

In fact, a typical undergraduate business student at Georgetown has grades that were assigned on many of these different conceptions of grades. A FINC 101 grade might signify her ranking against all other students that semester. Her FREN 101 grade might signify the professor’s qualitative evaluation based on her professor’s judgment of what constitute universal standards for introductory French at the college level. Her CHEM 101 grade might be a shorthand for the total percentage of problems she got right on a set of multiple-choice exams with somewhat arbitrarily assigned weights to questions.


Further, grades distributed at different universities might have different values. ECON 101 at Dartmouth might have higher standards than ECON 101 at Keene State University.

 

The upshot: While everyone recognizes that all things equal, an A is better than a B, in fact, grades do not have universal significance or meaning.

 

E.     Incommensurability of Grades (Skip If Necessary)

 

It is common for universities to average grades among classes and for instructors to average grades within classes. However, whether it is coherent to do so depends upon the meaning of the grades in question. 

 

For instance, rankings cannot generally be averaged because they represent ordinal numbers on incommensurable scales. Compare: 

 

1.     If MSB ranks #16 among undergraduate programs and #21 among MBA programs, it is not accurate to say that we rank #18.5 overall. This presumes, incorrectly, both that the two rankings are equally weighty and that distances between spots on the rankings fall along a constant scale. 

2.     If Christine ranks 1/50 in STRT 230 and 45/50 in ACCT 101, it is similarly not accurate to say that she ranks 23/50 overall in these two courses. Again, such averaging presumes both that the two rankings are equally weighty and that the distances between spots on the rankings fall along a constant scale. Accordingly, at MSB, when we average her grades in these two courses for GPA purposes, we pretend that we can average incommensurable ordinal rankings. We insert information that was not present in the original grade and ignore information that was present. It is mathematically incoherent.

 

This problem is compounded when we start averaging grades across classes in which the grades have very different meanings. To illustrate, suppose Christine has five courses. In some courses, grades represent a qualitative judgment, in some a ranking, and in others a percentage of somewhat arbitrarily weighted test problems solved correctly. Her GPA calculation for the semester in effect looks like this:

 

GPA           = [Good + Excellent + 100(249/320) + ranked 1/45 + ranked 19/38]  ¸ 5

             = [3.00 + 4.00 + 2.66 + 4.00 + 3.33] ¸ 5

                          = 3.398

The second step involves inserting information that was not present in the first step, ignoring information that was present, and pretending that incommensurate meanings are in fact commensurate cardinal numbers. 

 

Despite these conceptual problems, GPA might have external validity. Perhaps, all things equal, those with higher GPAs might perform better in the future, earn higher incomes, and so on. However, there is little and limited evidence of such external validity. One major paper claims the relationship between GPA and future job success is weak.[7] In contrast, Frank Schmidt and John Hunter, in a comprehensive meta-analysis of previous papers, claim that the correlation between college GPA and future job success is 0.34.[8] There is strong evidence that higher high school GPAs predict higher incomes, though this results almost entirely from those with higher high school GPAs being more likely to graduate from college and receive the college wage premium.[9]  

 

The upshotIn fact, grades between classes (and even individual grades within classes) cannot meaningfully be averaged. GPA might nevertheless have some external validity, though the evidence is surprisingly weak.

 

F.     The Social Meaning of Grades Is a Collective Action Problem

 

We use grades for both internal and external purposes. 

 

Internally, we use them to rank students against each other. Grades represent a short-hand description of a student approximate demonstrated skill compared to others in their year in the program. Even this description is too generous, as the problem of incommensurability means that two students with a 3.5 GPA in MSB courses might be radically different in terms of ability and absolute performance. Nevertheless, at least internally, everyone understands the grades are meant to represent an approximate relative ranking against other students at MSB.

 

Externally, though, what grades mean is not up to us. We do not get to decide how employers and graduate admissions committees interpret our grades.  Instead, potential employers and graduate admissions programs will tend to presume our grades signify more common qualitative assessments. Many potential employers or graduate admissions programs are and will remain unaware that we use grades to approximate our internal rankings, even if we attempt to inform them of such through notes on transcripts or accompanying letters. 

 

Further, we can expect that employers and graduate admissions programs make subjective judgments about how to compare equivalent letter grades across universities. For instance, a GPA of 3.6 from MIT might be viewed as more impressive than a GPA of 3.8 from UMass Boston. We can expect that employers and others will “weigh” our students’ GPAs based on their stereotypes of the quality of our students and the rigor of programs. Accordingly, employers might be look more favorably on a 3.5 undergraduate GPA from Georgetown’s MSB than a 3.5 GPA from ASU’s Carey. However, unless it is widely known that MSB students’ grades are curved—and it isn’t—then we nevertheless harm our students when they are compared to students from other universities which employers and admissions officers stereotype as being equivalent in rigor in quality. 

 

Perhaps, thanks to our grading curve, a 3.5 GPA at MSB should be seen as better than a 3.5 GPA at USC’s Marshall, but that doesn’t mean it is. If we have a curve and they do not, we disadvantage our students. We create the appearance that our students are worse though they are not. 

 

Some employers and graduate admissions programs require a minimal GPA for consideration; they do not allow students with lower GPAs to apply, period. 

 

It would perhaps be best if all undergraduate business schools coordinated on a common set of standards in grading, just as law schools have coordinated in using grades to represent rankings (ignoring the problem of incommensurability mentioned above). But they have not. 

 

Upshot: When students apply for jobs or graduate admissions, it doesn’t matter what our internal philosophy of grades is or what we take grades to signify. What matters is how others interpret those grades. They are generally unaware of our curve. We thus artificially create the appearance that our students are worse than comparable students at other universities. We do not get to decide how others interpret our grades. Our intentions may be good, but that does not mean they produce good results. We might believe internally that our curve increases the rigor of our program; externally, it might nevertheless signify our students are worse than others. 

 

G.    Grades as a Zero-Sum Game

 

A curve makes grades competitive. What grade a student receives depends not on how well she does, but on how well she does compared to others. The exact details will vary from class to class, but in general for one student to improve her grade over the semester requires corresponding losses to others’ grades. Instead of “People for others,” our grading motto is in effect, “My gain is your loss”.   

Accordingly, a curve creates perverse incentives to avoid collaborating with or helping others not in one’s own group or even to sabotage others in different groups.

As we teach in many of our courses, competition is often a good thing. We want companies to exist in a competitive market. In some cases, the zero-sum grading curve allows us to better simulate competitive markets in class. Nevertheless, it is unclear whether the benefits outweigh the costs for classroom learning. 

 

Our university advertises itself as promoting a collaborative environment in which students take an interest in each other’s’ welfare. A zero-sum grading system is in tension with these professed values. 

 

UpshotThe curve creates a zero-sum grade distribution and corresponding competitive mentality among our students. This has advantages and disadvantages. 

 

H.    Luck

 

One might think that as much as possible, grades should be based on a students’ skill, talent, and performance, not luck. However, if grades are determined by relative ranking, then by necessity a student’s grades depend heavily upon luck. If a student is in an unusually talented or driven class, her grades will be lower than if she were in an unusually untalented or lazy class, though her absolute performance remains the same. 

Upshot: A curve increases the significance of luck in determining grades. 

 

I.      Score Compression, Grade Uncertainty, Arbitrariness, and Faculty Sneakiness

 

Some faculty distribute scores during the semester with the goal of ensuring students know roughly where they stand. Students’ final grades are not a surprise.

 

However, in many classes, scores are compressed within a narrow range. Final averages might cluster between, say, 90-94. A professor will then need to impose a curve, such that a 90 becomes a B and 94 and A. 

 

This is problematic for a number of reasons. For one, students face anxiety from uncertainty. They cannot easily anticipate what their in-class scores mean. Second, and perhaps more damningly, this means that final grades are effectively arbitrary. Small point differences inside a course are rarely significant or robust indicators of genuine differences in effort, ability, or performance. Empirical work on grading indeed finds that professors routinely score the same essay or assignment very differently depending on how the professor feels, whether the professor has recently eaten, or other arbitrary factors. 

 

Jeffrey Schinske and Kimberly Tanner summarize the extant empirical literature on this last point:

 

[Educational psychologist W.C.] Eells investigated the consistency of individual teachers’ grading by asking 61 teachers to grade the same history and geography papers twice—the second time 11 wks [sic] after the first. He concluded that “variability of grading is about as great in the same individual as in groups of different individuals” and that, after analysis of reliability coefficients, assignment of scores amounted to “little better than sheer guesses”. Similar problems in marking reliability have been observed in higher education environments, although the degree of reliability varies dramatically, likely due to differences in instructor training, assessment type, grading system, and specific topic assessed. Factors that occasionally influence an instructor's scoring of written work include the penmanship of the author, sex of the author, ethnicity of the author, level of experience of the instructor, order in which the papers are reviewed, and even the attractiveness of the author.[10]

 

Professors tend to hate student “grade grubbing”. The curve creates a perverse incentive for faculty to score students very closely together, creating for many of them the illusion that they will receive high marks. Faculty can then assign lower grades to many of these students after student evaluations have been written and when the faculty no longer expect to see the students face-to-face.

 

Consider: If you were reading a paper submitted to a journal which relied upon this kind of data, you would probably laugh as you recommend rejection. 

Upshot: Score compression is a problem in many courses. Combined with our curve, this leads to effectively arbitrary grade distributions based on small, unreliable differences in assigned scores. 

 

J.      Teaching to Mastery vs Teaching to Mediocrity

 

Traditional teaching methods typically ask students to complete a project, test, essay, or activity. An instructor then grades the deliverable, offering advice about what the student could have done better. But then the student does not get the chance to do better; they move on to the next item.


It’s instructive that basically no one uses this method when the instruction actually matters and the trainee is expected to perform. GEICO gives insurance adjustors tests about insurance law and policy, but it doesn’t let them progress with training and start doing the job until they actually master the material. 

 

At base, “teach-to-mastery” or “mastery methods” of classroom instruction employ the same philosophy for classroom instruction. Teach-to-mastery is what an instructor does when they take FINC 101 at least as seriously as a child’s piano teacher takes the C major scale. 


For instance, since the curve was suspended, Jason Brennan has implemented a simple teach-to-mastery method in all of his courses. Students must pass in their assignments on time, but they are permitted to revise every assignment as many times as they want until the semester ends, until they get the grade they want. In the past, a student might get a C on the first essay and then slowly improve over the semester. Now, students instead fix that first essay. Surprisingly, Brennan’s classes often have a culture of revision, in which students revise and improve work even when they have already gotten an A and such revisions cannot improve their final grade. 


Such teaching does indeed take more work, though not necessarily as much as one might think, in part because revisions often simply add missing material rather than start over from scratch. In Brennan’s case, he estimates it adds maybe 15-20 extra hours of teaching work semester. It partly reduces work because it eliminates grade-grubbing; instead of students arguing they deserved a better grade, they rework their projects to get the better grade. He has not found his research output to have suffered in any way because of    this. 

 

At MSB, tenure-track faculty have a monetary incentive to minimize teaching work and instead maximize research output. Prestige, promotion, speaking engagements, and raises are more strongly tied to research than teaching.  

 

Accordingly, given these incentives and given that we are a Research I university, it is probably too much to demand that all faculty teach to mastery. However, it seems perverse to insist that faculty do not teach to mastery or to discourage them from doing so.

 

As it stands, our curve creates perverse incentives. Faculty want to avoid a situation in which nearly all students master the material, but nevertheless some students must receive low B grades because they were slightly less exceptional than others. Accordingly, we are incentivized to make tests too long, essays too difficult, projects too rigorous, and to prevent too many students from learning too much, all to ensure that a sufficient number of students do poorly enough to fall below the fixed mean. The curve creates perverse incentives against holding extra office hours, allowing revisions, holding review sessions, explaining the material well, and pre-briefing projects.

 

When the curve is absent, a professor is free to think, “I have certain absolute standards for what constitutes A-level work and I want to work to ensure all of my students get there.” When the curve is present, we are instead incentivized to think, “I need to ensure that a sufficiently large number of students fall below my standards.” Thus, we are incentivized to teach to mediocrity rather than mastery.  

 

Upshot: The curve creates perverse incentives to underteach or apply inappropriately high standards. It disincentives teaching to mastery, the system which is probably best for students. 

 

K.    The Widespread but Unsupported Belief in Grade Inflation

 

There is a widespread belief among academics and others that grades have become “inflated”. A C in 1960 is the same as a B in 2021, or so the claim goes.

 

For the sake of argument, assume grade inflation exists and that it’s bad. One might think that curves are an effective tool to combat such grade inflation. However, we again fall back into a collective action problem. If a small minority of schools impose strict curves, they reduce inflation at their school. But, to extend the metaphor, when their students apply for jobs or graduate admissions, the “buyers” won’t know the students’ “prices” are in a different “currency”. They won’t know that a Georgetown 2021B = a Georgetown 1960B = a Wharton 2021A-. They will simply assume our grades are also inflated and presume the Georgetown B-student is less capable than she really is. Curves might be a good solution when almost everyone does them, but that doesn’t mean we should adopt them when others have not. We do not get to decide what others interpret our grades as signifying.

 

However, we need to ask whether this assumption is warranted. As business professors, we all understand that a mere increase in monetary prices does not signify inflation. There could be demand or supply shifts or shocks, or any number of other things going on. We need to know whether prices changed because of changes in the quantity of money. 

 

The analog for grades is that it is not enough to know whether average GPAs are higher now than in the past. We need also to know whether work has gotten better, worse, or stayed the same. Georgetown’s admissions are more rigorous now than in the past, our students are better, our faculty are better, students have more support services, students are more informed and better at self-sorting into classes they excel at, and so on. If we held constant standards over 50 years, then our students today should be getting better grades than students 50 years ago. 


In fact, no one has published a study proving that students are getting higher grades today for equivalent work in the past, that a C paper yesterday really is the same as a B today. No one has shown that GPAs have increase over time at a faster rate than quality of student work. So, we don’t know whether grade inflation has occurred. (At elite universities, student credentials are stronger now than in the past, so we should expect their work is better all things equal.)


But it gets worse. It turns out we don’t even know how average college GPAs have changed. Many reports which appear to show a raw GPA increase use student-reported GPAs or other poor sources of data. These are unreliable: students might lie, misreport, or forget their GPAs, there might be selection problems in who answers the surveys, and so on. 

We need good data—actual student transcripts collected in a way that ensures proper sampling. The only major study that does so was conducted by US Dept of Education researcher Clifford Adelman. A short summary of his findings is that cohort of 1972 had an average GPA of 2.70, the cohort of 1982 had an average GPA of 2.66, and the cohort of 1992 had an average GPA of 2.74.[11] His study is now out of date and no one has published an equivalently rigorous study since then, in part because only someone in the Dept of Ed could force universities to provide the needed data. 

 

GradeInflation.com appears to demonstrate otherwise. But it’s bogus. It’s bogus in part because the author at best finds changings in raw GPA over time but doesn’t attempt to measure underlying performance. For all we know, student work has improved faster than average GPA, and so what looks like inflation is actually deflation. It’s bogus also because most of his data sources are bad. He relies on student newspapers, student self-reports, reports in rival university newspapers, and so on. Only in a minority of cases does he use unassailable data from registrar reports or properly sampled student transcripts. Even then, there usually isn’t sufficient data to determine whether many of the purported changes are statistically significant rather than random noise. 

 

L.     How Do Grades Affect Learning?

 

Experimental data suggests grades do little to help students improve their work. For instance, R. Butler and M. Nisan ran an experiment in which students completed a task, received either no feedback or one of two different types, and then had to do the task again.[12] They could then measure the value-added, if any, of the feedback. They gave the experimental groups either what they called evaluative or descriptive feedback. Evaluative feedback—such as a letter grade—tells students how good or bad their work is. Descriptive feedback gives students advice about how to do better. They generally found that—to produce better performance in the future—giving grades and evaluative feedback was better than giving no feedback, but giving descriptive feedback by itself was better than giving evaluative feedback and grades.

 

Further, grades do not appear to have a positive effect on students’ motivation. As Jeffrey Schinske and Kimberly Tanner summarize the large body of extant research: 

 

It would not be surprising to most faculty members that, rather than stimulating an interest in learning, grades primarily enhance students’ motivation to avoid receiving bad grades. Grades appear to play on students’ fears of punishment or shame, or their desires to outcompete peers, as opposed to stimulating interest and enjoyment in learning tasks. Grades can dampen existing intrinsic motivation, give rise to extrinsic motivation, enhance fear of failure, reduce interest, decrease enjoyment in class work, increase anxiety, hamper performance on follow-up tasks, stimulate avoidance of challenging tasks, and heighten competitiveness. Even providing encouraging, written notes on graded work does not appear to reduce the negative impacts grading exerts on motivation. Rather than seeing low grades as an opportunity to improve themselves, students receiving low scores generally withdraw from class work. While students often express a desire to be graded, surveys indicate they would prefer descriptive comments to grades as a form of feedback. [13]

 

M.  Responses to Arguments for the Curve

 

Argument: “We need to reduce grade variance.”

Response: The curve reduces variance in the final grades. It does not reduce variance in course content, course standards, or student performance. By itself, it creates the illusion of uniformity without creating underlying uniformity. It is a method of putting the same shell on different snails. Further, it’s not even clear that uniformity is good. We want faculty to experiment with different teaching methods, evaluation techniques, and so one. 

 

Argument: “The curve creates rigor.”

Response: First, anyone asserting this should provide evidence it indeed does so. Whether it does is far from clear. Some classes are easy, some are hard, and yet all are curved. In some classes, a 92/100 generates a B average. Second, anyone asserting this should demonstrate that that we cannot provide appropriate rigor through other means. For instance, teaching-to-mastery is a way of being rigorous—indeed more rigorous—even though every student can receive an A. Third, if the problem is rigor, we should solve the problem by demanding rigor. We could have faculty evaluate each other’s syllabi or grading to ensure no one is going “too easy” on students. If we genuinely cared about rigor, we’d enforce rigor, not enforce the appearance of rigor. 

 

Argument: “Students will go searching for easy As and faculty will be incentivized to provide them to get higher SET scores.”

Response: The research on SET scores over the past 20 years is almost univocal: SET scores are not a valid measure of faculty teaching effectiveness. The fact that Georgetown uses them indicates either that our administration is culpably misinformed or that we are not actually concerned about teaching effectiveness. Regardless, we should not do something wrong (impose the curve) as a response to a previously morally wrong and unscientific decision we made (use SET scores instead of measuring teaching effectiveness). 

 

Argument: It serves my department’s or my own self-interest to use the curve.

Response: We have a fiduciary duty to students which renders all such arguments inadmissible.

 

Argument: The curve helps students get jobs and earns them more money.
Response: I cannot find evidence that this is true. 

 

 



[1] https://haas.berkeley.edu/ewmba/academics/grades/

[2] https://www.stern.nyu.edu/portal-partners/current-students/undergraduate/academics/academic-policies#Grade%20Point%20Average

[3] https://goizueta.emory.edu/undergraduate-business-degree/curriculum/standards

[4] https://mendozaugrad.nd.edu/academics/departmental-grading-guidelines/

[5] https://guide.wisc.edu/undergraduate/business/#policiesandregulationstext

[6] Guy Montrose Whipple, “Editor’s Preface”, in Finklestein 1913, 1, 

[7] Bretz 1989.

[8] Schmidt and Hunter 1998.

[9] French et al 2015.

[10] Schinske and Tanner 2014. See also Branthwaite, Trueman, and Berrisford, 1981 for further evidence that grading is unreliable.

[11] Adelman 2009

[12] Butler and Nisan 1986.

[13] Schinske and Tanner 2014,