First posted: 12/12/95.
Revised: 3/11/96 (added interim report #1)
5/7/96 (added interim report #2, and some background links)
12/6/96: added Report on the design and testing of an applicant proposing matching algorithm, and comparison with the existing NRMP algorithm
6/4/97 added the press release from the NRMP announcing the adoption of the new algorithm for all matches starting in 1998 NRMP Will Employ New Residency Match Algorithm In 1998.
9/24/97 added the JAMA article "The effects of the change in the NRMP matching algorithm".
most recent revision 12/9/97 here are the press releases concerning the adoption of the new algorithm from the American Medical Student Association ( "It will go a long way towards restoring faith in the system of residency allotment..." ) and the AMA Medical Students' Section ( "We feel this change will not only enhance applicants confidence in the match program, but will also ensure a successful and effective resident matching program."
The design and comparison phase of the study was concluded in late 1996, and the final report is now available here: Report on the design and testing of an applicant proposing matching algorithm, and comparison with the existing NRMP algorithm . The NRMP board of directors voted on May 5, 1997 to adopt the new algorithm, and their announcement is here: NRMP Will Employ New Residency Match Algorithm In 1998. A brief summary of some of the important elements of the final report, which appeared in the September 3, 1997 issue of JAMA, is here: "The effects of the change in the NRMP matching algorithm".
For those unfamiliar with the market and the questions which were to be answered, a little background may be helpful. The progress of the study can be followed by looking at the initial proposal, and at the Interim Report #1: Evaluation of the current NRMP algorithm, and preliminary design of an applicant-processing algorithm, and Interim Report #2: Some Proposed Comparisons of Matching Algorithms
This discussion took specific form in four articles published in the June 1995 issue of the journal Academic Medicine. On the web, the American Medical Students Association (AMSA) has posted both a "call to arms" urging changes in the NRMP, and (together with Public Citizen, Ralph Nader's organization) has distributed a more detailed Report on the NRMP explaining their position. See also the subsequent Statement of the American Medical Association Council on Medical Education, Section on Medical Schools, Resident Physicians Section and Medical Student Section Concerning the National Resident Matching Program
In the early 1980's I conducted a study of the NRMP as it was then organized, starting with an account of the history which led to its organization in the early 1950's. The introductory parts of that paper are up on the web, here:
Roth, A.E. "The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory", Journal of Political Economy, 92, 1984, 991-1016.
That paper showed that the matching algorithm used by the NRMP produced a matching that is stable in a certain sense. In a subsequent paper, I studied markets for new physicians in the U.K., which helped make a very strong empirical case that this kind of stability is important for the success of computerized markets of this kind. An electronic 'reprint' of that paper is available in its entirety on the web, here:
Roth, A.E. "New Physicians: A Natural Experiment in Market Organization," Science, 250, 1990, 1524-1528.
It now seems uncontroversial that, to continue to organize the market successfully, the NRMP must continue to avoid producing the kinds of instabilities which have caused some other attempts at organizing computerized markets to fail. However there can be more than one stable matching, and so the controversy surrounding the NRMP has focused on which of perhaps many stable matchings should be chosen, and whether the algorithm should be organized in a way which, when there is a choice, tends to favor hospital programs, or students.
Some further background material, which puts the NRMP in the context of many other entry level labor markets (e.g. lawyers, graduates of elite Japanese universities, etc.) can be found in a short paper
Roth, A.E. "The NRMP as a Labor Market," Journal of the American Medical Association, 275, April 3, 1996, 1054-1056. (this paper is reproduced here in its entirety).
Much more detail (at the cost of much greater length) can be found in
Roth, A.E. and X. Xing "Jumping the Gun: Imperfections and Institutions Related to the Timing of Market Transactions," American Economic Review, 84, September, 1994, 992-1044.
In the Fall of 1995, the NRMP Board invited me to propose a plan for analyzing the effects of changes which might be considered in the matching algorithm. The plan I proposed is reproduced below. It consists of three phases, and the NRMP Board has decided to proceed with the first of these, which will involve an analysis of the current market algorithm and organization (which has changed considerably since my 1984 paper), design of alternative algorithms, and computational experiments based on the preference orders submitted in recent matches.
This document outlines a preliminary research program for the evaluation of the current NRMP algorithm and of changes to be considered in its operation and description. A principal objective will be to design a student-proposing stable (or approximately stable) match algorithm and to assess what would be the effect on the match of substituting such an algorithm for the present NRMP algorithm. Attention will need to be paid both to the mathematical/computational properties of the algorithm, and to the behavior of market participants and how this might be influenced by a change in algorithms. In addition, attention will have to be paid to how any changes would be monitored and evaluated.
The following table of contents gives an overview of the issues to be addressed and tasks to be undertaken.
Task List and Tentative Timeline
This document outlines a preliminary research program for the evaluation of the current NRMP algorithm and of changes to be considered in its operation and description. The matters considered are primarily those raised in the June 1995 issue of Academic Medicine. A principal objective will be to design a student-proposing stable (or approximately stable) match algorithm and to assess what would be the effect on the match of substituting such an algorithm for the present NRMP algorithm. Attention will need to be paid both to the mathematical/computational properties of the algorithm, and to the behavior of market participants and how this might be influenced by a change in algorithms. A related issue is how the match should be described to participants, and what advice should be given.
This research program can be usefully thought of as consisting of three overlapping but largely distinct parts, namely:
To design an alternative (student-proposing) algorithm, and compare it with the existing algorithm, in terms of both output matchings and opportunities for strategic behavior;
To assess and analyse the behavior and perceptions of participants in the market (including behavior prior to filling out rank order lists--e.g. in the interviewing stage), and consider how these might be affected by a change in algorithms;
To consider the effect of changes in market demographics-- i.e. the numbers and kinds of positions and applicants--on the above.
Studies by this author of other entry level professional labor markets and matching processes, both medical and non-medical, have now made it noncontroversial that centralized matching mechanisms like the NRMP arise in response to certain kinds of market failures, which deprive all participants of much of the benefit that an orderly market offers 1. In the case of the NRMP, these market failures involved both very early appointments, and a chaotic and unreliable process of making and confirming these appointments, which existed in virulent form in the decade prior to the establishment of the NRMP. These same kinds of market failures persist today in other markets, such as the American market for Federal court law clerks. One concern therefore is to evaluate the danger that these market failures could re-emerge, as market conditions change, both under the current operation of the market and under changes in its operation which might be considered. In this respect, economists as well as physicians should seek to work to Hippocrates' standard:
"The physician must be able to tell the antecedents, know the present, and foretell the future- must mediate these things, and have two special objects in view with regard to disease, namely, to do good or to do no harm." [Hippocrates 2]
In particular, decisions about the future operation of the market should be informed by some assessment of the likelihood of a recurrence of the behavior which led in the past to market failure. Thus if, following the analysis of the results of this proposed study, a policy decision is made to change the algorithm, then it would be prudent to design the change, and subsequently monitor and evaluate it, with the aid of a further program of monitoring and experiments, akin to clinical trials for a new treatment of a disease.
The sections which follow outline particular questions to be pursued, and methods of pursuing them.
I. Design of a student-proposing algorithm, and comparison with the current NRMP algorithm
Much of the argument on both sides of the present controversy has been couched in terms of results obtained, in many cases by the present author, from the game-theoretic analysis of matching models. This analysis focuses on matchings which are stable in the sense that no student and residency program who are not matched to one another would both prefer to be. There is now compelling empirical evidence that, to successfully organize the market, centralized matching mechanisms need to produce stable matchings 3. This point appears to be noncontroversial; the debate about the NRMP has focused on which stable matching, among possibly many, is being produced.
If students and positions were treated symmetrically, there would exist a hospital optimal stable matching and a student optimal stable matching. Furthermore, the hospital optimal stable matching would be achieved by a matching algorithm in which hospitals offer positions to students, who reject all but the best offer they receive, and the student optimal stable matching would be achieved by an algorithm in which the roles are reversed. (There are good reasons, having to do with the incentive properties of these two stable matchings, to concentrate on them rather than consider some stable match intermediate between the two; see Roth and Sotomayor 1990, Chapters 4 and 5.)
In the actual medical market served by the NRMP, asymmetries between students and positions may make the situation somewhat more complicated. It is already well known that the presence of married couples who go through the match together may make stable matchings more difficult (and in some conceivable cases impossible) to achieve 4. In general similar difficulties are to be anticipated whenever there is a linkage between positions, as when students may match to PGY2 and PGY1 positions simultaneously, which they can now do in some specialties.
The current NRMP algorithm deals with these problems, apparently successfully. My understanding is that the current NRMP algorithm was originally based on the hospital-proposing algorithm for symmetric markets, but that modifications have been made as match variations, chiefly having to do with the growing number of PGY2 positions, have been added over the last decade or so 5. It can therefore be anticipated with reasonable confidence that a modification of the student proposing algorithm can also be designed, with stability properties similar to those of the current algorithm. This will likely take some work: because of the asymmetries between students and positions, the existing algorithm cannot simply be "reversed" to produce a new algorithm.
There is no guarantee that definitive mathematical analysis of the algorithms is possible, or possible in a timely way. However it will be possible at the very least to identify potential sources of trouble, (which previous theoretical results show to be associated with variations in the market which induce "complementarities" in preferences 6). These will be places in which one or both of the two algorithms may have to make more than one pass through a given preference list. At such places in the algorithm, tests will have to be installed, to detect cycling of the algorithm or potential instabilities in the resulting matching.
The design of the student-proposing algorithm may therefore be a partly experimental, iterative process, in which alternative implementations will be compared on the basis of how well they succeed in producing stable matchings on the dataset of rank order lists from previous matches. For comparison purposes, it will be desirable to conduct the same tests on the current NRMP algorithm.
Once a student proposing algorithm has been designed, its outcomes under existing datasets of rank order lists can be compared with the outcome of the current NRMP algorithm, as can its sensitivity to strategic manipulation by both students and hospitals. These comparisons must depend heavily on computational experiments.
This is because the matchings which result depend not only on the properties of the algorithms and the formal requirements of the match, but also on what kinds of preferences market participants have, as reflected in the rank order lists they submit. That is, similarly organized matches in markets with different patterns of preference might yield very different results. For this reason, some questions can only be answered by conducting computational experiments using the data set of submitted rank order lists from previous matches. (Of course to the extent that participants may seek to take advantage of real or perceived strategic opportunities, the submitted rank order lists may not correspond exactly to true preferences, a point which will be returned to later, when we consider participant behavior).
1. Comparison of matchings
A first estimate of the impact of a change in algorithms can be obtained by comparing the matchings resulting from the two algorithms on previous years' matches, from their submitted rank order lists. For both students and hospitals, this would allow us first to count how many participants would have been affected by a change in algorithms, and to categorize these changes. For example, having the two outcomes for each of several year's matches will enable us to answer the following sorts of questions.
How many students, and hospitals, are likely to be affected each year by a change in the match algorithm?
How may residency programs will be affected in only one position, in two, etc?
Are some kinds of residency programs likely to fill multiple positions differently under the two algorithms?
Are students and residency programs in some specialties more likely to be influenced by a change in algorithms than those in other specialties (and how does this relate to the market demographics in those specialties)?
What is the probability that a student's outcome will change by one place in her rank order list, two places, etc.?
Is a student who received (e.g.) her 3rd choice under the NRMP algorithm more or less likely to receive a different outcome under the student proposing algorithm than a student who received his, say, 7th choice?
Is the year to year variance in outcomes sensitive to the algorithm used?
2. Comparison of strategic opportunities connected to the existing rank order lists
In a market in which applicants and positions are treated symmetrically, the existing game theoretic analysis offers considerable guidance for investigating how participants might profit from filling out their rank order lists strategically 7. The analysis involved in designing the student-proposing algorithm should also give a good indication of which strategic properties are shared by the hospital and student proposing algorithms given the asymmetries in the medical market 8. But for reliable estimates of the frequency with which possible situations actually occur given the pattern of preferences in the market, it will again be necessary to rely on computational experiments using the submitted rank order lists.
The chief difficulty in designing such experiments is that a change in a single rank order list has two kinds of effects: it may potentially change the match of the student or residency program whose rank order list is changed, but it may also potentially change the match of other students and residency programs. To see why, suppose that the rank order list of some student, which lists ten positions, is truncated to only its first five positions, and the match under one or the other of the algorithms is rerun after making (only) this change. Then the student whose rank order list was changed may do better (by being matched to a higher ranked choice) or worse (by being unmatched instead of matched), but at the same time other students may do better because of the lessened competition for the five positions eliminated from the rank order list, and other residency programs may do worse.
If it were practical to conduct computational experiments one student at a time, these effects could be disentangled fairly easily. However given the size of the market and the likelihood that most marginal changes in a single rank order list will not have any effect, it will likely be necessary to conduct computational experiments which sample some subset of participants simultaneously.
Consider then, an experiment in which a subpopulation of 1% of the students is picked at random to have their submitted rank order lists shortened by one position. (Right now all numbers are chosen arbitrarily--a full experimental design would specify sample sizes, etc. based on an estimate of magnitudes from pilot experiments.) The proportion of students in the 1% sample whose match outcome is improved would be an overestimate of the probability that an individual student could profit by shortening his list by one position, since it would be confounded with the reduced competition due to the shorter lists of the other students in the sample. Similarly, the proportion of students in the 1% population who become unmatched would be an underestimate, again because of the reduced competition due to the other shortened lists.
Depending on the magnitudes of these proportions, it may be both feasible and desirable to conduct followup computational experiments on this 1% sample, in which each affected student is now examined individually. A combination of population samples (of both students and residency programs), together with a followup of the participants whose outcomes were affected in each sample, would give a reliable estimate of the opportunities for strategic behavior for both students and residency programs.
The form of strategic behavior to be considered should be influenced both by considerations of the information requirements for participants to identify strategic opportunities, and by the behavioral assessment of participant behavior, to be described next.
II. Assessment and analysis of the behavior and perceptions of market participants
The analyses discussed up until this point make use of the rank order lists submitted in previous matches, which were run under the existing NRMP algorithm. The results of these analyses, about outcomes and strategic opportunities, will be most informative about the effect of a change of algorithms if the submitted preference lists are representative of participants' true preferences, and would not change radically or systematically if a different algorithm were substituted for the NRMP algorithm. The purpose of behavioral analyses is to help determine the likelihood that something close to this is the case, and if not, to help anticipate the implications for the effect of a change in algorithms.
A related set of concerns, especially given the market failures which preceded the organization of the NRMP, and which persist today in other markets, is with potential participants who either withdraw from the match or fail to participate in other ways, and also with those who make or contemplate early agreements which are subsequently formalized through the match. It will be especially valuable to understand what motivates such participants and what happens to them, in order to anticipate what effect if any a change in the match would have on this kind of behavior.
A. Surveys of match participants
Surveys of match participants can be helpful in assessing how the submitted rank order lists are formed. This is at least a two part process, consisting first of the interviewing process, in which (true) preferences are formed, and then of the process by which participants formulate their rank order lists. What is necessary is to formulate an assessment of whether either of these would change if a different algorithm were used. Surveys can also help assess how important are some of the effects observed in the analyses discussed above.
For example, some questions for students might look as follows:
How many interviews did you go on? (and how many programs did you list on your rank order list?)
Is there any place you interviewed but did not list, at which you would be willing to work if unmatched?
Were there any programs you liked a lot but didn't rank highly only because you felt they were unlikely to rank you highly?
Which would you prefer: having your 2nd choice for sure, or a 50-50 chance of your 1st or 3rd choice? (The parallel questions for program directors would also have to explore preferences over class makeup: e.g. Which would you prefer, having your first and fourth choice student, or your second and third choice?)
Did any program directors behave in ways that you felt brought undue pressure on you... (e.g. by suggesting that they would rank you highly if you ranked them highly...)?
Did you engage in any discussions having to do with the possibility of reaching an early agreement?
While there are limits on what can be reliably learned from survey questions of these kinds, they will be most informative if they are answered very shortly after students have filled out their rank order lists. Two possible venues for such questions therefore suggest themselves; they could be incorporated either into the submission process for preference lists, or into the AAMC graduation survey. Without knowing the feasibility of either of these options I have an initial preference for the graduation survey, rather than the rank order list submission process, in order to make clear the independence of the two activities, and to avoid changing either the answers to the questions or the submitted rank order lists by having the two activities confounded.
B. Assessing behavior of program directors
As far as I am aware, little is known about the behavior of program directors, and how they select interview candidates and compile rank order lists. However their behavior, and how it may change if the NRMP algorithm is changed, will obviously be important in determining the effect of a change.
To pick just one example, if a student-proposing algorithm gives program directors incentives to submit shorter lists than they now do, and if directors respond to those incentives after a change in algorithms, then the benefit to students from having a student proposing algorithm might be dissipated. (Of course the computational experiments will give some indication of the potential magnitudes of these countervailing pressures.)
Assessing director behavior will be complicated by the fact that directors have many directions in which to act, starting for example with the choice of which candidates to interview. Would a change which increased the variance of matches received by residency programs cause them to concentrate their interviewing more on some kinds of students? Would it cause directors to take into account, when formulating their rank order lists, their assessment of how highly students would rank them?
In the past when I have begun to study behavior in similarly complex environments (such as the market for clinical psychologists 9) I have found it useful to begin with site visits at which I could observe relevant decisions being made (particularly when made by committees), or interview the decision makers. This doesn't give a scientific sample, but it allows me to become oriented to the relevant issues, and thus guides the direction of inquiry. If feasible, this option should be considered.
In general, in anticipating how participants may adjust their behavior to compensate for changes in the match, it is good to keep in mind the economic maxim that it is hard to constrain the actions of actors who have many options. (This is why it is easier to raise tax revenue from wage income than from business or investment income...) And it is for this reason that it may be important to more fully understand the range of options available to the directors of residency programs.
C. Assessing withdrawals and early agreements, and the potential of renewed market failure
Early agreements are a special form of student and director behavior, but I separate them out here to emphasize that not only do they require mutual decisions by both students and directors, but also that they are potential leading indicators of the kind of market failure which the match was initially organized to avoid. That is not to say that early agreements between willing partners are necessarily a bad thing, but rather that if too many agreements are made early, the desirablility of waiting for the match is diminished, and there is a danger of reigniting an unravelling of agreement dates which could lead to market failure with high costs to all involved.
The first priority should be to assess the extent to which early agreements form a part of the market in recent years. Two likely traces that early agreements will leave in the match data are withdrawals from the match, and singleton rank order lists (by students) which result in matches. In particular, we can ask the following questions:
What percentage of participants who withdraw from the match (or who do not participate at all) obtain positions prior to the match?
What percentage of student rank order lists consist only of a single residency program, and what percentage of the students submitting such lists obtain the indicated match?
The answer to the first question may be contained in the followup surveys conducted by the AAMC, while the second question can be recovered from the previous years' matches. These answers will, among other things, provide a benchmark against which the market can be monitored in the future, both to assess the effect of any change made in the algorithm, and also to assess the effect of changing market conditions.
III. The effects of changes in market demographics
The answers to each of the questions discussed in this proposal may potentially be sensitive to changes in the supply of and demand for different kinds of positions. Furthermore, these supplies and demands have changed over time, and can be expected to continue to change, particularly in view of the current political discussion of how medical care should be financed.
Consequently, to anticipate which answers are sensitive to the anticipated changes, it will be useful to examine all of the above questions with respect to the changing supplies and demands which can be identified for specific specialties, and for the market as a whole, in the historical record.
IV. Monitoring and evaluating changes--"clinical trials"
It will come as no surprise to an audience of physicians that a research program for a proposed change of treatments must include a component of clinical trials, since not all implications of a change--side effects, if you will--can be anticipated or will become apparent even in the short term. One difficulty of designing such trials, as with any experimental design, is to control out the effects of extraneous factors, so that the treatment effect can be discerned in isolation.
Since we can anticipate continuing changes in how medical care is financed, with corresponding changes in numbers of positions, distributions across specialties, etc., it will be important when monitoring a change in algorithms to factor out these effects. For example, one way to approach this would be to consider the possibility that, for several years, one algorithm would be used in even numbered years and the other in odd numbered years. The results, pooled over years, would allow the general trend in the medical marketplace to be factored out from the treatment effect.
In case this is not feasible, or even if it is, it has also proved possible in recent years to conduct economic pilot experiments, directed at well focused behavioral questions, in the laboratory 10. Such experiments can be thought of as analogues to the engineering practice of building a scale model (e.g. to test in a wind tunnel) before investing in a full scale trial.
It would unnecessarily extend the scope of the present proposal to speculate now what kinds of pilot experiments or clinical trials may turn out to be appropriate. However it should be emphasized and kept in mind that there is a natural limit to what can be learned, before a change in algorithms, about partipant behavior after such a change.
In closing, I recommend that, regardless of what decisions are eventually taken about a change in algorithms, a systematic attempt to monitor early agreements and related behavior be undertaken on a regular basis, in order to keep track of the health of the market, which is influenced by many factors (such as participant confidence, and market demographics) beyond direct control.
Preamble: Following the October 11 conference call with NRMP, I have prepared this proposal from the point of view of an outside research director, with the understanding that much of the work to be done would be accomplished by NMS and NRMP. Thus in the tasks outlined below, I anticipate that my role would be limited to design, supervision, and evaluation.
I. NMS--provides documentation on current NRMP algorithm; particularly with regard to the handling of match variations
II. Roth & NMS--analyse current algorithm (incentives and stability properties; to provide bases of comparison)
III. Roth & NMS--design student proposing versions
IV. NMS codes current and new algorithms in a form suitable to running computational experiments (this overlaps with previous task)
V. Roth--designs computational experiments, and measures of comparison.
VI. NMS--runs computational experiments .
VII. Roth & NMS--evaluate results and run new computational experiments, as needed.
VIII. Roth & NRMP (or other relevant organizations)--design and administer, and analyse behavioral instruments; surveys, interviews, site visits, etc.).
IX. Roth--report on outcomes of study, and evaluate possible impact of alternative policy options (including implementation recommendations involving continued monitoring of market and possible "clinical trials").
Tentative Timeline:
It is always very difficult to predict how long tasks of this kind will take, as their difficulty only becomes apparent when they are attempted. But some plausible time frames to think about are as follows.
Investigating the implications of the way the current NRMP algorithm handles match variations (Tasks I and II) can be begun and hopefully completed in the next two or three months. Starting in February I anticipate that NMS will be fully occupied with the 1996 match, but presumably a good deal of progress can be made on the design of an alternative algorith (task III) in the Spring, at least to the point of revealing any unexpected difficulties. If no such difficulties are encountered, computational experiments, tasks IV-VII, could be run over the summer, with their evaluation perhaps extending into the Fall.
The behavioral parts of the investigation (task VIII) need to be conducted contemporaneously with a match. Given the lateness of the hour before the 1996 match, I anticipate that surveys, etc., probably cannot be administered before the 1997 match, by which time results of the computational experiments should be available to help in the design of the survey instruments. If site visits to residency directors are contemplated, it might be wise to schedule them during the 1996 match, in order that the results of such visits could inform the development of other measures.
[Click on note numbers to return to the text at the note]
1.See Roth, A.E. and X. Xing "Jumping the Gun: Imperfections and Institutions Related to the Timing of Market Transactions," American Economic Review, 84, September, 1994, 992-1044.
2.Hippocrates [approx. 400 B.C.] Of The Epidemics, Book I, Section II, Translated by Francis Adams.
3.See e.g. Roth, A.E. "New Physicians: A Natural Experiment in Market Organization," Science, 250, 1990, 1524-1528.
4.See Roth, A.E. "The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory", Journal of Political Economy, Vol. 92, 1984, 991-1016.
5.Subsequent to the study reported in Roth (1984).
6.See Chapter 6 of Roth, A.E. and M. Sotomayor Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis, Econometric Society Monograph Series, Cambridge University Press, 1990. (Winner of the 1990 Lanchester Prize.) Paperback edition, 1992.
7.See particularly chapters 4 and 5 of Roth and Sotomayor (1990).
8.In particular, it will be important to note if any of the match variations make a participant's match depend in any way on the part of the rank order list below the match received.
9.See Roth and Xing (1995).
10.See Kagel, J.H. and A.E. Roth (editors) Handbook of Experimental Economics, Princeton University Press, 1995.
thanks,
A.R.