Design of applicant proposing matching algorithm; comparison with existing NRMP algorithm

Report on the design and testing of an applicant proposing matching algorithm, and comparison with the existing NRMP algorithm

October 25, 1996
Revised December 2, 1996

Acknowledgements: The coding and implementation of the applicant proposing algorithm described here, as well as of all the computational experiments on NRMP match data, were performed by National Matching Services Inc. The work discussed in Section VII is not part of the study commissioned by the NRMP, but rather is part of the academic investigation of theoretical issues motivated by the study; computational support for that investigation was provided by Aljosa Feldin at the University of Pittsburgh.

For background information for this study, see Design Review of the National Resident Matching Program

Table of Contents:

Introduction and Executive Summary
Background to the present study
1. Stable matchings in simple and complex matching markets
2. The Current NRMP algorithm:
Design of the applicant-proposing algorithm
1. The conceptual design
2. Sequencing questions and implementation decisions
Descriptive statistics and original NRMP Match Results
1. The NRMP in the years 1987, and 1993-6
2. Specialty matches: Thoracic Surgery in the years 1991-94 and 1996
Differences in the matches produced by the two algorithms
1. The NRMP
2. Thoracic Surgery
Differences in sensitivity to participant behavior
1. Strategic behavior in simple and complex matching markets
2. Experiments to determine upper bounds for profitable strategic behavior
1. Preliminary experiments: Truncation of ROL's at the match point
2. Experiments to determine upper bounds
1. Results for the NRMP
2. Results for Thoracic surgery
Why the differences are small: Insights from the theory of simple markets
Directions for further investigation: participant behavior
Concluding remarks
Selected Bibliography

Appendices:

Results of the computational experiments concerned with sequencing
Results of the computational experiments concerned with truncation of ROL's and capacity reductions

I. Introduction and Executive Summary

In the Fall of 1995 the Board of Directors of the NRMP commissioned the design of an alternative (applicant-proposing) algorithm for conducting the annual match, and a study to compare it with the current NRMP algorithm. The comparisons were to focus both on how many applicants and residency programs could be expected to receive more or less preferred matches under the two algorithms, and on how the different algorithms might influence the opportunity or need for strategic behavior by applicants and programs. Closely related issues are what advice can be given to participants in the match when it is conducted with one or the other of the algorithms, and what kinds of changes in the behavior of match participants might be anticipated if the matching algorithm is changed.

These questions are at the heart of a controversy in the medical community which took its clearest form in an exchange in the June 1995 issue of Academic Medicine, followed by position papers put out by the American Medical Students Association and others. Much of that discussion was oriented around results in the game theory literature concerning "two- sided matching markets," of which the NRMP is one. However most of the theoretical results concern simpler markets, in which there are no linkages between applicants, or between positions in different programs. The NRMP involves a more complex match, primarily because of the presence in the applicant pool of married couples (who seek two positions close to one another), of applicants who seek PGY2 positions in the match and have supplemental lists which must be consulted to match them to PGY1 positions, and because some residency programs have positions which revert to other programs if they remain unfilled. These linkages can be shown to allow situations in which some of the conclusions reached about simpler markets no longer apply. That is, the "match variations" which add complexity to the NRMP imply that the existing theory for simpler markets will provide only an approximate guide to the NRMP itself.

It was therefore necessary, in designing both the applicant- proposing algorithm and in making comparisons to the existing algorithm, to first conduct computational experiments to determine the extent to which the predictions of the theory of simple matching markets applied to the NRMP. These computational experiments, as well as those employed to compare the two algorithms, were conducted on the Rank Order Lists submitted by all applicants and residency programs in previous matches. Specifically, the computational experiments and comparisons were conducted on the data from the four most recent matches (1993, 1994, 1995, and 1996) and from the 1987 match. The recent matches were selected in order to have data which reflected contemporary patterns of preferences among applicants and residency programs, and 1987 was selected to provide a baseline comparison over a longer period, and specifically because it had the lowest rate of unmatched US seniors in the available dataset (6.0%, as opposed to the historically high rate of 7.5% for 1996).

A number of specialty matches are also run under the auspices of the NRMP, and these are largely free of the match variations which add complexity to the general resident match. The existing theory therefore provides accurate predictions about the nature and direction of changes to be anticipated in these matches if the current NRMP algorithm is replaced by the applicant proposing algorithm. However the theory offers little guide to the magnitude of the changes to be expected, and for this purpose computational experiments on the data of past matches were also needed. To date these have been conducted for the Thoracic Surgery match, for the five years 1991-1994 and 1996.

The design of the applicant-proposing algorithm and the comparisons of the two algorithms will be discussed in detail in the body of the report. But the general conclusions can be summarized by noting that, both for the NRMP and the specialty matches, the effects of changing from the current algorithm to the newly designed applicant-proposing algorithm are in the directions predicted by the theory for simple markets, but the size of these changes is very small. Specifically, we can say the following.

1. Differences in the matchings produced by the two algorithms are very small.

In the five previous matches we examined, the percentage of applicants who submitted rank order lists and would have received a different match from the two algorithms varied from a low of 0.061% in 1995 to a high of 0.099% in 1987--i.e. only about one in a thousand applicants would have received a different match. (Of the 111,026 applicants who submitted rank order lists in those five years, 91 would have been affected by a change in the algorithm.)

It appears that the changes will be similarly small in the specialty matches: of the 883 applicants who participated in the Thoracic surgery match from 1991-1994 and 1996, only 4 (0.45%) would have been matched differently by the applicant proposing algorithm.

2. Most (but not all) of the few applicants who are matched to different positions by the two algorithms do better when the applicant proposing algorithm is used; and vice versa for programs.

Of the 91 applicants who would have received different positions from the two algorithms in the five NRMP matches studied, 65 (71%) received preferred positions from the applicant-proposing algorithm. (This is a case where the match variations matter--in a simple match, like the Thoracic surgery specialty match, all of the affected applicants do better under the applicant proposing algorithm. But the linkages in the NRMP mean that sometimes when an applicant gets a better position, another applicant is displaced and matched to a lower ranked position.)

3. Opportunities for profitable strategic behavior are very rare for both applicants and programs under either algorithm, and the differences between the two algorithms are small.

For example, for the five years studied, fewer than one in a thousand applicants could have profited from shortening their ROL's in the matches made under the current NRMP algorithm, and even fewer applicants, if any, could have profited from shortening their ROL's if the applicant-proposing algorithm had been in use. (That is, most shortenings of ROL's would have had no effect, fewer than 1 in 1,000 applicants could potentially have improved their match by optimally shortening their ROL s, but more than 999 out of 1,000 would potentially have hurt themselves.) Similarly, while more programs could potentially improve their matches under the applicant-proposing algorithm than under the current NRMP algorithm by shortening their ROL's, far more would potentially hurt themselves (more than 99 out of 100) than help themselves by doing so. Similarly, only a comparably small fraction of programs could potentially improve some of their matches by withholding some positions from the match. So it seems safe, under either algorithm, to advise both applicants and programs that trying to get a preferred match by behaving strategically is far more likely to harm them than to help them.

Overall, the current NRMP algorithm and the newly designed applicant-proposing algorithm perform very similarly. Both algorithms can accommodate the "match variations" which distinguish this market from others, and both algorithms make it sensible for applicants and residency programs to choose their Rank Order Lists based simply on their preferences for possible matches, and not in anticipation of favorably influencing the mechanics of the match. The choice of algorithms will systematically effect the matches of a very small group of applicants (approximately one tenth of one percent). In the absence of any difference in the confidence with which applicants and programs participate in the match, and of any effect that this has on the rank order lists they submit, the choice of algorithms by itself is unlikely to have any other systematic effects.

II. Background to the present study

A. Stable matchings in simple and complex matching markets

Perhaps the most important and least controversial finding about centralized matching algorithms is that they are mostly unsuccessful unless the matchings they produce are stable. Briefly, a matching between residents and residency programs is stable if there are no applicant-program pairs such that the applicant prefers the program to his/her current match, and the program also prefers the applicant to one of its current matches. The current NRMP algorithm produces stable matches, as do the algorithms used in some regions of the British National Health Service. However, for example, a number of the regions of the British National Health Service attempted to use algorithms which produced unstable matchings, and these have mostly failed and been abandoned. So the present study, and the controversy which preceded it, focuses on choices among algorithms which produce stable matchings. The reason for the controversy is that there can be systematic differences among stable matchings.

Stable matching in entry level professional labor markets is a subject which has received a good deal of attention by game theorists (both in the economics and mathematics literature). Most of the literature concentrates on markets which are simpler than the NRMP, in the sense that there are no linkages between applicants for positions, or between positions offered by different programs. For such markets, a great deal is known about stable matchings, and the properties of algorithms which produce them. Some of the salient facts are the following.

1. In simple matching markets, at least one stable matching can always be found, no matter what Rank Order Lists are submitted by residents and programs.

2. For given ROL's, there can be more than one stable matching. In a simple matching market the set of stable matchings always contains a "program optimal" stable matching, and an "applicant optimal" stable matching. The program-optimal matching assigns to each residency program its highest ranked applicants among those it can be assigned at any stable matching, and assigns to each applicant the lowest ranked program he or she can be matched to at any stable matching. The applicant-optimal stable matching matches each applicant to the highest ranked program he or she can be assigned at any stable matching, and matches each program to its lowest ranked applicants among those it can be assigned at any stable matching. When only one stable matching exists, the program-optimal and applicant-optimal matchings are the same.

3. In simple markets the hospital optimal stable matching is produced by a "hospital proposing" algorithm, which operates by having residency programs propose (make offers) to applicants, starting from the top of the program's rank order list, and allowing applicants to hold at any point in the algorithm the most preferred offer (on their ROL) among those so far received. The applicant proposing stable matching is produced by an "applicant proposing" algorithm, in which applicants propose to programs, starting from the top of the applicant's ROL, and allowing programs to hold their most preferred proposals (up to the number of positions available).

4. In simple markets, the same applicants are matched and the same positions are filled at every stable matching. (That is, any applicant who is unmatched at one stable matching is unmatched at every stable matching, and the positions which are unfilled are the same at every stable matching.) Only the specific assignment of which matched applicants are in which filled positions differs between different stable matchings.

5. In simple markets, when the applicant proposing algorithm is used, but not when the hospital proposing algorithm is used, no applicant can possibly improve his match by submitting an ROL that is different from his true preferences. (No parallel assertion can be made about residency programs which have more than one position.)

What makes the NRMP different from a simple market is that it has match variations of two kinds: variations which cause two positions to be linked to one another, and variations which cause the number of positions in a given program to change. In the first category of variations are couples, who submit rank orders of pairs of programs and must be matched to two positions; and applicants who match to 2nd year positions and have supplemental lists which must then be consulted to match them to 1st year positions. In the second category are requests by residency programs to have an even or an odd number of matches, and reversions of unfilled positions from one program to another.

As mentioned in the introduction, when these match variations are present, none of the five numbered assertions above will any longer be true for all possible ROL's. Theoretical examples of markets with match variations can be constructed for which: 1. no stable matchings exist; 2. No optimal stable matchings for either side exist (even when stable matchings exist) and therefore; 3. No algorithm always produces an "optimal" stable matching for either side of the market. Furthermore; 4. Different stable matchings may have different applicants matched and different positions filled; and 5. No algorithm can always give applicants the guarantee that stating their true preferences is an optimal strategy.

So a major focus of this study was to assess the extent to which these theoretical possibilities play a role in the actual NRMP matches. In the course of this report it will become clear that, while it has always proved possible to find multiple stable matchings in the previous years' NRMP matches (a stable matching has been found in every match at least since the mid 1970's), it appears that no stable matching is precisely hospital-optimal or applicant-optimal in any of the years we have examined. However we will see that applicant-proposing and program-proposing algorithms continue to perform approximately as in the case of simple markets.

B. The Current NRMP algorithm:

As discussed in the first Interim Report of this study (Roth 1996a), the current NRMP algorithm is primarily, but not entirely, a hospital-proposing algorithm. It deals with match variations through a three-phase process. The first phase produces an initial match by ignoring most match variations, using the program-proposing deferred acceptance algorithm, modified to let couples hold on to many offers until a late stage in the algorithm. The match produced in this way will in general not be stable (because of the way it handles couples, and because the other match variations are ignored), so the second phase of the program identifies potential instabilities. The third phase of the program uses an instability-chain algorithm to fix these instabilities and produce a stable match. The subroutines which implement this third phase do not always have hospitals proposing. Instead, couples propose in part of the algorithm designed to fix instabilities due to couples, and applicants also propose in part of the algorithm which fixes instabilities due to supplementary (PGY-1) positions. Thus the current NRMP algorithm is a hybrid: it is program-proposing in its first phase (which performs the bulk of the matching), and applicant-proposing in some parts of its third phase.

The NRMP specialty matches like Thoracic surgery are run using a microcomputer based system, which uses an algorithm that is technically a little different than the original NRMP mainframe algorithm (it does not handle some of the NRMP match variations such as supplemental lists, and it is organized in a single phase). In fact, without supplemental lists, it is essentially the same as one of the intermediary algorithms we developed for this study in the transition from the current NRMP algorithm to the applicant proposing algorithm. When no match variations are present, the specialty match algorithm, current NRMP mainframe algorithm and the intermediary algorithm developed for this study are all functionally equivalent in that they all produce the program-optimal result. (In fact, as a check, we ran the data for each year through both the current NRMP mainframe algorithm and the intermediary algorithm, and the results were in fact identical, and identical to the results obtained in the live match with the original microcomputer algorithm--so in a match without match variations, all these algorithms perform as expected.)

III. Design of the applicant-proposing algorithm

The process by which the applicant-proposing algorithm was designed is roughly as follows. First, a conceptual design was formulated and circulated for comment (in Interim Report #1, Roth 1996a). In order for this to be coded into a working algorithm, a number of choices had to be made, chiefly concerning the sequence in which operations would be conducted. Most of these decisions can be shown to have no effect on the outcome of simple matches, but could potentially effect the outcome when the NRMP match variations are present. Consequently, we performed computational experiments before making sequencing choices. In what follows, we first present the conceptual design (from Roth 1996a), and then discuss the sequencing experiments and implementation decisions.

A. The conceptual design

The algorithm described here is based on the general design of phase 3 of the current NRMP algorithm, and on the discussion of instability-chaining algorithms (which find stable matchings by fixing applicant-program instabilities one at a time) in Roth and Vande Vate (1990).

The object of the algorithm is to produce a stable matching, by assembling a set A(k) of residency programs and applicants and a tentative matching M(k) with the property that there are no instabilities within the set A(k), and no applicant or program in A(k) is matched to anyone outside of A(k). When the set A(k) has grown to include all applicants and programs, the resulting match is stable, and the algorithm stops.

In the applicant-proposing algorithm, the initial set, A(0), consists of all positions offered in the match, and the initial tentative matching has all positions vacant. The algorithm begins by selecting an applicant S(1) from the set of applicants in the match and adding S(1) to A(0) to make the new set A(1).

At any step k of the algorithm, at which a new applicant S(k) has just been added to form the set A(k), the new tentative matching M(k) is formed as follows. First, applicant S(k) (= S(k,1)) proposes down his/her Rank Order List (of programs which also rank S(k)), from the top, until the first program is reached which either has a vacant position or which prefers S(k) to its least preferred current tentative match. In the latter case, this least preferred applicant, S(k,2) is now rejected by the program in question, and this applicant now proposes down her ROL in a similar way, etc. Each applicant S(k,n) displaced in this way similarly proposes down her ROL.

At some point in this process, an applicant S(k,n) may be displaced who is a member of a couple, or who is displaced from a primary position for which she also holds a supplementary position. In either case, a second position now potentially becomes vacant, as the spouse of S(k,n) is withdrawn from his tentative match, or as S(k,n) is withdrawn from her supplementary match. In either case, the program whose position is left vacant, P(k,n), is added to a "program stack" to be held for later processing. If S(k,n) is a couple, then both couple members (S(k,n,a) and S(k,n,b)) now propose down their joint ROL of pairs of programs, and they each may displace another applicant. Also, if any S(k,n) has a supplementary ROL associated with her new tentative match, she proposes down it as well, which may also result in the displacement of another applicant. So both couples and supplementary matches may simultaneously displace more than one applicant. One displaced applicant is processed immediately and any others are added to an "applicant stack" for later processing.

Applicants propose down their ROL's in this way until the applicant stack is empty. (Applicants continue throughout to be able to propose to programs which may be on the program stack.) A residency program is then selected from the program stack, and all of the applicants in A(k) with whom it might form instabilities--i.e. all of the applicants in A(k) who are preferred by the program to its least preferred current tentative match and who prefer this program to their current match--are added to the applicant stack, which is processed as before, with applicants proposing down their ROL, from the top.

When both the applicant and program stacks are empty, the tentative matching thus produced is M(k): no instabilities for M(k) are contained in the set A(k), and no applicant or program in A(k) is matched by M(k) to anyone outside of A(k). The algorithm is now ready to pick a new applicant S(k+1), and start the process again, for the set A(k+1).

When all applicants have been included in the set A(k), even- odd requests and program reversions are adjusted, which causes additions to the applicant and program stacks, which are handled as above. When these stacks are empty, the algorithm stops, and the last tentative match becomes final. In the match with no match variations, the applicant and position stacks would always become empty, and the final match would be the applicant optimal stable matching.

When the match variations are present, there is a possibility that at some stages of the algorithm the position stacks would never become empty--i.e. a cycle would occur, in which the same positions reappeared on the stack. So a "loop detector" needs to be added to each stage k. Every loop must involve a position becoming unmatched and made vacant either because a couple or a supplementary assignment has been withdrawn from the position. So a loop detector can work by keeping a log of when positions become unmatched in these ways--i.e. recording which applicant is unmatched from which position, during the processing of some step A(k). If the same pairs appear multiple times, a loop is in progress. How to proceed at this point may depend on the nature of the loop. (It is observed in Roth and Vande Vate (1990) that certain kinds of inessential loops can be rendered harmless by randomizing the order in which applicants and positions are processed from the stacks. Loops due to the non-existence of a stable matching would be more serious, but the prior experience of the NRMP suggests that these may be extremely rare.)

B. Sequencing questions and implementation decisions

In a simple match, without the NRMP match variations, the applicant-proposing algorithm always produces the applicant- optimal stable match and the program-proposing algorithm always produces the program-optimal stable match, regardless of the order in which proposals are processed within the algorithm. One consequence of the fact that these optimal stable matches do not exist in general when the match variations are present is that the order in which applicants and programs are processed may have an effect on the match produced. Thus the sequence in which applicants and programs are processed at various points in the algorithm needs to be considered as part of the design of the applicant-proposing algorithm (and as part of the evaluation of the current NRMP algorithm).

Two issues were considered in conducting and evaluating experiments related to the sequencing of operations in the algorithms.

Do sequencing differences cause substantial or predictable changes in the match result (e.g. do applicants or programs selected first do better or worse than their counterparts selected later)?
Does the sequence of processing affect the likelihood that an algorithm will produce a stable matching? (In connection with this latter point, recall that instability-chain algorithms can cycle --even when stable matchings exist and certainly when they do not. So one objective was to consider how sequencing decisions might influence the frequency of "loops" occurring in the algorithm.)

Experiments to test the effect of sequencing were conducted using data from three NRMP matches: 1993, 1994, and 1995.

1. Sequencing experiments on the current NRMP algorithm

In the current NRMP algorithm programs are processed in ascending sequence by 6 digit program code number. To test the sensitivity of the results to this sequencing, computational experiments were run on the ROL data in which this sequencing was reversed, i.e. programs were processed in descending order by program code number. As expected, the results showed differences, but the differences were small: the largest difference was in 1994 when only 4 out of 3,662 programs which submitted ROL's received a different match under the alternative ordering, as did 4 out of 22,353 applicants. Not only are these differences very small, they do not appear to be systematic. (A fuller account of the results of these experiments appears in Appendix A.)

The current NRMP algorithm was also investigated for its sensitivity to the sequence in which reversions are processed. Rather than simply changing the order in which reversions occurred, the experiments involved setting the program quotas input to the match processing to be the final, post match quotas obtained from the original results produced by the current NRMP algorithm. All further reversion processing was then eliminated. These experiments then provided an indication of the differences caused, not only by changing the order of reversions, but also by altering when reversions enter into the match processing (i.e. all required reversions were assumed to take place simultaneously, at the beginning of match processing). No more than 2 programs or applicants were observed to be affected by such changes in the three years 1993-5 (see Appendix A).

Finally, it should be noted that no loops were detected in any of these experiments on the current NRMP algorithm. Consequently, despite the presence of match variations, sequencing does not appear to play a significant role in the operation of the current NRMP algorithm.

2. Sequencing experiments on the applicant-proposing algorithm

Experiments were conducted to measure the impact of

1. The sequence in which applicants are admitted to the algorithm for processing (experiments were run to compare processing applicants in ascending applicant code sequence and in descending applicant code sequence).

2. The sequence in which couples are processed relative to other applicants (experiments were run to compare the processing of all couples before all single applicants, and after all single applicants).

3. The sequence in which applicants ranked by a program are processed when attempting to fill a program that has been selected from the program stack (experiments were run to compare processing applicants in ascending program rank number sequence, and in descending program rank number sequence--i.e. when processing match variations, the first applicant to propose when a residency program is pulled from the "stack" could be the program's first choice, or its last choice).

To understand the results of the computational experiments (which are tabulated in detail in Appendix A) it is useful to compare the outcomes from each experiment to those from a fixed baseline. We chose as a baseline an applicant proposing algorithm in which applicants were processed in ascending order by their applicant codes, regardless of whether they were single or members of couples. (In all cases, when a member of a couple was processed, so was the other member. When applicants were processed in ascending code order a couple was selected for processing based on the code number of the spouse with the lower applicant code.) When a program was selected from the program stack, applicants were processed in ascending sequence by program rank number. All of these experiments were carried out on the ROL data from the NRMP matches in 1993, 1994, and 1995.

The experiments were conducted in a partial factorial design. The handling of couples had 3 treatments (couples intermixed with singles, couples first, couples last), the order of introducing applicants into the match had 2 treatments (ascending order by applicant code, or descending order), and the order of processing applicants after a match variation causes a program to be pulled from the stack had 2 treatments (ascending order by program rank, or descending order). The results are that none of these sequencing decisions had a large or a systematic effect on the matching which resulted. In the majority of cases the match was the same as the baseline case (in the majority of the remaining cases only 2 applicants received different matches, and the maximum number of applicants affected was 12 out of 22,937, which occurred when a couple received a worse match and initiated a chain of displacements).

However, there was an effect of sequencing on the internal processing of the algorithm. The number of loops encountered was fewest when couples were introduced to the match after single applicants. This is not too surprising in view of the fact that no loops would occur in the absence of match variations. The results indicate that loops are least likely to occur when the couples are introduced into the larger market with some tentative matches already assembled, as opposed e.g. to when couples enter first, so that the initial tentative matches involve only couples. Introducing couples last reduces the numbers of loops (and hence the potential that in some future match it would be difficult to find a stable matching) without changing the prospects of couples or single applicants in the match.

Finally, experiments related to the sequence in which reversions are processed were performed on an applicant proposing algorithm, similar to those performed on the current NRMP algorithm as discussed previously. Again, no substantial changes were induced by changing the order in which reversions were handled; no changes at all resulted in the 1993 match, and only 2 applicants and programs were affected in the 1994 and 1995 matches (see Appendix A). Thus for both the current NRMP algorithm and the applicant proposing algorithm, there is no significant difference between the results obtained with reversion processing and the results obtained by setting the quotas to the final quotas after reversions and eliminating further reversion processing. (This point simplifies the design of some of the experiments to compare the two algorithms, in connection with strategic behavior by residency programs, to be discussed later in this report.)

3. Implementation decisions

Based on the sequencing experiments described above, the following decisions were made pertaining to the design of the applicant proposing algorithm for the NRMP:

1. All single applicants are admitted to the algorithm for processing before any couples are admitted.

2. Single applicants are admitted for processing in ascending sequence by applicant code.

3. Couples are admitted for processing in ascending sequence by the lower of the two applicant codes of the couple.

4. When a program is selected from the program stack for processing, the applicants ranked by the program are processed in ascending order by program rank number.

Note that we did not at any point choose to randomize the processing order (randomization was shown in Roth and Vande Vate 1990 to allow the algorithm to escape from certain kinds of loops). The reason is that loops do not appear to be a problem with the processing sequences selected, and it was felt that a desirable feature of the match is that it should be precisely reproducible from the ROL data.

IV. Descriptive statistics and Original NRMP Match Results

A. The NRMP in the years 1987, and 1993-6

Table 1 gives the descriptive statistics of the NRMP match in the five years we consider. Notice that in each year, a substantial number of the more than twenty thousand applicants who participate do so in ways which utilize the match variations which the NRMP allows--about 4% participate as couples, and 8-12% submit supplemental Rank Order Lists. And in the 1990's about 7% of the three to four thousand programs which participate in each year have positions which revert to other programs if they remain unfilled (accounting for almost 6% of the total quota of positions). So the match variations are not an insignificant part of the match.

**Table 1a: Descriptive Statistics and Original NRMP Match Results**
	1987	1993	1994	1995	1996
APPLICANTS
Participating (Active) ROL Returned
Primary ROL's	20071	20916	22353	22937	24749
Applicants with Supplemental ROL's	1572	2515	2312	2098	2436
Results
Primary Matches	16117	17209	17725	18170	18316
Supplemental Matches	577	1294	1152	990	725
Couples
Applicants Who Are Coupled	694	854	892	998	1008
Coupled Applicants Who Matched	646	794	817	899	912
PROGRAMS
Active Programs	3225	3677	3715	3800	3830
Active Programs with ROL Returned	3170	3622	3622	3745	3758
Potential Reversions of Unfilled Positions
Programs Specifying Reversion	69	247	276	285	282
Positions to be Reverted if Unfilled	225	1329	1467	1291	1272
Programs Requesting Even/Odd Matching	4	2	6	7	8
Quotas¹
Total Quota Before Match	19973	22737	22801	22806	22578
Changes During Match Processing
Quota Increases: Programs	22	120	143	124	130
Quota Increases: Positions	45	357	357	327	336
Quota Decreases: Programs	23	127	142	128	138
Quota Decreases: Positions	46	338	338	303	326
Total Quota After Match (Final Quota)	19972	22756	22820	22830	22588
Results
Positions Filled	16694	18503	18877	19160	19041
Positions Unfilled	3278	4253	3943	3670	3547
Programs Filled	2100	2309	2440	2599	2589

¹Quotas include positions in active programs with no ROL returned. Changes during the match are caused primarily by reversions. In some cases, 1 position is reverted simultaneously to 2 programs, causing a net increase in the number of positions offered. In addition, a few positions may be dropped from the match during processing to accommodate requests for even/odd matching.

B. Specialty matches: Thoracic Surgery in the years 1991-94 and 1996

In contrast, the Thoracic surgery match is a simple match, with no match variations. Its basic descriptive statistics and original match results are given below.

**Table 1b: Descriptive Statistics and Original Thoracic Surgery Match Results**
	1991	1992	1993	1994	1995
Applicant ROL's	127	183	200	197	176
Active Programs	67	89	91	93	92
Program ROL's	62	86	90	93	92
Total Quota	93	132	141	146	143
Positions Filled	79	123	136	140	132

V. Differences in the matches produced by the two algorithms

A. The NRMP

As discussed in the first section of this report, the current NRMP algorithm and the newly designed applicant proposing algorithm were compared by comparing the matches that they produce for the ROL's submitted in 1987 and 1993-1996. Table 2 gives the results of these comparisons. The first half of the table concentrates on the comparisons from the point of view of applicants, the second half from the point of view of programs.

As noted in the introduction, only about 0.1% of applicants are affected by the change in algorithms, and of these most prefer the match they receive under the applicant proposing algorithm. Note that in two of the five years the number of applicants matched changed by one (one fewer in 1987, one more in 1996). Recall that in a simple match a change from one stable matching to another would never change the number of applicants matched; so here is another case in which the match variations cause a difference, but a difference which turns out to be very small and unsystematic.

Equally few programs are affected by the change of algorithms- -and these constitute about 0.5% of all programs. Most but not all of the programs prefer the match they receive under the current NRMP algorithm, but in 1994 and 1996 slightly more programs would even have preferred the applicant proposing algorithm to the current NRMP algorithm. Most programs which receive a different match have only one applicant different between the matches produced by the two algorithms. The majority of differences have to do with filling a position with a different applicant; only a small number of positions move from being filled to unfilled or vice versa. Again, this is a consequence of the match variations; as already noted in the case of applicants, it turns out to be both very rare and unsystematic.

It may be helpful to close with an example of precisely how the match variations can cause a deviation from the predictions of the theory for simple markets; for example how it can be that a few applicants do worse with the applicant proposing algorithm than with the program proposing algorithm. For example, if switching to the applicant proposing algorithm causes applicant A to improve his match from his 2nd to 1st choice, it may be that the 1st choice now requires a supplemental match that was not required before. If this new supplemental match displaces a previously matched but less preferred applicant in a program, that displaced applicant is forced to go farther down his/her list (i.e. does worse). Furthermore, matching that applicant may displace another applicant, who may displace another, and so on, causing a chain of applicants who do worse (even though, as expected of the applicant-proposing algorithm, this chain of events began with an applicant who did better than he would have had the program-proposing algorithm been used).

**Table 2: Comparison of Results Between Original NRMP Algorithm and Applicant Proposing Algorithm**
	1987	1993	1994	1995	1996
Applicants
Number of Applicants Affected	20	16	20	14	21
Applicant Proposing Result Preferred	12	16	11	14	12
Current NRMP Result Preferred	8	0	9	0	9
U.S. Applicants Affected	17	9	17	12	18
I.A. Applicants Affected	3	7	3	2	3
Difference in Result by Rank Number

1 rank	12	11	13	8	8
2 ranks	3	1	4	2	6
3 ranks	2	1	1	2	3
More than 3 ranks	2	1	1	2	3
New Matched	0	0	0	0	1
New Unmatched	1	0	0	0	0
Programs
Number of Programs Affected	20	15	23	15	19
Applicant Proposing Result Preferred	8	0	12	1	10
Current NRMP Result Preferred	12	15	11	14	9
Difference in Result by Rank Number
5 or fewer ranks	5	3	9	6	3
6 to 10 ranks	5	3	3	5	3
11 to 15 ranks	0	5	1	3	1
More than 15 ranks	9	4	6	0	11
Programs with New Position(s) Filled	0	0	2	1	1
Programs with New Unfilled Position(s)	1	0	2	0	0

B. Thoracic Surgery

Because there are no match variations in the Thoracic surgery matches for the years we consider, they are simple matches, and are well described by the existing theory. Consequently we know that the applicants will all do as well as possible at the stable match produced by the applicant proposing algorithm, and the programs will all do as poorly as possible at that stable matching. What the theory does not tell us is how large this effect will be; for that we need to look at the data. As discussed in the introduction, the effect turns out to be minimal: in the five years we studied, only four applicants and four programs would have been affected by a change in algorithms; in three of the five years, the applicant proposing algorithm would have produced the same match as the program proposing algorithm, indicating that this was the only stable matching in those years. (Colenbrander, 1996, reports similarly small differences in the specialty matches he maintains.)

Difference in result when algorithm changed from current NRMP to applicant proposing:

1991	1992	1993	1994	1995
none	2 applicants improve	2 applicants improve	none	none
	2 programs do worse	2 programs do worse

VI. Differences in sensitivity to participant behavior

The comparisons of match outcomes discussed in the previous section are all made in terms of Rank Order Lists which were submitted for matches made by the current NRMP algorithm. While the changes observed when the match was instead produced by the applicant proposing algorithm were small, a comparison of the two algorithms also requires us to consider whether participants might have reason to submit different kinds of ROL's if the new algorithm were to be substituted for the current one. For this purpose, we consider whether participants could have favorably influenced the match, under either algorithm, by submitting different ROL's. The idea is both to assess how many participants could do so, and how the number is different for the two algorithms. This will also allow us to determine what kinds of advice can be given to participants about how to participate in the match, under either algorithm.

Once again, this is a subject about which the theory of simple matching markets tells us a great deal, for markets without the match variations found in the NRMP. To see how well the theory for simple markets approximates the NRMP matches, and also to assess the size of the effects to expect, will again require computational experiments on the data. A quick review of the theory will help organize the discussion.

A. Strategic behavior in simple and complex matching markets

In a simple matching market, without match variations, it can be shown that there do not exist any stable matching algorithms which completely remove the possibility that some applicant or program can get a better match by submitting an ROL that differs from his/her/its straightforward preferences. However, we already noted (in section II) that

In simple markets, when the applicant proposing algorithm is used, but not when the program proposing algorithm is used, no applicant can possibly improve his match by submitting an ROL that is different from his true preferences. (No parallel assertion can be made about residency programs which have more than one position.)

So in simple markets, we would find strategic opportunities for applicants only when the program proposing algorithm is used, and the theory tells us what these might be. Specifically, consider the ROL of some applicant, and define a truncation of that ROL to be a shorter ROL which is the same as the original ROL for as many programs as it ranks. We can then say the following.

In simple markets when the program proposing algorithm is used, every applicant who can do better than to submit his true preferences as his ROL can do so by submitting a truncation of his true preferences. That is, if (holding all other ROL's constant) an applicant would be matched to his kth choice if he submitted his true preferences, and his jth choice (with j

It can also be shown that truncations are the kind of manipulation which can potentially be identified with the least information about others' preferences. Consequently, we can investigate the truncation of ROL's to get a measure of the scope for an even wider variety of strategic behavior.

For simple markets, the theory also tells us which applicants can potentially profit from manipulation, and how much.

In simple markets, when the program proposing algorithm is used, the only applicants who can do better than to submit their true preferences are those who would have received a different match from the applicant proposing algorithm. Furthermore, the best such applicants can do is to obtain the applicant optimal match, and they can do this by submitting to the program proposing algorithm the truncation of their true preferences which stops at the match they would have gotten from the applicant proposing match.

It is important to note that, even in the case of a simple match, without match variations, in general an applicant would not have the information needed to submit such a truncation (and if he submitted a truncation that was one program too short he would become unmatched). But this result shows that, in a simple match, we can identify an upper bound on the number of applicants who could possibly profit from strategic behavior of this sort by seeing how many applicants receive different matches at the two algorithms.

The case of programs which have more than one position is not so simple, even in the case of simple matches. Programs may, at least in theory, possibly profit both from truncating their ROL's, and from reducing the number of positions they submit to the match (either by making early arrangements with some applicants or by withholding positions to be filled by unmatched applicants after the match). The temptation for this latter kind of manipulation can be shown (Sonmez 1996a,b) to be larger with the program proposing algorithm than with the applicant proposing algorithm.

However, in view of the very small differences we observed in the outcomes of the two algorithms, it is reasonable to conjecture from these results about simple markets that, even in the more complex NRMP market, the opportunities for profitable strategic behavior will be vanishingly small under either algorithm. The next subsections describe the computational experiments which permitted this conjecture to be verified.

B. Experiments to determine upper bounds for profitable strategic behavior

1. Preliminary experiments: Truncation of ROL's at the match point

As noted above, in a simple market if an applicant is matched to his kth choice program, or if the lowest ranked applicant a program is matched to is its kth choice applicant, truncation of the ROL at the kth entry would have no influence on the match. This is because, in a simple match, the applicant or program proposing algorithms never have to "backtrack" on an ROL. But in the NRMP backtracking can occur, particularly because of the way reversions are handled. So, before an exploration of what truncations, if any, could have a strategic effect on the match, it was first necessary to see whether truncations at the match point (i.e. deleting the k+1 and higher choices for a participant who was matched to his kth choice) could influence the result of the match under either algorithm, and how much. The truncation of applicant ROLs and program ROLs were investigated separately, for each algorithm, for the 1993, 1994, and 1995 matches. In the majority of cases no change was produced when all ROLs were truncated at the match point; and in no case were more than 3 applicants affected by such truncations (see Appendix B for the detailed results). So truncations at the match point, while not entirely without effect, do not play a consequential role.

The experiments which follow, therefore, will concentrate on identifying participants who can potentially profit from strategic behavior involving truncations above the match point, and (for programs) reductions in the number of positions they register for the match below the number of applicants to which they were matched. (It should be emphasized that the manipulations in these experiments have been chosen to identify every participant who could possibly benefit from a manipulation; in general even the applicants and programs who could potentially profit by such manipulations do not have the information necessary to accomplish such an optimal manipulation.)

2. Experiments to determine upper bounds

As discussed above, the kinds of strategic manipulation to be considered involve truncation of ROL's by applicants or programs, and reductions in stated numbers of positions (quotas) by programs. The manipulations involving program quotas raise the question of how to handle reversions of positions when quotas are to be different from those in the data. Similar questions arise when truncating program ROLs, as this may increase unfilled positions. All of the experiments concerning strategic behavior of programs handle reversions by fixing quotas at the final quotas observed after the match with the original match data. (So when the current NRMP algorithm is tested, the quotas are set to those observed at the end of the actual match for each year studied.) As shown by the results of the sequencing experiments discussed in Section III and detailed in Appendix A, none of the results are likely to be sensitive to this simplification.

For each of the strategic manipulations whose potential magnitude is to be assessed, the chief difficulty in designing the experiments is that a change in a single rank order list or quota has two kinds of effects: it may potentially change the match of the applicant or residency program whose rank order list or quota is changed, but it may also potentially change the match of other applicants and residency programs. To see why, suppose that the rank order list of some applicant is truncated above his current match point, and the match under one of the algorithms is rerun after making (only) this change. Then the applicant whose rank order list was changed may do better (by being matched to a more preferred choice) or worse (by being unmatched instead of matched). At the same time other applicants may do better (and other residency programs may do worse) because of the availability of the position previously held by the applicant whose list was truncated.

This means that, if we truncate a group of applicant ROL's, for example, and see how many of the applicants in this group receive a better match as a consequence of this change, we will be looking at an overestimate of the number of applicants in the group who could have benefitted from truncating their own ROL--many of them will have instead profited from someone else's truncation (even if that person himself became unmatched as a result of his own truncation). So the number obtained in this way would be an upper bound on the number that would have benefitted by truncating their own ROL. But we do not have to settle for this upper estimate; we can refine it iteratively, by now (continuing to) truncate the ROL's only of those applicants whose match improved as a result of the previous (collective) truncations. This will allow us to further eliminate from the set of truncations those who profited from the truncations of applicants who were themselves harmed by their own truncation. Proceeding in this way, we can continue until no more reductions in the sample are achieved. This final number will still be an upper bound, of course, since even in a group of truncators who all do better when they all truncate their preferences, some may be profiting from the truncated ROL's of the others, not from their own truncation.

Experiments were conducted separately for applicants and for programs, and separately for each of the two algorithms. A computational experiment for applicants in a given year started by truncating all ROL's just above the (lowest) match point; i.e. every applicant's primary ROL was truncated just above the match he received when no ROL's were truncated using the algorithm in question. (For example, if an applicant originally matched to rank 3 on his primary ROL, the truncated ROL contained only his first 2 choices.) Of course, many applicants were left unmatched by this truncation, while others received preferred matches (these were the only two possibilities at this stage). Then at the next step, the ROL's of all those who had truncated their lists but did not improve were restored to their original length, and the process was repeated with the smaller number of truncations which remained. This process was repeated until it converged. Computational experiments for programs were structured similarly; starting with every program's ROL truncated just above the lowest ranked match it received. The full results are given in Appendix B. (Table B1 reports the results of the truncation of applicant ROL's for each algorithm, Table B2 the results for programs.)

The results can be summarized by looking at the final upper bounds of the number of applicants and the number of programs which could possibly benefit from truncating their Rank Order Lists.

The results are reported and analyzed below, first for the NRMP matches, and then for Thoracic surgery.

a. Results for the NRMP

The truncation experiments with applicants' ROLs yield the following upper bounds for the two algorithms in the years studied.

Upper limit of the number of applicants who could benefit by truncating their lists at one above their original match point:
	1987	1993	1994	1995	1996
Current NRMP Algorithm	12	22	13	16	11
Applicant-Proposing Algorithm	0	0	2	2	9

As expected, more applicants can benefit from list truncation under the current NRMP algorithm than under the applicant proposing algorithm. Note that the number of applicants who could even potentially benefit from truncating their ROLs under the current NRMP algorithm is in each year almost exactly equal to the number of applicants who received a preferred match under the applicant proposing match (line 2 of Table 2). We will return to this point in a moment, but note that it suggests that this upper bound is very close to the precise number that would be predicted in the absence of match variations.

The truncation experiments with programs' ROLs yield the following upper bounds.

Upper limit of the number of programs that could benefit by truncating their lists at one above their original match point:
	1987	1993	1994	1995	1996
Current NRMP Algorithm	15	12	15	23	14
Applicant-Proposing Algorithm	27	28	27	36	18

As expected, some programs can benefit from list truncation under either algorithm. However, consistently more programs benefit from list truncation under the applicant-proposing algorithm than under the current NRMP algorithm. Note that although the numbers of programs in these upper bounds remain small, they are in many cases about twice as large as the number of programs which received a preferred match at the stable matching produced by the algorithm other than the one being manipulated. (That is, referring back to Table 2, we see for example that in 1995 only 14 programs preferred the matching produced by the current NRMP algorithm to the one produced by the applicant proposing algorithm, but we now find 36 programs in our upper bound of programs which could potentially profit from a manipulation of the applicant proposing algorithm.) It therefore seemed worthwhile to further examine these upper bounds, and see if they were overestimates.

For each algorithm, this was first done by taking a 50% sample of the programs contained in the upper bound for 1995, and restarting the truncation experiment with only these programs having truncated ROL's. The idea is that, if each of these programs can in fact benefit from its own truncation, the experiment would stop after the first iteration, with no further reductions in the upper bound. But if in fact the upper bound is an overestimate, and some of the programs in it are benefitting not from their own truncated ROL, but from the truncation of one of the other ROL's in the upper bound, then on average half of such "false positives" in our 50% sample would have been benefitting from the truncation by one of the programs in the other 50%, which are no longer truncated. In this case we would iterate until the number of truncators who improved their outcome again stabilized at a new, lower upper bound. This is in fact what happened, so the new estimates for 1995 (equal to twice the number obtained from the 50% sample) look as follows compared to the old ones.

Refined estimate of the upper limit of the number of programs that could improve their results by truncating their own ROL's in 1995:
	Current NRMP Algorithm	Applicant Proposing Algorithm
Original Results	23	36
Current Estimate (still an upper limit)	12	22

These results confirm that the numbers that can benefit from the ROL truncations stated earlier are indeed overestimates.

A further analysis was undertaken for each of the five years, to compare the specific individual programs and applicants who appear in these upper bounds as potentially benefitting from ROL truncations with the programs and applicants whose results changed when the algorithm changed. This analysis indicated that those who could benefit from ROL truncations were, for the most part, those who did differently (generally worse) when the algorithm is changed from their side proposing to the other side proposing (without ROL truncations). For example, the applicants who can benefit from ROL truncations when the program proposing algorithm is used are very largely the same as those who benefit when the algorithm is changed to an applicant proposing algorithm with no ROL truncations. Thus in this respect also, it appears that the theory for simple markets provides a good approximation of the situation in the NRMP match.

We next turn to the question of capacity manipulation by programs. Recall that in an actual match this could be considered by a program in the context of either an early agreement (for example with an independent applicant) or in anticipation that some positions would be filled post-match.

An initial experiment was run setting all program quotas to the number of positions filled with the algorithm in question with the original data. (This is analogous to the initial experiment involving truncations of the ROL's at the match point, rather than above it.) In a simple match without NRMP match variations, this would be expected to have no impact on the results. However, with NRMP match data some differences were observed, as noted below:

Results With Input Quotas Set to Positions Filled, Compared to Original Results

	Current NRMP Algorithm	Applicant Proposing Algorithm	Current NRMP Algorithm	Applicant Proposing Algorithm	Current NRMP Algorithm	Applicant Proposing Algorithm
Programs: Improve	12	2	9	none	26	none
Do Worse	none	none	4	2	9	2
Applicants: Improve	none	none	3	2	6	2
Do Worse	12	2	9	none	27	none

With the applicant proposing algorithm, the differences are negligible. However, more differences were observed with the current NRMP algorithm, and the results obtained by setting the quotas to the original positions filled tended to produce better results for the programs.

In order to identify programs that could improve their remaining matches by further reducing their quotas, an iterative technique was employed similar to that used to investigate the effects of rank order list truncations. In this experiment, the base result for comparison purposes was the result obtained for each algorithm when reversions were eliminated but quotas input to the match were set to the final quotas resulting from original runs with reversions.

In the first pass, the quota was reduced to one less than the number of positions filled for each program that filled more than 1 position in the base result. (For programs that filled 1 or 0 positions, the quota was left unchanged.) A match was then run using these reduced quotas, and the result for each program was analyzed to identify programs that achieved strictly improved results for the remaining positions. For example, if a program filled 5 positions in the base match, the quota for this experiment was set to 4. The result was then considered an improvement for the program if the program filled all 4 positions, none of the applicants matched were less preferred than the corresponding 4 matched in the base match, and at least one matched applicant was more preferred than the corresponding matched applicant in the base match.

In each subsequent pass, the quota was set to one less than the number of positions filled for only those programs that had reduced their quota and obtained an improved result in the previous pass; quotas for other programs were set to their original values from the base match. The iterative process was run until all programs that reduced their quotas obtained an improved result.

The initial point at which this iterative process stopped for each year and each algorithm is shown in the table below:

Initial Upper Bound of the Number of Programs That Could Improve Their Remaining Matches By Reducing Quotas
	1987	1993	1994	1995	1996
Current NRMP Algorithm	283	285	210	249	289
Applicant Proposing Algorithm	276	244	198	243	263

At this point the number of programs in the upper bound of those which could potentially improve their results by reducing quotas remained quite high, on the order of 5% of programs. However, as with the analysis of rank order list truncations, this upper bound could be a substantial overestimate. While some programs may improve by reducing their own quotas, some programs may improve because of quota reductions by other programs. We therefore investigated these upper bounds further.

First, half of the programs that previously reduced their quota and improved their result had their quotas restored to the original quota from the base match, while the other half continued to have their quotas reduced. A match was then run with only this 50% sample reducing their quotas. Some of the programs that still had reduced quotas no longer obtained an improved result; these programs had benefitted from the quota reductions of other programs (which no longer had reduced quotas) rather than their own quota reductions. The iterative process described above was resumed and run until all those that reduced their quotas continued to have improved results. At that point another 50% sample was taken, and the process resumed again. When this iterative process stopped, the number of programs that continued to obtain an improved result by reducing their quotas was multiplied by 4 (because we had taken two 50% samples). The resulting numbers, shown in the following table, provide a refined estimate of the upper bound of the number of programs that could improve their remaining matches by reducing their quotas.

Revised Estimate of the Upper Bound of the Number of Programs That Could Improve Their Remaining Matches By Reducing Quotas
	1987	1993	1994	1995	1996
Current NRMP Algorithm	283	285	210	249	289
Applicant Proposing Algorithm	276	244	198	243	263

Again, these numbers are still estimates of the upper bound; further refinement would be possible. However, given the size of these numbers, it seems clear that only a very small number of programs (substantially less than 1%) could improve their remaining matches by reducing their quotas. This does not appear to be an advisable strategy for programs to follow with either algorithm.

b. Results for Thoracic surgery

Because the Thoracic surgery match does not have match variations, the theory tells us precisely which applicants and programs could improve their match by an optimal manipulation. As a check on our computational procedures, we confirmed these predictions by running the same computational experiments on ROL truncations as described for the NRMP matches. The results, summarized below, are as expected.

Applicants who could improve by truncating their ROL's:
Algorithm	1991	1992	1993	1994	1996
Current NRMP	none	2 applicants improve (same ones who did better when the algorithm changed)	2 applicants improve (same ones who did better when the alogrithm changed)	none	none
Applicant Proposing	none	none	none	none	none

Programs that could improve by truncating their ROL's:
Algorithm	1991	1992	1993	1994	1996
Current NRMP	none	none	none	none	none
Applicant Proposing	none	2 programs improve (same programs that did worse when the algorithm changed)	2 programs improve (same programs that did worse when the algorithm changed)	none	none

So, in Thoracic surgery as in the larger and more complex NRMP match, the opportunities for strategic manipulation are essentially non-existent under either algorithm. (Note that Colenbrander 1996 reaches essentially the same conclusions about the specialty matches he maintains.)

VII. Why the differences are small: Insights from the theory of simple markets

All of the results discussed up to this point can be characterized by noting that the theory of simple matches, without match variations, gives a good approximation for the direction of each of the comparisons we have made, and, in addition, the size of all the predicted changes has been very small. In this section I want to briefly discuss what insights we can get from simple markets to help explain why these differences are so small. The results in this section are based on computational comparisons similar to those discussed earlier, but now concerning hypothetical markets without any match variations.

The small differences between algorithms which we have been seeing reflects that, in each of the years studied, the set of stable matchings has been small, as measured by the number of participants who receive different matches from the program-proposing and applicant-proposing algorithms. It is therefore of interest to consider how the set of stable matchings looks for comparably large markets when we concentrate on simple matches. For this purpose, we can consider matching markets with n positions and n applicants, as n approaches the size of the markets we are studying, namely the specialty markets like Thoracic surgery, and the general NRMP match.

One factor which strongly influences the size of the set of stable matchings is the correlation of preferences among programs and among applicants. When preferences are highly correlated--i.e. when similar programs tend to agree which are the most desirable applicants, and applicants tend to agree which are the most desirable programs--the set of stable matchings is small. (When preferences are perfectly correlated then there is a unique stable matching, so both algorithms would produce the same matching.) However as the correlation of preferences goes down, the size of the set of stable matchings grows, and more and more participants would be matched differently by the two algorithms. This is true independently of the size of the market.

It turns out, however, that the size of the market also plays a critical role, in an interesting way. Consider the case in which preferences are uncorrelated (so the set of stable matchings is large). If every applicant could somehow interview and be interviewed for all of the positions, then the set of stable matchings would grow larger and larger (even as a percentage of the number of applicants who could get different stable matchings) as the number of applicants and positions grew. But of course in a real market there is a limit to how many interviews an applicant can go on, or a program can conduct. And when we take this into account, we see that the set of stable matchings quickly becomes very small as the market becomes large.

Specifically, let k equal the number of interviews a candidate can go on, and let n equal the number of applicants and positions in the market. (For simplicity, suppose for a moment that each program offers a single position.) Then even when preferences are completely uncorrelated, as k/n becomes small, the set of stable matchings becomes small. So for example, if k=15 (not an unreasonable approximation for the NRMP) and n > 10,000, fewer than 0.1% of applicants would receive a different match from the two algorithms. That is, even with completely uncorrelated preferences, we see in this simple market the same one-in-a-thousand order of magnitude that we see in the NRMP. And for simple markets the size of the specialty matches like Thoracic surgery, with n on the order of 100 positions, if we suppose that applicants interview at no more than k=10 programs we find only about 2% of applicants receiving different matches from the two algorithms.

Especially in view of the fact that preferences are not uncorrelated in the medical matches, this means that the orders of magnitude of the effects studied in the actual matches are very comparable to what we should expect of simple matches with similar k and n. Thus (once we know to look at both k and n) the theory of simple markets turns out to provide a good approximation not only for the direction of the effects we are seeing, but also for their size.

The reason this is important for the present study of the NRMP and specialty matches is that, in the theoretical study of simple markets, we can look at what would happen when we know agents' true preferences, not just the ROLs which they submit to the match; whereas in the study of real matches we have been using as data the submitted ROLs. So one theoretical possibility might have been that the reason we find such small potential for strategic manipulation is that our data has been collected after such manipulation has already taken place. That is, one skeptical interpretation of our results is that, perhaps there are enormous opportunities for strategic manipulation, but these have been exhausted by the time you look at the ROLs submitted to the match, because the participants have already behaved strategically in an optimal way. The results discussed in this section show that this hypothesis is implausible, because in that case when we looked at similarly sized matches theoretically (and examined the hypothetical participants' true preferences), we would find the hypothesized enormous differences when we looked at the size of the set of stable matchings.

Instead, we see that the case of simple markets provides an explanation of not only the direction of the effects we have been examining, but also their small size.

VIII. Directions for further investigation: participant behavior

As discussed above, the comparisons between the two algorithms in this study have all been made in terms of the Rank Order Lists submitted in previous matches. This means that many aspects of the behavior of applicants and programs were not examined. In view of the extremely small potential for strategic behavior which turns out to be present in these markets, some kinds of behavioral observations which initially appeared important no longer appear so. However, a full understanding of the match, particularly if the medical market continuis to undergo significant change, will require observation of other kinds of behavior by applicants and programs as the performance of the match is monitored over the next decade.

One issue which may be worth some attention, despite the absence of opportunities for profitable strategic behavior, is the extent to which some market participants may engage in "intuitive strategizing" in a potentially counterproductive way. (For example, in the course of studying these markets, I have sometimes heard of folklore passed among particular groups of applicants, to the effect that one should list one's second choice first, or list no more than six choices, etc.) To the extent that such beliefs may be prevalent in some parts of the market, the performance of the match might be improved by pointing out the implications of the present study to associate deans or others in a position to advise applicants.

A related set of issues, especially given the market failures which preceded the organization of the NRMP, and which persist today in other markets, is with potential participants who either withdraw (themselves or some of the positions they might offer) from the match or fail to participate in other ways, and also with those who make or contemplate early agreements which are subsequently formalized through the match. It will be especially valuable to understand what motivates such participants and what happens to them, in order to anticipate what effect, if any, a change in the medical marketplace will have on the match. (Theoretical studies like those discussed in the previous section also have the potential to illuminate this question, but they would have to be more closely modeled on the medical match.)

Aside from the question of precisely how participants compose their Rank Order Lists, there are many aspects of the match which a study such as this does not begin to address, such as: 1. how interviews are sought and granted; 2. what kind of information is exchanged through the interview process; and 3. what happens in the "scramble" for unmatched applicants and unfilled positions following the match. All of these contribute to producing the match, and should be examined as the match is monitored in the coming years.

Regarding the first point, as remarked in the previous section, the number of interviews which applicants go on is an important determinant of the size of the set of stable matchings (as is, therefore, the number of interview invitations which programs issue, and the number of applications they receive, and from what kinds of applicants).

Regarding the second point, a frequent problem in many matching markets I have studied is that, at the interview stage, applicants feel pressured to say whether the position in question is their first choice (and program directors feel it is crucial for them to assess not only how much they like each candidate, but how much each candidate likes their program). Sometimes misinformation and confusion can be passed in both directions, in this connection. How large a factor this may be in the NRMP, and what are its consequences for the match, cannot be assessed from the ROL data examined in this study.

Regarding the third point, what happens to initially unmatched candidates and positions is not only an important determinant of the final match (particularly to them), it may also influence how people participate in the initial match, e.g. it may influence the kinds of behavior discussed in the first two points above. So, particularly if changes in the post-match process are to be contemplated (e.g. the proposed `second match' of unmatched applicants and positions), it would help to understand the connections between the match and the post match process (e.g. do some candidates avoid listing "safe" but less desirable choices in the match and instead rely on the scramble? Would more do so in an organized second match? How about programs?)

Behavioral questions such as those above could be approached by surveys and interviews of match participants, but cannot be inferred from ROL data.

IX. Concluding remarks

The purpose of this report has been not only to present the main conclusions of the study, which are summarized in section I, but to describe the process by which these conclusions were reached, so that their robustness and reliability could be understood as well.

One result of this study is that there is now a well tested (more than 500 matches were conducted during the course of this study) applicant proposing algorithm capable of handling the NRMP match variations. As we have seen, its overall performance is very similar to that of the current NRMP algorithm. Except for the fact that a very small group of applicants can be expected to do better under the applicant proposing algorithm, a choice between the two algorithms will not involve consequential differences for the match as a whole.

A related result is that, under either algorithm, both applicants and programs can be safely advised that, at the point in the process when they decide what Rank Order Lists to submit, they are extraordinarily unlikely to be able to influence the outcome of the match in their favor by submitting a list different from their true preferences. This result is supported both by the computational studies conducted on previous matches, and by the new theoretical results presented in section VII.

Finally, it should be noted that the performance of a matching mechanism depends not only on its design and "mechanical" properties, but also on the information which participants have about its performance. It may be helpful to refer to the results of this study whenever the algorithm (whichever one is chosen) is described in NRMP literature, to insure that participants know that they can participate in the match with confidence.

X. Selected Bibliography

Aldershof, Brian and Olivia M. Carducci [1996], ``Stable Matchings with Couples," Discrete Applied Mathematics, forthcoming.

Colenbrander, August (1996), personal communication.

Roth, A.E. "New Physicians: A Natural Experiment in Market Organization," Science, 250, 1990, 1524-1528.

Roth, Alvin E. [1995], ``Proposed research program: Evaluation of changes to be considered in the NRMP algorithm," Consultant's report, http://www.pitt.edu/~alroth/nrmp.html.

Roth, Alvin E. [1984a], "The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory," Journal of Political Economy, 92, pp991-1016.

Roth, Alvin E. "The NRMP as a Labor Market," Journal of the American Medical Association, 275, April 3, 1996, 1054-1056.

Roth, Alvin E. and Marilda Sotomayor Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis, Econometric Society Monograph Series, Cambridge University Press, 1990.

Roth, Alvin E. and John H. Vande Vate [1990], "Random Paths to Stability in Two-Sided Matching," Econometrica, 58, 1990, 1475-1480.

Sonmez, Tayfun [1996], "Manipulation via Capacities in Two-Sided Matching Markets," mimeo, University of Michigan.

Sonmez, Tayfun [1996], "Can Pre-Arranged Matches be Avoided in Two-Sided Matching Markets?" mimeo, University of Michigan.

Appendix A: Results of the computational experiments concerned with sequencing

1. Sequencing experiments on the current NRMP algorithm

1. Sequence in which programs are processed

Results With Programs Processed in Descending Code Order Compared to Original Results With Current NRMP Algorithm

	1993	1994	1995
Programs: Improve	none	2	2
Do Worse	2	2	2
Applicants: Improve	2	2	none
Do Worse	none	2	2

(In 1994, when some programs and applicants did better while others did worse, there was no correlation between the change in result and the code numbers of the applicants and programs.)

2. Sequencing of reversions:

Results With Input Quotas Set to Final Quotas and Reversion Processing Eliminated, Compared to Original Results With Current NRMP Algorithm

	1993	1994	1995*
Programs: Improve	2	none	none
Do Worse	none	2	2
Applicants: Improve	none	2	2
Do Worse	2	none	none

* Subsequently the years 1987 and 1996 were also examined with similar results: no applicants were affected in 1987, 2 were affected in 1996.

B. Summary of Results of Experiments Related to Sequencing in the Applicant Proposing Algorithm

Sequence of Processing

1993

1994

1995

Part A: Baseline Results (when program selected from stack, applicants processed in ascending program rank number sequence)

1. Applicants ascending; singles and couples intermixed.

Match Result

This is the base result to which others are compared.

Loops Detected

Part B: Applicants and Couples Processing Sequence(when program selected from stack, applicants processed in ascending program rank number sequence)

2. Applicants descending; couples last.

Match Result	Same	Same	Same
Loops Detected	0	0	0

3. Applicants ascending; couples last.

Match Result	Same	Same	Same
Loops Detected	2	0	0

4. Applicants ascending; couples first.

Match Result	2 applicants worse	2 applicants worse	Same
Loops Detected	3	6	1

5. Applicants descending; couples fist.

Match Result	2 applicants worse	2 applicants worse	Same
Loops Detected	1	59	3

Part C: Sequence of Processing Applicants Ranked by Program Selected from Program Stack(when program selected from stack, applicants processed in descending program rank number sequence)

6. Applicants ascending; singles and couples intermixed.

Match Result	Same	9 applicants improved, 3 applicants worse^#	Same
Loops Detected	17	148	62

7. Applicants ascending; couples last.

Match Result	Same	9 applicants improved, 3 applicants worse^#	Same
Loops Detected	2	0	0

^#In Part C, the results for the two experiments for 1994 (couples intermixed and couples last) were the same. In both cases, the differences in the results in Part C as compared to the baseline results in Part A were caused by chains resulting from 2 applicants doing worse in Part C when compared to Part A.

Results With Input Quotas Set to Final Quotas and Reversion Processing Eliminated, Compared to Initial Results With Applicant Proposing Algorithm
	1993	1994	1995^*
Programs: Improve	none	none	none
Do Worse	none	2	2
Applicants: Improve	none	2	2
Do Worse	none	none	none

^*Subsequently 1987 and 1996 were also examined, with no applicants affected in 1987 and a single chain of 9 affected in 1996.

Appendix B: Results of the computational experiments concerned with truncation of ROL's and capacity reductions

1. Truncations at the match point

Difference in result for both the current NRMP algorithm and the applicant proposing algorithm when applicant ROLs truncated at the match point:
1993	1994	1995
none	2 applicants improve; same positions filled	2 applicants improve; same positions filled

Difference in result for the current NRMP algorithm when program ROLs truncated at the match point:
1993	1994	1995
none	none	2 applicants do worse, same positions filled

Difference in result for the applicant proposing algorithm when program ROLs truncated at the match point:
1993	1994	1995
none	3 applicants do worse, same number of positions filled, but not same positions [3 programs filled one less position, 1 program filled 1 more position, 1 program filled 2 more positions, 1 additional position was reverted from one program to another]	none

TABLE B1: Results for Iterative Truncations of Applicant ROL's
	1993				1994
	Original NRMP Algorithm		Applicant Proposing		Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	17209	4546	17209	4536	17725	4935	17725	4934
Run 2	4546	2093	4536	2082	493 5	23 61	<4934/td>	2359
Run 3	2093	1036	2082	1023	236 1	11 85	2359	1183
Run 4	1036	514	1023	498	1185	602	1183	598
Run 5	514	258	498	241	602	292	598	287
Run 6	258	135	241	116	292	151	287	143
Run 7	135	73	116	52	151	75	143	66
Run 8	73	48	52	25	75	40	66	31
Run 9	48	34	25	12	40	27	31	17
Run 10	34	27	12	5	27	18	17	7
Run 11	27	24	5	2	18	14	7	3
Run 12	24	22	2	0	14	13	3	2
Run 13	22	22			13	13	2	2

TABLE B1: (Continued) Results for Iterative Truncations of Applicant ROL's
	1995				1996
	Original NRMP Algorithm		Applicant Proposing		Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	18170	5763	18170	5758	18316	5805	18317	5806
Run 2	5763	2907	5758	2899	580 5	29 15	5806	2917
Run 3	9073	1572	2899	1559	291 5	15 69	2917	1571
Run 4	1572	857	1559	844	1569	861	1571>	864
Run 5	857	473	844	460	861	481	864	482
Run 6	473	251	460	238	481	271	482	271
Run 7	251	136	238116	124	271< /td>	157	271	155
Run 8	136	79	124	67	157	89	155	87
Run 9	79	45	67	31	89	57	87	55
Run 10	45	31	31	17	57	36	55	33
Run 11	31	22	17	8	36	24	33	21
Run 12	22	18	8	4	24	19	31	15
Run 13	18	16	4	2	19	15	15	13
Run 14	16	16	2	2	15	14	13	12
Run 15					14	13	12	11
Run 16					13	12	11	10
Run 17					12	11	10	9
Run 18					11	11	9	9

TABLE B1: (Continued) Results for Iterative Truncations of Applicant ROL's
	1987
	Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	16117	4324	16116	4317
Run 2	4324	1894	4317	1887
Run 3	1894	898	1887	891
Run 4	898	437	891	429
Run 5	437	203	429	194
Run 6	203	93	194	84
Run 7	93	41	84	31
Run 8	41	24	31	13
Run 9	24	18	13	6
Run 10	18	14	6/td>	2
Run 11	14	12	2	0
Run 12	12	12

TABLE B2: Results for Iterative Truncations of Applicant ROL's
	1993				1994
	Original NRMP Algorithm		Applicant Proposing		Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	3342	1457	3342	1462	3369	1514	3369	1517
Run 2	1457	7403	1462	748	1514	809	1517	813
Run 3	740	382	748	394	809	441	813	444
Run 4	382	201	394	216	4415	249	444	255
Run 5	201	107	216	122	249	138	255	145
Run 6	107	64	122	79	138	79	145	86
Run 7	64	37	79	52	79	44	863	52
Run 8	37	22	52	37	44	31	52	39
Run 9	22	15	37	30	31	23	39	32
Run 10	15	13	30	28	23	20	32	30
Run 11	13	12	28	28	20	18	30	29
Run 12	12	12			18	16	29	28
Run 13					16	15	28	27
Run 14					15	15	27	27

TABLE B2: (Continued) Results for Iterative Truncations of Program ROL's
	1995				1996
	Original NRMP Algorithm		Applicant Proposing		Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	3444	1538	3444	1541	3410	1445	34107	1444
Run 2	1538	783	1541	790	1445	727	1444	725
Run 3	783	420	790	431	727	384	725	3841
Run 4	420	237	431	248	384	213	3841>	212
Run 5	237	130	248	141	213	114	212	115
Run 6	130	77	141	89	114	71	115	72
Run 7	77	50	89	62	71	50	72	52
Run 8	50	35	62	47	50	35	52	39
Run 9	35	29	47	41	35	26	39	30
Run 10	29	26	41	38	26	21	30	25
Run 11	26	24	38	36	21	19	25	23
Run 12	24	23	36	36	19	18	23	22
Run 13	23	23			18	17	22	21
Run 14					17	16	21	20
Run 15					16	15	20	19
Run 16					15	14	19	18
Run 17					14	14	18	18

TABLE B2: (Continued) Results for Iterative Truncations of Program ROL's
	1987
	Original NRMP Algorithm		Applicant Proposing
	Truncated	Truncated & Improved	Truncated	Truncated & Improved
Run 1	2967	1345	29676	1349
Run 2	1345	670	1349	675
Run 3	670	347	6757	353
Run 4	347	186	353	194
Run 5	186	100	194	110
Run 6	100	55	110	66
Run 7	55	33	66	44
Run 8	33	21	44	33
Run 9	21	17	33	29
Run 10	17	15	29	27
Run 11	15	15	27	27