20210602, 12:44  #1 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5,939 Posts 
Matching problem
Suppose you have n letters and n addressed envelopes that you need to match. You can find matches by pairing all letters with envelopes and a friend will tell you how many matches there are(not which).
What is the expected number of times you need to pair the letters with envelopes to get all matching? What is the maximum number of pairings you could need to find all matches? What would your strategy be to minimize the above numbers? Do they have the same solutions? For easier discussion let's use n=10 as an example although a general formula would be great. 
20210602, 14:32  #2 
"Viliam Furík"
Jul 2018
Martin, Slovakia
2·353 Posts 
If we assume the envelopes are coded 1 to n, and letters are also coded 1 to n, but the pairings of these numbers are unknown, the maximum number of tries you need should be n! because you can simply try all the possible arrangements of the letters while fixing the envelopes.
That's for brute force. If you want something clever, I suggest something like the following:
When you are certain that some letter is in its envelope, leave it there and never move it until you're done. That should do it. I am not sure if it's the best way to do it, but it should be a fairly good way to do it. As for the expected number of times you bother your friend, I don't know, and it's probable that someone else here will find a better algorithm or the number before I do either. 
20210602, 16:38  #3 
"Viliam Furík"
Jul 2018
Martin, Slovakia
2·353 Posts 
I will demonstrate what I think is the worstcase scenario for n=5, using a bit of strategy in choosing the switches, such that I will try to not place the same letter twice to where I know it doesn't belong.
([x] denotes that letter is for sure in its correct envelope) Begin Step 1 > 2, 4, 5, 1, 3 Friend > 0 (call 1) Step 2 > 2, 3, 5, 1, 4 (34 switch) Friend > 0 (call 2) Step 2 > 5, 3, 2, 1, 4 (25 switch) Friend > 0 (call 3) Step 2 > 5, 1, 2, 3, 4 (13 switch) Friend > 0 (call 4) Step 2 > 5, 1, 4, 3, 2 (24 switch) Friend > 0 (call 5) Step 2 > 3, 1, 4, 5, 2 (35 switch) Friend > 0 (call 6) Step 2 > 3, 1, 4, 2, 5 (25 switch) Friend > 1 (call 7) Step 2 > 3, 5, 4, 2, 1 (15 switch) Friend > 0 (call 8) Step 2 > 3, 1, 4, 2, [5] (rollback) Step 2 > 3, 4, 1, 2, [5] (14 switch) Friend > 1 (call 9) Step 2 > 1, 4, 3, 2, [5] (13 switch) Friend > 3 (call 10) Step 2 > [1], 2, [3], 4, [5] (24 switch) At this point, you don't even need to call your friend, because you are certain you got it correct  If your correctness number is n2, you know you just have to switch the two remaining letters you didn't yet mark as correct. That is 10 calls for n=5 in the worst case. If someone finds a mistake in my example, please, point it out. 
20210602, 19:50  #4 
"Walter S. Gisler"
Sep 2020
Switzerland
11 Posts 
My approach would be to formulate the problem as a 01 linear program.
We will have variables which are 1 if letter matches with envelope and 0, otherwise. Clearly, every assignment of letters to envelopes would fulfill the following constraints: If we solve this problem we will get a unique assignment and we can receive the number of correct guesses, which we can denote as . With that, we can add one new constraint every time this hint is given. Assuming our previous solution is given as a set of pairs, denoted as , this additional constraint can be expressed as follows: This constraint essentially says that the correct solution includes of the matchings in . I believe this is the strategy that uses the given information the most efficient way possible and minimizes the number of attempts needed to uncover all of the correct pairs. Unlike using only the information from the last (or maybe last two "rounds") it combines all knowledge it retrieves in this process. It is pretty hard to calculate the expected number of "rounds", but a simulation can help to quantify it. I have implemented this quickly in Python to get an idea. I have added a sample run to illustrate how this method choses the pairs in each round. It is clearly not what a human would do or in any way similar to the previous answer, but seems to be quite efficient. I'm attaching a graph showing the frequencies (+cumulative frequencies) of the problem being "solved" after a given number of rounds. It can be seen that, if we have 10 letters and 10 envelopes, the expected number of iterations seems to be somewhere around 10. In my experiment (2000 random experiments), the highest number of tries I needed was 15 (just once), but the most common number seems to be 10 (473 times out of 2000 experiments). I am also attaching the Python code (simulate.txt  seems like I can't upload .py files) in case anyone wants to experiment with it and adjust it to chose a different strategy. Out of curiosity, I have also done a quick experiment and have adjusted the number of letter/ envelope pairs to 20. Obviously, there are a lot more possible assignments. Nevertheless, the expected numbers seems to be somewhere around 27, which is still quite decent. 
20210602, 20:37  #5 
"Viliam Furík"
Jul 2018
Martin, Slovakia
2C2_{16} Posts 
I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?

20210602, 20:50  #6 
"Walter S. Gisler"
Sep 2020
Switzerland
B_{16} Posts 
I like your approach, Viliam. I would be really curious to see how it performs in a Monte Carlo simulation, but it isn't as easy to implement as my method.
You can even improve the performance a bit if you keep track of pairings that are wrong. For example, in step 2c, you learn about two numbers that are wrong and it would never make sense to switch either of them back again. In the example that you provided, you are doing one move that could be avoided: You already know (from the very first step) that 4 can't be in the second position, but you are doing a swap later in the process that brings it back into that position. By the way, after 8 steps, the solution is actually clear if you combine all of the knowledge you have gained in the process. To show this, I have created a 5x5 grid representing all possible pairs of letters/ envelopes. Whenever your "friend" tells you that a solution has 0 correct choices, you cross out a possible solution. After the first 8 steps, you got the 0correct scenario 7 times, so the grid is quite full already (see attached screenshot). Based on this, you can already deduce the correct solution for letter 4 and 5 and don't need to continue trying with those. Furthermore, you have had one solution that gave you 1 correct position: 3, 1, 4, 2, 5 Since from all the 0correct scenarios you know that the solution will be x, x, x, 4, 5, you can now deduce that 3, 1, 4, 2 is also incorrect. That crosses out one more square and tells you that the solution will be x, x, 3, 4, 5. Furthermore, for position 1, the candidates are either 1 or 4, but 4 is already taken. From that, the whole solution is know. Nevertheless, a pretty good strategy, but you can combine it with additional logic reasoning to reduce the number of steps. 
20210602, 21:14  #7 
Jan 2017
171_{8} Posts 
I'm not familiar with docplex which seems to be what the code uses to generate the specific guesses, but it seems like it probably just chooses some arbitrary permutation which has the given number of matches with every previous try  if your first try got 4 matches and second 5, then on third try it'll try some permutation that has 4 choices in common with the first and 5 with second.

20210602, 21:24  #8  
"Walter S. Gisler"
Sep 2020
Switzerland
13_{8} Posts 
Quote:
The whole idea behind my approach is to describe the correct solution in terms of the knowledge we gain in every step. The knowledge you gain in one single step isn't valuable by itself, but the knowledge you gain in multiple steps combined, tells you a lot about the solution. Think of it as ruling out solutions, rather than moving towards a better solution at every step. At the beginning, the possible solution space is . With every iteration and every constraint that we add, we cut off some of the solutions and therefore reduce the solution space, until we are left with only one single unique solution. Let's look at a much easier example with n=3; let's assume the correct solution is [1, 2, 3] If our first guess is [3,1,2], the number of correct guesses is 0. Based on this, we can not only rule out [3,1,2], but also [3,2,1], [1,3,2] and [2,3,1] (because each of them has one position in common with a solution we know is completely wrong), and that's exactly what adding that constraint would do. About choosing the next solution: as I said, all that the constraints do is to reduce the solution space from to a smaller number of solutions. So after adding a constraint, we just let the MIP solver chose one of the remaining solutions in the solution space at random. The whole idea isn't necessarily to chose a better solution in every iteration, so "predict the next solution" isn't quite the right wording. A common problem that is solved with a binary integer program is Sudoku. It is certainly much easier to understand than this. If you are curious about the method I am using here, this would probably be a good starting point: https://www.mathworks.com/help/optim...lembased.html 

20210602, 21:26  #9  
"Walter S. Gisler"
Sep 2020
Switzerland
1011_{2} Posts 
Quote:
Think of docplex as a black box. There are plenty of other libraries I could have used: CBC, Gurobi, ORTools, one of many SAT solvers etc. 

20210602, 21:32  #10 
Jan 2017
11^{2} Posts 
Trying something that could be the correct answer given all previous ones (which I believe is what Walter's code does) is not always the optimal answer, either for average or worstcase behavior (at least worst case for remaining tries in a particular situation, I don't show this to necessarily affect the global worst case):
Suppose that n = 1000000. Your first random guess gets 999998 matches right (quite lucky). Now, if you keep trying guesses that could be the correct answer, they can only swap one pair from your initial guess. Since you can only change two places at once, you'd expect to need around 146000 tries to even try changing one of the incorrect positions (you'd try changing twice that number of places, and 50% chance that both the incorrect positions are outside that set). Some kind of bisection strategy should be able to do much better than that. Last fiddled with by uau on 20210602 at 21:35 
20210602, 22:38  #11  
"Walter S. Gisler"
Sep 2020
Switzerland
13_{8} Posts 
Quote:
I did a few additional tests. Unfortunately it isn't practical to try my method with such a high n because the matrix gets too big, but if I use n = 100 and force exactly this kind of situation, I am on average only using about 30 tries. My feeling is telling me that that's pretty decent compared to what you would expect as an average runtime. So, with n = 1M, it is true that my approach would probably use over 100k tries in this worst case scenario (it needs to "touch" each wrong position at least once). However, what would the average case scenario look like? My gut feeling is telling me it would be somewhere in that range too. But might completely off on that estimate. It is pretty hard to imagine this game with such a lot of letters and envelopes... 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Matching BAD LL results?  PhilF  Data  35  20190520 02:36 
Largest number of LL tests before matching residues achieved?  sdbardwick  Lounge  1  20150203 15:03 
Can 1227133513 be the only composite number matching the conditions?  miket  Math  5  20140812 00:41 
Three matching tests not closing exponent?  Dubslow  PrimeNet  8  20120427 18:19 
Residue not matching due to masked bits  patrik  Data  1  20110924 23:44 