»  Solutions to puzzles in my National Review Online Diary

  March 2009

—————————

March's puzzle was a wee exercise in what my schoolmasters called "perms and coms" — that is, permutations and combinations. A key piece of arithmetic here is the binomial coefficient or "choice number": the number of ways of picking k objects from n, when you don't care about the order and don't allow replacement. The number of ways of picking two objects from four, for example (call the four A, B, C, and D), is given by 4×3 divided by 2×1, which is of course 6. (Here they are:  AB, AC, AD, BC, BD, CD.) It always goes like that. The number of ways of picking three objects from 20 would be 20×19×18 divided by 3×2×1, which is 1,140. Yes, the division always comes out to a whole number — math is neat like that!

The number of ways of picking k objects from n on these conditions is usually written nCk.   So 20C3 = 1,140. If you have Microsoft Excel, you can do combin(20,3) to get the same result.

OK, here is the March problem. An emailer asked:

There is a set of 8 items and I am making 10 random selections with replacement. I have not been able to determine a formula to assess the probability of selecting a given number of items (say 5) without selecting the rest.

As so often happens, I outsmarted myself here, sending my correspondent a glib answer which was completely wrong. (Sorry, guy.) I won't feel too bad about this, since the handful of readers who emailed in with same-day solutions all got it wrong, too, and there were two credentialed mathematicians among their number! Tim Chow finally produced the right formula. Let's take a look.

—————————

Solution

Let's name the eight objects:  A, B, C, D, E, F, G, and H. I'm going to call this my "alphabet." Now I define a "word" to be a string of letters made up by randomly selecting ten of these objects, allowing replacement after each selection. So an example of a word would be  CGAAFDACDD. That particular word "hit" only five of the letters in my 8-letter alphabet:  A. C, D, F, and G.

How many possible 10-letter words can I make from this 8-letter alphabet? Of that total, how many will, like the word I just showed, "hit" precisely five letters?

The first question is easily answered. Since each of the ten positions in a pick can be filled in eight ways, the total number of possiblities is 8×8×8×8×8×8×8×8×8×8, known to us mathematical sophisticates as 810. This works out to 1,073,741,824.

To answer the second question we have to delve into the mathematics of partitions. Any 10-letter word that uses only five letters will divide the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} into exactly five disjoint subsets. For the word CGAAFDACDD that I quoted above, for example, the five subsets are {1, 8}, {2}, {3, 4, 7}, {5}, and {6, 9, 10}, corresponding to, respectively, the Cs, the G, the As, the F, and the Ds.

Now, the number of ways of dividing {1, 2, 3, …, n} into k disjoint subsets is given by the Stirling number of the second kind  S(nk). For our particular problem we want S(10,5), which is 42,525.

Since I can choose the 5 letters from my alphabet of 8 in 8C5 ways, I need to multiply the possibilities by that, which is 56. Also, I can assign the 5 letters to partitions in 5! (that is, factorial 5) ways: associating the first partition with any of C, G, A, F, or D, then the second with one of the four remaining, and so on.

Bottom line: The number of 10-letter words that use only 5 of the letters in my 8-letter alphabet is

8C5 × 5! × S(10,5)

Where S(nk) is the Stirling number of the second kind. This is 56 × 120 × 42525, which is 285,768,000

You can make this work for any number. If you want the probability that any random 10-letter word from your 8-letter alphabet will use only five letters, you need to divide by 810, the total number of 10-letter words in this alphabet. The answer's about 0.26614219.

Just to cross-check, I did a simulation, running a full 1,073,741,824 picks with a random number generator to see what I got. Here's what I got.

Objects hit Picks Proportion
of all picks
    1         7         0.00000000652    
    2         27,150         0.0000253    
    3         3,133,270         0.00292    
    4         57,321,154         0.0534    
    5         285,855,613         0.266    
    6         459,831,099         0.428    
    7         237,313,892         0.221    
    8         30,259,639         0.0281    

If you have Microsoft Excel and want to see the thing done without using Stirling numbers (though they are present implicitly via the recursion algorithm), Joe Shipman contributed this neat spreadsheet. If you click on a cell you'll see the corresponding recursion formula.