At the start an ominous warning is given, one which I have experienced all to many times. Probability is often very counter intuitive. It’s easy to make mistakes, you can’t trust your intuition and you need to stay principled to be sure. These notes are my attempts at staying principled.

Why study probability

As a data scientist I use probability and statistics every day. But most of it is relatively simple. Yet it would be nice to have a very firm grounding of all statistics because of it usefulness in so many situations whenever data is concerned. Therefore my reasoning is mainly a better foundation for statistics and a Computer science. The fact that it is used a lot in Finance and other sciences is an added bonus.

Strategies used in this book: - Simulation (Also see hacker-statistics) - Biohazards: get a feel for the common pitfalls - Sanity checks: using multiple methods for verification

Pebble World

We have a sample space \(S\) containing all the outcomes of an experiment. We call a subset of the set an event (can be one or more outcomes). If the actual outcome is in the subset, we say that that event has occured.

We can use set theory methods. like intersections, unions, complements to create new subsets of the experiment. This means combining events. See also de morgan laws (2 examples of combining outcomes are given in the book)

Naive definition of probability

Naive definition (relies on many assumptions like each outcome being equally likely). Say we want to know the probability of A. Let a be the number of outcomes favorable to A. Let s be the total number of outcomes in sample space \(S\).

p_naive <- function(a, s) a / s

Note that our set rules, like the complement, also holds. So to calculate the complement.

p_naive_complement <- function(a, s) 1 - p_naive(a, s)

Because of the restrictions (making it naive) it is often not applicable. Some cases where it is applicable.


Check out books and sites on Combinatorics for a more thorough mathematical view on counting.

The multiplication rule: Combine multiple experiments by multiplying the counts of the different experiment outcomes.

Note that although it is easy to think about the experiments in a certain order, this is not needed for the multiplication rule to work.

This can even be used for problems in which you need to find the number of possibilities for a number 1, 2, 3.

Example: We have 10 train wagons. How many different ways to select numbers 1, 2, 3? \(10 * (10 - 1) * (10 - 2) = 720\)

The multiplication rule leads to the counting of subsets.

Example: We have a combined experiment of

  1. ball or no ball
  2. card or no card
  3. coin or no coin

Each of these has 2 outcomes so the combined number of probabilities would be \(2 * 2 * 2 = 2^{3}\). Now if we think of each of the individual as having anything or not (maybe it can be represented by a having a certain number of not). It can be used for any set of elements and the question of how many subsets are there.

If the experiments are actually sampling from a the same set again and a again we have the calculation for sampling with replacement. This only works if order matters. Having a result for experiment 2 and the same result for experiment 3 mean different things.

outcomes_with_replacement <- function(n, k) {
  x <- n ** k
# Number of ways to obtain a sample size of k (different order would be different way)

As seen before for counting outcomes with replacement we need to subtract the previously chosen elements from the set. \(n(n-1)(n-2)\) in a previous train example and in general \(n(n-1) \dots (n-k+1)\). Then we need to add a guard that selecting more elements than are in a set is impossible.

outcomes_without_replacement <- function(n, k) {
  # Only works for k bigger than 0 (it needs to select something)
  if (k > n) {
  res <- 1
  for (i in 0:(k - 1)) {
    res <- res * (n-i)

outcomes_without_replacement(10, 10)
## [1] 3628800
## [1] 3628800

And we can also see that if \(n\) is the same size as \(k\) it is a factorial calculation of \(n! = n (n-1) \dots 1\).

Now to use our counting to calculate probabilities by using our naïve definition of probability. To calculate the birthday paradox: The number of ways to select \(k\) birthdays (which can take any day of the year so with replacement) will be used as the sample space \(S\). Now it is much easier to calculate no birthday overlap than one or more overlaps, so we will use the set rules and in the end do \(1 -\) probability of no birthday overlap.

p_no_birthday <- function(k) outcomes_without_replacement(365, k) / outcomes_with_replacement(365, k)
p_birthday <- function(k) 1 - p_no_birthday(k)
## [1] 0.5072972
pbirthday(23) # R buildin
## [1] 0.5072972

When using counting it is important to label the different objects (or sometimes it is easier to think of labeling the different experiments). If we look back at the train example, having a train at position 1 means something different as having that train at position 2. In this example it is clear, but a mistake can be made when we talk about an experiment with 2 dice (combination of 2 experiments) and we would not keep the dice results apart (See Leibniz’s mistake).

Adjusting for overcounting

Sometimes we have to adjust for counting too many correct outcomes. One place where this happens is if order does not matter. For instance for selecting two pots of soup to pour into the soup from 8 different pots (where order of soup does not matter): \(8 * 7 = 56\) different possibilities if order mattered. But because any outcome of pot a and pot b is the same as pot b and pot a (ab is ba) we can divide by 2. Note that we also can’t select the same object twice for any of these cases.

The book also shows an example of layering the overcount-correction. First for sampling a team of 2 players out of 4. Then again for sampling 2 teams out of the 4 players (divide by 2 again since knowing one team means knowing the other team).

binomial_coeff <- function(n, k) {
  overcounted <- outcomes_without_replacement(n, k)
  # we overcount by the total ways of picking k since we do not care about the order
  total_ways_picking <- factorial(k)
  out <- overcounted / total_ways_picking

binomial_coeff(100, 2)
## [1] 4950
choose(100, 2)  # R buildin
## [1] 4950

The book has nice examples of how this applies to word permutations and calculating full house in Poker. The book also mentions Newton-Pepys problem which Norvig showed nicely in his Probability notebook.

Story proofs

\({n \choose k} = {n \choose n - k }\) : Think about a committee, choosing who is not going to be on.

\(n{n - 1 \choose k - 1} = k{n \choose k}\) : Picking a team captain. Left is picking the team captain (\(n\)) then picking the rest of the team. Right is first picking the k team members (\({n \choose k}\)) and then selecting the captain afterwards from the team (\(k\)).

\({m + n \choose k} = \sum_{j=0}^{k}{m \choose j}{n \choose k - j}\) : Vandermonde’s. Think of a selection from 2 groups. \(m + n\) Is the size of the combined groups. On the right hand side we sum over each of the posibilities of selecting from one group and the other.

\(\frac{(2n)!}{2^{n} \cdot n!} = (2n-1)(2n-3) \cdots 3 \cdot 1\) : \(n\) Partnerships for \(2n\) people. Left hand side: we look for a random order of the people and then form pairs of every 2 people. Since the order of the pairs does not matter we overcount by \(n!\). And because the order within a pair also does not matter we overcount by \(2^{n}\). Right hand side: How many people person 1 can select from, times how many people then person 2 can select from, etc.

General definition of probability

Few simple axioms

  1. \(P(\emptyset) = 0\) (Empty set)
  2. \(P(S) = 1\) (Complete experiment’s event space)
  3. If \(A_{1}\) and \(A_{2}\) are mutually exclusive events (meaning no intersection at all): \(P(\bigcup_{j=1}^{\infty}A_{j}) = \sum_{j=1}^{\infty}P(A_{j})\)

Some properties for any events \(A\) and \(B\)

  1. \(P(A^{c}) = 1 - P(A)\) - (Classic set complement)
  2. If \(A \subseteq B\) then \(P(A) \leq P(B)\)
  3. \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

This last one is actually taking account of overcounting the part where A and B intersect. If there were 3 event-outcome spaces which overlap. then you would subtract all 3 pairs and add the intersection of all 3 again since you both added and subtracted it 3 times leaving it at 0. This is call inclusion - exclusion. Useful for when events are not disjoint. Full rule:

\(P(\bigcup_{i=1}^{n}A_{i}) = \sum_{i}P(A_{i}) - \sum_{i<j}P(A_{i} \cap A_{j}) + \sum_{i<j<k} P(A_{i} \cap A_{j} \cap A_{k}) - \cdots + (-1)^{n+1}P(A_{1} \cap \cdots \cap A_{n})\)

Matching problem simulation

De Montmort’s Matching problem is a difficult problem described in Example 1.6.4 about calculating the change of winning in a game. In this game you win if the nth card you pull (sample without replacement) is number n (given that you have each card a unique number between 1 and \(n\))

# 52 cards
n <- 100
perform_game <- function(n) {
  ordered_sample <- sample(n)
  correct_vector <- ordered_sample == (1:n)
  return(sum(correct_vector) >= 1)
## [1] TRUE

Now we can repeat this experiment to get some probability estimates.

test_repeats <- 10^4
repeat_vector <- replicate(test_repeats, perform_game(n))
# total number of success divided by total number of tries
sum(repeat_vector) / test_repeats
## [1] 0.6277

Birthday problem simulation

Just like previous problem we can also take a simulation based approach to finding the birthday results

n_people <- 23
perform_test <- function(p) {
  birthday_dates <- sample(365, p, replace=TRUE)
  # tabulates takes a vector and interprets them as indices which it will increment of a (bigger) vector
  birthday_vector <- tabulate(birthday_dates)
  double_or_more_birthdays <- sum(birthday_vector >= 2)
  return(double_or_more_birthdays >= 1)
## [1] FALSE

Again we can repeat the experiment

test_repeats <- 10^4
repeat_vector <- replicate(test_repeats, perform_test(n_people))
sum(repeat_vector) / test_repeats
## [1] 0.5073

And we can compare it to what R’s buildin function says.

## [1] 0.5072972

Question 8

The number of ways to create teams.

Note that the teams do not have an ordering or labeling within the team and also the teams themselves aren’t ordered or labeled.

A: Split 12 people in 3 teams of 5, 5, 2

The solution is to split it into smaller problems and account for overcounting. On an individual problem, using binominal coefficient already accounts for theovercounting that happens when taking without replacement when order does not matter. Then every sub-problem can also be done in another order so we divide by the ways in which we can do those. Lastly, the 3rd (non-)team is not a sub-problem because that team is already set once the other 2 are decided.

total_people <- 12
team_a <- 5
team_b <- 5
(choose(total_people, team_a) * choose((total_people - team_a), team_b)) / factorial(2)
## [1] 8316

A second way of looking at it could be to say we pick each spot individually and then divide by the overcounting of each of the teams and then account for combination of the 2 teams of 5

team_c <- 2
indivual_positions_team <- factorial(total_people)
team_a_overcount <- factorial(team_a)
team_b_overcount <- factorial(team_b)
team_c_overcount <- factorial(team_c)
indivual_positions_team / (team_a_overcount * team_b_overcount * team_c_overcount * factorial(2))
## [1] 8316

We will do the same for B: Split 12 people up in 3 teams of 4

In this case because we can swap all 3 of the teams out for each other we can divide by factorial(3)

total_people <- 12
team_a <- 4
team_b <- 4
(choose(total_people, team_a) * choose((total_people - team_a), team_b)) / factorial(3)
## [1] 5775

Or with method 2: all the team overcounts are the same

team_overcount <- factorial(4)
indivual_positions_team / ((team_overcount ** 3) * factorial(3))
## [1] 5775

Question 9

We have a plane with discrete locations called points. How many paths are there between point \(0,0\) to point \(110, 111\). The only permitted movement is one unit up or one unit right.

A: We need to go (in total) 110 times right and 111 up. In how many ways is this possible (going over a grid)

A way to look at this is to order 110 right arrow characters, in a long list of 110 right arrow + 111 up arrow characters.

choose((110 + 111), 110)
## [1] 1.802617e+65

B: Now we have to go through the point (110, 111) and end up in (210, 211).

Now the approach is to use the multiplication rule. The old amount of ways to get to (110, 111) times the amount of ways to go from (110, 111) to (210, 211)

leg2_up <- 211 - 111
leg2_right <- 210 - 110
leg1_ways = choose((110 + 111), 110)
leg2_ways = choose((leg2_up + leg2_right), leg2_right)
leg1_ways * leg2_ways
## [1] 1.632243e+124

Question 15

A Story proof is needed for \(\sum_{k=0}^{n}{n \choose k} = 2^{n}\)

The left hand side can be interpreted as the number of subsets in n (See Example 1.4.4). The right hand side can be seen as a sum, where for each k it finds the amount of different subsets possible (how many ways to choose k from n, see Definition 1.4.12). Then because we sum over all different possible k values (k < 0 would lead to 0 and k > n would also lead to 0) the can be seen as equal.

Question 16

First the algebraic solution:

\({n \choose k} + {n \choose k-1} =\) \(\frac{n!}{(n-k)!k!} + \frac{n!}{(n-k+1)!(k-1)!} =\) \(\frac{n!}{(n-k)!k!} + \frac{n!k!}{k!(n-k+1)!(k-1)!} =\)

Given that \(\frac{k!}{(k-1)!} = k\)

\(\frac{n!}{(n-k)!k!} + \frac{n!k}{k!(n-k+1)!} =\)

Given that \(\frac{n!}{(n-k)!} = \frac{n!(n-k+1)}{(n-k+1)!}\)

\(\frac{n!(n-k+1)}{k!(n-k+1)!} + \frac{n!k}{k!(n-k+1)!} =\) \(\frac{n!(n+1)}{k!(n-k+1)!} = {n+1 \choose k}\)

Story solution. We introduce a president to the whole set of people. Now we either select this president for our selection, or not (2 distinct possibilities). If we do, there is a single way of picking the president and \({n \choose k-1}\) ways of picking the rest. If we don’t then we are sure that it won’t be the president and we pick the group as if nobody was introduced: \({n \choose k}\)

Question 20

A: A story proof. Our newly sized total group could be ordered by age. We could simply say we pick the oldest for the extra selection needed, and the rest would be picked using the old fashioned method (ie. \({n \choose k}\)). But we could have also picked the n - 1 oldest. Since we have already counted picking the oldest, we know this person is not going in the group, but the person before is going in the group. That means we still need \(k\) but we have 1 less to select from. We can continuously count the ways we can pick the person younger than the previously oldest person who is selected, creating a sum of all possibilities (as long as the group-to-select-from size stays bigger than the selection size, in the other cases no selection is possible)

\({n+1 \choose k+1} = \sum_{j=k}^{n}{j \choose k}\)

B: We can write down taking \(k\) from \(n\) with replacement as \({n + k - 1 \choose k}\) So picking 40 bears from 5 delicious flavours means \({5 + 30 - 1 \choose 30}\).

Now for some \({5 + i - 1 \choose i} = {4 + i \choose i} = {4 + i \choose 4}\) (Because of complement).

\(\sum^{50}_{i=30}{4 + i \choose 4} = \sum^{54}_{i=34}{i \choose 4} = \sum^{54}_{i=4}{i \choose 4} - \sum^{33}_{i=4}{i \choose 4}\)

Now we can fill in the formula from A to get:

choose(55, 5) - choose(34, 5)
## [1] 3200505

Question 24

This question is about a family with 6 children, 3 boys and 3 girls. Since we already know in advance how many boys and girls there are we can reformulate the question into creating permutations of the letters “BBBGGG” where the location in the word means in what order the children are born.

Our total would be positive (only this ordering is valid, any change in ordering would make the answer invalid). Divided by the total number of ways to order the boys and girls.

1 / choose(6, 3)
## [1] 0.05

Another way of calculating is by giving each boy and girl a number and identifying each individually (one of the tips given in the book). Then suddenly having a different order of girls or boys in the good solution means a different result. In this case there are \(3!\) ways to select the first 3 boys (eg. b3b2b1), \(3!\) of selecting the last 3 girls (eg. g2g1g3). But in total there are \(6!\) ways to order 6 individuals.

(factorial(3) * factorial(3)) / factorial(6)
## [1] 0.05

Question 25

Here we have 6 districts and 6 robberies. This feels a lot like rolling a dice 6 times. What is the probability that one number is rolled more than 1 time. This more than 1 time feels very similar to the birthday problem so it is probably a good idea to use the complement here as well. So 1 minus the chance that all the numbers are different. Total ways to order the 6 sides of a die is \(6!\) which are the positive events. Total number of ways to roll 6 dice is \(6^{6}\)

p_all_different <- factorial(6) / 6**6
p_1ormore_same <- 1 - p_all_different
## [1] 0.9845679

Question 28

College has 10 time slots, 3 statistics courses. What is the probability that 2 or more statistics courses overlap. Again we see a 2 or more. We could calculate it for each case but we might as well use the complement again. 1 minus the chance of no courses overlapping.

Like the birthday calculation

p_no_overlap <- outcomes_without_replacement(10, 3) / outcomes_with_replacement(10, 3)
p_atleast1_overlap <- 1 - p_no_overlap
## [1] 0.28
# Or more direct
1 - ((10 * 9 * 8) / (10 ** 3))
## [1] 0.28

Question 29

a.) We need to compare getting 22 versus getting 21 after rolling 4 dice. Ways of getting 21:

  • 6663 \({4 \choose 1}\)
  • 6654 \({4 \choose 2} \cdot {2 \choose 1}\)
  • 6555 \({4 \choose 1}\)

Ways of getting 22:

  • 6664 \({4 \choose 1}\)
  • 6655 \({4 \choose 2}\)

Now the order can be different but the amount of ways to roll for dice A,B,C,D and get 6,6,6,X will be the same. And for getting 6,6,5,5 there are less ways than 6,6,5,4. Therefore there are more ways for 21. Since we need to do it with exactly 4 throws the denominator stays the same. So we know the probability of getting 21 is greater.

b.) We need to compare the change that a 2 letter word is a palindrome versus a 3-letter word. Notice that for an uneven-number-of-letters palindrome, the center letter never matters since it stays the same no matter in which direction you read it. Therefore comparing 2-letter and 3-letter words both come down to making sure the first and last letter are the same. And since they come down to the same probability experiment we already know that the probability will be equal (without doing any calculations).

The calculation would be \(\frac{24}{24 \cdot 24}\) if we just limited ourselves to the alphabet but could be different if other signs and/or capitalization was included.

Question 31

There are \(N\) Elk, and a random sample is taken of \(n\). A second sample of \(m\) is taken. What is the probability that \(k\) of the second sample \(m\) were already tagged.

Since we are looking from the viewpoint of the second sample. The total sample space will be simply the number of ways we can get the second sample: \({N \choose m}\). Then the number of positive cases is the number of ways of getting \(k\) from \(n\) and \(m - k\) of \(N - n\) so \({n \choose k} \cdot {N - n \choose m - k}\)

Question 33

A jar contains \(r\) red balls and \(g\) green balls. A ball is taken randomly and a second ball is taken randomly.

a.) We do not know what color the first ball was. Therefore we did not gain any new information about the chance of getting a green ball. The chance of getting a green ball is the same in our eyes.

b.) We can split the sample space into green balls \(g_{1 \dots n}\) and red balls \(r_{1 \dots m}\). An experiment is taking two balls from either \(g\) or \(r\) but it can’t be the same ball. The total amount of draws is \((n + m) \cdot (n + m - 1)\). Then the amount of possible draws to have the first ball as green and the second anything is \((n) \cdot (n + m - 1)\) and for the second green it’s either the first is green and the second too or first is red and second green \(n(n - 1) + mn\) which is the same as \(n(n + m - 1)\).

Another way of looking at it is seeing there are \(n\) ways of picking the right second ball, and given that the right second ball is chosen, there are \((n + m - 1)\) to select the right first ball. (Remember that the experiments do not have to occur chronologically)

c.) To calculate the amount of red and green balls to have an equal outcome with 16 balls, we need to create a function to calculate the probability.

If we take the case of both are the same color \(n(n-1) + m(m-1)\) and the denominator stays the same as in previous parts of the question. We equate the whole thing to \(1/2\) because it can only be equally likely if both cases have 50% change.

…. Some magic happens ….


\((n-m)^{2} = n + m\)

\((n-m)^{2} = 16\)

\(n-m\) is -4 or 4. So one is 6 and the other is 10 or the other way around.

Question 34

a.) First we need to find the chance of getting a flush (but not a royal flush!). Which means we need to subtract the change of a royal flush from the chance of a normal flush.

All flushes: \(\frac{{13 \choose 5}4}{{52 \choose 5}}\)

Now we subtract the single royal flush which can only be done in one way only per color.

\(\frac{1 \cdot 4}{{52 \choose 5}}\)

so we can write it out as \(\frac{({13 \choose 5} - 1)4}{{52 \choose 5}}\)

((choose(13, 5) - 1) * 4) / (choose(52, 5))
## [1] 0.001979253

(Note that usually you also wouldn’t count a straight flush as a normal flush)

b.) Getting a two pair.

Choose the two ranks \({13 \choose 2}\), choose the suits \({4 \choose 2}{4 \choose 2}\), and choose the other card \((52 - 8)\).

enumer <- choose(13, 2) * choose(4, 2) * choose(4, 2) * (52 - 8)
enumer / (choose(52, 5))
## [1] 0.04753902

Or we have 13 cards to choose first rank, then 12 ways to choose the second rank. Same calculations for the suit and the last card. Only in this case we overcount since having rank A and rank B is the same as rank B and rank A. So we devide by 2 and get the same result.

enumer <- (13 * choose(4, 2) * 12 * choose(4, 2) * (52 - 8))
(enumer / 2) / choose(52, 5)
## [1] 0.04753902

Question 42

Show that the probability of a nonrepeatword which uses all 26 letters is very close to \(1/e\).

There are \(26!\) number of ways to order all the letters, therefore that many full nonrepeatwords. In general there are \({26 \choose k} k!\) ways of getting all the \(k\) words so the complete probability calculation would look like \(\frac{26!}{\sum^{26}_{k=1}{26 \choose k} k!}\).

The denominator is equal to the first terms of the Taylor series therefore \(1/e\) is a very close approximation.

Question 50

For calculating the probability of having one suit miss from a 13-card hand, we need to calculate the probability for each suit missing, and then applying the inclusion-exclusion rule.

First for the high level vier. We have 4 suits, therefore 4 times the probabiliy of a single suit missing. Then \({4 \choose 2} = 6\) ways of combining those 4 suits, so 6 times minus the probability of two suits missing at the same time. Then \({4 \choose 3} = 4\) ways of combining 3 suits which will be added to the probability. And finally you could think of all 4 suits but there is no way to get the 13 cards then so this chance is 0.

cards_left <- function(suits) {
  return (52 - (suits * 13))
denom <- choose(52, 13)
single_enum <- choose(cards_left(1), 13)
p_single <- single_enum / denom
double_enum <- choose(cards_left(2), 13)
p_double <- double_enum / denom
triple_enum <- choose(cards_left(3), 13)  # Really just 1 since we have to take the last suit
p_triple <- triple_enum / denom
# if we wanted we could add the p_quadruple but the enum would be 0

(4*p_single) - (6*p_double) + (4*p_triple)
## [1] 0.05106552

Question 54

This question is about class selection. We have in total 30 different lecture slots in a week, every day 6, 5 days a week. If we randomly select 7 slots from the 30 in total, what are the chances we have one for every day.

First the direct counting method:

nr_ways_select_lessons <- choose(6 * 5, 7)

# Select a single day to have 3 courses
ways_select_triple_day <- choose(5, 1)
ways_select_triple <- choose(6, 3)
ways_select_single <- choose(6, 1)
triple_day_method <- ways_select_triple_day * ways_select_triple * (ways_select_single^4)

# Select 2 days to have 2 courses
ways_select_double_day <- choose(5, 2)
ways_to_select_double <- choose(6, 2)
double_day_method <- ways_select_double_day * (ways_to_select_double^2) *  (ways_select_single^3)

(triple_day_method + double_day_method) / nr_ways_select_lessons
## [1] 0.3023873

Now for the inclusion / exclusion method:

We can calculate 1 minus the mount of ways we can have 1 day without any lessons.

So, if \(P(D_{i})\) is the probablity of having no class on day \(i\). Then we need to calculate \(P(\bigcup^{5}_{k=1}D_{k})\). To which the inclusion-exclusion rule can be applied.

prob_days_free <- function(days) {
  nr_ways_select_lessons <- choose(6 * 5, 7)
  nr_ways_with_days_free <- choose(6 * (5 - days), 7)
  return (nr_ways_with_days_free / nr_ways_select_lessons)
ways_select_single_day <- choose(5, 1)
ie_first_term <- ways_select_single_day * prob_days_free(1) # for every day, inclusion term

ways_select_two_days <- choose(5, 2)
ie_second_term <- ways_select_two_days * prob_days_free(2)

ways_select_three_days <- choose(5, 3)
ie_third_term <- ways_select_three_days * prob_days_free(3)

# Applying inclusion exclusion
prob_some_gap_day <- ie_first_term - ie_second_term + ie_third_term
1 - prob_some_gap_day
## [1] 0.3023873

Question 59

A story in which there are 100 passangers with a 100 seats. The first person might choose to sit on their seat, or pick a random one. All other people will sit on their seat, except when someone is already sitting there, then they will pick a random seat. Give the probability that the last passenger can sit in their seat.

What can happen

  • The first person can sit in their own seat and everyone goes to their respective seats
  • The first person can switch to seat 2 - 99 (\(x\)) causing something else to happen
    • person \(x\) goes to seat 1 and after person \(x\) everything continues as if nothing happened
    • person \(x\) goes to a random other seat in \(x\) - 99, causing this loop to continue
    • person \(x\) goes to 100, causing the last person having to select seat 1

As can be seen, the only two solutions are really 100 and 1, all others are selected already if they are available. For these 98 other people both seats are equally likely (100 and 1 both are not their seat) and for the first person who takes a seat at random they are also both equally likely. As seen in the informal step through description, it always ends with a person either picking the first or the last seat. Therefore by symmetry the chance of getting 100 or 1 for the last passenger is \(\frac{1}{2}\).

This answer can be checked by making a simulation.

create_seating <- function() {
  all_seats <- replicate(100, 0)
  first_seat <- sample(1:100, 1)
  all_seats[first_seat] <- 1
  for (p in 2:99) {
    if (all_seats[p] == 0) {
      all_seats[p] <- p
    } else {
      random_seat <- sample(which(all_seats == 0), 1)
      all_seats[random_seat] <- p
test_sample <- function() {
  seating <- create_seating()
  return(seating[100] == 0)

test_repeats <- 10^4
repeat_vector <- replicate(test_repeats, test_sample())
sum(repeat_vector) / test_repeats
## [1] 0.4878