I decided to make it my goal this summer to try to link my computer science capabilities to my math knowledge. I stumbled upon a website in which someone tried to calculate the percentages of landing on spots of the monopoly board with pure mathematics. I thought to myself, “I wonder how correct this data is, it doesn’t seem they tested it out.” So I set off to write a computer program to simulate the game itself and output the data to the comma separated value file (CSV). Here is some sample output of my computer program (Figure 1):
This data can be opened by Microsoft Excel with ease to look like this (Figure 2):
This then allows for all sorts of analysis of the data, and allows me to simulate a billion rolls of the dice!
The first and undoubtedly most important part of this project is to create the code that simulates a game of monopoly. Luckily, one simple fact about monopoly allows simulating it much, much easier: players cannot affect the position of other players. That means that only one player has to be simulated, removing big variables from the equation that exist in the simulation of many other games.
The program is designed to track the following data throughout the simulation: the number of rolls of the dice, number of moves (rolling doubles and going again counts as one move), position on the board (can be from 0 to 39, as there are 40 spaces on the board), number of doubles rolled, number of times go has been passed, total distance moved, the values of both dice, whether or not the player is in jail, and the number of times each of the forty spaces have been landed on.
It was coded in Java using Eclipse on Mac OSX as seen below (Figure 3):
The source code in its entirety can be found online on GitHub at
The final result ended up being 1,741 lines: 1,189 of code, 307 of comments, and 237 blank. There were several features that the skeleton was laid out for that were actually not completed. I designed the program to be extensible so it could be added later. The main feature that the groundwork is laid out for but is not coded is a visual representation of the board. It is set up in the same way that the data-collecting code is set up. It would output an image of a monopoly board with information on the current statistics (whereas the data-collecting code outputs a line of comma-separated values with information on the current statistics).
The program takes three important inputs from the user:
- Number of rolls to simulate.
- Save interval (in rolls)
- Location to save CSV (Comma-separated value) file.
The first two shape the data of the CSV file. If the number of rolls to simulate was 500 and the save interval was 10, there would be a total of 50 lines of data saved, which leads to 50 rows of data to work with in Microsoft Excel. I used this to simulate one billion rolls of the dice, and set the save interval to one billion as well. This only outputs one line of data, but this line of data can be used to make interesting graphs. I ran another that simulated one hundred thousand rolls with a save interval of one hundred, which output one hundred lines of data which I used to graph with the number of rolls as the independent variable (logically similar to graphing with time on the x-axis). For the last and most complex type of data I wanted, another short program was written to extend off the working program. It runs the code of the original program a certain number of times, only saving the last line of data. For example, I ran one million rolls one thousand times. The collected data looked like so (Figure 4):
Generating this required another short program (see Figure 5) which made use of my previous program, allowing it to be done in only about one hundred lines, making use of a loop within a loop.
Now that all the data has been generated, it’s time to analyze it with Excel.
The analysis is where the main goals of the program are reached. The program can only do so much without reinventing the wheel. Instead of trying to analyze data and/or create graphs from within the java program, it makes much more sense to output data that can be read by a program that specializes in just that. Microsoft Excel is the industry standard for a vast majority of data-analyzing tasks.
The first and most obvious thing to analyze is how certain spaces were landed on more or less often than other spaces. This can be represented accurately using a single line of data after a billion rolls. Dividing the number of times each space has been landed on by 1 billion yields a percentage representative of how often that space was landed on. A good way to tell if a space is more or less popular is by subtracting the average from the percentages. Since there are forty spaces on the board, the average is 1/40 or 2.5%. Some interesting trends are revealed by the graph (Figure 6).
The most obvious outliers are jail, chance, and community chest spaces. There are many reasons that jail by far has the most hits.
- One of the forty spaces on the board sends the players directly to jail.
- One of the sixteen chance and one of the sixteen community chest cards send the player to jail.
- Rolling three doubles sends the player to jail.
- If the player does not roll doubles to get out of jail, it counts as another turn spent in jail.
Some less obvious ones are chance and community chest. The reason chance cards are seldom landed on is because eight of the sixteen chance cards actually move the player to another space, and the latter space is counted as the space landed on. The same goes for community chest, except only two of the sixteen cards actually change the location of the player.
“Go To Jail” is landed on 0% of the time because in reality, Jail is the space that is “landed” by the end of the roll.
The trends when it comes to other spaces are clearly shown in the graph: Mediterranean and Baltic Avenue are very unpopular while Illinois is very popular. The averages for the different types of space can be seen in Figure 7 below:
While Illinois is the most popular space, the orange properties, on average, are landed on considerably more than the red ones. The brown properties are far below those popular properties, almost 0.5% below the 2.5% average. It’s easy to tell by this ranking that certain properties are much more valuable than others.
The blue property category, which includes the infamous Boardwalk, is actually in the bottom three. This probably comes as a surprise to a lot of players because it is a very contested property as a result of its high payouts once the game has been going on for a while. It would be lower, actually. However, one of the chance cards, “Take a walk on the boardwalk”, advances the player to the boardwalk space, making it worth quite a bit more than its neighbor, Park Place.
This was only one test, however. I use the auxiliary program to collect a thousand trials of games with a million rolls. With this data, a box plot can be made to represent the minimums, maximums, first, second, and third quartiles for each property (Figure 8).
This proves beyond a reasonable doubt that Illinois is nearly always the most popular space to land on, only possibly being beaten by Go, as the maximum of Go is higher than the minimum of Illinois. It also shows that Park Place could sometimes be the single least popular property!
A lot of other small, interesting graphs can be analyzed with the miscellaneous data that the program keeps track of. A graph can be made to show the randomness of the dice by comparing it with expected, mathematically-calculated data (Figure 9).
Small tests like these confirm that the simulation is using good numbers that don’t stray from mathematical probabilities.
Another test, which failed several times during the creation of the computer program, is making sure that the average of the hit percentages from each space is equal to 2.5%. If it is lower, some hits are not being accounted for, and if it is higher, some are being counted twice. In the beginning, the percentage was lower, which showed that there were certain times where the computer program was not counting the hits. It turned out that the bug had to do with rolling doubles, and was eventually confirmed to be fixed when the average was exactly 2.5%.
The average of all these numbers (Figure 10): 0.025, 1/40, or 2.5%.
So why did the data come out the way it did? There is one extreme, obvious spike in the graphs. A spike so great that it is cut off or not included in all the graphs.
Jail is by far the most popular space on the board. As discussed before, there are many ways that a player can end up in jail.
When rolling two dice, the most common sum is a seven. The space seven spaces after jail is community chest (between the orange spaces) and seven spaces after that community chest is none other than Illinois!
The reason Illinois is higher than the orange spaces, though, is because there is also a chance card that advances the player to Illinois, instantly spiking up the chances of landing there. However, the oranges properties average a higher hit percentage than the red properties. This is because they are six, eight, and nine spaces away from jail, all of which are close to seven (the rolls are more and more common as they get closer and closer to seven).
However, none of the chance or community chest cards advance the player to an orange space, which is why none of them hold the title of most popular space. On the other hand, there is both a chance and community chest card that advances the player to Go, which is why, excluding Jail, it is the second most popular space to land on.
Bottom line? The saying applies in Monopoly as much as it does in real life:
Location, location, location.
(although in real life the value of a property usually goes up because of beaches, not jails…)