Solving the Rainbow

The rainbow is undoubtedly one of the most spectacular light shows on earth. It’s hard to miss, cutting across the sky as an arc of seemingly unnatural light. Although a bit more complex than other optical phenomena such as sunrises and sunsets, the common rainbow is still a simple little thing.

What makes a rainbow?

A rainbow is a result of two types of events, refraction and reflection. Refraction occurs when a propagating wave (light in this case) moves from one medium to another. Most people have probably experienced refraction in one form or another. If you place a straw in a glass of water, it appears bent due to some light refracting before it reaches your eyes. In the case of the rainbow, refraction occurs when sunlight interacts with drops of water in the air.

Rainbow rays refracting in a raindrop

This of course does not happen before our very eyes, and raindrops in the air aren’t melon-sized. Instead, it occurs high up in the sky with very small droplets of water.

The sun relative to the raindrop

Don’t be fooled by the diagram though. I assure you that looking up at the sky when raindrops are present will not redirect the full strength of the sun into your eyes. In addition to light energy being absorbed by the atmosphere, the first diagram only pictured the rays responsible for the rainbow phenomenon we humans see down here. In actuality, some light waves reflect and others continue through. Most of these rays do not generate anything that is visible from where we stand; only the first one pictured hits our eyes in the form of the typical rainbow.

Rainbow reflection events

What about the different colors?

When we consider a source of light that produces many different wavelengths of light, such as the sun, a single beam of light is split up into its constituent wavelengths due to dispersion. Meaning when light passes between air and water, the refraction angle is different depending on its color.

Optical dispersion

Breaking it down

To get a good understanding of how refraction actually works, we need to understand Snell’s law. When waves change mediums in which their speed differs, Snell’s law relates the angle of incidence (the angle at which the light strikes the rainbow compared to the normal) to the angle of refraction  (the angle the light goes after refraction, still compared to the normal). In the case of a raindrop, we have a light wave moving from air to water, and the normal line is really like an extended radius since the normal is defined as being perpendicular to the tangent.

First refraction event

The law tells us that the ratio of the sines of the two angles (i, the angle of incidence, and r, the angle of refraction) is equal to the ratio of the phase velocities, v_{1} and v_{2}, of light in the respective media:

\frac{\sin i}{\sin r} = \frac{v_1}{v_2}

Instead of using the velocities of light as our variables, however, we’d like to describe this relationship using the refractive indices of the mediums: the ratio of the speed of light in a vacuum, c, to the speed of light in the specified medium, v

n = \frac{c}{v}

With a little math magic we see that the ratio of the two sines is equal to the inverse of the ratio of the two refractive indices.

\frac{\sin i}{\sin r} = \frac{v_1}{v_2} = \frac{\frac{c}{v_2}}{\frac{c}{v_1}} = \frac{n_2}{n_1}

We’d like to solve for r , so we rewrite it as:

\sin r = \frac{n_{1}}{n_{2}}\sin i

Luckily, for our purposes, we can make some simplifications before we go further. The refractive index of light in air at standard temperature and pressure is about 1.00029, but when it comes to refractive indices, we’ll find out later that we only have sufficient data for four significant figures, so we can just use 1.000 instead. We can also rename our refractive index of light in water as n_{w} :

\sin r = \frac{1}{n_{w}}\sin i

Solving for r, we find that

r = \sin^{-1} \left(\frac{1}{n_{w}}\sin i\right)

Where the colors separate

The key to the separation of colors is the phenomenon of optical dispersion, which states that the phase velocity of a wave is dependent on its frequency. In light-specific terms, dispersion tells us that the phase velocity v of a light wave is dependent on its color \lambda, which means our refraction angle r is ultimately a function of color!

We can test out this relationship with an example graph. Plotting the refraction angle r as a function of n_{w} between n_{w} = 1 and n_{w}= 2  for a 45° angle of incidence i we see the refraction angle change:


In this particular case, we see that the refraction angle r changes an entire 0.42 radians, or about 24°, effectively as a function of color. However, we haven’t truly related color to refraction angle, we’ve only related refractive index to refraction angle. We haven’t actually found a way to transitively relate refraction angle, r, to color, \lambda. We know that the relationship exists, but what are the details?

In other words, we know that

r\left(n, i\right) = \sin^{-1}\left( \frac{1}{n_{w}}\sin i \right)

but we need to understand n_{w}\left(\lambda\right) because we want to know r\left(\lambda, i\right)

r\left(\lambda, i\right) = \sin^{-1}\left( \frac{1}{n_{w}\left(\lambda\right)}\sin i \right)

For now, we are just worried about wavelength. We won’t worry about angle of incidence i until the end.

Modeling n as a function of λ

Finding a relationship between the refractive index and the wavelength of light for many optical materials is a topic that has been of interest to scientists for hundreds of years. Augustin-Louis Cauchy defined an empirical equation in 1836 relating the two:

n\left(\lambda\right) = B + \frac{C}{\lambda^{2}} + \frac{D}{\lambda^{4}} + \ldots

where B, C, and so on were empirically derived coefficients for different media. However, this model was only good for representing the visible spectrum. It became more inaccurate for infrared and ultraviolet waves. Wilhelm Sellmeier improved upon the errors of this equation in 1871, introducing a new equation more components and coefficients:


For our purposes, however, we don’t need a complex model to accurately approximate indices of refraction. We can fit experimental data instead. The light wavelengths that concern the rainbow are part of the visible spectrum, a range of only about about 300 nanometers. RefractiveIndex.INFO offers a great, public domain database for refractive indices of all sorts. Using the database, I easily found experimental data from Optical Constants of Water in the 200-nm to 200-μm Wavelength Region by George M. Hale and Marvin R. Querry. The curve seems rather daunting at first look:


But this experiment is for wavelengths of 0.2–200 micrometers. We only plan on modeling 0.390 to 0.700 µm, so we don’t have to try and model an equation for this funky graph.

Instead, the relevant data points could be downloaded as a raw comma-seperated value file:

Wavelength (nm) Refractive Index
375 1.341
400 1.339
425 1.338
450 1.337
475 1.336
500 1.335
525 1.334
550 1.333
575 1.333
600 1.332
625 1.332
650 1.331
675 1.331
700 1.331

These numbers all are noticeably very close. In fact, because there are only 4 significant digits, the refractive index from 650 to 700 nanometers does not “change” at all. However, we only need a reasonable estimate of how the index changes for this simulation, so we don’t need to worry much about misrepresenting the precision we actually know. Using a second degree polynomial fit, a good function to approximately model our n_{w}\left(\lambda\right) in the range of 390–700 nanometers would be:

n(\lambda, i) = \left(8.1318681 \times 10^{-8} \right)\lambda^{2}-\left (1.1757142\times 10^{-4}\right)\lambda + 1.3733752

Finally we can properly make use of our function for r:

r\left(\lambda, i\right) = \sin^{-1}\left( \frac{1}{n_{w}\left(\lambda\right)}\sin i \right)

Relating r and i to the geometry of the event

Ultimately, the rainbow is going to be comprised of a countless number of light waves of different wavelengths \lambda hitting the raindrop at different angles of incidence i. The outputted rays of light will be quantified as rays of wavelength/color \lambda at angle D.

Drawing auxiliary lines onto our original diagram will help solve this geometry problem:


We already know much about the situation from a small amount of information. We also already know r as a function of i and \lambda. We have enough information geometrically to represent every angle on the diagram, so we can fill out the important ones:


  1. Because opposite angles are equal, r + \left(i-r\right) has to be equal to i.
  2. Because all radii in a circle are congruent, we know that the triangles created by two of them are isosceles.
  3. The first half of the reflection angle is r because the two base angles of an isosceles triangle are equal.
  4. The second half of the reflection angle is r because the definition of a specular reflection is that the incident ray and the reflected ray make the same angle with respect to the normal (the radius).
  5. Again, the refraction angle of the second refraction event is r because it must be equal to the other base angle of the isosceles triangle.
  6. Ultimately, because this is the same wavelength ray changing mediums in the same way, it is undergoing the same refraction event, so therefore the other angle must be i - r like the first one.

After solving for these angles, the total deflection angle can simply be found by adding up 3 three net deflections, as animated below:


Simplified, the total deflection angle D as a function of i and r would be

D\left(i, r\right) = 180^{\circ} + 2i - 4r

Since we already know the reflection angle r as a function of i and \lambda, we can substitute to find D as a function of i and \lambda.

D\left(i, \lambda\right) = 180^{\circ} + 2i - 4sin^{-1}\left(\frac{1}{n_{w}\left(\lambda\right)}\sin i\right)

Analyzing this angle as a function of two variables

So far, we’ve only seen these events while ignoring the variability of the angle of incidence, i. In reality, rays will strike the raindrop from all angles between –90° and 90°. However, for the angles of incidence less than zero (on the bottom half), the ray will end up refracting another way, basically doing the same thing but mirrored over the x axis. For now, we’ll only worry about the angles from 0° to 90°. Let’s see how i affects the graph of D when n_{w} is held constant at 1.333:

As you go over all the angles of incidence on the raindrop, there is a certain minimum output angle. Because the graph of D flattens our near this minimum point, the rays cluster up and form a caustic. So in this case, when n_{w}\left(\lambda\right) =  1.333, there will be a caustic of rays of wavelength \lambda at angle 137.9°. For different wavelengths, there are different n_{w}\left(\lambda\right) values, different local minima, and different deflection angles D where the specific wavelength is most clustered.

Some examples using some n_{w}\left(\lambda\right) values from our empirically-derived function.

Wavelength (nm) Refractive Index Deflection Angle
390 1.33989098927173 138.91°
500 1.334919231025 138.20°
600 1.3321071431079998 137.79°
700 1.330921428817 137.62°

Simulating the rainbow

After doing a few more calculations for things such as coordinate points on the raindrop, the slope of the rays, and the color perceived by the human eye, we can run a simulation to test out this multivariate situation. This simulation will:

  1. Have rays of sunlight strike the raindrop in the northern hemisphere.
  2. Have each ray be made up of many different wavelengths of light equally spread throughout the visible spectrum.

I created the simulation using my own JavaSim framework. For more info on the simulation itself or to run it on your own computer, see the Raindrop Rainbow main page.

The simulation yields a beautiful cluster of rays:

Simulation1Most importantly, at the bottom you can see where the small differences in deflection angle start to form different caustics for different wavelengths. The rainbow truly comes to life. A stretched image of some of the bottommost pixels gives us a sample of the rainbow we’ve created:


Locating the newly-defined rainbow

These reflection angles for the minimum and maximum visible wavelengths tell us where the rainbow begins and where it ends! To locate the rainbow now, we need to make a small modification to our old diagram:


The sun is first of all, not right above you. In this diagram, the sun would actually be setting on the horizon because the rays in question are all completely parallel to the ground. We need to consider the position of the sun in the sky and how it changes where the rainbow ends up. If we redraw the diagram with a variable angle of depression and now include our shadow, we get what seems to be a more complex situation:


Therefore the true angle of the rainbow also depends on the height of the observer (specifically, the location of the eyes). Fortunately, since the rays that shine down from the sun to strike the raindrop are parallel to the rays that strike down on your back to make your shadow, we can view the problem from a different angle, literally:


This diagram is the same as before, except instead of being relative to the ground, it is relative to the line from your eyes to the head of your shadow is (or more specifically, where your shadow’s “eyes” are). This point on the shadow directly opposite the sun is consequently named the antisolar point. Some small geometry fill ins reveal that the angle made between here and the ray of angle D is the supplement of D. For the two ends of the visible spectrum, we wish to calculate these supplements.

Wavelength (nm) Deflection Angle Angle From Antisolar Point
390 138.91° 41.09°
700 137.62° 42.38°

Before we walk outside and look up 41.5° and wonder where the rainbow is, we need to remember the preconditions. First and foremost, water droplets must be present, whether from rain, mist, or a garden hose. Also, the sun cannot be high in the sky. It must be either the beginning or the end of the day, with few exceptions. In an area of completely flat ground, during the middle of the day, let’s just say you would need some… underground raindrops in order to see a rainbow: Diagram11So the full rainbow-hunter’s checklist would go something like:

  1. Choose a rainy, misty day or environment.
  2. Choose a time early or late in the day (late tends to have better weather).
  3. Locate the head of your shadow on the ground.
  4. Look up about 41°.

With any meteorological luck, you should see the world-renowned bow!

Extras: variations and unusual cases

Of course, when it comes to something as funky as a rainbow, there will be some more funky things that come along with it. Behold!

The “underground” rainbow

When the sun is too high in the sky, the diagrams tell us that raindrops would theoretically have to be underground for us to actually be able to see the phenomenon, but from the edge of a cliff, we can effectively make this happen:

Diagram12A great example of the “underground” rainbow taken midday at the Grand Canyon:


The secondary rainbow

Commonly referred to as a double rainbow, a secondary rainbow is not all that different from a typical rainbow. It is a result of the same processes with two key differences: the rays in question strike the southern hemisphere of the raindrops and there are two reflection events instead of one. The rays striking in the southern hemisphere cause the color order to be reversed and the extra reflection event causes more light to be lost.

Diagram13This sister phenomenon is more difficult to see due to another event in which more light is lost. The double rainbow is often seen in conjunction with already-spectaular rainbows because it already needs good conditions to be clearly visible.

An example of a secondary rainbow taken by myself in Stone Harbor, New Jersey on an August afternoon after a rainstorm:



Special thanks to Professor Halpin-Healy for a great lecture on rainbow physics that covered most of the content in this post and inspired me to learn and experiment more.

Thanks to Marvelous Marv for providing the great Grand Canyon rainbow photo and his wife for taking it.


“Cauchy’s Equation.” Wikipedia. Wikimedia Foundation, 25 Aug. 2014. Web. 29 Aug. 2015.


Halpin-Healy, Timothy. “Rainbow Physics.” Barnard College, New York City. 6 Aug. 2015. Lecture.


Nave, Carl R. “Rainbow Concepts.” HyperPhysics. Georgia State University, n.d. Web. 29 Aug. 2015.


“Plot.” Wolfram Alpha. Wolfram, n.d. Web. 29 Aug. 2015. <>.


“Plot.” Wolfram Alpha. Wolfram, n.d. Web. 29 Aug. 2015. <*sin%28i%29%29+


Polyanskiy, Mikhail. “Optical Constants of H2O, D2O (Water, Heavy Water, Ice).” RefractiveIndex.INFO. Mikhail Polyanskiy, n.d. Web. 29 Aug. 2015.


“Sellmeier Equation.” Wikipedia. Wikimedia Foundation, 16 Jan. 2015. Web. 29 Aug. 2015.

Simulating Monopoly For Statistics


I decided to make it my goal this summer to try to link my computer science capabilities to my math knowledge. I stumbled upon a website in which someone tried to calculate the percentages of landing on spots of the monopoly board with pure mathematics. I thought to myself, “I wonder how correct this data is, it doesn’t seem they tested it out.” So I set off to write a computer program to simulate the game itself and output the data to the comma separated value file (CSV). Here is some sample output of my computer program (Figure 1):

Sample CSV file opened as raw text in Notepad.
Figure 1. Sample CSV file opened as raw text in Notepad.

This data can be opened by Microsoft Excel with ease to look like this (Figure 2):

Sample CSV file opened in Microsoft Excel.
Figure 2. Sample CSV file opened in Microsoft Excel.

This then allows for all sorts of analysis of the data, and allows me to simulate a billion rolls of the dice!


The first and undoubtedly most important part of this project is to create the code that simulates a game of monopoly. Luckily, one simple fact about monopoly allows simulating it much, much easier: players cannot affect the position of other players. That means that only one player has to be simulated, removing big variables from the equation that exist in the simulation of many other games.

The program is designed to track the following data throughout the simulation: the number of rolls of the dice, number of moves (rolling doubles and going again counts as one move), position on the board (can be from 0 to 39, as there are 40 spaces on the board), number of doubles rolled, number of times go has been passed, total distance moved, the values of both dice, whether or not the player is in jail, and the number of times each of the forty spaces have been landed on.

It was coded in Java using Eclipse on Mac OSX as seen below (Figure 3):

Monopoly Simulator project opened in Eclipse.
Figure 3. Monopoly Simulator project opened in Eclipse.

The source code in its entirety can be found online on GitHub at

The final result ended up being 1,741 lines: 1,189 of code, 307 of comments, and 237 blank. There were several features that the skeleton was laid out for that were actually not completed. I designed the program to be extensible so it could be added later. The main feature that the groundwork is laid out for but is not coded is a visual representation of the board. It is set up in the same way that the data-collecting code is set up. It would output an image of a monopoly board with information on the current statistics (whereas the data-collecting code outputs a line of comma-separated values with information on the current statistics).

The program takes three important inputs from the user:

  1. Number of rolls to simulate.
  2. Save interval (in rolls)
  3. Location to save CSV (Comma-separated value) file.

The first two shape the data of the CSV file. If the number of rolls to simulate was 500 and the save interval was 10, there would be a total of 50 lines of data saved, which leads to 50 rows of data to work with in Microsoft Excel. I used this to simulate one billion rolls of the dice, and set the save interval to one billion as well. This only outputs one line of data, but this line of data can be used to make interesting graphs. I ran another that simulated one hundred thousand rolls with a save interval of one hundred, which output one hundred lines of data which I used to graph with the number of rolls as the independent variable (logically similar to graphing with time on the x-axis). For the last and most complex type of data I wanted, another short program was written to extend off the working program. It runs the code of the original program a certain number of times, only saving the last line of data. For example, I ran one million rolls one thousand times. The collected data looked like so (Figure 4):

Sample data opened in Microsoft Excel. (Note: some have 1,000,001 or 1,000,002 rolls because they rolled one or two doubles in their final turn)
Figure 4. Sample data opened in Microsoft Excel.
(Note: some have 1,000,001 or 1,000,002 rolls because they rolled one or two doubles in their final turn)

Generating this required another short program (see Figure 5) which made use of my previous program, allowing it to be done in only about one hundred lines, making use of a loop within a loop.

Code snippet from the auxiliary program.
Figure 5. Code snippet from the auxiliary program.

Now that all the data has been generated, it’s time to analyze it with Excel.


The analysis is where the main goals of the program are reached. The program can only do so much without reinventing the wheel. Instead of trying to analyze data and/or create graphs from within the java program, it makes much more sense to output data that can be read by a program that specializes in just that. Microsoft Excel is the industry standard for a vast majority of data-analyzing tasks.

The first and most obvious thing to analyze is how certain spaces were landed on more or less often than other spaces. This can be represented accurately using a single line of data after a billion rolls. Dividing the number of times each space has been landed on by 1 billion yields a percentage representative of how often that space was landed on. A good way to tell if a space is more or less popular is by subtracting the average from the percentages. Since there are forty spaces on the board, the average is 1/40 or 2.5%. Some interesting trends are revealed by the graph (Figure 6).

Deviation from average (2.5%) for all forty spaces.
Figure 6. Deviation from average (2.5%) for all forty spaces.

The most obvious outliers are jail, chance, and community chest spaces. There are many reasons that jail by far has the most hits.

  1. One of the forty spaces on the board sends the players directly to jail.
  2. One of the sixteen chance and one of the sixteen community chest cards send the player to jail.
  3. Rolling three doubles sends the player to jail.
  4. If the player does not roll doubles to get out of jail, it counts as another turn spent in jail.

Some less obvious ones are chance and community chest. The reason chance cards are seldom landed on is because eight of the sixteen chance cards actually move the player to another space, and the latter space is counted as the space landed on. The same goes for community chest, except only two of the sixteen cards actually change the location of the player.

“Go To Jail” is landed on 0% of the time because in reality, Jail is the space that is “landed” by the end of the roll.

The trends when it comes to other spaces are clearly shown in the graph: Mediterranean and Baltic Avenue are very unpopular while Illinois is very popular. The averages for the different types of space can be seen in Figure 7 below:

Average deviation from 2.5% for several categories.
Figure 7. Average deviation from 2.5% for several categories.

While Illinois is the most popular space, the orange properties, on average, are landed on considerably more than the red ones. The brown properties are far below those popular properties, almost 0.5% below the 2.5% average. It’s easy to tell by this ranking that certain properties are much more valuable than others.

The blue property category, which includes the infamous Boardwalk, is actually in the bottom three. This probably comes as a surprise to a lot of players because it is a very contested property as a result of its high payouts once the game has been going on for a while. It would be lower, actually. However, one of the chance cards, “Take a walk on the boardwalk”, advances the player to the boardwalk space, making it worth quite a bit more than its neighbor, Park Place.

This was only one test, however. I use the auxiliary program to collect a thousand trials of games with a million rolls. With this data, a box plot can be made to represent the minimums, maximums, first, second, and third quartiles for each property (Figure 8).

Box plot using 1,000 games of 1,000,000 rolls.
Figure 8. Box plot using 1,000 games of 1,000,000 rolls.

This proves beyond a reasonable doubt that Illinois is nearly always the most popular space to land on, only possibly being beaten by Go, as the maximum of Go is higher than the minimum of Illinois. It also shows that Park Place could sometimes be the single least popular property!

A lot of other small, interesting graphs can be analyzed with the miscellaneous data that the program keeps track of. A graph can be made to show the randomness of the dice by comparing it with expected, mathematically-calculated data (Figure 9).

Percentage of rolls that are doubles after X rolls compared to the mathematically expected 16.666%.
Figure 9. Percentage of rolls that are doubles after X rolls compared to the mathematically expected 16.666%.

Small tests like these confirm that the simulation is using good numbers that don’t stray from mathematical probabilities.

Another test, which failed several times during the creation of the computer program, is making sure that the average of the hit percentages from each space is equal to 2.5%. If it is lower, some hits are not being accounted for, and if it is higher, some are being counted twice. In the beginning, the percentage was lower, which showed that there were certain times where the computer program was not counting the hits. It turned out that the bug had to do with rolling doubles, and was eventually confirmed to be fixed when the average was exactly 2.5%.

The average of all these numbers (Figure 10): 0.025, 1/40, or 2.5%.

Hit percentages for 40 spaces which averages out to 2.5%
Figure 10. Hit percentages for 40 spaces which averages out to 2.5%


So why did the data come out the way it did? There is one extreme, obvious spike in the graphs. A spike so great that it is cut off or not included in all the graphs.


Jail is by far the most popular space on the board. As discussed before, there are many ways that a player can end up in jail.

When rolling two dice, the most common sum is a seven. The space seven spaces after jail is community chest (between the orange spaces) and seven spaces after that community chest is none other than Illinois!


The reason Illinois is higher than the orange spaces, though, is because there is also a chance card that advances the player to Illinois, instantly spiking up the chances of landing there. However, the oranges properties average a higher hit percentage than the red properties. This is because they are six, eight, and nine spaces away from jail, all of which are close to seven (the rolls are more and more common as they get closer and closer to seven).

However, none of the chance or community chest cards advance the player to an orange space, which is why none of them hold the title of most popular space. On the other hand, there is both a chance and community chest card that advances the player to Go, which is why, excluding Jail, it is the second most popular space to land on.

Bottom line? The saying applies in Monopoly as much as it does in real life:

Location, location, location.
(although in real life the value of a property usually goes up because of beaches, not jails…)