Parrondo's Paradox, or how to win by randomly losing

Gambling enjoys a negative reputation, which is perhaps deserved, because the games gamblers play are designed in such a way that the chances of winning are just slightly smaller than the chances of losing. This makes gambilng a losing activity in the long run. In what follows, I am going to describe two betting games which are both losing, however a strategy of randomly playing these losing games turns out to be very interesting indeed.

Losing Game A

The first example of a losing game has the probability of winning p = 1/2 - e, where e is a very small number. For illustration, let's assume that we make 1000 successive bets getting $1 if we win, and losing $1 otherwise. The simulation below runs our game 100 times, and shows that on average we expect to lose about $15 (for e = 0.005).

import random
from __future__ import division

e = 0.005

def game_A(m):
    return m + 1 if random.random() < (1/2 - e) else m - 1

def simulate(n_runs, n_bets, game_fn):
    runs = np.zeros((n_runs, n_bets))
    for i in range(n_runs):
        money = []
        m = 0
        for _ in range(n_bets):
            m = game_fn(m)
            money.append(m)
        runs[i, :] = money
    return np.mean(runs, 0)

avg_money = simulate(100, 1000, game_A)

plot(avg_money);
/img/parrondo/output_1_0.png

Losing Game B

The rules of the second game are a bit more complicated. If the current amount of money is a multiple of 3, then the probability to win is 1/10 - e, otherwise the probability to win is 3/4 - e. The simulation below shows that game B is also a losing game, as we are expected to lose around $12 after 1000 successive bets.

def game_B(m):
    if m % 3 == 0:
        return m + 1 if random.random() < 1/10 - e else m - 1
    else:
        return m + 1 if random.random() < 3/4 - e else m - 1

avg_money = simulate(100, 1000, game_B)

plot(avg_money);
/img/parrondo/output_3_0.png

Randomly switching between games A and B

Now we have two losing games, A and B. What if we made 1000 bets randomly switching back and forth between these two games, sure we would also end up losing, right? Surprisingly no! It looks like if we just randomly switch between the two losing bets, we actually win! (about $15 after 1000 bets)

def game_AB(m):
    if random.random() < 0.5:
        return game_A(m)
    else:
        return game_B(m)

avg_money = simulate(100, 1000, game_AB)

plot(avg_money);
/img/parrondo/output_5_0.png

The surprising behavior described above was first discovered by Juan Parrondo in 1996. I came accross it this year, while reading the book Digital Dice which I would totally recommend as an interesting read.

An interesting observation is that Parrondo's paradox only works if e is small enough. The figure below plots how much can be won after 1000 bets for different values of e. We notice that if e = 0 we make the most money (+$30), while for values greater than 0.01 this random switching game stops being profitable. In the case of betting on a color in roulette e = 0.0263, which is not small enough.

e_vals = np.linspace(0, 0.05, 100)
avg_won = []
for e in e_vals:
    avg_money = simulate(100, 1000, game_AB)
    avg_won.append(avg_money[-1])
fig, ax = plt.subplots(1,1)
ax.plot(e_vals, avg_won);
ax.set_xlim(0, 0.03);
ax.set_ylim(-40, 40);
ax.set_xlabel('values for epsilon');
ax.set_ylabel('amount won after 1000 bets');
ax.plot([0, 0.03], [0,0], 'r-');
/img/parrondo/output_8_0.png

How We Became National Champions

This year, in the Slovenian Open Teams Championships, I had the pleasure to be part of the winning team, together with Bogdan Rasula, Joze Sadar, Janko Mijoc and Gregor Rus.

/img/bridgeteams/winningteam.jpg

The winning team: Joze, Janko, Bogdan, Lorand and Gregor

After struggling in the qualifiers, we made it into the playoffs by a hairsbredth of 0.14 victory points. The semifinals were highly dramatic. We managed to turn around a huge handicap and win the match by 1 IMP after scoring 46 IMPs in the last four deals.

The finals were extremely tight, but bringing the high morale, solid play, and a bit of luck, we managed to run away with the title defeating the Slovenian dream team who has won so often in the past that nobody keeps count anymore.

If I had to pick one most interesting board that I played, it would be following. With three deals to go, I picked up this hand as North:

/img/bridgeteams/hand1.png

East opened a precicion 2♣, my partner doubled, West passed, and I bid 4♠.

On the lead of the ♣K a lot of things were going through my head as I was trying to make a game contract which could well prove to be decisive. It looks like I have to lose one spade, one heart and a club, and should avoid a diamond loser somehow. From the bidding probably East has the A and K, but it won't help me to play a diamond towards the qeen, the thrumps should also be 3-2, and I would rather West have three... but still the pieces of the puzzle were not quite fitting.

Anyway, I decided to take the lead with the A, play two top spades and a heart, hoping for the best. Now my opponent could make a mistake taking his A immediately. Indeed, my opponent fell into a long pause, but eventually he made the correct play of a small heart. Taking in dummy, I saw that things aren't looking good. So I decided to change plans and play a club to my Jack. I figured that I'll be able to ruff the losing club and still survive somehow if the opponents lead into my diamonds from the wrong side. West took the ♣J with the Q and continued clubs, I ruffed, overruffed by East, Jack of diamonds, one down!
Some of the spectators pointed out the slightly better line of continuing with the hearts after West ducking the first; when taking the A West would have to play clubs and East would have to be very careful not to ruff with his Queen.

In retrospect, my only chance to make the game on perfect defence is to play East for the QJx of trumps and run a spade to my ten at trick two. Fortunately, although I missed the correct play, this board did not cost us the victory.

Motion Detection in 50 Lines of Python

While experimenting with motion detection, I have implemented a very simple algorithm in python. The input is a stream of images (captured from a fixed webcam), and the motion detection approach consists of fitting a normal distribution to the past values of each pixel of the image. Like this, the probability of the value of a new pixel can be computed, and if it is very small then it is likely the pixel has changed. The animated GIF images below demonstrate how the algorithm works.

In the upper left picture the frames are shown as they are captured from the webcam. The upper right image shows the frame as the computer sees it after some preprocessing (converting to greyscale and applying a gaussian blur for noise removal). Lower right we see the probability that each pixel has changed; black is 0 (i.e no change) and white is 1 (i.e. certain change). The lighter the shade of grey, the more confident the algorithm is that a change in the intensity values of the pixel has occured. We can observe that in the area where the dog's head is, the probability of change is higher because the dog moves his head and ears slightly, however this movement is not enough to trigger the motion detector. Applying a treshold to the change probability (in this example 0.98) we obtain a change mask shown in the lower left picture. The change mask shows which area of the image has changed. If the change area is bigger than some value (let's say 5% of the entire image) then the motionn detctor is triggered.

The model assumes a stream of pixels:

/img/mdetect/eq_stream.png

The mean and standard deviation are computed as follows:

/img/mdetect/eq_ema.png
/img/mdetect/eq_emstd.png

This method is called the exponential moving average, and instead of taking all past values into account or working with a moving window, it only looks at the previous frame. The parameter alpha determines how fast the past values are "forgotten". If the user does not wish to take more than N previous frames into account, then he can set alpha to:

/img/mdetect/eq_alpha.png

The code implementing the above algorithm is available on github.The slides which I have presented at the Ljubljana Python Meetup can be viewed here.

Best Time to Tweet

The halflife of a tweet is the amount of time it takes until the tweet will receive half of the clicks it is ever going to receive. In other words, how long do people pay attention to something you share. Recently, bitly published a blogpost in which they show that the typical halflife of a link is about 3 hours.

https://lh3.googleusercontent.com/z2g6vb18_hZvqswW0vQb-ncNpt3HseDXLIeD0Rb8PZfVRwpaaBgk4xGKavP7Pzb3X87OarapvM1rpJdOKgPzJLHXVMSu_T8zfsISBxvwxEgpgrVR3oE

According to bitly, the halflife of a link published on twitter or facebook is about 3 hours, on youtube about 7 hours (the figure is from the bitly blogpost mentioned above)

Clearly, the attention span of your followers is very short. What can you do to make the most of it?

You could certainly increase your impact on twitter by making your tweets more interesting or increasing the number of your followers. However, you could also do something much simpler than that. Knowing that the attention span for a tweet is very short, it is essential that most of your followers be online and watching when you post that tweet.

I have no idea when my followers are online, reading tweets. What I can easily find out, however, is when my followers tweet themselves. Assuming that the majority of people are likely to also read a few tweets when they post, it looks like I can increase the chances of my tweet being read by posting it when my followers are most likely to be active tweeting.

In order to find out when your followers are most likely to tweet, just open the twitter page of each of your followers in a separate tab (if you think that opening more than five tabs will cause your computer to explode, then take a look at this) and paste this piece of code into the javascript console:

setInterval(function(){window.scrollBy(0, 500)}, 200)

This will automatically scroll the page, which will cause all tweets to be loaded. When the scrolling is done, and you have all tweets loaded, you just have to save the page and extract the timestamps of the tweets with some simple regular expressions. Like this you will know, for each follower, how many tweets he made, and at what time of day and day of week he tweets most. The data for my top followers looks like this:

/img/tweettime/hourly_tweets.png

The tweeting activity for every follower (rows) in each hour (columns). The greener a cell, the more the user tweets in the corresponding hour. The red bar shows the total number of tweets

/img/tweettime/daily_tweets.png

Tweeting activity for every follower (rows) in each day of the week starting Monday (columns). The greener a cell, the more the user tweets on the corresponding day of the week

In conclusion, most tweeting by my followers is happening in the morning around 10am, and in the evening around 8-9-10pm. Very little is happening in the weekends. The best days seem to be Wednesday and Thursday.

My optimal time to tweet is Wednesdays and Thursdays at 10am, when is yours?

What was your name again?

One of the thoughts which sometimes crosses my mind when I join a group of people is "What if there is someone with my name already? How likely is that?" In Slovenia, the name Lorand is very rare. Nevertheless it is interesting to think about how likely name collisions are for different names.

This article aims to answer two questions:

  1. How likely is it that there is a name collision in a group of people in Slovenia?
  2. Given a specific name, how likely is it that it will not be unique in a group of given size?

First of all, let me list a few simplifying assumptions that I make:

  • the group of people I am looking at is a random sample of the population. Groups of people working together at a job or studying together at the universitity are far from random, I hope :)
  • there cannot be a name collision with a person of the opposite gender, i.e. the set of girl and boy names are disjoint (with a few exceptions this is true; only Saša occurs as both a girl's and a boy's name in the 200 most frequent names in Slovenia)

The problem of computing the probability of a name collision is related to the birthday problem, where birthday collisions are computed. What makes it different is the distributions. Birthdays are uniformly distributed, while the frequency distribution of names is exponential.

The figure below shows the frequency of each of the most popular 200 Slovenian names, on the right, and the percentage of people accounted for by the names up to a certain popularity (on the left). Indeed, an exponential decrease of popularity can be observed, also, half of the population is covered by the most popular 50 names.

/img/namecollision/freqdist_cumulative_pair.png

Frequency of the 200 most popular names (left) and cummulative distribution (right)

The top 200 names only cover about 80% of the population (819966 boys and 824442 girls). Using this data only would leave around 120000 boys and 115000 girls nameless.

Knowing that the popularity of names decreases exponentially, we can estimate the frequency of the boy names which didn't make it into the top 200 by tuning the parameters n and b in the equation below:

http://latex.codecogs.com/gif.latex?730\int_{0}^{n}{e^{-bx}~dx}%20=%20120000

Doing some quick trial and error in WolframAlpha I came up with the values of n = 1800 and b = 0.006, which look reasonable. The tail of the boy name frequency distribution therefore looks like this:

/img/namecollision/boy_tail.png

Frequency distribution of boy names. The blue line corresponds to real data (up to rank 200), the green line is my best guess of what happens from 200 onwards

For girls, the popularity of rare names was found in a similar way.


Knowing the frequency distributions of names, it is time to determine the probability of collisions. I am sure that there is a mathematical solution to this problem, which I would very much like to hear, however I decided to estimate the probabilities by simulation.

The simulation algorithm is very simple. For example, if we want to determine the probability of at least two girls having the same name in a group of 20 girls, we can sample 20 names from the distribution derscibed above, and see if there is a collision or not. If we do this one million times we get a pretty good estimate of the probability of collision by dividing the number of groups where there was a collision by one million. For our example, the probability of at least two of 20 girls having the same name turned out to be 77.53% (compare this with the birthday problem, where the probability of collision in a group of 20 is under 50%)

/img/namecollision/p_same_name.png

The probability (y-axis) of at least [two, three, ..., six] girls (dotted) and boys (dashed) having the same name in a group of given size (x-axis)

The above figure shows the probability of name collisions in groups of sizes 2 up to 200. Groups always have people all of the same gender. Dotted lines show probabilities for girls and dashed lines for boys. It looks like in a group of 14, the chances of at least two people having the same name are about 50%, both for girls and for boys. If a group is larger than 50, then it is practically certain that two people will have the same name, moreover ot is very likely that there will be three girls (80%) or three boys (70%) with the same name. As many as six people having the same name is almost certain if the group is large enough (200 people or more). In general, groups of girls are more likely to have collisions (especially more than two with the same name), because the most frequent girl's name, Marija, is more than twice as popular than the most popular boy's name.

Finally, let's look at the probability of collision for a specific name; how likely is a collision of Matjaz compared to a collision of Luka for instance.

/img/namecollision/pcoll_2_10_50_200.png

Collision probability of a specific name. The x-axis represents the top 100 names ordered by popularity. The y-axis shows the probability of collision for each of these names. Each of the 4 images represents a different group size: 2 (top left), 10 (top right), 50 (bottom left), 200 (bottom right). Blue line for boys, red line for girls.

Two random people having the same name is very unlikely, even for the most popular names (well under 1%). For groups of 50, any of the top 10 names has approx. 10% or more chance of not being unique, and for groups of 200 each of the top 100 names has at least a 10% chance of not being unique, while the top names are almost certainly not unique in the group.

In conclusion, name collisions are surprisingly frequent. Perhaps one could make some money by standing in front of a supermarket and taking bets about the names of people inside.