Bad Scientist: linux

Showing posts with label linux. Show all posts

Sunday, 27 April 2014

Not-so-random post about randomness

One of the most common misconception about randomness is that it is often confused with uniformity.
Could you tell which is the most random distribution of dots between the two below?

Which is random? (click to enlarge)

The most common answer would be the one on the right, which is wrong. In fact, the pattern on the right is generated by applying a small wobble (or uncertainty) on an uniform distribution. The pattern on the left, instead, has been generated using a pseudorandom [1] number generator in BASH (details in this blog post). Randomness allows for "clumps" to form and it's because of those clumps that the unpredictable behaviour of randomness comes from (unless, of course, you know the "shape" of the random distribution a priori). This appears more clearly whenever we look at the distributions shown above in 1D rather than 2D:

A random and an uniform distribution (click to enlarge)

It is clear that in the case of a uniform distribution, we have a greater level of predictability. Imagine you have a series of events which follows the uniform distribution. You are filling the histograms with events one by one by reaching 5000 events which recreate the distribution above. Whenever you approach higher numbers of events filled if there is a lack of events in one region of the histogram, there will be a higher chance for the next even to fall in that region.

You can picture in practice a 2D uniform distribution by imagining pouring a layer of marbles in a box, one by one. At the beginning their position will look random, because they can move around and take any possible position, but the fuller it gets the less available spaces there are and soon it will be easy to predict which are the only available spots for the next marble to go in.

The key thing is that whenever a position is occupied, it cannot be occupied again. This allows for a level of predictability, and this is not true of random distributions. Randomness is not predictable, its profile is unknown by definition, and it allows for the same position to be occupied again. This caused by the assumption that each random event is independent from the previous (Poisson process). Maybe this is also the reason why we believe uniform distributions to be "random". We are more familiar with those in practice and random events are either mostly abstract or harder to visualize.

This explains tendency of people to estimate randomness wrongly. This happens every time during lotteries. If a number has not come up for a long time, we feel that it is due soon. This is because we imagine random events as uniform. In fact, the probability of picking any number in a lottery is the same for every extraction, making all number equally probable to be picked, and not anyone more probable, because every extraction is independent from the previous.

An equivalent example is found in coin flipping. We feel that repeated head or tail events in a sequence of coin flips is a rare event. In fact, the chance of obtaining heads or tails is still 50% even after any number of heads have come up already. For example, events of the kind: THHTHTTHTTH and TTTTTHHHHH are equally as likely.

I created a computer-generated set of tosses to work out the frequencies of occurrences of repeated heads or tails (details on how this is done are in here). These are the results for 100000 occurrences of 15 tosses each:

3 repetitions: 93894/100000
4 repetitions: 64634/100000
5 repetitions: 34667/100000
6 repetitions: 16723/100000
7 repetitions:    7789/100000
8 repetitions: 3487/100000
9 repetitions:    1584/100000
10 repetitions: 719/100000
11 repetitions:    332/100000
12 repetitions:   125/100000
13 repetitions:   53/100000
14 repetitions:   18/100000
15 repetitions:       7/100000

15 repetitions would mean having a full set of heads or tails, which seems impossible, but it happens 0.006% of the times. In fact, over 100000 occurrences, it happened 7 times. Close enough.

The data shows that three repetitions is an event which happens almost all the times, with four repetitions more than half of the times. Up to 6 repetitions, in fact, is not that much of an uncommon event. Almost half of the sequence! I am sure it would be hard to define "random" - in the common sense - anything which has more than 4 repetitions, although this simple test shows that it's almost the norm.

Another great example of how bad we are at understanding randomness is given by this small application at Nick Berry's DataGenetics. It will distinguish a randomly generated sequences of tosses (from a real coin) from the ones you made up yourself. It is not infallible, as it is based on the Pearson Chi-squared test (so it cannot always predict which is which) and after some trials it is easy to trick it to make it believe your sequence is random. Yet, I am sure it will give you a better understanding of randomness!

[1] I have used the word pseudorandom because every computer-generated programs simulates randomness. True randomness can only be found in some not completely understood natural phenomena. Being a simulation, there are different "qualities" of generated random numbers. I am not sure about the one used in BASH, but it is usually good to stick with a higher quality random number generator for more serious business (e.g. GSL)

Sunday, 5 August 2012

The Ultimate guide to install a broadband wireless dongle on Linux

Yet another mobile broadband dongle is not working out of the box on your beloved Linux distribution.

If you are new to Linux, this might be one of the most annoying problems you will face, as there is a sea of different kinds of these internet dongles and they usually all require different drivers to be detected.

Fortunately, as almost every single wireless broadband dongle user seeks for help on Linux forums for his particular hardware, there is lot of help around from which you can guess and work out what is your problem.
But this makes the search messy, as often beginners get easily lost and reading discontinue posts on what to do is sometimes more difficult than trying to work out a solution on your own.

Check out the post on my other forum about Linux:

http://www.greplinux.net/2012/07/everything-you-need-to-know-about.html

Wednesday, 13 June 2012

10 Nautilus tips and tricks

This page has been moved to my new blog dedicated to Linux, here.

Tuesday, 24 November 2009

Problem with the avahi daemon during the boot

This page has been moved to my new blog on Linux, here.

Friday, 2 October 2009

Problem with Grsync and (partial) remedy

The post is now moved here, in my other blog about Linux greplinux.net.

Wednesday, 29 July 2009

Stellarium: night sky simulation software

Exploring the educational section of softwares for Gnome, I stumbled upon this incredible program: Stellarium.

You just enter your location and it simulates the sky over you at that moment. Very useful for amateur astronomers or night sky passionate.

There are also a lot of cool features to make the sky similar to the real sky: you can regulate the magnitude and the light pollution, or you can accelerate or choose the time, you can label costellations, stars or nebulae (so you can learn star's names or costellations), make zooms and a lot more.

Practical, easy to use and interesting.
It is also available for Windows.

Saturday, 25 July 2009

Firefox and Thunderbird Backup folders (Windows and Linux)

Making backups periodically is a very good habit.

Personally I've lost a lot of data thanks to thunderstorms, super-heated HDDs or just random broken hardware, enough to make backup of every single bit of data on my notebook.
Sometimes it's like a curse: if you don't make backups, some thunder will burn your hard-disk... and it happens... regularly.

I find very useful to save settings, especially those on my web and mail browser, that is full of themes, add-ons, feeds, bookmarks and other preferences that I wouldn't really like to lose.
I use Firefox and Thunderbird as internet browser and mail platform and I find them the most customizable and fluent software in that field.

So I'd just like to remind you to make a backup of your settings (yes, don't be lazy, make it now!). The profile folders (that contain almost everything) are located in:

Firefox

Windows XP: %APPDATA%\Mozilla\Firefox\Profiles\xxxxxxxx.default\ (note: %APPDATA% is equivalent to C:\Documents and Settings\[User Name]\Application Data)
Linux: ~/.mozilla/firefox/xxxxxxxx.default/

Thunderbird

Windows XP: %APPDATA%\Thunderbird\Profiles\xxxxxxxx.default\
Linux: ~/.thunderbird/xxxxxxxx.default/

A very good tool to make automatic and scheduled backups in Firefox is the FEBE add-on.

Thursday, 23 July 2009

From Windows to Linux?

I haven't been writing for a while here, since I've been busy particularly with installing, using (and enjoying) Linux.

I decided to install the Fedora distribution (a free Red-Hat product) in a dual boot with the intention of using Linux sometimes and keeping Windows as the main OS. I also decided to try Windows 7 RC, so I built 3 partitions and installed each operative systems:

The reaction to Windows 7 was not so exciting as I thought. The user interface is very similar (if not the same) to Vista. Yes, it was much better than Vista under a lot of aspects, so I was almost convinced to swap to Windows Seven, even if it did not really bring many improvements from XP.

What really shocked me was the compatibility with hardware and software. A lot of programs were not working under Seven, and this is quite acceptable since it is still a RC. But what made me upset were the ATI drivers for Seven, made only for the newest graphic cards (and not for my ancient but still working Radeon X600).
Without the drivers, the "powerful" Windows 7 didn't even recognize the resolution of my display (1440x900) so I was constrained to use a crappy 800x600.
I immediately removed Seven.

Then I installed Fedora 11! In less than 20 minutes the OS was on the hard drive ready to be used, with all the components installed automatically (and the right display resolution).
There are no accurate words to describe it: flexible, light, fast, user-friendly.
I'm not planning to use Windows again from that day.
~~Maybe one of the few drawbacks is the lack of programs~~ (untrue! I could never be more wrong, repositories are full with programs and much cooler ones than Windows) so I decided to keep Windows for that, but often there are good, if not better, alternatives and there are a lot of more interesting applications.
I can't list the benefits of using Linux here, it is seriously difficult to list them all!

Anyway, I'm not using Windows any more, so I'll write quickly some post about some Windows programs that I planned to write and then I think I'll begin to write IT posts exclusively about the Linux world.

Bad Scientist

Pages