Archive for November, 2010

Off To Egypt

November 21st 2010

I am off to Egypt for 2 weeks, a one week Nile cruise with Voyages Joules Verne starting at Luxor and working down to the Aswan High Dam via The Valley Of The Kings and ending in the Moevenpick Hotel in El Gouna. No Cairo, Pyramids or the Cairo Museum this trip, another time though.

Im particularly excited about Abu Simbel,, The Red Chapel of Hatshepsut, Temple of Amenhotep III among tons of others.

Also I am very much looking forward to No Phone and No Computers for the whole trip, the longest I will have been without either for for a good few years. I am taking the chance to catch up on some reading and take a break, I will have my Kindle with loads of books on it but am taking The Dots-and-boxes Game: Sophisticated Child’s Play (one of my 101 goals is to make a dots and boxes program), Gödel, Escher, Bach (another 101, started 3 times and never finished and it has intrigued me since my first year in university) and Winning Ways for your Mathematical Plays. Pen’n'paper geekery for the win!

Posted by tom under 101 & archaeology & jolly | No Comments »

Using R on Hadoop with Rhipe

November 20th 2010

I spent a while this week getting Rhipe, a java package that integrates the R environment with Hadoop, to work. Forward are pretty heavy users of Hadoop and it’s supporting ecosystem so R will be another way for the devs to interrogate the huge (and rapidly growing!) datasets we have.

Installing R
Adding the repositry
Create a new file at /etc/sources.list.d/R.list

#R repositry
deb http://rh-mirror.linux.iastate.edu/CRAN/bin/linux/ubuntu hardy/

(we are still using hardy, with the Cloudera packages)

Add the gpg keys for the repository

gpg --keyserver pgp.mit.edu --recv-key E2A11821
gpg -a --export E2A11821 | sudo apt-key add -

Install and update R
Easy:

$ sudo apt-get install r-base r-base-dev pkg-config littler
$ sudo R
> update.packages()

Set environment variables for Rhipe
Add to bottom of /etc/environment

HADOOP=/usr

create it for current session

$ export HADOOP=/usr

install protobuff

# wget http://protobuf.googlecode.com/files/protobuf-2.3.0.tar.bz2
# tar jxf protobuf-2.3.0.tar.bz2
# cd protobuf-2.3.0
# ./configure
# make
# make install
# ldconfig

install Rhipe

# wget http://www.stat.purdue.edu/~sguha/rhipe/dn/Rhipe_0.64.tar.gz
# R CMD INSTALL Rhipe_0.64.tar.gz

So all is well except that the test code here is a bit off.

For me today

> library(Rhipe)

Only works as root

It seems that

> rhwrite(list(1,2,3),"/tmp/x")

should be:

> rhwrite(list(1,2,3),"/tmp/x",1)

then

> rhread("/tmp/x")

works properly.

Also in the longer example

map <- expression({
  lapply(seq_along(map.values),function(r){
    x <- runif(map.values[[r]])
    rhcollect(map.keys[[r]],c(n=map.values[[r]],mean=mean(x),sd=sd(x)))
  })
})

## Create a job object
z <- rhmr(map, ofolder="/tmp/test", inout=c('lapply','sequence'),
          N=10,mapred=list(mapred.reduce.tasks=0),jobname='test')

## Submit the job
rhex(z)

## Read the results
res <- rhread('/tmp/test/p*')
colres  <- do.call('rbind', lapply(res,"[[",2))

colres
       n      mean        sd
 [1,]  1 0.4983786        NA
 [2,]  2 0.7683017 0.2937688
 [3,]  3 0.5936899 0.3425441
 [4,]  4 0.3699087 0.2666379
 [5,]  5 0.5179839 0.4060244
 [6,]  6 0.6278925 0.2952608
 [7,]  7 0.4920088 0.2785893
 [8,]  8 0.4592598 0.2674592
 [9,]  9 0.5734197 0.1928496
[10,] 10 0.4942676 0.2989538

Where line 16 has been changed from the original

res <- rhread('/tmp/test')

Thanks to Saptarshi Guha, the author of Rhipe for so quickly responding to my query in the group and also the authors of this discussion on setting up R in Ubuntu

Posted by tom under hadoop & r | 1 Comment »

Finding Primes In SICP

November 2nd 2010

I was reading SICP over lunch and found this lovely footnote on probabilistic methods for deciding if a number is prime. (it is #47)

Numbers that fool the Fermat test are called Carmichael numbers, and little is known about them other than that they are extremely rare. There are 255 Carmichael numbers below 100,000,000. The smallest few are 561, 1105, 1729, 2465, 2821, and 6601. In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a “correct” algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering.

Posted by tom under lisp & SICP | No Comments »