A Random Walk Through Idea Space: hadoopPosts tagged with hadoophttp://www.thattommyhall.com2011-06-01T11:29:36+01:00thattommyhallCompressing Text Tables In Hivehttp://www.thattommyhall.com/2011/06/01/compressing-text-tables-in-hive/2011-06-01T11:29:36+01:002016-06-05T02:27:23+01:00thattommyhallAt Forward we have been using Hive for a while and started out with the default table type (uncompressed text) and wanted to see if we could save some space and not lose too much performance.
The wiki page HiveCompressedStorage lists the possibilitiesUsing R on Hadoop with Rhipehttp://www.thattommyhall.com/2010/11/20/using-r-on-hadoop-with-rhipe/2010-11-20T12:42:34+00:002016-06-05T02:27:23+01:00thattommyhallI spent a while this week getting Rhipe, a java package that integrates the R environment with Hadoop, to work. Forward are pretty heavy users of Hadoop and it’s supporting ecosystem so R will be another way for the devs to interrogate the huge (and rapidlyFinding information on Hive tables from HDFShttp://www.thattommyhall.com/2011/05/16/hive-size-hdfs/2011-05-16T17:42:07+01:002016-06-05T02:27:23+01:00thattommyhallI was curious about our Hive tables total usage on HDFS and what the average filesize was with the current partitioning scheme so wrote this ruby script to calculate it.
current = ''
file_count = 0
total_size = 0
output = File.open('output.csv','w')Berlin Buzzwordshttp://www.thattommyhall.com/2011/06/09/berlin-buzzwords/2011-06-09T13:07:50+01:002016-06-05T02:27:23+01:00thattommyhallI have just returned from Berlin Buzzwords. It was a great conference and well organised so thanks to the organisers.
As all the talks will be online soon I will just mention a few things that I enjoyed.
The two keynotes were excellent, Doug Cutting