Finding information on Hive tables from HDFS
I was curious about our Hive tables total usage on HDFS and what the average filesize was with the current partitioning scheme so wrote this ruby script to calculate it. current = '' file_count = 0 total_size = 0 output = File.open('output.csv','w') »