I was curious about our Hive tables total usage on HDFS and what the average filesize was with the current partitioning scheme so wrote this ruby script to calculate it.
Lots of our files were small so I am going to experiment with different partitioning and compression schemes.
One of our servers running MongoDB crashed today and we encountered the typical
old lock file: /var/lib/mongodb/mongod.lock. probably means unclean shutdown
recommend removing file and running –repair
see: http://dochub.mongodb.org/core/repair for more information
As the docs do not seem to have much of an alternative to running –repair I looked for a way to automate it from upstart. Mongo creates a mongod.lock file in the data directory with the processes PID in and on a safe shutdown removes the PID, leaving the file there.
This upstart scripts includes a pre-start script that checks if the lock file exists, reads it, makes sure there is a PID there, makes sure no mongod processes exist with that PID then performs the repair as the mongodb user.