Signals In Ruby / “rescue Exception” considered harmful

February 24th 2011

Yesterday we had an issue with the different behaviour of “kill ” and “kill -9 ” and in the process I had to refresh my knowledge of Unix signals, learn how you handle them in Ruby and properly learn Rubys exception hierarchy.

To -9 or not to -9?
The unix kill command is perhaps strangely named as it actually sends signals to processes (see “man signal” for a full list). It defaults to sending SIGTERM to the process and the application writer can decide how to treat it by “trapping” it, allowing for a safe shutdown or debug dumps etc. “kill -9″ sends SIGKILL and the signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored by your programs.
I think in the first instance you should just use “kill”, give the app the chance to do the right thing then get -9 on its ass if you need to.

Catching signals in Ruby

puts "I have PID #{Process.pid}"

Signal.trap("USR1") {puts "prodded me"}

loop do
  sleep 5
  puts "doing stuff"
end

Is about the simplest code that will trap the “USR1″ signal (which you can send with “kill -USR1 “). The USR1 and USR2 signals are left free for you to use however you wish in your applications.

If you look at the image below you can see that it responds to the USR1 signal I send it and kill (ie sending SIGTERM) works also.

The following two code snippets are the same except one takes the default and the other catches Exception (ie any exception)

#sig-rescue.rb
puts "I have PID #{Process.pid}"

Signal.trap("USR1") {puts "prodded me"}

loop do
  begin
  puts "doing stuff"
  sleep 10
  rescue => e
    puts e.inspect
  end
end


So that still works as before and errors in our “do stuff” loop would get caught.

#sig-rescue-E.rb
puts "I have PID #{Process.pid}"

Signal.trap("USR1") {puts "prodded me"}

loop do
  begin
  puts "doing stuff"
  sleep 10
  rescue Exception => e
    puts e.inspect
  end
end


This fails though. You can see that SIGTERM no longer works and CTRL-C from the terminal does not work also. This is because we are catching the SignalException when we do “rescue Exception”. Kill -9 worked though, as it will kill any application as the signal cannot be caught.

Rubys Exception Heirachy
The full exception heirachy (from the excellent cheat gem) is

Tom-Halls-MacBook-Pro:signal tomh$ cheat exceptions
exceptions:
  Exception
   NoMemoryError
   ScriptError
     LoadError
     NotImplementedError
     SyntaxError
   SignalException
     Interrupt
       Timeout::Error    # require 'timeout' for Timeout::Error
   StandardError         # caught by rescue if no type is specified
     ArgumentError
     IOError
       EOFError
     IndexError
     LocalJumpError
     NameError
       NoMethodError
     RangeError
       FloatDomainError
     RegexpError
     RuntimeError
     SecurityError
     SocketError
     SystemCallError
     SystemStackError
     ThreadError
     TypeError
     ZeroDivisionError
   SystemExit
   fatal

I think you should only catch StandardError or its children, possibly some of its siblings and avoid catching Exception as you probably dont want to change how the process deals with signals (you could trap them if you need to)

Posted by tom under Ruby | 1 Comment »

Ruby On Windows – Forking other processes

February 20th 2011

While moving our VM deployment site written in Sinatra to a Windows machine with the VMware PowerCLI toolkit installed the only snag was where we forked a process to do the preparation of the machines. Both Kernel.fork and Process.detach seemed to have issues.

Original MRI on Linux

  def build
    pid = fork { run_command }
    Process.detach(pid)
  end

  def run_command
    `sudo /opt/script/deployserver/setupnewserver.sh -p #{poolserver} -i #{ip} -s #{@size} -v #{@vlan} -a "#{@owner}" -n #{@name} -e "#{@email}"`
  end

IronRuby
We tried IronRuby and the same bit of the script broke as on win32 MRI (though I was pleased and surprised that Sinatra worked)

  def build
    WindowsProcess.start "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
"-PSConsoleFile \"C:\\Program Files (x86)\\VMware\\Infrastructure\\vSphere PowerCLI\\vim.psc1\" \"& C:\\script\\DataStoreUsage.ps1\""
  end

Using the following DotNet code

class WindowsProcess
  def self.start(file, arguments)
    process = System::Diagnostics::Process.new
    process.StartInfo.FileName = file
    process.StartInfo.CreateNoWindow = true
    process.StartInfo.Arguments = arguments
    process.Start
  end
end

Workaround using Windows “start” command
I had hoped the module at win32utils would let me just use the original script but fork did not work properly still.

def build
  commandstr = "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -PSConsoleFile \"C:\\Program Files (x86)\\VMware\\Infrastructure\\vSphere PowerCLI\\vim.psc1\" \"& C:\\Sites\\vmdeploy\\PrepNewMachine.ps1 -type #{@type} -machinename #{@name} -size #{@size} -vlan #{@vlan} -creator #{@owner} -creatoremail #{@email} -ipaddress #{ip}"

  system ("start #{commandstr} > ./log/#{@name}.log 2>&1")
end

This uses the windows “start” command and works pretty well.

Posted by tom under Ruby & VMware | No Comments »

Measuring Disk Usage In Linux (%iowait vs IOPS)

February 18th 2011

This occurred to me when looking at our Hadoop servers today, lots of our devs use IOWait as an indicator of IO performance but there are better measures. IOWait is a CPU metric, measuring the percent of time the CPU is idle, but waiting for an I/O to complete. Strangely – It is possible to have healthy system with nearly 100% iowait, or have a disk bottleneck with 0% iowait. A much better metric is to look at disk IO directly and you want to find the IOPS (IO Operations Per Second).

Measuring IOPS
In linux I like the iostat command, though there are a few ways to get at the info. In debian/ubuntu it is in the sysstat package (ie: sudo apt-get install sysstat)

root@MACHINENAME:/home/deploy# iostat 1
Linux 2.6.24-28-server (MACHINENAME.forward.co.uk) 	18/02/11
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
    45.51    0.00    1.85    0.62       0.00       52.03

Device:        tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0     4.00       0.00       40.00          0       40
cciss/c0d1     4.00       0.00       64.00          0       64
cciss/c0d2    12.00       0.00      248.00          0       248
cciss/c0d3     0.00       0.00        0.00          0       0
cciss/c0d4    25.00       0.00      320.00          0       320
cciss/c0d5     0.00       0.00        0.00          0       0
cciss/c0d6    30.00       0.00      344.00          0       344
cciss/c0d7    42.00    3144.00        0.00         3144     0

iostat 1 refreshes everysecond, if you do it over a longer period it will average the results. tps is what you are interested in, Transactions Per Second (ie IOPS). -x will give a more detailed output and separate out reads and writes and let you know how much data is going in and out per second.

What is a good or bad number though?
As with most metrics, if the first time you look at it is when you are in trouble then it’s less helpful. You should have an idea of how much IO you typically do, then if you experience issues and are doing 10x that or only getting 1/10 from the disks then you have a good candidate explanation for them.

How much can I expect from my storage?
It depends how fast the disks are spinning, and how many there is.
As a rule of thumb I assume for a single disk:
7.2k RPM -> ~100 IOPS
10k RPM -> ~150 IOPS
15k RPM -> ~200 IOPS
Our hadoop servers were pushing about 70 IOPS to each disk at peak and they are 7.2k ones so that is in line with this estimate.

See here for a breakdown of why these are good estimates for random IOs from a single disk. Interestingly a large amount of it comes from the latency of the platter spinning, which is why SSDs do so well for random IO (Compared to a 15k disk, ~50x for writes, ~200x reads)
See also:
A concrete example of faster CPU causing higher %iowait while actually doing more transactions here

Extreme Linux Performance Monitoring and Tuning: Part 1 (pdf) and Part 2 (pdf) from ufsdump.org/

Posted by tom under linux | No Comments »

Running Any Executable As A Windows Service (Ruby / Sinatra)

February 14th 2011

While migrating an automated VM deployment page using a combination of Sinatra on Linux and Bash scripts using the Perl toolkit with a simpler script using the VMWare PowerCLI that I love so much I needed to create a windows service from the Sinatra App and had to do some googleing so I thought I would share how I did it.

You only need two things – the built-in “sc” command and an executable from Windows Server 2003 Resource Kit Tools called srvany (works with 2008 too). Get just that exe here (if you trust me of course ;-) )

Creating the service

Check it exists

Set Parameters In The Registry
Configure it at HKLM/SYSTEM/CurrentControlSet/Services/APPNAME/Parameters

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VMdeploy\Parameters]
"Application"="C:\\Ruby192\\bin\\ruby"
"AppParameters"="C:\\Sites\\vmdeploy\\server.rb -p 80"
"AppDirectory"="C:\\Sites\\vmdeploy"
"AppEnvironment"=hex(7):65,00,78,00,61,00,6d,00,70,00,6c,00,65,00,3d,00,32,00,\
  37,00,00,00,62,00,6c,00,61,00,68,00,3d,00,63,00,3a,00,5c,00,74,00,65,00,6d,\
  00,70,00,66,00,69,00,6c,00,65,00,73,00,00,00,00,00

Note the AppEnvironment is a multiline string, the rest are strings

This lets you run any executable file, change the directory you run it from and pass any arguments or environment variables so should cover most use cases.

I will be sharing the code for both the Sinatra app and the PowerShell deploy script in later posts.

Posted by tom under Ruby & sinatra & VMware & windows | 2 Comments »

2010 Retrospective

January 8th 2011

Well, 2010 was a great year – thanks to all the great people who made it so.

Just a brief outline:

Work: Started the year down in Kent working on a XenApp deployment on VMware at the beginning of the year, then over in Dubai building a VMware View solution, went to Libya and built the corporate infrastructure for their biggest telco, spent 2 months working on INGs next gen datacentre in the Hague and ended in London for Forward working on merging USwitch’s infrastructure after the acquisition.
While this was fun, I got fatigued with traveling and living out of a suitcase and was very happy to settle down a bit in London (which I think is the greatest city in the world) and work for a great company which I am very pleased and proud to say I have joined permanently. It is exciting to work somewhere using so much great tech and with so many sound people (with huge brains).

Jollys: I managed to go to Russia, visiting Moscow and St Petersburg (OK, that was Dec 2009 but what the hell), the Czech Republic, Italy, go to Oklahoma for a friends wedding followed by a visit to two ancestral puebloan sites, Berlin, Paris, Vegas and Egypt. I feel privileged to have the opportunity to have done all this and am still a bit amazed at just how much happened.

Life: I have always thought their was an embarrassment of riches when I look at the great people I get to call my friends, I don’t get to see any of them enough and no amount is too much. Thanks to you all, I always say you are what makes the universe great for me, you conspire to make it so even though you don’t all know one another. The biggest change this year was finding someone patient enough to pair up with me full time, thanks Petra – I love you. We are moving in together soon and I look forward to having a permanent home in the greatest city in the world, with a spare room – please come visit!. Thanks to my family too, we had some bad news in 2010 but you all dealt with it with the usual aplomb, you are all ace.

Resolutions There is no new resolutions for 2010, I will be reporting on the 101 goals in 1001 days soon.

2011: Looks set to be the best year ever, thanks in advance for helping make it so

Posted by tom under 101 & Life | No Comments »

Donating To Wikipedia

December 31st 2010

I realised when auditing my delicious bookmarks recently how much I rely on Wikipedia to look things up and today donated for the first time.

I had previously moaned about seeing Jimmy Wales’s face every time I logged in, like this

and laughed my head off at this piss take from The Daily What:
JimmyFace

I found today at Information Is Beautiful the following demonstration of just how effective the campaign has been though.

Wikimedia have done some nice analysis of the campaign on the Meta Wiki if you are interested.

Give to Wikipedia here

Posted by tom under Uncategorized | No Comments »

Load Based Nic Teaming vs Link Aggregation

December 21st 2010

I remembered seeing Simon Long’s comment on twitter a few weeks ago and it was rattling around in the back of my mind.

Will #VMware Load-Based Teaming remove the need for #Cisco EtherChannel? Discuss….

I long ago investigated NIC Teaming algorithms and settled on IP Hash with Cisco Etherchannels for most environments, only really using something else if the client happened not have stacked switches. Thanks to Scott Lowe for this superb article on the matter.

When vSphere 4.1 came out with Load Based Teaming, I was pleased that at last we had an algorithm that would have a go at proper load balancing and not just load distribution but had not got round to investigating much more.

At Forward we have just updated to 4.1, Enterprise Plus and have bought some shiny new Extreme Summit X650 Series 10G switches; so Simon’s comment was particularly apropos.

I had decided I wanted to try and use LBT but was unsure if I should port-channel the uplink ports. It turns out you can’t. I thought maybe you should to be honest, it does not mention in the dvSwitch guide as far as I can see but the ESX host requirements for link aggregation KB (updated today) is very clear

  • The switch must be set to perform 802.3ad link aggregation in static mode ON and the virtual switch must have its load balancing method set to Route based on IP hash.
  • Enabling either Route based on IP hash without 802.3ad aggregation or vice-versa disrupts networking

ie you need both IP Hash and EtherChannel and neither will work without the other.

In answer to Simon’s question, my feeling is you may still get better performance from EtherChannel and IP based hash for some workloads but would guess “usually” LBT wins. I think the case where you may get better utilisation is when certain VMs have very high bandwidth requirements to different IPs. As described here IP Hash is the only way to allow traffic from one vNIC to leave over different pNICs at the same time.

It is interesting that even with LBT bandwidth is still limited to the maximum bandwidth a single pNIC can provide for individual VMs / vmkernels, also IP hash will not get higher than a single pNIC for a vMotion or other point to point connections. So 10G is going to perform better for these operations than 10x1G, however you team them.

Posted by tom under VMware | No Comments »

Hyper9 Saves The Day

December 20th 2010

We recently bought the Hyper9 capacity planning, reporting and monitoring solution for our VMware infrastructure and I quite soon made use of it to troubleshoot some problems reported to us like backups taking longer and databases being slower than normal.

In the 3par storage I could see that IO was unusually high of late.

Then I looked at the top-n datastores by IOPS and graphed them

A huge jump for sharedstorage8, so I looked at its VMs

and found the culprit VM.

Here it is against our big “Superhero” database and the vCenter server with the DB.

A lot of IO from a machine the owner thought was doing nothing!

Hyper9 is a pretty good tool for reporting, alerting and troubleshooting your VMware infrastructure, the query language is lucene based and this gives you lots of options in creating custom views and alerts.

Posted by tom under VMware | No Comments »

Visualising Tommy

December 19th 2010

Here are two Wordle visualisations, first the words used in my blog. I like reading obviously but it seems to be quite heavy on stuff from the single article I wrote on Rhipe – unusual words I suppose.

The second is my del.icio.us tags, when is later going to happen?

With delicious closing I am looking at alternatives, send recommendations if you have any.

Posted by tom under random | 1 Comment »

Forward To Vegas

December 17th 2010

Well, I guess I need to say something about my company Christmas present to all of their ~150 staff, a three day trip to Vegas.

We flew out last Thursday and stayed for 3 nights at the Wynn, which is a great hotel.
Vegas2010-46.JPG

Thursday: Arrive and sleep (forgive me I only just got back from Egypt after flying to Manchester instead of London and having to sit all night on a freezing cold coach!).

Friday: Flew in a helicopter into the Grand Canyon which was rather awesome.
Vegas2010-12.JPG
Vegas2010-29.JPG

Sat: Went to see Zumanity by Cirque du Soleil, it was amazing, I must see them again.

Sunday: A few of us went shooting,
An M16
Vegas2010-61.JPG

An H&K MP5
Vegas2010-59.JPG

A Mac-10
Vegas2010-66.JPG
not firing it gangster style unfortunately.

A Tommygun
Vegas2010-62.JPG

A Desert Eagle

And I did a Shotgun

And I’ve got the T-shirt to prove it
Vegas2010-69.JPG
Peace Through Superior Firepower indeed

Pics On Flickr

Vids On Youtube

Posted by tom under travel | 1 Comment »

« Prev - Next »