Load Based Nic Teaming vs Link Aggregation

I remembered seeing Simon Long’s comment on twitter a few weeks ago and it was rattling around in the back of my mind.

Will #VMware Load-Based Teaming remove the need for #Cisco EtherChannel? Discuss….

I long ago investigated NIC Teaming algorithms and settled on IP Hash with Cisco Etherchannels for most environments, only really using something else if the client happened not have stacked switches. Thanks to Scott Lowe for this superb article on the matter.

When vSphere 4.1 came out with Load Based Teaming, I was pleased that at last we had an algorithm that would have a go at proper load balancing and not just load distribution but had not got round to investigating much more.

At Forward we have just updated to 4.1, Enterprise Plus and have bought some shiny new Extreme Summit X650 Series 10G switches; so Simon’s comment was particularly apropos.

I had decided I wanted to try and use LBT but was unsure if I should port-channel the uplink ports. It turns out you can’t. I thought maybe you should to be honest, it does not mention in the dvSwitch guide as far as I can see but the ESX host requirements for link aggregation KB (updated today) is very clear

  • The switch must be set to perform 802.3ad link aggregation in static mode ON and the virtual switch must have its load balancing method set to Route based on IP hash.

  • Enabling either Route based on IP hash without 802.3ad aggregation or vice-versa disrupts networking

ie you need both IP Hash and EtherChannel and neither will work without the other.

In answer to Simon’s question, my feeling is you may still get better performance from EtherChannel and IP based hash for some workloads but would guess “usually” LBT wins. I think the case where you may get better utilisation is when certain VMs have very high bandwidth requirements to different IPs. As described here IP Hash is the only way to allow traffic from one vNIC to leave over different pNICs at the same time.

It is interesting that even with LBT bandwidth is still limited to the maximum bandwidth a single pNIC can provide for individual VMs / vmkernels, also IP hash will not get higher than a single pNIC for a vMotion or other point to point connections. So 10G is going to perform better for these operations than 10x1G, however you team them.