Highly Scalable: Issue: Load Balancer Errors Occur When Idle and Then Receiving a New Request

WEB: Webserver IP
API: Web API Server IP

Saturday, August 11, 2012

Today's unexpected load balancer behavior!

Although it hasn't been resolved yet, based on packet analysis through Wireshark:

When the VIP is called, if there is no call for a certain period and then a new call is made, a communication error occurs.

IP addresses are replaced with placeholders:

No	Source	Destination	Source MAC	Destination MAC	Flag	Sequence	Window	MSS	ACK	WS	SACK Perm	Description
1	WEB	API	MS_b7:1c:10	Radware64	SYN	0	8192	1460		256	1	Initiating communication with API server, are you ready?
2	API	WEB	MS_b7:1c:39	MS_b7:1c:10	SYN+ACK	0	8192	1460	1	256	1	Ready! Send it over!
3	WEB	API	MS_b7:1c:10	Radware64	ACK	1	131328		1			Please execute this API request.
4	API	WEB	MS_b7:1c:38	MS_b7:1c:10	RST	1	0					What nonsense is this?

Based on the packet analysis results shared with the infrastructure team:

The root cause was an incorrect MAC address registration for the firewall acting as the gateway for VIP 150.
Only the MAC address of the active L4 equipment should have been registered, but the backup L4 equipment's MAC address was also registered.
(This is an occasional issue during initial setup when the ARP table isn’t cleared thoroughly.)
Requests were sent to both the active and backup L4 equipment, causing the packets to be transmitted to both servers simultaneously, disrupting the sequence.
As a result, the server reset the out-of-sequence packets faithfully.
After deleting the VIP 150 ARP table on the firewall and clearing ARP tables on each server, the issue seemed resolved.
The L4 load balancing was also restored to the round-robin method.
The issue was resolved after reconfiguring the ARP table properly.