|
This might come in handy at some time for people using NLB clusters in test environments.
I run a test lab in my office [1] which allows me to perform certain performance and scalability tests on applications or parts. I routinely reconfigure these machines, for example, in Windows Network Load Balancing (NLB) clusters - or combinations of clusters - to reflect different scenarios. This time however, I decided to repave three of these machines to give me a known equal configuration for a critical performance test.
Oddly enough, after installing new copies of Windows Server 2003, and mapping the servers' drives to deploy an application, I noticed a really strange behavior. Errors like "Invalid Drive Specification" and "The target account name is incorrect" routinely happend in very interesting combinations. I was for example able to deploy my app to \\TESTSRV01\Deploy, but \\TESTSRV02\Deploy would fail in the same deployment script. A couple seconds later, I could deploy to TESTSRV02 but the connection to TESTSRV01 would fail with the same error message. They basically randomly worked and randomly refused to work -- the only certainity was that a single one would work at some time, but any other server wouldn't.
MSDN and friends pointed me towards synchronization conflics in an active directory, but I was 100% sure that this couldn't be the cause. It however led me to the assumption that my deployment client actually tried to connect to the servers with randomly incorrect security tokens ... or similar. But how could this be? The machines have been independently installed ... I didn't use any shared/ghosted images which could usually cause such things when not correctly SYSPREP'd.
At some point I decided to try to see if there's some conflict or similar on the LAN. I use DHCP, so I assumed that there wouldn't be any problems. However, pinging TESTSRV1 and TESTSRV2 revealed the impossible: both have been given the same IP address. As there weren't any warning about "IP address conflicts on your LAN" it started to dawn: there is only one reason -- they must use the same ethernet MAC address. But then again, that's basically impossible, right? Running IPCONFIG /ALL revealed the truth: both indeed had the same MAC.
And all of a sudden, it hit me right in the face: when you add machines to an NLB cluster in which each node has two NICs (one which is used for node-to-node traffic and one which is used as the external-facing interface), all external-facing NICs will receive the same virtual IP and MAC. When I formatted the machines, I didn't remove them from the cluster beforehand (after all, I destroyed the whole cluster anyway), so that they still continued to use their old virtual MACs. I somehow mistakenly assumed that this configuration is done at runtime, but instead it seemed that the changed MACs were persisted to the NICs flash memory. When the machines came back alive as non-clustered machines, all of them still used the same MAC and - rightly so - challenged the DHCP server, my windows client, and a few of my beliefs.
Lesson learnt!
[1] Yes, there was a time when people wondered how one person could possibly need more than ten PCs. But I guess most of my visitors are used to it nowadays.
|