Ok, so this is a weird as it sounds.
We've been working with ISA and TMG since 2004, this is the first time I've seen this kind of behavior. Let me explain the details.
We implemented 3 TMG 2010 Servers in an Array and 2 EMS Servers on Windows Server 2008 R2. Each TMG Server has 4 NICs (Internal, External, DMZ-Intra-array). At first we wanted to enable them with an F5 Hardware Load Balancer but after weeks of trying to make them work together we couldn't (SNAT and routing issues related), so we tried using Windows NLB but had problems with the Multicast configuration using VMWare and after some other battles we decided to first try out just using one TMG Server as the main one to try to make it work. The customer we are implementing this is currently using ISA 2006 and they wanted to upgrade to TMG 2010 using basically the same stuff as their ISA had, so we backed up that configuration and imported it into TMG without problems. We added the TMG Servers on the EMS configuration and everything replicated just fine.
Since they already had IPS, Cisco ASAs and Ironports as Proxy they decided to disable NIS, Malware inspection, Flood Mitigation and all those things TMG has for better securing Internet traffic.
The firewall policy rules are about 100 and they have 3 publishing rules to HTTPS Services.
So after making the necessary configuration changes to the TMG infrastructure, we then decided to unplug the ISA Servers, change the TMG servers IP Address to the ISA Server ones and test to see if everything worked just as ISA Server did. However it didn't.
At first we have issues related to slow internet traffic, after troubleshooting for some time we ended up finding out that the Source IP used by TMG was different that the one ISA was using, even if the same IP was configured in the NIC and the other IPs were configured as alternate. We found out after some searching that Windows Server 2008 R2 uses some RFC and manipulates the IP Address on a NIC in a way that 2003 didn't. We found out that we needed to add the other IPs via Netsh int ipv4 add address<Interface Name> <ip address> skipassource=true
After that configuration we got things working fine... for a while, several hours later, servers started losing connectivity, switches stopped responding and the entire network was collapsed! After unplugging the TMG Servers, everything returned back to normal. We though this was a issue related to drivers or something to do with VMWare plataform, so it was decided to reinstall everything on physical servers.
After some days of reconfiguring again TMG Servers, we made the switch again, unplugged the ISA Servers, configured the TMG with the ISA IP Addresses, did the NETSH thing and then tested out everything and everything worked.
But again hours later the same behavior appeared once more! Servers and switches stopped responding and the entire network went down once more! Again we unplugged the TMG Servers and everything returned back to normal!
So here we are, back to square one with no clue on what is causing this behavior on the network. The current physical servers are running HP 3666i 4 multiport 10Gb NICs, we don't know if that has something to do with this. Or the fact the the switch core to which the TMG servers are directly connected to is a Nexus 7000 and there is some configuration issues with it against the TMG or something. The TMGs are patched with Service Pack 2 Update Rollup 5.
We are probably going to open a support case with Microsoft with this issue, but we first wanted to see if anyone else may have had, seen or heard something related to this and has an explanation or ideas on why is this happening.
I appreciate any replies.
Thank you all.
Eduardo Rojas