We’ve recently encountered a repeated issue with the “Agents” in the TotalMail application that relay between the Mobile Communications provider(s) and TMWSuite, which we have deemed “TotalMail Connection Throttling”.
This has recurred enough that we wanted to highlight the situation and solution applied in case it can assist others with similar issues. If you suspect that you are experiencing this problem then check for the following criteria and behaviors:
COMMON CRITERIA
- Multiple Agent Instances
- High Messaging Volume
- More than ~100 Units
EXHIBITED BEHAVIORS
- Significant Queued Messages/Volumes in the Delivery Agent
- Agents frequently hang or crash when processing messages
- Slow message processing speeds (5+ minutes to leave TotalMail)
- Restart Scripts are continually restarting Agents during processing
- Inability to communicate over the network on the Agent machine
VERIFICATION
To ensure this issue is the cause, run the following command in an Command Prompt on the TotalMail Agent machine.
netstat -anop tcp > “OpenSockets.txt”
notepad “OpenSockets.txt”
This exports a list of all currently open network connections in a file called “OpenSockets.txt” and opens it in Notepad.
A typical computer will have a few dozen to a few hundred open sockets. When this issue is occurring you will see THOUSANDS of open Sockets, all of which will say “TIME_WAIT” and have no attached process (0 Process ID at the end).
“TotalMail Connection Throttling” is a conflict with the way that these agents communicate information over the network and the default configured limits within the Windows Operating System.
It works like this:
- TotalMail needs to talk to the SQL Server to get/update/process messages.
- It opens a connection to the SQL Server, sends a brief communication, then closes.
- Windows leaves the connection open as it doesn’t get a definitive close message from the SQL Server or TotalMail.
- After a period of time these are closed. However there are a limited number of connections available.
- If TotalMail opens them faster than they timeout, eventually you run out of connections and everything pauses until more are free.
The default limits in Windows are approximately 4,000 connections and a timeout of 4 minutes. TotalMail can open several dozen connections per message and per agent meaning that if you are processing more than a few hundred messages every 4 minutes or have several instances running in parallel you can run out of connections.
Fortunately, this is easy to remediate. If you adjust the limits in the default configuration to shorten the timeout and increase the possible number of connections you will set the ceiling high enough that it is unlikely TotalMail can fill the connections up before they timeout.
Adjust the following Registry settings.
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
“TcpTimedWaitDelay”=dword:0000001e
“MaxUserPort”=dword:0000ffff
“TcpMaxDataRetransmissions”=dword:00000005
“TcpNumConnections”=dword:00fffffe
This sets the Timeout to the floor of 30 seconds (so TIME_WAIT connections are killed after that much time), increases allowed ports and connections to the maximum possible (65K), and lowers the checks for a TIME_WAIT response.
Once these settings are modified, run the following command in an elevated command prompt (which ensures the Operating System won’t try to tune them from what was set).
netsh int tcp set global autotuninglevel=disabled
Finally, reboot after implementing to observe results. Naturally, I encourage a back up of existing settings before changing (though they’re likely set to Windows default values), however making these modifications has proved to be a very expedient remediation to this issue, so I hope it can help a few more.