We recently fixed a problem where a user in London connected to an Oracle database server (using SQL Developer) in Atlanta. She ran a query and then let the connection sit idle for 12 minutes while she answered email. When she went back to run another query, SQL Developer just spun, ultimately failing with a connection timeout.
We broke out our trusty copy of Wireshark, and started sniffing on her workstation, hoping to see who was cutting off the conversation: the client or the server.
As you can see in the screen capture below, we have a normal Oracle TNS protocol conversation of PSH, ACK’s between her client at 172.25.4.29 and the server at 172.27.10.219. At 72 seconds (1 minute) into the trace, everything is fine. However, at 716 seconds (12 minutes) into the capture, the client tries to send a new SQL query to the server, and gets no response. It tries retransmitting its request several times-highlighted below as TCP Retransmissions, even marking one packet with the URG (urgent) flag to force the server to pay attention.
What is going on here? Clearly the Oracle client still thinks it has a TCP-level connection with the Oracle server, but in reality the TCP session has already been torn down. We ultimately traced the problem back to a BlueCoat network appliance sitting between the user and the server that was disconnecting the session based on its own timeout value.