SSL Timeouts and layer 3 infrastructure

I’ve spent the last 5 days agonizing over a very hard problem on my network. Using curl, LWP::UserAgent, openssl, wget or any other SSL client, I’d see connections either timeout or hang halfway through the transfer. Everything else works fine including secure protocols like SSH and TLS. In fact inbound SSL connections work great too. It’s just when I connect to an external SSL host that it hiccups.

If you remember your OSI model, SSL is well above layer 3 (IP addresses and routers) and layer 2 (LAN traffic routed via MAC addresses). So the last place I planned to look was the network infrastructure.

I eliminated specific clients by trying others and I eliminated the OS by spinning up virtual machines running other versions of Linux. I elminated my physical hardware by reproducing it on a non Dell server and having one of the ops guys repro it on his OS X macbook.

And just to prove it was the network, which is all that was left, I set up a VPN from one of my machines that tunnelled all traffic over the VPN to a machine on an external network that acted as the router, thereby encapsulating the layer 2 and 3 traffic in a layer 4 and 5 VPN. And the problem went away. So I knew it was the network.

Tonight a few minutes ago my colo provider took down my local router and I gracefully failed over to the redundant router, and lo and behold the problem has gone away.

I still don’t know what it is, but what I do know is that a big chunk of layer 3 infrastructure has been changed and it’s fixed a layer 5 problem. What’s weird is that TCP connections (which is what SSL rides on top of) have delivery confirmation. So if the problem was packet loss, TCP would just request the packet again. So it’s something else and something that only affects SSL – and only connections bound from my servers out to the Internet.

The reason I’m posting this is because during the hours I spent Googling this issue this week (and finding nothing) I saw a lot of complaints about SSL timeouts and no solutions. So if you’re getting timeouts like this, check your underlying infrastructure and you might just be surprised. To verify that it’s a network problem, set up a VLAN using PPTP. Set up NAT on the external linux machine that is your VLAN server. Then disable the default gateway on the machine having the issue (the VLAN client) and verify that all traffic is routing via your VLAN. Then try and reproduce the SSL timeout and if it doesn’t occur, it’s probably your layer 2 or 3 infrastructure.

4 thoughts on “SSL Timeouts and layer 3 infrastructure

  1. Pingback: SSL Network problem follow-up

  2. There is no software or hardware that processes or filters above layer 3 between the machines and workstations I’ve tested and the open Internet. You can walk up to my switch, plug in, assign yourself a public address and be connected directly to the unfiltered internet. The only thing between you and the rest of the world is the default gateway that routes at layer 3, and the problem still exists.

    I’ve also used iperf to simulate client server connections in both directions on both port 443 and 80, and there is no throttling of TCP packets of any kind.

    My provider says they had a sniffer on the wire and while repro’ing the problem they would see duplicate packets arrive from upstream. I did the same thing and didn’t see that, but sound like a clue.

    Last night we shut off one of the upstream providers (there are about 4 of them) and the problem went away. But I’m not convinced it’s that upstream provider because I’ve seen the problem exist for routes through other providers.

    it’s the strangest thing I’ve seen. :)

  3. I’m not an expert… I think this problem might be related to some kind of rule (like outbound firewall) on part 443. I don’t think SSL has anything to do with it, but it’s TCP problem. Maybe there is a throttling on that port or a bandwidth limit, packet limits, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.