No-latency SSH sessions on a 5Ghz WiFi router with 250mw radio

Disclaimer: You may brick your fancy new Linksys router by following the advice in this blog entry. A large number of folks have installed this software successfully including me. But consider yourself warned in case you’re the unlucky one.

I use SSH a lot. My wife and nephew love streaming video like Hulu instead of regular cable. For the last few years there’s been a cold war simmering. I’m working late, they start streaming, and my SSH session to my server gets higher latency. So every time I hit a keystroke it takes 0.3 seconds to appear instead of 0.01. Try hitting 10,000 keystrokes in an evening and you’ll begin to understand why this sucks.

I’ve tried screwing with the QoS settings on my Linksys routers but it doesn’t help at all. I ran across a bunch of articles explaining how it’s useless to try to use QoS because it only modifies your outgoing bandwidth and can’t change the speed at which routers on the Internet send you traffic.

Well that’s all bullshit. Here’s how you fix it:

Upgrade the firmware on your router to DD-WRT. Here’s the list of supported devices. I have a WRT320N Linksys router. It’s a newer router that has both a 2.4 Ghz and 5Ghz radio. Many routers that look new and claim to support “N” actually just have 2.4Ghz radios in them.

The DD-WRT firmware for the WRT320N router is very very new, but it works perfectly. Here’s how you upgrade:

Read Eko’s (DD-WRT author) announcement about WRT320N support here. The standard DD-WRT installation instructions are here so you may want to reference them too. Here’s how I upgraded without bricking my router:

  1. Download the ‘mini’ DD-WRT here.
  2. Open all the links in this blog entry in other browser windows in case you need to refer to them for troubleshooting. You’re about to lose your Internet access.
  3. Visit your router’s web interface and take not of all settings – not just your wireless SSID and keys but your current MAC address on your Internet interface too. I had to clone this once DD-WRT started up because my ISP hard-codes MAC addresses on their side and filters out any unauthorized MAC’s. I’d suggest printing the settings direct from your web browser.
  4. Use the web interface (visit usually) and reset your router to factory default settings.
  5. You’ll need to log into your router again. For linksys the default login is a blank username and the password ‘admin’.
  6. Use Internet Explorer to upgrade the firmware using your router’s web interface. Apparently Firefox has a bug on some Linksys routers so don’t use that.
  7. Wait for the router to reboot.
  8. Hit with your web browser and change your router’s default username and password.
  9. Go to the Clone MAC address option and set it to your old Internet MAC address
  10. Set up your wireless with the old SSID and key
  11. Confirm you can connect to the router via WiFi and have Internet Access.

Now the fun part:

  1. Go to Wireless, Advanced settings, and scroll down to TX Power. You can boost your transmit signal all the way to 251mw. Boosting it by about 70mw should be safe according to the help. I’ve actually left mine as is to increase my radio’s life, but nice to know I have that.
  2. Go to the NAT/QoS menu and hit the QoS tab on the right. Enable QoS. Add your machine’s MAC address. Set the priority to Premium (not Exempt because that does nothing). Hit Apply Settings. Every other machine now has a default priority of Standard and your traffic will be expedited.
  3. For Linux Geeks: Click the services tab and enable SSHd. Then ssh to your router’s IP, usually Log in as root and whatever password you chose for your router. I actually changed my username to ‘admin’ but the username seems to stay root for ssh.

You can use a lot of standard linux commands in SSH – it’s busybox linux. Type:

cat /proc/net/ip_conntrack | grep <YourIPAddress>

Close to the end of each line you’ll see a mark= field. For your IP address it should have mark=10 for all your connections. Everyone else should be mark=0. The values mean:

  • Exempt: 100
  • Premium: 10
  • Express: 20
  • Standard: 30
  • Bulk: 40
  • (no QoS matched): 0

Remember if no QoS rule is matched the traffic is Standard priority if you have QoS enabled on the router. So you are Premium and everyone else is standard. Much more detail is available on the QoS DD-WRT Wiki here.

The Linux distro is quite amazing. There are over 1000 packages available for DD-WRT including Perl, PHP and MySQL in case you’d like to write a blogging platform for your Linksys router. To use this you’re going to have to upgrade your firmware to the ‘big’ version of the WRT320N binary. Don’t upgrade directly from Linksys firmware to the ‘big’ DD-WRT – Ecko recommends upgrading to mini first and then upgrading to ‘big’. Also note I haven’t tried running ‘big’ on the WRT320N because I’m quite happy with QoS and a more powerful radio.

There are detailed instructions on how to get Optware up and running once you’re running ‘big’ on the Wiki. It includes info on how to install a throttling HTTP server, Samba2 for windows networking and a torrent client.

If you’d like to run your WRT320N at 5Ghz the DD-WRT forums suggest switching wireless network mode to ‘NA-only’ but that didn’t work for my Snow Leopard OS X machine. When I was running Linksys I had to use 802.11A to make 5Ghz work for my macbook. And likewise for this router I run A-only. You can confirm you’re at 5Ghz by holding down the ‘option’ key on your macbook and clicking the wifi icon on top right.

I prefer 5Ghz because the spectrum is quieter, but 5Ghz doesn’t have the distance through air that 2.4 Ghz does. So boosting your TX power will give you the same distance with a clear spectrum while all your neighbors fight over teh 2.4Ghz band.

Routers treat HTTPS and HTTP traffic differently

OSI Network Model

Well the title says it all. Internet routers live at Layer 3 [the Network Layer] of the OSI model which I’ve included to the left. HTTP and HTTPS live at Layer 7 (Application layer) of the OSI model, although some may argue HTTPS lives at Layer 6.

So how is it that Layer 3 devices like routers treat HTTPS traffic differently?

Because HTTPS servers set the DF or Do Not Fragment IP flag on packets and regular HTTP servers do not.

This matters because HTTP and HTTPS usually transfer a lot of data. That means that the packets are usually quite large and are often the maximum allowed size.

So if a server sends out a very big HTTP packet and it goes through a route on the network that does not allow packets that size, then the router in question simply breaks the packet up.

But if a server sends out a big HTTPS packet and it hits a route that doesn’t allow packets that size, the routers on that route can’t break the packet up. So they drop the packet and send back an ICMP message telling the machine that sent the big packet to adjust it’s MTU (maximum transfer unit) size and resend the packet. This is called Path MTU Discovery.

This can create some interesting problems that don’t exist with plain HTTP. For example, if your ops team has gotten a little overzealous with security and decided to filter out all ICMP traffic, your web server won’t receive any of those ICMP messages I’ve described above telling it to break up it’s packets and resend them. So large secure packets that usually are sent halfway through a secure HTTPS connection will just be dropped. So visitors to your website who are across network paths that need to have their packets broken up into smaller pieces will see half-loaded pages from the secure part of your site.

If you have the problem I’ve described above there are two solutions: If you’re a webmaster, make sure your web server can receive ICMP messages [You need to allow ICMP code 4 "Fragmentation needed and DF bit set"]. If you’re a web surfer (client) and are trying to access a secure site that has ICMP disabled, adjust your network card’s MTU to be smaller than the default (usually the default is 1500 for ethernet).

But the bottom line is that if everything else is working fine and you are having a problem sending or receiving HTTPS traffic, know that the big difference with HTTPS traffic over regular web traffic is that the packets can’t be broken up.

SSL Timeouts and layer 3 infrastructure

I’ve spent the last 5 days agonizing over a very hard problem on my network. Using curl, LWP::UserAgent, openssl, wget or any other SSL client, I’d see connections either timeout or hang halfway through the transfer. Everything else works fine including secure protocols like SSH and TLS. In fact inbound SSL connections work great too. It’s just when I connect to an external SSL host that it hiccups.

If you remember your OSI model, SSL is well above layer 3 (IP addresses and routers) and layer 2 (LAN traffic routed via MAC addresses). So the last place I planned to look was the network infrastructure.

I eliminated specific clients by trying others and I eliminated the OS by spinning up virtual machines running other versions of Linux. I elminated my physical hardware by reproducing it on a non Dell server and having one of the ops guys repro it on his OS X macbook.

And just to prove it was the network, which is all that was left, I set up a VPN from one of my machines that tunnelled all traffic over the VPN to a machine on an external network that acted as the router, thereby encapsulating the layer 2 and 3 traffic in a layer 4 and 5 VPN. And the problem went away. So I knew it was the network.

Tonight a few minutes ago my colo provider took down my local router and I gracefully failed over to the redundant router, and lo and behold the problem has gone away.

I still don’t know what it is, but what I do know is that a big chunk of layer 3 infrastructure has been changed and it’s fixed a layer 5 problem. What’s weird is that TCP connections (which is what SSL rides on top of) have delivery confirmation. So if the problem was packet loss, TCP would just request the packet again. So it’s something else and something that only affects SSL – and only connections bound from my servers out to the Internet.

The reason I’m posting this is because during the hours I spent Googling this issue this week (and finding nothing) I saw a lot of complaints about SSL timeouts and no solutions. So if you’re getting timeouts like this, check your underlying infrastructure and you might just be surprised. To verify that it’s a network problem, set up a VLAN using PPTP. Set up NAT on the external linux machine that is your VLAN server. Then disable the default gateway on the machine having the issue (the VLAN client) and verify that all traffic is routing via your VLAN. Then try and reproduce the SSL timeout and if it doesn’t occur, it’s probably your layer 2 or 3 infrastructure.

How to mirror someone elses web server with iptables

It took me a while to find this – I needed it for testing purposes, nothing malicious. If you’d like your web server somewhere on the web to pretend to be any other web server, even a secure one, you can do the following. x.x.x.x is your own server and y.y.y.y is the ip of the server you’re trying to mirror. I’m also assuming you only have one network card in the machine and it’s called eth0. The following will mirror a secure web server. If you’d like to mirror a regular web server, replace 443 with port 80.

iptables -t nat -A PREROUTING -p tcp -i eth0 -d x.x.x.x --dport 443 -j DNAT --to y.y.y.y
iptables -t nat -A POSTROUTING -p tcp -o eth0 -d y.y.y.y --dport 443 -j MASQUERADE

If this doesn’t work you probably have to enable packet forwarding like this:

echo 1 > /proc/sys/net/ipv4/ip_forward