For some reason it seems like when you access the internet wirelessly you’re asking for trouble: invariably you’ll run into flakey WiFi connections and weird cellular data problems. There are are two problems that I frequently encounter, WiFi networks hijacking your requests so you log into their site first and cellular data ‘accelerators’ messing up HTTP headers. Both of these are caused by what are essentially proxies breaking normal internet connections and should not be done. However, we live in the real world and so, after first explaining the problem in greater detail, I’ll outline an approach I’ve been mulling over with friends the last few weeks that at least mitigates the poor user experience.
First, WiFi: how many of you have logged onto a WiFi network in a public place, say a café or a hotel, and had to first visit some login page before you could access the real internet? This makes some sense when you need to pay or input a code first but sometimes it’s just to accept the network’s terms of service. But silly terms of services that no one will ever read are not the real problem. Worse is the fact that these networks make you login by hijacking your web requests. HTTP requests until you’ve successfully logged in1 are redirected to to some sort of login webpage. It will often do this redirection by simply pretending to be the requested site, either by a 302 or 303 status code or, much worse, and yes I have seen this, by masquerading as the requested site while returning a 200 status code. This means that your browser would say that you’re http://www.google.com but you’re seeing a login page for Stupid WiFi Company instead!
And why is this so bad? First, not all HTTP requests are made in a web browser. I’ve connected to WiFi networks on my iPhone and then refreshed my Twitter client only to get an error message saying that it could not parse the JSON. Why was that? Well, of course the app made an HTTP request to a Twitter API method that should return a JSON object but the instead the proxy was returning the HTML for the network login page.
Accepting that sometimes wireless networks need users to authenticate themselves first and that the network password is not sufficient,2, we have to accept that these login screens are here to stay. So how would I like it to work? Ideally the hijacking responses would return a 402 Payment Required code and provide the login page with a location header. Or, if that’s a little too much, at least a temporary redirect, ideally using the 307 code. And of course the proxy should never pretend to be the original requested location.
What about the clients, the ones making the requests? Ideally they wouldn’t have to do anything. However, if it receives a response in a format other than the one it expected and without information from the API why, it should assume that it is HTML is being provided by a proxy sitting between it and the real location. It should then request to the user that they log into the network on the provided page. I believe that the official Twitter iPhone app now does essentially this, showing a popup with text saying that the user appears to be on a network requiring login.
Moving to cellular internet connections, they often have transparent proxies in order to minimize the amount of data sent to phones. The problem, of course, is that T-Mobile is particularly notorious for its proxies, something called Speedmanager Plus3 or Web-n-Walk Accelerator, which very aggressively compress images before sending them on to users. Clients can disable the proxy by sending headers disabling all caching. However, I’m not wild about this solution as I wonder whether it will lead to false positives, where there.
I learned about T-Mobile’s proxy when building Smakelijk Amsterdam for Het Parool. Our contacts at Parool used the app and said it performed just fine at work but was incredibly slow anywhere else. We could not duplicated this problem, whether on WiFi, my Rabo Mobiel connection, or on Gerard’s T-Mobile connection. Finally we realized that the reason it was working for the Parool people at work was because, thanks to their massive office building blocking all cellular signals, they were only able to connect via WiFi there. However, both of them had enabled T-Mobile’s ‘accelerator’ while Gerard had not. When they were outside the office and away from WiFi their phones switched to their cellular data connections, so it was the proxy causing the slowdown. But how could a proxy make a request slower?
Ironically enough, it all came about because we were trying to make our requests faster! Our API consisted of one method which would return an XML feed of all recent restaurant reviews. Because this XML files was so large with all the restaurant reviews, it made sense to download the whole feed once, cache it, and then only download new any elements. Rather than roll our own set of query parameters to indicate what to return and what wasn’t needed, I decided to use the the HTTP headers that exist for that. Logical, no?
Specifically we used If-Modified-Since and Last-Modified headers to indicate just that. How it works is that the server includes the Last-Modified header with its responses. The client stores the value and uses it with the If-Modified-Since header in its subsequent requests. If nothing has changed since that time the server returns a 304 Not Modified status code and an empty body for its response. If something has changed then it returns a 200 status code and the entire feed. I don’t think T-Mobile messed with this at all.
However, do you see the problem with the approach above? If anything changes we get everything returning. So, we can still do better. I decided to implement the proposed IM: feed approach. Basically all you do is add another header that indicates deltas are acceptable and then the server can response with a 226 IM used status and just the delta in the body. Bam, responses are much smaller! However, these headers are not standards yet, only proposed ones (RFC3229+feed). For whatever reason T-Mobile would drop them (I’m guessing for them being non-standard), meaning that phones using their mobile internet service (i.e. the slowest connection type) would get the all or nothing response (the larger format)!
Talking to friends recently I think we’ve come up with a decent work-around. Put simply, before making a first API request, make a special API request with all headers you would like to use with your normal request. The server ignores the headers but instead returns all of them back to the client, which can use the knowledge of which headers arrived at the server to determine whether there is a bad proxy service in the middle and act accordingly. One fallback, assuming you have control over the server too, would be to include the missing headers in the request’s query parameters or body, neither of which should be touched by the proxy (assuming we don’t have the WiFi network problem!).
Of course, if you have to do that sometimes, why not all the time? I guess that you should generally just prioritize standards. Another thought, particularly if you have several APIs whose client requests are missing headers, is to implement the query parameter approach but have your own pass-through proxy in front of all of them that simply restores the missing headers, meaning that none of the specific APIs need to implement the missing header support.
This detection idea ties into the one existing detector I know of: Apple’s iOS WiFi connection tool. When you connect to a WiFi network Apple immediately and invisibly makes a request to apple.com. If it does not get the response it expects it will show, on top of whatever app you’re using, a window titled Login and containing the webpage returned by the annoying WiFi network. After login, presumably detected because iOS has been repeatedly trying the Apple URL in the background until it finally works, the window is dismissed and you can begin using the internet normally. I’ve found that this does not always work but it is quite reliable: it correctly prompts me maybe nine times out of ten.
The fact that Apple has implemented this at the OS level is a great one: WiFi login pages are (normally) a one time thing and not something each application needs to concern itself with. I look forward to Apple adding this to OS X, and hope every OS maker will. Unfortunately bad proxies like T-Mobile’s are more complicated. Because they’re always there, there’s no one action the OS can do so that the problem is fixed. Instead, each application needs to know what is getting dropped and work around it in a way the destination supports. I guess OS and HTTP client library makers could try to add additional proxy detection features, but I think it’d be hard and all it would give is just that, detection.
So where does all this leave us? I guess just that networks make things more difficult for us than they have to and make more work for developers who want to mitigate the problems. And if after all this I missed something, please let me know!
- Usually only on port 80, though I’d be very interested to hear if they’ve also had it happen on port 443, for HTTPS connections. [↩]
- There might be something in WPA2, specifically in RADIUS or EAP, that does this but I’m not aware of it. [↩]
- Which apparently, at least a few years ago, was based upon the Flash Networks NettGain 1200. [↩]