I’m often surprised how little web developers know about the HTTP protocol. I’m by no means an expert (feel free to correct what I write in the comments!) but I think I know a fair bit and I’ll outline the basics here.
Why does it matter? What if you’re a frontend developer doing AJAX calls? How do you know that the request succeeded? Do you check that the response text isn’t empty? The status code is 200? Neither is optimal, though luckily most good libraries will take care of figuring it out.
At its most basic HTTP is a request and a response. Messages are divided into a group of headers and a single body. Each request has a method, the most common of which are GET and POST. It will also indicate the host (e.g. http://www.google.com) and the document, which may simply be ‘/’ (the host root) or ‘*’ (a wildcard). There are a lot more headers that a HTTP request may include, but those are the bare minimum. Most requests will have an empty body, though POSTs and PUTs are normally specifically used to send content to the server in the request body. A request body will consist of key value pairs, with the text pairs URL encoded and binary data base64 encoded.
Response messages will at least consist of a status code header and there is usually a message body too. A HEAD request specifically indicates that the response should only include headers (they are treated as GET requests otherwise). Your standard response has a 200 status code header, some other headers describing when and how the request was served, and a body containing an HTML webpage.
As you may have noticed, HTTP methods are a set of verbs that let the client tell the server ‘give me this information’ or ‘process this data’. Since each verb means a specific thing, you actually get a very defined and powerful little language. As I see it, it is this observation (among other things!) that is REST. While they often aren’t, URLs are at their most powerful when they represent unique things. HTTP methods let us indicate what we want to do with them. So, GET /blog/ would indicate that we wanted to get the blog webpage, while POST /blog/ might indicate that we want to submit a new blog post. Likewise, GET /blog/2009/12/http-for-web-developers/ gets the post while PUT /blog/2009/12/http-for-web-developers/ sends an updated version of the post. As you see, we’re dealing with the same blog post, we’re just doing different things with it.
HTTP status codes are powerful things and in my opinion people often overlook the cool things you can do with the right request headers and response status codes. For instance, your average browser will often include headers in its requests indicating that it has requested the resource before (e.g. If-Modified-Since, If-None-Match) and, if based on those means of comparison the resource hasn’t changed, you can return a 304 response with an empty body and the client will use its cached version. Especially with binary data (images!) you can avoid lots of bandwidth usage, something especially useful for clients on slow (mobile!) connections.
One specific thing that I’d like to draw your attention to is requesting and serving feeds. Since a client may request it often looking for changes, you definitely want to be efficient about your responses. If-Modified-Since and If-None-Match are great but responses to the requests must be all or nothing: 200 or 304. If you’re returning a 200 code you must return the entire feed. But what if the user requested a feed of 500 items and only the last one was created since the If-Modified-Since date they sent? There is an unofficial but somewhat supported set of headers called A-IM and IM. The request includes the A-IM header with the value ‘feed’, indicating that they understand the resouce is a feed and so can support partial responses based upon their request criteria. The server can return a 200 or 304 as normal but it can also take a middle ground: it can respond with a 226 status code and the IM header, again with the ‘feed’ value, and only include the changed feed items in its body. Major savings!
One thing I should add is that dates are always a major pain in HTTP headers. The correct format is ‘Fri, 01 Jan 1990 00:00:00 GMT’. In printf format that’s ‘%a, %d %b %Y %H:%M:%S %Z’.
Both Google and Yahoo have very good tools for monitoring HTTP requests. The Yahoo Developer Network has great articles on bandwidth and speeding up websites. Significant portions of this work at both companies have been due to Steve Souders. His books High Performance Web Sites and Even Faster Web Sites are great guides.
With all this I think you have more than enough to make strong, efficient HTTP clients and servers. Good luck!