Demystifying HTTP

A primer

Ian Ring | Netsuite Inc

Communication using electricity

Average 14 wpm

Using a modified version of Morse code

Electricity travelling through wires
does not behave nicely

It's erratic and noisy

A lot of hard work was done in the 1920s

Harry Nyquist, engineer at Bell Labs and AT&T, invented the facsimile machine, and developed improved methods for sending pulses of electricity through wire, making telegraphy and television possible.

Thanks to people like Harry,

we have a reliable way to send electricity as a clean signal

between two machines

Communicating data using electricity

Pulses, waves, frequency modulation, etc.

Suffice to say all those problems were solved in the 1920s-1930s

but what about networks?

There was already a known way

Connecting electronic devices to each other in a large network

Physically moving wires around isn't efficient

Connecting every machine to every other isn't scalable

This problem actually took a long time to figure out

About 45 years actually

And lots of people were working on it

Part of the challenge was waiting for computers to evolve into

machines complex enough to do the operations that a network requires

It was eventually solved

NCP introduced numeric addresses and open "connections"

But it wasn't perfect

NCP was a bit flaky and doesn't scale well

Then along came

Vint Cerf

Bob Kahn

"Internet Protocol"

IP puts data into small "packets"

Packets take hops through a network

Eventually being delivered to an IP address

Anatomy of an IP Packet

...but did the message arrive ok?

IP doesn't verify to the sender
that a message arrived

And it doesn't care

Vint and Bob figured that one out too

They named it "TCP"

Transfer Control Protocol

TCP is a conversation between two machines

comprised of several one-way messsages sent via IP

So it's often called "TCP/IP"

A TCP/IP "connection" is not like a direct conduit

it's indirect and complicated, but remarkbly reliable

An open TCP/IP "connection" means

both sides are ready to communicate.

IP Packet

TCP/IP Packet

 

SYN

SYN,ACK

ACK

 

FIN

ACK

FIN

ACK

 

TCP provides acknowledgement

that the packets are all there, and in the correct order.

there are other protocols

for situations where speed > reliability

Because it checks that all the packets arrived and are put back together in the correct order,

TCP/IP is the most reliable protocol

for sending data from one machine to another over a network.

Most reliable, but not the fastest

A TCP/IP packet isn't always the same size

An average packet will be close to 1500 bytes

at least 64 of those bytes is IP and TCP bubblewrap

TCP/IP involves information going in both directions

but it is merely sending a message one way

We're all used to our computers being able to do TCP/IP without thinking about it

but that wasn't always the case

How many of you remember this?

How many of you remember this?

Lots of different things can be sent over TCP/IP.

After all, the "message" is just a long sequence of 0s and 1s.

What can be done with that?

"ports" are a TCP thing

Thanks, Vint.

Protocol

An agreement how information will be structured and how each side will send and receive it.

Each port expects content to adhere to a protocol

Telnet

Port 23

Gopher

Port 70

SMTP

Port 25

SSH

Port 22

BitTorrent

Ports 6881-6889

Bitcoin

Port 8333

HTTP

Port 80

Tim Berners-Lee

Squatting on port 80 since 1990

Sort of invented HTTP and HTML

HTTP is very similar to the Gopher protocol (1991)

HTML is based on SGML, a language already being used where he worked at CERN.


Tim B-L added hyperlinks to SGML, and made some improvements to Gopher.
He called his new protocol "HTTP" and the new language "HTML", and his network of interconnected documents became the "World Wide Web".

HTTP

Uses TCP/IP to transmit two messages.

1. Request

2. Response

TCP/IP makes sure they both get delivered.

Request

A request tells the destination to send a response containing the requested resource.

Response

In TCP/IP, it does not automatically follow that a request is fulfilled by a response. TCP/IP is a one-way communication. The request/response cycle is part of the HTTP protocol.

Demo time!

Telnet is not magic, it's literally typing in raw data to send over TCP/IP to a port.

A simple request

The HTTP Protocol is so friendly...

you can type in an HTTP request manually with telnet.

$ telnet ianring.com 80
Trying 50.57.138.29...
Connected to ianring.com.
Escape character is '^]'.
GET /hello.html HTTP/1.1
Host: www.ianring.com
						

The response

HTTP/1.1 200 OK
Date: Tue, 22 Sep 2015 20:39:24 GMT
Server: Apache/2.2.23 (Fedora)
Last-Modified: Thu, 17 Sep 2015 04:55:59 GMT
ETag: "3a914-2b-51fea3b385610"
Accept-Ranges: bytes
Content-Length: 43
Connection: close
Content-Type: text/html; charset=UTF-8

<html>
<body>
hello world!
</body>
</html>

						

A real GET request

made by Firefox

HTTP VERBS

  • GET

    Request content from the server

  • POST

    Send data to server

  • HEAD
  • PUT
  • DELETE
  • TRACE
  • OPTIONS
  • CONNECT
  • PATCH

A simple GET request

Tells the server what host you're trying to contact

Because a single IP address could be hosting more than one website

This one is required by HTTP/1.1

Identifies what kind of agent is making the request.

not required

Identifies what kind of response the agent expects back.

and also how it may be compressed or encoded

Identifies "statefulness", aka persistence of an agent.

Cookies are really simple, but the way they're used is complicated.

Suggests that the client is capable of "keeping alive" a TCP connection, so the TCP/IP layer may use the same TCP connection to send multiple HTTP requests/responses

The connection timeout is still usually really short, like anywhere from 5 to 15 seconds

Two ways of saying "don't give me a cached version, get a fresh one".

RFC 2616 defines 47 different request headers

There are plenty of other ones that are commonly used

And you are allowed to make up your own

HTTP response

HTTP Status Codes

  • 200 OK
  • 404 Not Found
  • 301 Moved Permanently
  • 403 Forbidden
  • 500 Internal Server Error
  • ... and plenty more

This is precisely when the server created the response

Identifies the kind of server making the response

Caching has a huge impact on speed

If the content hasn't changed, the server doesn't need to send the whole response

Tells the client that this server accepts "byte ranges"

Nice to know, if you're downloading a big file.

Explains that the length of the body should be precisely this many bytes

And if it's not... something went wrong

Instructs the client to close the TCP/IP connection

But remember it's not actually two things connected

Declares what type of content this is

HTTP provides a foundation

for interactive hyperlinked documents

  • URLs
  • Message syntax and routing
  • Requesting page assets (images, scripts, stylesheets)
  • Authentication
  • Redirection to new URLs
  • Caching
  • Encoding
  • Range Requests
  • Application State (cookies)

RFC 2616

Every developer should read it.

...and its sequels: 7230, 7231, 7232, 7233, 7234, and 7235.

Summary so far

  • HTTP isn't mysterious
  • It is very detailed and quite smart
  • The really hard stuff is in lower layers

Intermission