BuzzFeed Tech

Sharing our experiences & discoveries for the betterment of all!

Follow publication

What I Learned About Computer Networking During BuzzFeed Hack Week

--

For BuzzFeed Hack Week this year, a week in which all of BuzzFeed Tech puts down their day-to-day responsibilities to work on a project of their choosing, I took a computer networking class through Udacity. BuzzFeed Hack Week is not limited to “hacking” or building a project using code, design software, etc., but can also be an exploration into a topic you might want to learn about. Below is what I learned.

How does the internet work? How does it actually transmit information from one side of the world to another? How does it know where information is coming from, and where it should be sending the information? How does the internet recover from an overloaded of requests? If you’re like most people, you type in a page address into your browser, wait for the page to load, and if it doesn’t, you connect and disconnect to the internet, and try again. Have you ever asked yourself how the address “facebook.com” gets sent from your computer to the internet, how facebook.com receives the request, and how it sends the facebook.com homepage to your computer? If so, read on.

A representation of the internet, from http://www.makeuseof.com/service/web-based/

For starters, you should know, the internet is controlled by the Transmission Control Protocol/Internet Protocol (TCP/IP), a set of rules that tell computers or other devices how to packet, address, send, and receive information. After you type in “facebook.com” to your browser, a couple of requirements have to be fulfilled for the request to go through:

  1. Just like when sending an envelope through the United States postal service (USPS), when your computer tells the internet where it would like to request information from and send information to, it has to do so using TCP/IP rules. Just like USPS would not send your envelope if the send address was missing, placed in an unexpected side of the envelope, or written without a destination city, zip code, etc., TCP/IP rules state that in order to deliver “envelopes” or packets of information to and from different locations, it needs to know their IP addresses (you might have seen an IP address before, a number that looks something like this 208.80.154.224). An IP address is a set of numbers separated by four dots, that act like your “driver’s license” number on the internet, an official identification number that is unique to your device. We’ll talk about how this number came to be later.
  2. In addition to the IP address, TCP/IP requires that a request or a packet have packet headers to tell it what to do. These packet headers do include the source/destination IP address, but also how to deliver the packet of information. For example, USPS handles a packet containing fine china labelled ‘fragile’ very differently than a packet containing sturdy reams of white paper; the same is so for different kinds of data transmitted over a network, like ‘image’ or ‘text’, to name a few. The World Wide Web, email, and file transfers rely on TCP/IP. Though there is another protocol used for connections that are not streaming data, the User Datagram Protocol (UDP), we will focus on TCP/IP here, the most widely used protocol in the transport layer of the Internet Protocol Suite (other layers include the link layer, the internet layer, and the application layer).
  3. In order for USPS to deliver a package, there must be a mailbox, a porch, or even a package delivery door in a large building, some sort of entrance, for it to delivery the package through. Interestingly, your computer has 65,536 available “doors” or ports that it can open at any given time. Why 65,536 you ask? Seems like a random number, but it’s one that’s linked to the “size” of the field on the “packet header” that’s carrying that “door” or port information. “Size” or “length” when you’re writing a paper, for example, is measured in letters or characters, but in programming, the length of information is measured in bytes (a byte is equivalent to eight bits, or an octet as computer scientists call it, and each bit is a binary digit, a 1 or a 0…yes like you’ve seen in the matrix). To put it in perspective, TCP/IP packet headers can be anywhere from 20–60 bytes, which means, 160–480 individual bits, 0’s or 1’s. The space for a port number is only 16 bits (and remember, each of these either a 0 or a 1).
IP addresses explained, from https://www.expressvpn.com/what-is-my-ip

Try this short mental exercise: if you only had 0’s or 1’s to tell someone that there were 20 people in the room with you at an event, how would you do it? What sort of system would you invent with these 0’s and 1’s, to make sure the other person understood that there were 20 ppl in the room? You’re not allowed to use sticks, draw the number of people, or speak in any way.

The answer is, you’d probably come up with a secret code that only your friend understood. You might choose to say that the number 1 will be represented by “1”, the number 2 by “10”, the number 3 by “11”, the number 4 by “100”, so on and so forth (by the way, this is actually what these numbers in binary code represent!). To fully answer the question above, then, there are 65,535 available “doors” or ports on your computer because that is the highest number that can be represented by a 16-bit or 2 byte binary number (if you’re into math, since every bit can either be a 0 or a 1, one of two numbers, and if there are 16 bits allowed, 2¹⁶ would give you the total number of ports available, and, since in computer science we start counting at zero, the highest number port would be 65,535 not 65,536). IP addresses, the numbers that tell a network where to find your computer (described in number 1 above), or where to send packets of information, have similar length constraints. They can be a maximum of 32 bits long, which means about 4.3 billion possibilities (this system is called IPv4). That’s 2³² possibilities!

Cartoon on maximum port numbers, from Udacity’s Networking for Web Developers Class

S o, we need an IP address, packet headers, and a port to actually send and receive information. But how the heck does the internet handle so many requests at once?

Imagine that a billion packets were sent to be shipped at the post office on a given day, but there were only enough trucks, planes, and cars to deliver 10 million packets a day. How long would it take the post office to deliver all of the packets? 100 days. That’s a long time. Maybe the post office would enlist twice as many trucks, drivers, and charter more planes to handle the increase in packages being sent. That, however, would take a long time to do — USPS would have to hire a lot of people. The internet works in a similar fashion, though when it’s busy, it does something USPS would probably never do (at least not on purpose) — it “drops” packages and does not deliver them. This is called TCP congestion control.

Cartoon on TCP congestion control, from Udacity’s Networking for Web Developers Class

Imagine if for example, you have a home internet connection that can handle sending or receiving 1 million packets per hour, but the connection outside of your home can only handle 500,000 per hour, and a bottleneck of requests forms. It turns out that TCP has protections against this — TCP doesn’t actually start sending all the data requests you made at once (like “calling” facebook.com), but does so slowly at first, increasing speed only when it receives word from the place you’re looking to reach that the packets are going through at a normal speed. If there are too many packets to send, the router, a device that connects one IP network to another, will drop the packets to relieve pressure on the connection. Since the request is dropped, and no information comes back from the place or “server” you are trying to reach, the request will “time out”, and give you an error. The connection between your computer and the internet will speed up again as there are less packages to take care of. Were it not for the router, the whole connection would time out, which means, no packets of information would be sent or received anymore.

The IP address, router, and internet relationship, from Udacity’s Networking for Web Developers Class

But…we’re running out of IP addresses?!?

Did you realize above, when talking about IPv4, that there can only be 3.2 billion IPv4 addresses, yet if the world is over 7 billion people, and all were to be on the internet with their own IP addresses, not to mention on multiple devices each with an IP address, 4.3 billion is not enough combinations?

If you did, pat yourself on the back — the issue agitated programmers for a while.

At first, to delay the problem, programmers built something called the Network Address Translation (NAT), a workaround to not having enough addresses left.

NAT is like an office telephone system. An office, for example, has many employees, all of whom may have a phone on their desk that rings if their extension is called. Though the person on the other end of the line dials a central number, they are re-routed to a specific employee’s telephone number after entering their extension number. Just like there is a single telephone number where anyone can reach all company employees, NAT uses one public IP address to send and receive packets (every computer has it’s own “extension”).The “extension number” in a NAT is the port (explained in number 1 above) number on an individual computer that a package is either coming from or going to. The NAT (or a proxy, a similar system that works on specific HTTP requests) knows which port numbers “belong” to which computers. The NAT is in charge of re-writing the addresses on these packets, the “to” or the “from”, so that packets get to where they need to go.

NAT diagrams, from Udacity’s Networking for Web Developers Class

Though NAT is still used today in offices, home networks, and other places, it did not solve the scarcity of IP addresses. Knowing a system based on IPv4 addresses would not be feasible forever, programmers came up with a new system in 1999, the IPv6. Instead of IP addresses being constrained to 32 bits, IPv6 uses 128 bit addresses, which allows for 2¹²⁸ addresses, more than can ever be used in the foreseeable future.

A system that has 3.2 billion users (estimated by the International Telecommunication Union in 2015), however, is not easily transitioned to a new IP address system. Given that the old system and the new system are not compatible, most networks and devices made today must accept both IPv4 and IPv6 addresses. The transition, however, has proven to be a slow one. As of July 13, 2017, only about 20% of Google users access it over IPv6. As is apparent in the map below, that percentage is higher in more developed countries that can afford to purchase new devices and software.

World IPv6 adoption, as measured by Google (see more: https://www.google.com/intl/en/ipv6/statistics.html#tab=per-country-ipv6-adoption&tab=per-country-ipv6-adoption)

You now can imagine how packets of data are sent back and forth through the internet, making sure to follow the rules of TCP/IP, and understand how the internet keeps itself from shutting down when there are too many packets being sent back and forth around the world. You understand how basic binary numbers work, and have been introduced to the ever-present IPv6 IP address transition. Hopefully you’ve been intrigued enough to take this introduction and keep exploring the wonderful world of computer networking principles.

Check out more projects that we worked on during Hack Week here.

To keep in touch with us here and find out what’s going on at BuzzFeed Tech, be sure to follow us on Twitter @BuzzFeedExp where a member of our Tech team takes over the handle for a week!

--

--

Published in BuzzFeed Tech

Sharing our experiences & discoveries for the betterment of all!

Written by Angie Ramirez

Software Engineer @Livepeer. Previously @BuzzFeed @Yale. Traveler. Writer. Thinker.

No responses yet

Write a response