The Internet - TCP/IP
How The Internet Came About - Standards
- Internet is about interoperation
- Standards are what make the internet work
- The internet is an incredible advance
- Standards are dull, so it's hard to believe
Buy a fan and plug it in
Suppose you buy a fan and bring it home. How will you plug it in? GE, Panasonic, ... you don't worry that you have a GE branded wall socket at home. Any vendor just works with the socket: it's a standard. This doesn't seem like a great victory, but many areas of computer science fall quite short of this. Perhaps appliance makers understand interoperability better than phone app startups.
This seems like just common sense, but it's very easy for things to land in a world without a standard.
First Example - Email Standard
- Standards are about "interoperability":
- Operation between parties
- -Interoperation between parties
- -Not tied to a single vendor
- Email is a great example
- 1. Standard address form: firstname.lastname@example.org
- 2. Standard email protocol: email accounts can send each other email
- You don't need a stanford.edu account to send email@example.com mail
- You don't need a gmail account to send email to someone with a gmail account
- Any email account can contact any other
- Downside: spam
- Phone numbers another example standard: 650-725-4727
Chat Systems - Lack of Standards
Each chat system vendor wants as many users as possible for itself. So to talk to X ... you have to remember what their ID is on a system you are both on.
- "it just works" standards is not the common state
- Easy to get in the "babel" state
- Each vendor wants works with itself
- Utility for users and vendors is terrible
- Each vendor pursuing its interest, hard to get out of the babel state
Exercise: How Many Chat Systems
How many different chat systems can you list?
- This is not a good system!
- How do you contact someone?
- Each system has its own address scheme
- Each person has a confusion of their own addresses
- One benefit: each system can prioritize certain features
Babel - Natural First State
Babel Analysis - Bad
- What we had pre-internet
- Each vendor has its own system
- Vendors tend to like "lock-in" where it's hard for clients to leave
- -e.g. your address book is stuck in the system
- Format-a to Format-b "adapter" for each pair .. awkward
- Requires on the order n-squared such adapters
- Babel: easy to get stuck here, not efficient
- e.g. Chat
Bible Babel quote: The Lord said, "If as one people speaking the same language they have begun to do this, then nothing they plan to do will be impossible for them. Come, let us go down and confuse their language so they will not understand each other."
Universal Business Adapter funny IBM ad about n-squared different adapters
Escape From Babel - Standards
- Create an "Open Standard" format
- Specification is freely available, not patented, not restricted
- Not under the control of a single vendor, vendor-neutral
- Can write XX-to-standard adapters for each vendor
- Now: interoperation works very well in the Open-Standard domain
- This is exactly how the internet works and has grown so amazingly
- TCP/IP standard, HTML standard, JPEG standard .. (details below)
- Economics: new competition can easily enter with the standard - great!
- Warning: vendors tend to put "open" "standard" on all sorts of things to make them sound better than they are
Study question: suppose you want to start a gmail competitor. What do your potential customers need to install? Custom network system? No, just use standard tcp/ip. Custom browser software? No, just use http/html standard browser.
Internet - TCP/IP Standards
- Previously .. LAN, e.g. ethernet, Wi-Fi, one house
- Internet - world-wide network built on open standards
- Internet is like a phone system for computers
- -Every computer has a unique address
- -Every computer can try to "call" any other computer
- TCP/IP Standards, 1974, government sponsored research
- Open standards, vendor neutral - successful pattern for infrastructure
- Capitalism is great at many things, but it's hard for vendors to come up with a good standard for everyone
- Although once the standard is in place, capitalism does great (e.g. the internet)
The previous LAN examples connect computers all on the same LAN. Now we will scale the problem up to send packets between any two computers on earth.
The worldwide Internet is built on the TCP/IP family of standards (Transmission Control Protocol / Internet Protocol) which solves the larger problem of sending packets between computers across the whole internet. These are free and open, vendor-neutral standards which is probably the reason they have been so incredibly successful.
- Every computer on the internet has an IP address
- Here looking at IP v4 addresses, v6 on the horizon
- e.g. 184.108.40.206
- IP addr is exactly 4 bytes (4 1-byte numbers)
- Left part encodes "neighborhood" on internet
- Just like phone, 650-725-0000
- e.g. 171.64.xxx.xxx generally Stanford campus
- e.g. 171.64.64.xxx my floor of Gates building
Every computer on the internet has an "IP address" that identifies it (like a phone number). The IP address is 4 bytes, written between dots, like "220.127.116.11". The left part of the address encodes in part where that IP address is in the whole internet -- for example any 171.64.(anything) is part of Stanford (like the area code of a phone number). More specifically, in my part of the Gates building, all the IP addresses begin 171.64.64.XX varying only in that last byte.
Sandra Bullock Blooper
There's a blooper in the movie The Net where the IP address "75.748.86.91" is shown. This is not a valid address, since 748 is larger than the largest possible byte value which is 255.
- Domain names
- e.g. "www.google.com" "web.stanford.edu"
- The famous x.org x.edu x.com x.gov names
- Basically human-readable names for IP addresses
- codingbat.com (name for 18.104.22.168)
- thneed.stanford.edu (name for 22.214.171.124)
- Domain names are easy for people to remember and type
- Domain system can look up an IP addr from a domain name
- So when you use a domain name, it is looked up to get an IP addr for the actual packets
- Registering a domain name costs $30 a year or so
- Whoever grabs it first can keep it, unless someone else owns the trademark
- A computer typically has an "upstream" router
- The router provides the internet connection - your traffic goes through it
- Router has multiple connections, copies-over/routes packets between them
- My office computer is at 126.96.36.199
- That computer connects "upstream" to router 188.8.131.52
- That router handles traffic for a few local computers
- Left side of computer and router IP addresses typically the same - same neighborhood
The most common way for a computer to be "on the internet" is to establish a connection with a "router" which is already on the internet. The computer establishes a connection via, say, ethernet to communicate packets with the router. The router is "upstream" of the computer, connecting the computer to the whole internet. For example, the computer in my Stanford office has IP address 184.108.40.206, and it has a one-hop ethernet connection to its router upstream at 220.127.116.11, and this router handles packets for my computer. Often the router's IP address will end in .1, such as my router's 18.104.22.168. Typically the IP address of the computer and its router will look at the same on the left side, since they are in the same "neighborhood" of the internet.
IP Packet - From: and To: IP Addresses
- TCP/IP defines a standard "IP Packet"
- Defines addresses, data format, checksum scheme
- The TCP/IP packet has both from: and to: fields
- The from: and to: fields are both IP addresses
That's A Lot Of Hopping!
How does a packet get around the internet? Answer: Hop Hop Hop Hop Hop Hop Hop Hop Hop. Strange but true.
- Suppose 22.214.171.124 sends a packet over to 126.96.36.199
- IP packet marked with ultimate From:/To: IP addrs
- Router strategy: send the packet 1 hop closer to its destination
- Hop 1: 188.8.131.52 sends packet up to its router
- Hop 2: 184.108.40.206 sends packet up to its bigger router
- Hop hop hop, over to destination, 10-20 hops typically
- Analogy: source capillary up to major artery, over, and down to destination capillary
Suppose my computer at 220.127.116.11 wants to send a packet to a computer at 18.104.22.168 somewhere out on the internet (actually that's the codingbat.com server I administer). The Internet is essentially made of a big web of routers talking to each other.
1. My computer prepares an IP packet which includes in particular From:/To: information as IP addresses, like this: (IP Packet From:22.214.171.124 To:126.96.36.199 data data data data).
2. My computer sends that IP packet to my upstream router, one hop, over ethernet. This is the "first hop" of the packet on its journey.
3. The 188.8.131.52 router looks at the To:/From: of the packet and forwards it to the next router, one hop closer to its ultimate destination. Essentially, the router has its own upstream router which is bigger and knows more about the layout of the internet. The packet is forwarded, one hop at a time, until it reaches its ultimate destination. Each router does not need to know the whole route to the destination; each router just needs to know which way to send the packet to get it one-hop closer to its destination. The routers look at the left part of the IP address to get the packet to the right neighborhood -- 173.255.x.x -- with the right part of the address -- x.x.219.70 -- coming into play only when the packet is near its ultimate destination.
- Each router knows enough to figure the next hop, not the whole route
- There is no "center" of the internet that knows everything
- The initiating computer does not typically know anything, delegating to its router
- "Core" routers, towards the middle, bigger, fancier, more connections
- Routers measure connection functionality/breakage all the time, choose alternative routes in real time
- Routers are another distributed, collaborative system
The routing of a packet from your computer is like a capillary/artery system .. your computer is down at the capillary level, your packet gets forwarded up to larger and larger arteries, makes its way over to the right area, and then down to smaller and smaller capillaries again, finally arriving at its destination. The ultimate destination puts all the packets back together in the right order to recover the original image file or whatever. The routers at the ends have a trivial upstream/downstream configuration, so the next hop for a packet is pretty simple. More central "core" routers tend to have several possible outgoing connections, so they have a more complicated choice about which link to use for the next hop.
The routers, collectively, measure what networks are reachable over what links, and dynamically adjust what links to use for each packet. One simple metric would be to route packets the way that takes the fewest number of hops. In reality, the metrics used are more complex than this. The routing system resilient to router hardware failures, overloading of certain links due to normal traffic, and links going down. The path taken by an IP packet can change from minute to minute. The routers are another example of a distributed, collaborative system. The old joke is that the backhoe is the IP packet's natural predator in the wild, as construction will sometimes slice through an important data cable, suddenly breaking a link in use. The routers "route around" such damage automatically.
Note that my computer does not need to know the layout of the internet. My computer just needs to have a connection to its upstream router, and the router, and its upstream router etc., will handle the routing from there.
Very broadly speaking, most data you get or send on the Internet goes in packets which take more than 10 but less than 20 hops from origin to destination.
Paying For Internet Service
- Internet service is like a basic utility
- Typically you pay a provider for your "upstream" service
- Say, $30 per month, for a 5 mbps (megabits per second) connection
- They in turn pay some of that money to their upstream
- Sadly, the internet service business in the US is not very competitive = high cost
- "Net Neutrality" - a good idea, avoid market manipulation by (few) internet providers
- If there were 10 providers competing to provide service, you would not need Net Neutrality legislation
Picture of Internet/Routers
Special "Local" IP Addresses
- Note that 10.x.x.x and 192.168.x.x addresses are special "local" IP addresses
- These addresses are not valid out on the internet at large
- These are translated to a real IP addr as a packet makes its way
- Frequently given out by Wi-Fi routers .. why I mention them
What Does it Mean to Be On the Internet?
- On the internet - e.g. connect to a Wi-Fi router
- 1. Computer connects to an upstream router to handle traffic. Most Wi-Fi access points combine Wi-Fi radios and a router.
- 2. The router typically gives the computer an IP address to use
- The computer cannot pick an arbitrary IP address, since the left part of the address depends on the location on the internet ... details known by the router
- Also, you don't want to pick an IP address in use by someone else, so the router gives you a known good one
- 3. DHCP "Dynamic Host Configuration Protocol" - automatically configure network settings to work locally. Computers very often use this feature to get needed network configuration from the router automatically.
So what does it mean for a computer to be on the internet? Typically it means the computer has established a connection with a router. The commonly used DHCP standard (Dynamic Host Configuration Protocol), facilitates connecting to a router; establishing a temporary connection, and the router gives your computer an IP address to use temporarily. Typically DHCP is used when you connect to a Wi-Fi access point.
Experiment: bring up the networking control panel of your computer. It should show what IP address you are currently using and the IP address of your router. You will probably see some text mentioning that DHCP is being used.
Here I use the "host" program to look up the IP addr of a domain name. You don't have to do this; I'm just demoing.
$ host codingbat.com # I type in a command here codingbat.com has address 184.108.40.206 codingbat.com mail is handled by 10 mx01.1and1.com. codingbat.com mail is handled by 10 mx00.1and1.com. $ host www.google.com www.google.com has address 220.127.116.11 www.google.com has IPv6 address 2607:f8b0:4007:808::2004
"Ping" is an old and very simple internet utility. Your computer sends a "ping" packet to any computer on the internet, and the computer responds with a "ping" reply (not all computers respond to ping). In this way, you can check if the other computer is functioning and if the network path between you and it works. As a verb, "ping" has now entered regular English usage, meaning a quick check-in with someone.
Experiment: Most computers have a ping utility, or you can try "ping" on the command line. Try pinging www.google.com or thneed.stanford.edu (18.104.22.168, on nick's desk). Try pinging poland.pl ... much farther away from Stanford.
Here I run the "ping" program for a few addresses, see what it reports
$ ping www.google.com # I type in a command here PING www.l.google.com (22.214.171.124): 56 data bytes 64 bytes from 126.96.36.199: icmp_seq=0 ttl=53 time=8.219 ms 64 bytes from 188.8.131.52: icmp_seq=1 ttl=53 time=5.657 ms 64 bytes from 184.108.40.206: icmp_seq=2 ttl=53 time=5.825 ms ^C # Type ctrl-C to exit --- www.l.google.com ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 5.657/6.567/8.219/1.170 ms $ ping thneed.stanford.edu PING thneed.stanford.edu (220.127.116.11): 56 data bytes 64 bytes from 18.104.22.168: icmp_seq=0 ttl=64 time=0.654 ms 64 bytes from 22.214.171.124: icmp_seq=1 ttl=64 time=0.760 ms 64 bytes from 126.96.36.199: icmp_seq=2 ttl=64 time=0.436 ms 64 bytes from 188.8.131.52: icmp_seq=3 ttl=64 time=0.468 ms 64 bytes from 184.108.40.206: icmp_seq=4 ttl=64 time=0.444 ms ^C --- thneed.stanford.edu ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.436/0.552/0.760/0.131 ms
- See series of hops
- First few hops in your IP neighborhood
- Farther away hops .. more milliseconds
- Note New York, London .. big jump in milliseconds
- Hops does not go up linearly with distance
- East bay - 30 miles from Stanford - 11 hops
- Russia - 10000 miles from Stanford - 19 hops
Traceroute is a program that will attempt to identify all the routers in between you and some other computer out on the internet - demonstrating the hop-hop-hop quality of the internet. Most computers have some sort of "traceroute" utility available if you want to try it yourself (not required). Some routers are visible to traceroute and some not, so it does not provide completely reliable output. However, it is a neat reflection of the hop-hop-hop quality of the internet. Here's an example traceroutes from my office, and then a randomly chosen computer with a Serbia (.rs) domain name.
$ traceroute -q 1 codingbat.com # typing a command to the computer traceroute to codingbat.com (220.127.116.11), 64 hops max, 52 byte packets 1 yoza-vlan70 (18.104.22.168) 2.039 ms 2 bbra-rtr-a (22.214.171.124) 0.932 ms 3 boundarya-rtr (172.20.4.2) 3.174 ms 4 dca-rtr (126.96.36.199) 27.085 ms 5 dc-svl-agg1--stanford-10ge.cenic.net (188.8.131.52) 2.485 ms 6 dc-oak-core1--svl-agg1-10ge.cenic.net (184.108.40.206) 3.262 ms 7 dc-paix-px1--oak-core1-ge.cenic.net (220.127.116.11) 4.046 ms 8 hurricane--paix-px1-ge.cenic.net (18.104.22.168) 14.252 ms 9 10gigabitethernet1-2.core1.fmt1.he.net (22.214.171.124) 9.117 ms 10 linode-llc.10gigabitethernet2-3.core1.fmt1.he.net (126.96.36.199) 4.975 ms 11 li229-70.members.linode.com (188.8.131.52) 4.761 ms $ traceroute -q 1 yujor.fon.bg.ac.rs traceroute to hostweb.fon.bg.ac.rs (184.108.40.206), 64 hops max, 52 byte packets 1 csmx-west-rtr.sunet (220.127.116.11) 32.802 ms 2 18.104.22.168 (22.214.171.124) 0.478 ms 3 dc-svl-agg1--stanford-10ge.cenic.net (126.96.36.199) 0.972 ms 4 dc-svl-core1--svl-agg1-10ge.cenic.net (188.8.131.52) 2.784 ms 5 hpr-svl-hpr2--svl-core1.cenic.net (184.108.40.206) 1.107 ms 6 lax-hpr2--svl-hpr2-10g-2.cenic.net (220.127.116.11) 13.880 ms 7 hpr-i2-newnet--lax-hpr.cenic.net (18.104.22.168) 9.213 ms # See the ms go way up here 8 et-1-0-0.111.rtr.hous.net.internet2.edu (22.214.171.124) 41.892 ms # houston 9 et-10-0-0.105.rtr.atla.net.internet2.edu (126.96.36.199) 65.663 ms # atlanta 10 et-9-0-0.104.rtr.wash.net.internet2.edu (188.8.131.52) 78.620 ms # DC 11 abilene-wash.mx1.fra.de.geant.net (184.108.40.206) 179.285 ms # jumped the Atlantic 12 ae0.mx1.pra.cz.geant.net (220.127.116.11) 179.336 ms 13 ae2.mx2.bra.sk.geant.net (18.104.22.168) 183.670 ms 14 ae0.mx1.bud.hu.geant.net (22.214.171.124) 199.815 ms 15 amres-gw.mx1.bud.hu.geant.net (126.96.36.199) 207.006 ms 16 amres-l-j-agg.rcub.bg.ac.rs (188.8.131.52) 193.146 ms 17 cisco3550-fon.rcub.bg.ac.rs (184.108.40.206) 193.536 ms 18 rcub-fon-gw4.rcub.bg.ac.rs (220.127.116.11) 193.758 ms 19 hostweb.fon.bg.ac.rs (18.104.22.168) 208.213 ms
The numbers down the left side are the number of "hops" to that machine. The "ms" figures are the number of milliseconds (1 ms = 1 thousandth of a second) it took for the send/reply. Notice that as the hops get further away, it does roughly take more milliseconds. The first few hops are Stanford addresses, then the route goes over some provider, until it arrives at Linode, which is the company that provides the hardware where codingbat.com currently lives. Small mystery: it seems like the first hop should be 22.214.171.124 which is the first router from my office; apparently that router is invisible to traceroute.
TCP/IP Summary Picture
- TCP/IP is a free and open standard for packet communication
- In the picture: laptop 126.96.36.199 sends a packet to 188.8.131.52.
- 1. The laptop sends the packet 1-hop, to the laptop's router
- 2. That router sends the packet to the next router (1-hop closer to its destination)
- 3. Each router sends the packet 1-hop, until it arrives at 184.108.40.206
- The traceroute output above is exactly this hop-path
- Essential features of TCP/IP: IP addresses, packets, each router sending the packet 1-hop closer
- Each router knows its local area, no router has the whole picture
- Typically sending data on the internet goes 10-20 hops
TCP/IP Standard is the Foundation
- TCP/IP, Free and open standards
- Provide foundation, any computer can send packets to any computer
- Other services are built on top of this:
- The Web
- Video Calling (e.g. Skype)