Intro to Internet

Home | Net Concepts pg1 | Net Concepts pg2 | Net Concepts pg3

Introduction to Internet Communications

As the growth of information systems has escalated geometrically in the 1990's, the increased reliance on networked communication systems, notably the Internet, has increased exponentially. The growth of such consumer-centric online services such as America Online, as well as localized Internet service providers (commonly referred to as "ISP's"), can be interpreted as proof that information and communication systems have finally spread into the home, as well.

With all of this expansive growth, it is easy to forget that a very low percentage of the individuals who rely upon electronic mail and other information services for their communication in this day and age, actually understand how Internet communications systems work. It is the purpose of these Web Pages to educate those users wishing to learn more about how the Internet, and specifically electronic mail and mail transport systems work.

Before getting further into how the Internet works, we should define what the Internet is, and define some essential terminology. This document will then progress further and further into how the Internet functions, more technical terminology will be used, and these terms would be defined upon their introduction. The terms highlighted as links are also listed in an online glossary-style reference.

The Internet was originally begun as an experimental communications network connecting terminals together. Funded by a grant in 1969 from the division of the Department of Defense known as the Advanced Research Projects Agency, the first version of the Internet, called ARPANET, was made up of research machines, the first of which were located at UCLA, Stanford Research Institute, UC/Berkeley, and the University of Utah. These machines were set up to speak to one another using a very basic set of communications rules. These rules collectively, made up the first computer communications network protocol, called NCP (Network Control Protocol), which was the first stage in developing the network protocol we still use to this day when communicating over the Internet.

Requests for Comments
By first developing ground rules for network address registration, and voluntary standards for machines and entities connecting to this new network (which was originally known as ARPANET), an important precedent was set: the precedent of voluntary compliance. Nothing is ever mandatory on the Internet; that is, there is no overwhelming universal rules which determine how your mail program must send its mail. There are, however, generally accepted standards that have come into being, often described in documents called Requests For Comments (or "RFC's", for short). If one expects their mail system to be able to seamlessly interface into the Internet, it is in their best interest to use a commonly used mailing standard, which more than likely means one which uses the Simple Mail Transfer Protocol (SMTP), which has been discussed in an RFC (the document known as RFC-821, to be specific). The more acquainted one becomes with the way the Internet works, the more familiar one becomes with the idea of RFCs as a collection of guiding documents, in which the most frequently desired information regarding the Internet and its workings are discussed.

RFC documents are created by individuals who see a need to have something related to the Internet formalized on paper. Such individuals write a relevant document, and send it to a person designated as a referee for such proposals. The document is then commented on by all individuals wishing to take part in the discussion, which is handled through electronic communications such as electronic mail and postings on newsgroups on the Internet. The document, usually after going through several revisions to reflect the suggestions of commenting parties, is either abandoned, or accepted to be generally a good idea and assigned an RFC number.

A Tale of Two NICs
It is important to note that a large majority of documents which have been classified as RFCs do not delineate networking standards; instead, most of these documents are written to describe networking concepts and proposals which have already been created, as well as to provide a base for discussion purposes. All of these documents are kept in publicly accessible areas on machines run by the organization responsible for archiving information for the operations of the Internet, known as the Network Information Center, or NIC.

The NIC is available to all users of the Internet, and provides most of its information by way of the Internet, but also through postal mail, as well as by way of telephone and facsimile communication. The current network address for the NIC is internic.net, and the NIC is reachable by way of the World Wide Web, at http://www.internic.net.

ARPANET, as created, allowed machines to communicate with one another by using special pieces of hardware, that recognize information addressed to the machine in which they are installed, and pass this information along to their host machine. These pieces of hardware, which have evolved into the modern Network Interface Card (another networking term with NIC for an acronym, indeed), recognize information addressed to their host machine by the use of an Internet Protocol (IP) address.

Locational Addressing and Address Classes
IP communications provide for locational addressing, which allows information to be passed through the Internet from a source machine to a destination machine without the source machine needing to provide explicit instructions on how to get to the destination. The researchers of ARPANET developed the Internet Protocol, or IP, which allowed the distributed smaller, localized networks wishing to communicate with one another to speak the same networking language.

Locational addressing means that by following a top-down hierarchy, a message can be sent to any location connected to the Internet, with very little necessary knowledge as to the location of the destination machine. These Internet protocol addresses ("IP addresses", in common use) are composed of four hexadecimal couplets, which are usually converted to decimal form. These digits are separated by periods, formulating an address such as "128.192.255.1". There are three methods of utilizing these addresses, which are identified as Class A, Class B, and Class C addressing schemes.

The rarest form of these addresses, Class A addresses, uses the first digit in the address to specify which institution is being addressed, the following two numbers to subdivide that institution's networks, and the final digit to specify an individual machine on a specific subdivision within that institution. This means that two machines with Class A addresses of 255.1.1.1 and 255.2.1.1 belong to the same institution, but are located in different logical parts of the network within that institution. It is also important to note that these addresses, while frequently mirroring the physical layout of a network, are strictly logical; that is, machines with network addresses of 100.200.1.2 and 128.128.5.75 could sit on the same desktop, but would be very distant from one another, logic-wise.

The first two parts in a Class B address designate the type of institution and its location on the Internet, while the third digit can be used by the institution in question to further subdivide its network. Class B networks are similar to Class A networks, in that they do provide a method to subdivision of the network (usually called a "subnet"), this time with one number, instead of two, dedicated to subdividing a network. The final digit in Class B network addresses, as in Class A addresses, is used to designate a specific machine within a specific subnet of that network. Finally, Class C addresses use the three first digits to determine which institution an address belongs to, and the final digit to determine which particular machine is being addressed. Class C addresses have no subnets.

These three IP address formats provide an interesting framework, in which there can be a few institutions with a very large number of subnetted machine addresses (Class A institutions can have over 16.5 million addresses per institution), a larger number of institutions which have a medium number of addresses (over 65,000 addresses within Class B institutions) and a very large number of institutions which have a low number of addressees (Class C institutions, with 255 possible addresses per institution).

It is a current concern that Class B addresses, which are the most in demand, are being depleted at an incredible rate; there may come a day when, indeed, all of the Class B network addresses are filled. Newer addressing methods which are backwards-compatible to the four-hexadecimal couplets (notably: Extended IP Addressing) are currently being proposed and discussed; it is assumed that an advanced addressing scheme will take the place of standard IP addressing within the next few years, before the exhaustion of available IP addresses occurs.

Naming The Network
It was soon discovered that while numeric addressing is easy for machines to handle, it becomes unwieldy for humans to work with. That is, if you were trying to communicate on the Internet, you would be hard pressed to remember the number assigned to every machine to which you wished to send mail, retrieve files from, or login to. Due to this rift between people and machines, the concept of domain name services (DNS) was developed. Every institution, or domain, connected to the Internet is required to have two domain name service machines. These machines and the domain they serve are registered with the Network Information Center. For example, the University of Georgia has two machines, named dns1.uga.edu and dns2.uga.edu, which provide name services for the uga.edu domain.

Machines within an Internet domain can have a unique name which is stored by the domain name servers, called a Fully-Qualified Domain Name (FQDN). This means that any outside machine which needs to contact a machine at the University of Georgia via the Internet, sends network messages to the University of Georgia's Domain Name Servers. The context of these messages is a request for a numeric address which the requesting machine needs, to match a Fully-Qualified Domain Name, which the requesting machine knows (by addressing an electronic mail message to someuser@somemachine.somedomain, the sender gives his mail-handling machine the Fully-Qualified Domain Name of the machine at which the receiver's account resides). One of the Domain Name Servers, upon receiving such a request, responds with the stored IP address for the desired machine.

This method of naming machines provides an added benefit: if a machine which provides crucial services (such as mail services) becomes disabled, another machine can be substituted in its stead, simply by changing the records which designate a particular machine's name to a certain IP address, and the name on the machine itself. Just imagine the chaos if a machine's name/address combination needed to be changed on every machine connected to the Internet which wished to communicate with any other machine!

To this point, we have discussed many of the basics of the Internet's infrastructure; that is, how addresses on the Internet allow information to move between machines connected to the Internet. What we have not discussed, however, is how the information which is being passed through the Internet is structured. For this, we need to delve a bit deeper into the definition of the Internet Protocol.

IP, TCP, and UDP in comparison
The Internet Protocol (IP) is the basic protocol of the Internet. There are two other protocols which are widely used on the Internet, namely the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP), but these two protocols are a subset of the IP protocol. That is, information being sent across the Internet which conforms to the TCP protocol, is sent along the Internet wrapped in the middle of an IP transmission. The practice, of sending data of one nature or protocol within a transmission of a different protocol is called encapsulation. An encapsulated protocol is sent within another protocol, just as a cardboard box can be shipped across the country while enclosed within another box.

The question poses itself, then: Who would possibly wish to send a cardboard box across country within another cardboard box? If you think about it, this is the way breakfast cereals make it to your grocery store. Packaging boxes, with their bright colors and marketing design, are sent to the store inside of larger boxes, whose design is strictly utilitarian, to provide safe transport. Sending the cereal boxes across country by themselves would prove unwise. Not only would the shipping require accounting for a much larger number of smaller packages, but the thin cereal boxes, designed for marketing appeal, are unsuitable shipping containers by themselves, and much of the cereal shipped in such a manner would arrive at your store in an unsaleable condition.

The Internet Protocol uses its scheme of IP addresses, as described earlier, to send information, broken down into manageable chunks called packets, across the Internet. Sometimes the packets being transmitted across the Internet run into dead-ends, cannot find a path to get to the machine for which they were addressed, or get stuck in a circle, being bounced among several machines. A number stored in the first part of an IP packet (the first part of an IP packet is known as the "IP Header") called the "Time To Live" value (TTL) gets decremented each time a packet gets sent from machine to machine, trying to find a path to its desired destination machine. When the TTL value hits zero, the packet is discarded, and is considered to be "lost". As each packet is transmitted individually, there is also no guarantee that two consecutive packets will take the same physical route through the Internet to reach their destination, or that these packets, upon arriving, will be in the same order they were transmitted in. This is the main failing of the IP protocol; there is no numbering of the data being sent to ensure that it can be reassembled easily. Once a piece of information leaves the transmitting computer, the sending machine washes its hands of the transmitted information. There is a benefit to this, however; as IP packets don't have to be concerned with the order of the data sent, or whether the data gets there or not, the speed of the IP protocol for transmitting data across the Internet is, all in all, fairly unfettered. In a perfect world, all data would be transmitted via the IP protocol. In this imperfect reality, however, more reliable transfers are necessary.

The need for data accountability and information ordering is where the Transmission Control Protocol (TCP) comes in. TCP uses ordering numbers to indicate which part of a particular transmission is contained within the current packet, and what order the packets should be assembled in. If a particular transmission, using TCP, arrives at the destination machine with four out of six packets, and in the order 1, 3, 5, 2, the destination machine properly arranges, and then holds onto these packets, while sending a message back to the origin of the transmission that packets four and six need to be resent. When all of the parts of a transmission are properly received by the destination machine, the parts are assembled, and the communication of that particular transmission is complete. By using the TCP protocol, computers can simulate, over an indirect and non-contiguous connection, a direct machine-to-machine connection. A TCP transmission uses a good deal of startup communication between machines, plus requires a good deal of extraneous overhead information to be transmitted just to keep the connection between the two machines going.

TCP also adds to the IP protocol the concept of port numbers. This is a crucial concept in this world of multiply-formatted information. By using port numbers, TCP identifies which service a particular transmission is intended for on the destination machine. Without port numbers, the destination machine wouldn't know whether the incoming information needed to be handled by the mail system on that machine, or whether it constitutes a request for a page on the web site run on that machine, and needs to be handled by the web server software. Ports are numbers ranging from 0 to 65,000, which allow transmissions to be sent directly to a particular piece of software which is 'listening' to the specified port. A port on a machine is usually specified by the IP address of the machine which the port is active on, followed by a colon, and the number of the port, such as 128.192.1.5:80. Port numbers under 1024 are considered to be "privileged" ports; only the person responsible for maintaining a machine has the ability to let programs use these ports. There is a security reason for this; if any user were allowed to set up a program to listen, for example, to the port used for incoming mail, they would be able to read the mail of every other user on that system.

The User Datagram Protocol (UDP) is a simpler protocol than TCP; it provides, in addition to the standard IP features, port numbers, and has an optional data-integrity feature called "checksumming". Checksumming allows a receiving machine to tell if the data it has received in a UDP packet is correct, or whether some errors occurred in the transmission of the information. Since UDP does not provide any means for telling the numerical order of multi-packet transmissions, it is best suited to small information transmissions which can be handled within the bounds of a single IP packet. Since there's only one packet for such small communications, any numerical ordering information would be wasted data space. Since the maximum size of an IP packet is quite small, UDP is rare in use; there are few services indeed which are capable of fitting all of their transmitted information, every time, within the bounds of one IP packet.

Any Port In A Storm: Communications Service Routing
We briefly mentioned port numbers in our discussion of the TCP and UDP protocols. The most common port numbers are port 25, which is used for mail communications, port 80, which is used for world wide web communications, and port 23, which is the "telnet" port. Telnet is a communication tool used to allow multiple users of a machine, called a "host", to connect from other machines, called "remote terminals", and perform work as if they were sitting right at the keyboard of the host machine. Many operating systems have this facility; the most widely-used is the UNIX class of operating systems (including SunOS/Solaris, Linux, AIX, and BSD), but other operating systems offer this functionality to some degree or another, including IBM's desktop PC operating system called OS/2, and Microsoft's Windows NT operating system.

A program run on machines which allow TCP communication, called an "inetd superserver", is responsible for watching all incoming TCP data traffic. This software does very little, indeed; it simply checks the port number that the incoming information is being addressed to, and passes the body of the transmission to a program which is running elsewhere on that particular machine, which was designed to handle a particular type of information transmission. This means that every program running on a machine, which handles TCP data, needs not be concerned with watching the incoming flow of data, checking to see if the data received is intended for that program, and handle the assembly of transmitted packets. Allowing one such "superserver" to route the information to the proper program saves not only programming, but can simplify the process of setting up multiple information systems on a single machine. The prime drawback of the inetd server is that if a machine becomes inordinately busy, as can often happen with popular servers, the response time of the inetd program can slow to a crawl as it becomes slowly overwhelmed. Only a slowdown in the amount of incoming network traffic long enough to allow inetd to catch back up to incoming requests will speed the response time back up.

A class of programs also exists, called "daemons", which help to solve some of this problem. Daemons sit quietly, watching a specific port, for TCP or UDP transmissions to come in addressed to that port. When a transmission arrives, the daemon starts, or "spawns", another copy of itself, to handle that incoming communication. This means that inetd doesn't even need to watch a particular port, and needs to act as a "traffic cop" for one less program. The response time for a daemon is also lower than it takes for a transmission to be properly forwarded by the inetd superserver, as data is handled directly as it comes in off of the Internet, into a specific port. Thus, the port-addressing ability of the TCP protocol can be used to provide not only more robust communications (higher data integrity communications are referred to as being "robust"), but faster responses to incoming requests, and faster handling of incoming information.

Conclusion
Since its inception in the early 1970's, the Internet has continued through times of unparalleled expansion, and unforeseen troubles. As networking equipment increases in its speed to send out data onto the Internet, and the bandwidth demand by an increasing number of users continues unabated, perhaps our largest challenge in the future will be handling the growing pains we are certain to face. As the network evolves, however, the standards for networking also evolve in kind. We cannot forecast where we will be ten, or even five, years from now, and if the past is any guide by which to chart the future, perhaps we should not even dare to project what the changing standards might involve, until we are more certain of what our network needs are. Certainly, more network addresses are needed, as are more-universal file transfer methods. It remains to be seen whether electronic mail will undergo changes in the future to allow users to specify items such as what text fonts their message should be displayed in, and whether the plain-text mail messaging of today will be replaced by a full-featured document management system, which takes us farther away from our old text-based network roots than we can imagine. Certainly, the world wide web has changed the outlook of information seekers on the Internet from the despair of having to work with dry, text-based interfaces, to working with information-rich graphic environments. It remains to be seen if MIME will have the same effect on mail services, or if some unforeseen mail protocol lurks on the horizon which will replace all that we know in a whirlwind of network redesign. This is perhaps our only legacy for the future of Internet networking: uncertainty of our destination, but certainty that the journey has already begun, well before we had even become aware that we were traveling.

Top