Let's get back to basics
My original article, "The Internet - What Is It?" (July
1994 issue of Digital Rag) dealt with things like the hardware you needed,
what a BBS (bulletin board system) was, free-nets, and costs. We'll deal
with some of that in future articles, but I'm going to assume for now that
the reader is already connected - otherwise how are you reading this eh?
The problem many people have today is that they really don't understand
just exactly what they are connected to and why it works the way it does
(or doesn't work the way they think it should). Herein is a little bit of
light shed upon the topic. Of necessity it might appear a bit technical -
but then so is figuring out where to put the gas in a car, and which is
the gas and brake pedals. The point is that understanding a very few items
can make many things more obvious and easy to deal with later.
The Internet is a network of networks.
All of the various smaller networks use a technical standard agreed
upon by a technical committee called the Internet
Engineering Task Force or IETF for short. This standard is called
TCP/IP - Transmission Control Protocol/Internet Protocol. The 3 main
concepts of TCP/IP that the neophyte should understand are Addresses,
Names and Services. The one thing most are familiar with is the URL which
ties all of these together but actually confuses people about what is
really going on. There are certainly lots of other things to learn, but
only these three things are necessary to grasp what is really going on.
Addresses
Each computer directly connected to the Internet must have a unique
numeric address. The address is usually expressed as a series of 4 numbers
separated by dots such as: 24.113.126.213
These numbers are not randomly assigned. There is a registration
body that hands out blocks of numbers to ISPs all over the world, who
in turn hand out smaller blocks or even individual addresses to their
customers, one of whom is you. If you are running on a Windows based
system you can look at the number you are assigned by running the c:\windows\winipcfg.exe
program on your machine. If you are running a Linux or other Unix type box
you will see it if you type ifconfig -a a the command line prompt.
If your system is connected to a network in your home or business it
actually may not have a truly unique address - but instead will have one
that is unique in the local network only. Chances are that it will be
hidden behind a firewall which will hide its non-unique address from the
rest of the Internet behind one or more assigned addresses that are
unique. See Firewalls
for more discussion on this.
Your ISP assigns an address to your system each time you connect to
them if you dial periodically. If you are "permanently"
connected via a lease line, ADSL connection, or cable modem, you may have
been assigned a permanent or "static" address when you were
first connected. If your business has many computers, it may in fact have
been given a block of addresses which your system administrator doles out
one at a time to each workstation and server in your facility.
Regardless of how you end up with the number though, understand that it
(or at least the one that is on the gateway to the rest of the world) is
unique. Any other computer on the Internet can talk to yours simply by
using its numeric address - kind of like a phone number.
Names
Because humans don't deal all that well with long strings of numbers, a
facility to allow text names to refer to the numbers was created. The
Domain Name System (DNS) maps names such as www.pacdat.net
to specific machines so that we humans don't have to keep track of the
numbers. It also allows some flexibility in that it allows more than one
name for a particular computer, and in some instances allows more than one
computer to answer to a given name.
Note that domain names are case insensitive. This means that Pacdat.net
is the same as PACDAT.NET
While there is currently a lot of discussion in various places as to
the evolution of the Internet naming scheme, you will probably recognize
the fact that most computer names end in only a small number of suffixes:
"Dot COM", "Dot NET", "Dot ORG", "Dot
EDU", "Dot MIL", and "Dot GOV". Outside of the
U.S. we can add a 2 letter suffix that is unique to our country. Here in
Canada, ours is "Dot CA"
The "Dot" is actually written as a period which separates the
"second level name" from the "top level name" to
produce for example: pacdat.net
Each "Dot" in a name adds a level to the name, and each level
may be assigned to be administered by a different individual or company.
The top level domains are administered by what once was a U.S. government
agency and has recently been changed to a commercial contract currently
held by Network Solutions Inc.
If you go to the Network
Solutions site you can use their "whois" service to look up
and see who owns a particular domain. If you look up pacdat.net you will
see information about me and my company.
Here in Canada the "Dot CA" domain is administered currently
by www.cdnnet.ca
There is a lot of controversy over the "ownership" of domain
names, including trademark and such. Each registry of Top Level Domains
(TLDs) has its own policies regarding this. The TLD registry for the .COM,
.ORG, .NET and other original TLDs is in the U.S. and goes by U.S. laws,
even though many of the owners of second level names in these TLDs are in
fact outside of the U.S.
The CADomain people have their own set of policies on who can have what
names. The prime example currently is the denial of a second level name
(such as YOURNAME.CA) to any but federally registered companies or
companies with offices in more than one province, regardless of whether
such a name is otherwise not used. If you have a company incorporated in
BC with only a BC address, you may register only a third level name such
as "YOURNAME.BC.CA". Another policy example is the limit of only
one name to each business with the exception of a second one that is for
the French name of the same business. This is supposed to change under a new
registration body soon.
A second level name (xyz.com) may be the only entity for a domain. The
DNS (name to number) translation system may only have one address to
translate it to, and that may be the end of it. This is the simplest, but
also the least likely scenario. Usually there are one or more third level
domain names, and sometimes fourth and fifth and deeper. These are used
for example to describe other computers the company might own (desktop.xyz.com)
, or to describe a computer dedicated to a particular service (mail.xyz.com
or ftp.xyz.com). The most recognizable of these is WWW.xyz.com since the
www is the most typical third level name for the computer that hosts a
company's web pages.
The DNS name for my computer as given to me by Rogers Wave is cr554487-a.poco1.bc.wave.home.com
Note that I also have it named as pacdat.pacdat.net - both names are
valid, and both point to the same IP address (currently 24.113.126.213 as
I write this). The difference is only in
who has authority in a particular name space. In fact, if you have control
of a domain and DNS server, you could give my computer a completely
different name (e.g. richard.yourdomain.ca) and refer to it that way if
you want.
There is another aspect to the DNS system that maps numbers back to
names. Note that this is always a one to one map, and is referred to as
the "Reverse DNS" name. This is governed by the organization
that has control of the IP number block. In my case, Rogers maps my IP
number (24.113.126.213) back to cr554487-a.poco1.bc.wave.home.com - my
"real" DNS name. It is worth noting here that many ISPs will not
allow a computer that doesn't have a reverse DNS entry set up for it to
talk to their systems or send E-mail. This has mainly to do with security
and their ability to determine who controls a computer that might attempt
to do something they don't like.
One thing that some people get confused over; a domain name is not
exactly the same as a URL. See the section below on Tying it all
together.
Services
The TCP/IP standard adds an extra number to the IP address described
above. These extra address numbers are used to select what type of service
the computer addressed by the IP address is to perform. There are a number
of standard service numbers but the most recognizable one for most people
these days is service 80 - the one for Hypertext Transport Protocol
(HTTP); in other words, the Web Server.
Other well known services include Simple Mail Transport Protocol (SMTP)
for E-mail which is 25, and File Transfer Protocol (FTP) which is complex
enough to use two ports, 20 and 21.
Any computer may handle any number of services for the address it has.
My local server handles most of the main services for my home LAN but
there are a couple that it hands out to another computer to work on. One
is service 8080 which points to my camera server.
Tying it all together
The Universal Resource Locator (URL) ties it all together.
HTTP://WWW.PACDAT.NET/progressions.htm
A URL describes not only a computer, but a service on that computer
(see previous section) and possibly even a very specific piece of
information on that computer.
The URL is made up of a number of pieces and is worthy of a bit of
discussion here. The 3 most common pieces that have been codified into the URL
include:
- the service name followed by a colon
- a double slash followed by the name of a computer to indicate that
the service is somewhere out on the network
- a single slash followed by the file name on the computer or a string
of commands which will bring up the desired information or page.
There are defaults for each of these if they are left out, and the
creators of some Internet browsers have taken it upon themselves to put in
place some extra assistance that might or might not make it easier to find
things.
The most typical service is HTTP, or "The World Wide Web. If you don't put the
service name into most browsers it will default to HTTP.
The next piece is the name of the computer. Usually this is out
on the network somewhere, so it is preceded by what has become the
convention for "out on the network somewhere" - the double
slash.
After the double slash is either a domain name or a computer numeric
address. If you put in a domain name that doesn't begin with the
conventional WWW, some browsers will first try for a computer without the
WWW, and then try the name with WWW at the beginning. The WWW itself does
not mean anything - it is simply the conventional name for the computer
that hosts a company's web site.
Some browser creators are also adding a second lookup facility to take
typical key words and find a domain name from a search engine. This is
nothing to do with an Internet standard and is not yet wide spread.
If there is no computer name at all, the URL points at the local
computer.
Some times you will see a second colon and a number after a computer
name. This is a case where the service number is different from the
standard one. e.g. http://www.pacdat.net:8080/test.htm
The final item is the explicit piece of data on the computer that is
wanted, the file name or other command. This is the only part of the URL
that might be case sensitive. While MS Windows computer file names are not
generally case sensitive, other operating such as Unix and Linux have
files whose names are case sensitive. You should assume that
anything after the computer domain name is case sensitive.
If no file name is given on the end of the URL, then the server may
provide a default file, typically called index.html or index.htm. If there
is no such file the server will either simply show a list of files, or
show an error page. The same is true if a directory name but no actual
file name is given.
So, to summarize:
- http://www.pacdat.net/progressions.htm
- Goes to the computer called www.pacdat.net
Uses the HTTP service when it gets there
To ask for the file progressions.htm
- ftp://www.pacdat.net/pub/
- Will ask the www.pacdat.net computer to give a default file using
the FTP service. Typically this will be a list of the available
files. Note that even though the domain name starts with www, the
computer uses the service (FTP in this case) specified before the
colon in the URL.
- www.pacdat.net
- Your browser will put the http:// in front
The server at www.pacdat.net will give you the default file named
index.html
- pacdat.net/directory/
- your browser will first try to see if there is a computer with the
URL http://pacdat.net/directory that will answer the HTTP
service and deliver a default file from the directory. If not, then
some recent browsers will try http://www.pacdat.net/directory/
Only if that fails will they tell you that nothing can be found.
Next month we deal with the most widely used service, and in fact the
oldest; and no, it isn't "the Web"; it is E-mail
Let
me know if this or other articles help you.
richard