|
|
|
The logs can even track what site you visited before you come to the one you're viewing. This can include what search criteria you used at your favourite search engine. All of this can be analyzed and served up as statistics in aggregate or even individual by individual (although that's not typical on a busy site - just too much detail). We do this for David's own site www.centa.com so that we can judge what are the "hot" topics as time goes by. Of course we don't know who you are unless you've actually subscribed to the mail-list.
As you can see in the box above, lots of interesting things can be read from the logs - and as you can also see - even on the old AMD 850 this site (along with several hundred more, some of which are MUCH larger) is hosted on only took less than a second to produce the report; a report that runs to about 129k of text plus graphs for this one day - you're only seeing one piece of one section. The same report is done as a monthly and yearly aggregate too. We don't track individual users' path through the site and we use "Open Source" log analysis software so the report is pretty basic. You can bet that the major sites collect far more data and do a far better job of analyzing it. Note that even after this analysis is done the original log lines are still available for further analysis if needed. The lines for this year for the CEN-TA site total to about 44 Megabytes of compressed files. Even our largest site which gets over a million file views a day runs to only about 12 Gigabytes for the year. With disk space at about $1/Gig these days, storing them online is trivial. The point is that the technology to track literally everything you do when sitting in front of your computer and interacting with it and the Internet's Web is available, and not all that expensive. Even at the best, you leave tracks in various computers as you browse; mostly "anonymous" but valuable none the less. Taking Away the Mask of AnonymityWhat David first asked me about - whether or not I'd seen a picture from the web page he'd sent me - is all about unmasking your anonymity. Much of what I've detailed in the previous section can only tell what computer address you were at when you looked at the pages. For most people this changes each day or so, so there is no real correlation to a person (I have a fixed IP address which adds spice to the problem as I'll tell you about below.) In some cases this unmasking is subtle. In others it is blatant. In Canada after January 1, 2004 it had better be "by the book" or somebody could be in trouble; at least somebody other than you, the page viewer. Of course my opinion is that you're potentially in trouble no matter what you do. I don't mean to sound completely paranoid, I'm not. On the other hand, maybe I (and you) should be. The number of incidents of identity theft and fraud is growing. So too is the number of online scams, spam e-mails, bogus web sites and what have you. They're not yet at the point where I'd call them a real epidemic - at least not for people who know there is no Easter Bunny, Santa Clause, 80% return on investment in a year or $200,000 bonus for getting "my" millions out of Uganda or wherever; in other words for people who have even a modicum of skepticism and common sense. All that is needed is a bit of education on what to watch out for - the subject of this article. Web bugsThe original reason David asked me to write this article is an example of a "web bug" - a unique URL that is embedded in a message sent to you in some fashion that, when you view the message, confirms that you have done so. The page David sent me (or caused the web site to send me as if it were from David) was done up in HTML and included a couple of unique image URLs, one of which ended with "__tn_pers2790347040.jpg?BCmegAABvemnfj9H" If my browser had been set as most of yours are set, the first time this message appeared in my preview pane or was opened by me, the image would have been loaded from the sending website - leaving behind a log record including the full URL. Note that after the image's name (__tn_pers2790347040.jpg) there is a trailing "?" and something (BCmegAABvemnfj9H) that appears to be garbage characters. In fact, the garbage is a unique key to a record in a database that includes the fact that the page was mailed to both me and David, including the time it was sent, and probably linking to all the things that David had done in the session leading up to his sending it. In this case the bug was attached to a "real" picture. In some cases it is as little as a single pixel (picture element - dot on the screen) so loads "instantly" and doesn't show you anything - but it's log record exists in the server none the less. Freaky, eh? And you thought you'd turned off "acknowledge reply request" (which causes an automatic reply e-mail to be sent which tells the original sender that you've read their message, but which some mail agents don't support well and most people outside of specific companies refuse to have turned on for privacy reasons if for no other reason than to deter the spammers) We know you've seen our mail! And in some cases (Windows specifically) because it is actually the main browser engine that interprets the HTML and retrieves the graphic, the sending site has the opportunity to send your computer a "cookie" that continues to identify you if you should again visit the site with your normal browser, even months in the future. CookiesWhen you're just web browsing, one of the ways a web site tracks you as distinct from some other viewer, for a few minutes or forever, is by sending your web browser a unique series of characters (somewhat like the web bug above) that your browser stores for some time, possibly permanently. This "cookie" concept is valuable to you the viewer in some cases - such as when you're working with a web site you've had to log onto with a user ID and password. If it were not for cookies, the otherwise simplistic design of the Hypertext Transport Protocol would mean you would have to re-logon for each page you wanted to view on the site - not something most would put up with. The problem is that this viewer-helping web extension also can help the web site keep track of you and your travels through the site (or even across sites). Unless you have told your web browser not to store cookies (see Pushing Back) a web site can deposit a cookie on your computer and later check to see if it is there. The cookie can contain either direct data or a key (like the one above on the image tag) that can be used to pull a record from a database and add more detail to it. At minimum, the cookie can be used to track which pages you've visited, in what order and for how long during the current viewing session with the site. In extreme cases, the cookie can allow the system to track your use of any web site that uses a common information database (and there are many such agglomerated site systems) and tie the information into answers you might give to seemingly innocuous "surveys" and questionnaires (see Verifications below) as well as purchases - eventually building up a wealth of data on your personal and financial life. In some cases enough is learned that the web site can tie their information to your credit record (even if you don't give them a credit card number or your SIN/SSN.) One thing to note with this and many of the other methods used by legitimate companies to collect information on you; it is not looked at personally by anyone except in very extreme cases. The data is massaged and manipulated by programs which today bear a striking resemblance to Artificial Intelligence - with the goal of presenting you with advertising and offers as well as information that the system thinks is most likely to keep you coming back and hopefully to get you to part with some of your hard-earned cash - sell you things and services. VerificationsI subscribe to a number of "free" magazines. Even though I've been around computers and the Internet for longer than most people my age, I still like to read from paper - a habit I'm working on breaking by adding screen real estate to my system, but which seems to be a losing battle as my eyesight deteriorates with age. For the techies out there, I run my main system with two 19" monitors, each running at 1600x1400 - problem is I have the font sizes cranked up to the point where I might just as well be running them at 800x600 when I'm actually reading. Anyway, back to the free magazines. Every year or so, each of the magazines sends me a special issue wrapped in a verification questionnaire. Prior to the Internet I'd fill these in and either snail-mail them back or fax them back. Today however, all of them have fill-in web forms for this purpose; should be easier, right? Well, yes it is easier. The problem is that the magazines get their advertising dollars based upon audited subscription statistics so they can't just print up thousands of copies and send them out to random people; they have to know that you "qualify" and are a real person. With the forms they send, there is a spot for a signature. Unfortunately, there is no way of signing a web fill-in form (at least not one they will accept) so the auditors (or the magazines' programmers maybe) came up with the concept of a "verification question" - something that is of a relatively personal nature that a random person probably would not know about you - kind of like asking your mother's maiden name when talking to the government about your passport or driver's license. (I have issues with this too but that's for another time) The problem is that it seems that many/most of the magazines I get either have the same software for their questionnaires or use the same service provider to manage their subscriptions. Some of them even send me to the same web site but different sub-directory, although most have something under their own web name. The curious thing is that all of these magazines have a similar set of questions they ask for "verification purposes". The questions seem to change every time I renew for a particular magazine but over all of them the questions in total remain fairly static:
Notice anything? Each of the questions in itself doesn't give any particularly private information, but all of them in total do - and these are just a sampling of the ones I get. I know for a fact that at least 5 of the magazines I get are from the same publisher - they cross advertise and the web site is the same for the renewals; yet each asks a different question each year so the total of the information they can gather is large. Of course I caught onto this years ago and have instituted my own "Privacy Policy" which I'll tell you about in another section. In general I have a set of answers that I use consistently but which are not even close to the "truth". Surveys, Questionnaires and Stealth QuestionsSeveral of the web sites I visit regularly have "informal polls", questionnaires, and other information gathering means. The magazine sites in the previous section all ask information about the kinds of business I do, including dollar volumes, projections, etc. In their case, this is to allow them to decide if I "qualify" as someone they want to send their "free" magazine to. At least the magazine publishers are fairly up front about it; other sites are not. If you do any major browsing on the Web I'm sure you've come across sites that ask you questions in order to gain access to some of their areas. The questions can include personal information, even if cloaked as a range of values (Age: 18-25, 26-35, ...) but over time the accuracy of the data can be alarmingly precise. If you are asked the same question but with slightly different ranges the computer can narrow down the exact answer by detecting when you move from one range to another; (18-25, 19-30, 24-36, 26-35 - if you are 25 you'll end up in the first, second, third but not fourth) The fact that you choose a particular button to go to the next page can be informative; [English] [French] being one of the most common in Canada. In fact, your choice of click-through advertising is probably kept along with the rest of your profile. Did you click on the ad for music videos or tools? The next time you're presented with a couple of ads they may be specifically placed to determine your preference in tool or music artist, depending upon which you chose first. You should also know that the same things apply to the information you fill into the software registration forms on your computer when you add something new. You're asked similar things each time you get an upgrade in some cases and of course when the inevitable happens and you have to re-install everything again. Against all of these techniques, what can you do? You want to use the services, and in many cases don't mind that they are going to try to sell you things. You just don't want to give away enough that "they" can be more than minorly annoying if you can possibly help it. On the other hand, you also don't want to get caught by the criminal side of the computer revolution either. Information you might actually be comfortable with giving to a company you know and trust might be just the thing an identity thief needs to get a new credit card issued with your name on it. Somewhere you and the businesses and sites you deal with have to strike a balance that both can be comfortable with. The problem is that the guys at the other end of your Internet connection have all the tools and databases.
|
|
|