Digging for Web data

FCW's DotGov Thursday column looks at why agencies need to mine their Web sites for data and the tools available for doing so

My hometown was built on the wealth of what was taken out. In the early days, the town, the bank and the stores owed their existence to coal. With the benefit of ever-improving machinery, miners dug great quantities of the coal beneath the hills of West Virginia. It was a tough, dangerous, dirty job, but the miners were among the best-paid workers. The lesson to learn is that only by the hard work of extracting the coal, was it possible to fuel the steel mills in Pittsburgh and to keep the trains going that delivered the manufactured goods.

Data and information are the new energy sources. In the world of Web sites, it is important to locate and use data to make informed decisions about design, marketing, long-term strategies and information technology purchases. Fortunately, without even getting your fingernails dirty, you can dig for a treasure trove of information from the comfort of your ergonomically correct chair.

One way to dig for data about a Web site is to use data mining, which is the process of finding patterns. Patterns are hidden in extremely large tables. Depending on your needs, finding the patterns could involve artificial intelligence or statistical analysis software.

To a lesser extent, data mining also makes use of Web site analysis tools. Such tools are limited to providing some facts about the visitors the site and some data about their activities. However, full-scale data mining will provide insight about trends of the visitors to your site.

New to data mining? It can be a daunting task. For a start, you may want to answer such questions as:

    * What browsers do users have? What versions do they use?

    * How many unique visitors did our site have this month? How long did they stay?

    * Who is referring to us? What search words did they use to find the Web site?

    * What are the top 10 downloaded documents? What are the least requested files?

    * Were there any problems encountered out of the ordinary? Do we have a lot of 404 errors?

Simple tools may be best when just starting out. Before jumping into data mining, try analyzing the Web site data in the log files. This assumes that you have graduated beyond using a hit counter, and that your site employs some sort of Web analysis tool, such as WebTrends Corp. software.

Using Web site analysis tools can help with marketing and design strategies as well as help advise your Web design team. The team needs to know if links are dead ends and be able to answer the above questions. By knowing if users are encountering an overabundance of problems (via miscellaneous error messages), the Web team will be able to distinguish between activity because of miscoded links and activity that possibly is a sign of hacking. The informed designer also will know if users' paths through a site are more convoluted than need be.

Knowledge can arise from the careful study of your log files' data. The information can then be used to map a data mining strategic plan for more sophisticated activities.

Look to all the individuals (in a small organization) or to departmental representatives (in larger agencies) to comment on the strategic plan and to ensure that it meshes with the overall mission as well as other existing business or marketing plans. Doing so will ensure that all stakeholders are accounted for in the plan and that the most can be achieved with the limited resources. It can help you decide if log analysis suits your needs or if you need to invest in data mining software.

The more customer-driven your site (that is, if it performs e-commerce or delivers e-services), the more likely you will want to do a comprehensive analysis of your Web server logs and then transition into studying customer transactions and trends.

— Tang, a member of the Federal Web Business Council, is an associate in the Information Technology Group at Caliber Associates, Fairfax, Va.

NEXT STORY: GSA rethinks FIDNet solution