Web site sleuths
Monitoring software helps agencies keep tabs on Web sites and more
- By Maggie Biggs
- Mar 14, 2005
As federal agencies’ Web sites become a more integral part of government, information technology managers responsible for maintaining them might want to consider some of the latest monitoring software.
Aside from helping ensure proper adherence to agency-defined service-level agreements and adequate capacity planning, Web site-monitoring tools manage the various technologies that support the site.
Web sites today are more likely to be complex. They may have graphical user interface (GUI) components on one platform using one Web technology such as an Apache server running on Linux. The business logic that supports the Web site may be running on another platform such as IBM’s WebSphere middleware and the IBM AIX operating system. Similarly, the databases on the back end are likely varied. The databases may include Oracle, IBM DB2 and Microsoft SQL Server.
Rather than purchase tools from multiple vendors to monitor the varied technologies and platforms that constitute a Web site, users can buy solutions that help reduce downtime by quickly pinpointing any component that is having a problem.
Moreover, gathering statistics about Web site use and performance is also important for ensuring adequate capacity planning over time. For example, you might review the statistics gathered for three months and find that the business logic layer experiences regular morning performance slowdowns. This might lead you to add more application servers to your middle tier.
Or, you might notice during the three-month period the rate at which disk storage utilization is increasing over time. With this information, you can more easily project your storage requirements for a given budgetary cycle.
With these factors in mind, we recently examined three monitoring solutions: BMC Software’s Patrol Express, Empirix’s OneSight and Nimsoft’s Nimbus. We found all three to be well matched in terms of what they can monitor, the depth of their reporting, ease of administration and integration with service-level agreements.
The three solutions differ in terms of price and in the ways they monitor, using an agent software-based approach, an agentless approach or both.
You might consider an agent-based approach when you need to collect detailed information about a particular platform or technology. However, agentless approaches require less setup.
Although all three solutions can monitor across a wide array of infrastructure components, the products themselves are based on the Windows platform.
Therefore, if you want to deploy monitoring tools on another platform, you’ll need to explore the nearly 60 monitoring solutions available on the market.
BMC’s Patrol Express is a Web portal-based solution that can be installed inside the agency or subscribed to as an external service. For our evaluation, we installed the product locally as opposed to test-driving the service.
The installation of the BMC solution took a bit longer than that of the other solutions we tested, but it was not problematic.
Within BMC’s portal, called the Service Integration Portal, administrators configure one or more Remote Service Monitors (RSM). The function of the RSM is to pass data from enterprise assets, such as the Web server being monitored, back to the BMC portal. In addition, the RSMs forward alert and warning information back to the portal.
There are two types of RSMs: dedicated and shared. We used the browser-based interface to click through and quickly create dedicated and shared RSMs. You might use a dedicated RSM to monitor a single Web site, for example. By contrast, you might use a shared RSM to watch several internal Web applications.
Depending on the user’s role, she or he can get authorization to see one or more of the RSMs. This can be particularly useful if the owner of a group of applications wants to monitor only those applications.
Once we installed the RSMs in the test environment, we used the Add Element option to select the items we wanted to monitor. We added monitors to track individual URLs and multiple Web pages, and specified our service-level expectations, such as response time. We also added some infrastructure monitors to keep tabs on our server resources.
After adding the items to monitor, we generated traffic against the Web sites and began to monitor for alerts. We were able to drill down into the events and see details of when they were resolved and if alerts were sent.
Next, we switched to the Service Measures tab to view overall trends and reporting. We were able to examine trend data, such as the percentage of time with no critical alarms and how often the service-level goals had been reached.
BMC offers several options for reporting. Report data can be e-mailed or exported in comma-delimited or text formats. After exporting some reports, we were able to easily import the data into a set of spreadsheets.
As with the other two solutions, BMC supports role-based access to the portal. An administrator with full access can interact with any portion of the product while other, more limited roles, such as read-only, are also available. Users can request an account via a form that can be acted on by the administrator. Larger agencies will most likely want to use the subscription approach to save time.
Like its competitors, Empirix’s OneSight is a browser-based solution. It installed easily and without incident during our tests.
Like Patrol Express, OneSight supports monitoring across a wide array of agency assets. OneSight makes use of three types of monitors: dedicated, profile and remote.
The dedicated monitors watch specific front-end components. For example, you might have a monitor ping a front-end component and report the results. You can also check file sizes, event logs and CPU processes.
Profile monitors gather statistics on various back-end resources including databases and servers.
Finally, remote agents are optional and you can install them on a variety of server platforms. They are useful if you want to capture remote logging on a specific machine or if you want to execute a script in response to an alert.
A data-collector module gathers the metrics from the various types of monitors and forwards the information to the OneSight database.
As with the BMC product, the Empirix solution uses monitor groups to organize various items you may wish to monitor at your agency. For example,
we created two monitoring groups: one to watch intranet applications and a second to monitor some public-facing Web sites.
We created them easily by clicking on Add Group under the Configure portion of the interface. After defining the groups, we began to add different monitors within each group.
We deployed monitors to check on the Web GUI layer, the business logic layer — BEA Systems’ WebLogic — and the database layer — Oracle — of our intranet applications. We also added similar monitoring for our Internet-based Web sites. And we used ping tests and a remote agent on one of our Sun Microsystems Solaris machines to capture remote logging.
For each of the monitors we set up, we clicked on the alerts tab and specified what action to take when a specific event occurred. For example, we added an action to send an e-mail message to a pager should a ping test not receive a response. The OneSight documentation was detailed, which made it easy to know what parameters needed to be entered for each type of monitor.
Clicking the SLA status link, we were able to add and modify several service-level objectives. For example, we set thresholds for the Web GUI, business logic and database layers of our intranet sites to adhere to a subsecond response time for each one.
As with Patrol Express, we generated network traffic across several Web sites to stress the test infrastructure and generate alerts. As the load was running, we switched to the status view to see what was going on.
In the status view, we could see red, yellow and green icons that represented the current status of each monitor during a given interval. For one of our monitors, we specified that an alert should be generated only if a ping test failed three times in a 15-minute interval. We took down a Web interface for more than 15 minutes, and OneSight caught the error and alerted us correctly.
OneSight’s reporting capabilities are first-rate. A large number of reports, such as availability and adherence to service-level agreements, were available via the Reports link. In addition, authorized users can create customized reports by filtering the report data.
Reports can be executed ad hoc or on a scheduled basis. We were able to create reports for our various applications and e-mail them as an attachment to a group of test users.
Administrators will find it easy to manage users and groups of users. The Configure utility makes it easy to create monitoring groups beyond the default group that comes with OneSight, and user accounts can be set up with varying degrees of authority.
The Nimbus solution, which is also Web-based, uses a messaging bus as a conduit to capture runtime metrics for all assets being monitored. Captured metrics are stored in the database and surface through dashboards and reports.
Nimbus uses robots and probes to gather data. A robot must be installed on a given server platform before any probes or monitors can be deployed. An administrator can install the robot directly on a given server or set it up on various servers using an enterprise deployment tool.
Once the robots are in place, the probes can be defined through the administrative console. Nimbus supports a variety of monitors, including those for Web servers, application servers, databases and server health, which looks at the CPU, memory and disk.
We had no trouble installing the Nimsoft solution. Afterward, we clicked on the Nimbus server icon, which launched a Web browser. We then clicked on the client installation icon, which opened a second browser window displaying links to install the three client components.
The first client component is Nimbus Manager, which is the solution’s administrative client. Nimbus Manager is used to create user accounts, configure monitoring and alert thresholds, and manage Nimbus itself.
The second client interface is the Enterprise Console, which is used to view the status of the monitored environment and any alarms that might be generated. You can also build custom dashboards using the Enterprise Console.
The third client component is the Service Level Manager. It is used to define service-level agreement parameters as needed to support agency requirements.
Once we installed the components, we deployed robots to each of the server platforms within the test environment. Then, we launched Nimbus Manager and began defining users and monitors. Within Nimbus Manager, we defined an overarching domain for our monitoring and then two hubs — one to monitor intranet sites and a second to monitor public-facing Web sites. We then dragged and dropped probes from the archive into our new monitoring hubs.
For each probe we added to a hub, we defined thresholds and set parameters, as required by the type of probe. For example, we set up probes to monitor our server’s CPU, memory and disk usage, and we specified that CPU usage should not go higher than 70 percent.
We continued dragging and dropping monitors into our hubs until we accounted for all our test assets. We then moved to the Service Level Manager to begin defining our SLA parameters.
To define our SLAs, we supplied the hours of operation, named the agreements and provided the compliance percentage. We then used the tabs within the defined agreements to set the quality-of-service parameters, alert notifications and hours when the service level would be excluded from monitoring.
Within the Service Level Manager, authorized users can also view service-level reporting to see how monitored assets are performing. This type of reporting is especially useful for identifying problematic areas within the enterprise and for executing capacity planning.
Moving to the Enterprise Console, we clicked on the design icon within the interface and started designing two dashboards — one for internal monitoring and a second for external monitoring. We quickly dragged and dropped monitors onto the dashboard and set properties, alert notification settings, a background and more.
After exiting the design mode, we again generated network traffic against our Web sites. We then viewed the results in the two dashboards we created. We again took a server off-line on purpose during one of our simulations, and Nimbus correctly identified the problem and issued an alert.
As with the other two solutions, we had no difficulties in executing any of the administrative tasks with Nimbus, such as setting up users and privileges.
Each of the solutions we tested is up to the task of monitoring agency assets — including Web sites — in a mixed-platform, multitechnology setting. They all offer strong features, solid execution of monitoring, useful reporting and ease of use.
The only major differences among them are the price and the methodology used to gather data. Your agency should conduct a proof-of-concept project of any or all these solutions if administrators decide to take a proactive stance on monitoring.
Biggs is a senior engineer and freelance technical writer based in Northern California.