my recent reads..

Atomic Accidents: A History of Nuclear Meltdowns and Disasters; From the Ozark Mountains to Fukushima
Power Sources and Supplies: World Class Designs
Red Storm Rising
Locked On
Analog Circuits Cookbook
The Teeth Of The Tiger
Sharpe's Gold
Without Remorse
Practical Oscillator Handbook
Red Rabbit

Tuesday, May 22, 2007

Monitoring log files on Windows with Grid Control

The Oracle Grid Control agent for Windows (10.2.0.2) is missing the ability to monitor arbitrary log files. This was brought up recently in the OTN Forums. The problem seems to have been identified by Oracle earlier this year (Bug 6011228) with a fix coming in a future release.

So what to do in the meantime? Creating a user defined metric is one approach, but has its limitations.

I couldn't help thinking that the support already provided for log file monitoring in Linux must already be 80% of what's required to run under Windows. A little digging around confirmed that. What I'm going to share today is a little hack to enable log file monitoring for a Windows agent. First the disclaimers: the info here is purely from my own investigation; changes you make are probably unsupported; try it at your own risk; backup any files before you modify them etc etc!!

Now the correct way to get your log file monitoring working would be to request a backport of the fix from Oracle. But if you are brave enough to hack this yourself, read on...

First, let me describe the setup I'm testing with. I have a Windows 10.2.0.2 agent talking to a Linux 10.2.0.2 Management Server. Before you begin any customisation, make sure the standard agent is installed and operating correctly. Go to the host home page and click on the "Metric and Policy Settings" link - you should not see a "Log File Pattern Matched Line Count" metric listed (if you do, then you are using an installation that has already been fixed).

To get the log file monitoring working, there are basically 5 steps:

  1. In the Windows agent deployment, add a <Metric NAME="LogFileMonitoring" TYPE="TABLE"> element to $AGENT_HOME\sysman\admin\metadata\host.xml

  2. In the Windows agent deployment, add a <CollectionItem NAME="LogFileMonitoring"> element to $AGENT_HOME\sysman\admin\default_collection\host.xml

  3. Fix a bug in $AGENT_HOME\sysman\admin\scripts\parse-log1.pl

  4. Reload/restart the agent

  5. In the OEM console, configure a rule and test it


Once you have done that, you'll be able monitor log files like you can with agents running on other host operating systems, and see errors reported in Grid Control like this:


So let's quickly cover the configuration steps.

Configuring metadata\host.xml
Insert the following in $AGENT_HOME\sysman\admin\metadata\host.xml on the Windows host. NB: this is actually copied this from the corresponding host.xml file used in a Linux agent deployment.
<Metric NAME="LogFileMonitoring" TYPE="TABLE">
<ValidMidTierVersions START_VER="10.2.0.0.0" />
<ValidIf>
<CategoryProp NAME="OS" CHOICES="Windows"/>
</ValidIf>
<Display>
<Label NLSID="log_file_monitoring">Log File Monitoring</Label>
</Display>
<TableDescriptor>
<ColumnDescriptor NAME="log_file_name" TYPE="STRING" IS_KEY="TRUE">
<Display>
<Label NLSID="host_log_file_name">Log File Name</Label>
</Display>
</ColumnDescriptor>
<ColumnDescriptor NAME="log_file_match_pattern" TYPE="STRING" IS_KEY="TRUE">
<Display>
<Label NLSID="host_log_file_match_pattern">Match Pattern in Perl</Label>
</Display>
</ColumnDescriptor>
<ColumnDescriptor NAME="log_file_ignore_pattern" TYPE="STRING" IS_KEY="TRUE">
<Display>
<Label NLSID="host_log_file_ignore_pattern">Ignore Pattern in Perl</Label>
</Display>
</ColumnDescriptor>
<ColumnDescriptor NAME="timestamp" TYPE="STRING" RENDERABLE="FALSE" IS_KEY="TRUE">
<Display>
<Label NLSID="host_time_stamp">Time Stamp</Label>
</Display>
</ColumnDescriptor>
<ColumnDescriptor NAME="log_file_match_count" TYPE="NUMBER" IS_KEY="FALSE" STATELESS_ALERTS="TRUE">
<Display>
<Label NLSID="host_log_file_match_count">Log File Pattern Matched Line Count</Label>
</Display>
</ColumnDescriptor>
<ColumnDescriptor NAME="log_file_message" TYPE="STRING" IS_KEY="FALSE" IS_LONG_TEXT="TRUE">
<Display>
<Label NLSID="host_log_file_message">Log File Pattern Matched Content</Label>
</Display>
</ColumnDescriptor>
</TableDescriptor>
<QueryDescriptor FETCHLET_ID="OSLineToken">
<Property NAME="scriptsDir" SCOPE="SYSTEMGLOBAL">scriptsDir</Property>
<Property NAME="perlBin" SCOPE="SYSTEMGLOBAL">perlBin</Property>
<Property NAME="command" SCOPE="GLOBAL">%perlBin%/perl</Property>
<Property NAME="script" SCOPE="GLOBAL">%scriptsDir%/parse-log1.pl</Property>
<Property NAME="startsWith" SCOPE="GLOBAL">em_result=</Property>
<Property NAME="delimiter" SCOPE="GLOBAL">|</Property>
<Property NAME="ENVEM_TARGET_GUID" SCOPE="INSTANCE">GUID</Property>
<Property NAME="NEED_CONDITION_CONTEXT" SCOPE="GLOBAL">TRUE</Property>
<Property NAME="warningStartsWith" SCOPE="GLOBAL">em_warning=</Property>
</QueryDescriptor>
</Metric>

In the top TargetMetadata, also increment the META_VER attribute (in my case, changed from "3.0" to "3.1").

Configuring default_collection\host.xml
Insert the following in $AGENT_HOME\sysman\admin\default_collection\host.xml on the Windows host. NB: this is actually copied this from the corresponding host.xml file used in a Linux agent deployment.
<CollectionItem NAME="LogFileMonitoring">
<Schedule>
<IntervalSchedule INTERVAL="15" TIME_UNIT = "Min"/>
</Schedule>
<MetricColl NAME="LogFileMonitoring">
<Condition COLUMN_NAME="log_file_match_count"
WARNING="0" CRITICAL="NotDefined" OPERATOR="GT"
NO_CLEAR_ON_NULL="TRUE"
MESSAGE="%log_file_message%. %log_file_match_count% crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold."
MESSAGE_NLSID="host_log_file_match_count_cond" />
</MetricColl>
</CollectionItem>

A bug in parse-log1.pl?
This may not be an issue in your deployment, but in mine I discovered that the script had a minor issue due to an unguarded use of the Perl symlink function (a feature not supported on Windows of course).

The original code around line 796 in $AGENT_HOME\sysman\admin\scripts\parse-log1.pl was:
...
my $file2 = "$file1".".ln";
symlink $file1, $file2 if (! -e $file2);
return 0 if (! -e $file2);
my $signature2 = getSignature($file2);
...

This I changed to:
...
my $file2 = "$file1".".ln";
return 0 if (! eval { symlink("",""); 1 } );
symlink $file1, $file2 if (! -e $file2);
return 0 if (! -e $file2);
my $signature2 = getSignature($file2);
...

Reload/restart the agent
After you've made the changes, restart your agent using the windows "services" control panel or "emctl reload agent" from the command line. Check the management console to make sure agent uploads have resumed properly, and then you should be ready to configure and test log file monitoring.

1 comment:

Unknown said...

Hi,

I strongly appreciate your post........... High quality Hyper-V Servers with 100% dedicated resources.............

Thanks,
VPS Hosting