In this post I’m discussing about the possibilities SCOM provides with event detection monitoring using monitors.
I’ve written a similar blog for creating services, which you can see here:
SCOM BASIC SERVICE MONITOR VS. WINDOWS SERVICE TEMPLATE
Alright, so just go to Authoring -> Expand Management Pack Objects -> Monitors -> Create a Monitor -> Unit monitor. This is the screen that you should have got:
The options enclosed in the box is what we’re concerned about at this time. So let’s go through them, one by one. The three “Reset” options, “Manual Reset”, “Timer Reset” and “Windows Event Reset” exist for all the monitors (even though I’ve expanded only the first 2 in the pic above).
- Manual Reset: Choose this option when you want the alert to stay in the console unless you close/resolve it manually.
- Timer Reset: Choose this option when you want the alert to close itself automatically after a given period of time.
- Windows Event Reset: With this option you can choose to automatically close the alert only when a second healthy event is detected in a given time period. So, one bad event raises the alert, and the second good event resolves it. If the healthy event is not detected in the given time, the alert stays in the console until you close it manually.
Simple Event Detection:
This is the option that may know the best. It’s the simplest and does exactly the same as the name suggests – simply detects the occurrence of an event in the specified Event Log and raises an alert.
Examples:
Manual Reset –
Now that we have the monitor set up, let’s test it.
We’ll create a custom event with Powershell and try to detect that. Here’s a simple Posh:
#create a custom source New-EventLog -LogName Application -Source "Custom" #write event Write-EventLog -LogName Application -Source "Custom" -EventId 100 -Message "This is a test event"
Just making sure the event was created:
Right, looks good. Now onto the Ops Console:
As we can see, the alert has been raised. The alert will be resolved when the monitor producing it will be healthy. Since this is a manual reset monitor, it’ll only turn back healthy when you manually reset it.
There’s a good side to this and a bad one.
Good side:
You will always notice when the alert has been raised, and you can take any responsive measures as applicable. After you’re done, reset the monitor to make sure some action has been taken on this.
Bad side:
Unless and until you’re making sure to manually reset the monitor, there won’t be a new alert. As the monitor is critical already, it can’t be critical again and so won’t generate a new alert. It’ll only increase the repeat count, which may or may not be what you want. The work-around for this is to run a scheduled script that resets the monitors periodically to turn them back to healthy to make way for a new alert.
Timer Reset –
The only extra option you have here is to specify the wait time for reset. I’ve created this monitor to detect event 101 in Applications log.
With tests similar to the previous one, I get an alert for this.
You will have to take my word for it, the alert disappeared after 15 minutes 😉
Windows Event Reset –
Pay attention to the Wizard options here. You have to configure 2 event expressions, one for unhealthy and other for healthy. I set up the unhealthy event as event 102 with source “custom” in Application log while the healthy event is event 102 with source “custom1”.
Unhealthy event:
Healthy event:
As soon as I created the unhealthy event, I received an alert which was automatically resolved when I triggered the healthy event.
Repeated Event Detection:
Choose this monitor when you want to raise an alert if the specific event is raised repeatedly, with given settings. Here’s where the things get a little tricky.
You have a bunch of different (and confusing) options to set up here. Luckily, it’s all very well documented here on Technet : Repeating Events
What I’m doing is to configure the monitor to raise an alert when the event 103 is raised 3 times within 15 seconds. And sure enough, I do get an alert.
Missing Event Detection:
Choose this monitor when you’re expecting some event to be written in the Event Log – maybe due some kind of scheduled activity like backup, maintenance, scripted events, etc – at the given time. If the monitor doesn’t detect it, it generates an alert.
So what I’m basically telling SCOM is, “I’m expecting the event ID 104 from source “custom” in the Application event log every 15 minutes, let me know if it doesn’t show up, will ya? Thanks!”
To test this, I did NOT create an event with ID 104, and sure enough, I got the alert.
(Do not worry about the mismatch in the alert name and the monitor name, I made a typo in the alert name. It should say “anaops – missing event detection – manual reset” instead of the “repeated” as the name of the monitor at bottom suggests)
Correlated Event Detection:
Choose this option if you want an alert based on some correlation between two event ID’s. “Some correlation” can vary, as you can see in the wizard.
This can be bit confusing. In this demo, what I’m telling SCOM is,”Hey, let me know if event 105 from source “custom” is raised AND within 5 minutes of its occurrence, event ID 105 from source “custom1″ is also raised (in that order). Cool?”
SCOM said “Cool!”, so I tested it with writing these two events mentioned above within the interval of 5 minutes. And yup, I got an alert.
Correlated Missing Event Detection:
Choose this one when you need an alert when you have “some correlation” between two events – first one occurs, we’re expecting the other within 5 minutes, but it isn’t raised.
For testing this, I created the event 106 from source “custom” in applications log but did NOT create the other event 106 from source “custom1” within the next 5 minutes. Sure enough, here’s the alert I got:
As you can imagine the other two monitor reset strategies “timer reset” and “windows event reset” will have slightly different wizards, but I’m sure you guys can figure it out 😉
Also, As you may have noticed, unlike many other monitors, there’s no “interval” at which the event detection monitors are running. Meaning, it is looking for the events in the log “all the time”. So the event monitoring you get is almost real-time.
This concludes this fairly long blog, but I hope it gives you some clarity about what options you have for event detection monitoring and help you in choosing the right one. 🙂
We’ll talk about the event monitoring options with rules in the next post.
Cheers!