OMS/Azure Automation Example – Part 2 – Alerts

The Scenario:

We are monitoring multiple servers using Operations Management Suite (OMS). We are very concerned about the Print Spooler service on our servers (because who isn’t worried about the Print Spooler service Winking smile ). We want to configure OMS to throw an alert whenever the Print Spooler service stops on a server, and then create an Azure Automation runbook to attempt to restart the service automatically for us.

We will break the solution into multiple steps:

  • Create custom fields in Log Analytics to hold Service Name and Service State – Part 1
  • Create an alert that triggers whenever the Print Spooler enters a “stopped” state
  • Create an Azure Automation runbook that triggers a PowerShell script running on a Hybrid Worker role on premises, to try and restart the service

In this blog post we will look at how to create an alert that triggers when the Print Spooler service enters a “stopped” state.

In Part 1 of this series, we created a query that used our custom fields to return all the instances of the Print Spooler service stopping:

Type=Event WindowsServiceState_CF=stopped WindowsServiceName_CF=”Print Spooler”

image

To turn this query into an alert that will be thrown, click the Alert button above the query. This opens the Edit Alert Rule window.  The following image shows the alert rule form filled out.

image

First, you need to give the alert a name, in this case Print Spooler Service Has Stopped. As a best practice you should also give the alert a description: This alert is generated when the print spooler service is stopped on an OMS monitored machine.

You can set the severity of the alert: Critical, Warning, or Informational.

The search query for the alert is automatically populated for you, using the query you had just created. You can make changes to the query here if you so desire.

Next, you need to specify the Time Window for the Alert, and the Alert Frequency.  It is important to understand the difference between these two values.  The Time Window specified the time range for the query.  The query will only return records that were created within this range of the current time. This value can be anywhere between 5 minutes to 24 Hours.  For example: if the time window is set to 60 minutes, and the query runs at 1:15 PM, only records created between 12:15 PM and 1:15 PM will be returned.

The Frequency specifies how often the query should run.  This can also be any value between 5 minutes to 24 hours.  This value should always be less than or equal to the time window. If you have a value greater than the time window, you run the risk of missing records. For this example, we are setting the Time Window and Frequency to the same value, 5 minutes. This means the query will run every 5 minutes, and will only evaluate that last 5 minutes of records, so essentially it will only evaluate new records.

An alert rule is triggered either based on a number of results returned, or on a metric measurement.  You can consult the documentation for details on this, but for our example, we want to trigger if the number of results returned is greater than zero.

For now, we are going to just send an email notification when an alert is triggered.  So we select the Yes button under Email notifications, provide a subject line, and specify who should receive the email.

Finally, we click the Save button to save our changes.  The alert is now active.

I’m going to go to one of my monitored machines and stop the Print Spooler service.  If I go back to the OMS workspace, and go to Log Analytics, and click the Favorites button, I can see that my alert query was saved in a Saved Searches section called Alerts:

image

If I click the query to run it, I can see that a new record has been added for the Print Spooler service stopping:

image

If we wait a few more minutes, I’ll receive an email from OMS:

image

 

Back in my OMS dashboard, I’ve implemented the Alert Management solution.  I see this:

image

It shows me that I’ve thrown one critical alert in the past 24 hours.  I can click the solution to drill into it.

image

Here I can see the details for the Critical and Warning alerts. If I’m flowing alerts from SCOM to OMS, I would see the  active alerts here.

Under the Critical section, I can click on the Print Spooler alert to pull it up in Log Analytics.

image

 

At this point, we have used our custom fields to create an alert that is thrown whenever a Print Spooler service is stopped.  And when the alert is thrown an email is generated notifying the appropriate person.

Our next step will be to create automation that will try and start the service back up once it is stopped. To do that, we will make use of Azure Automation and Hybrid Worker Roles.  See you next post!

  • Add Your Comment