Blog

SCOM: Monitor the monitor part 1: PowerShell

Recently I got a question of an engineer during a community event why SCOM didn’t notify him when SCOM was down.

My first response was very similar to the response of my favorite captain below: printscreen_surf-0018

But this got me thinking actually because the engineer made a good point. That to have a full monitoring you should have another mechanism in place to monitor the monitoring system. Most companies still have a legacy monitoring system in place that can be leveraged to monitor the servers of SCOM but let’s face it: keeping another monitoring system alive just to monitor the SCOM servers only adds complexity to your environment for a small benefit.

That’s why I started building a small independent check with PowerShell. In part 1 of this series I’ll go over how to monitor whether your management servers are still up and running.

To do this we need to make sure that we have a watcher node which is able to ping the management servers. This watcher node may be any machine capable of running PowerShell and does not need to have operationsmanager PowerShell module available. This to make sure we are operating completely independent from SCOM.

Process used

The graph below shows the process used:

monitorthemonitor_servers

In my environment I have 2 management servers which are reachable from the watcher node. The first step is to dynamically determine how many management servers are in my environment. To do this I’m creating the input file which is generated by PowerShell on a management server and updated once a day. This is an automated process because face it: if we need to think about changing the infile.txt when we add or delete another management server we will forget.

This file will be available on the watcher node to do the ping commands even when the management servers are down.

Configuration on the Management server

(this is action 1 in the graph above)

To generate the infile containing all the management servers which are currently in our environment we need to execute the following PowerShell command on the watcher node:

[xml]
#=====================================================================================================
# AUTHOR:    Dieter Wijckmans
# DATE:        03/12/2014
# Name:        Readms.PS1
# Version:    1.0
# COMMENT:    This script will read out all the Management servers in a management group and saves it
#           into a txt file which is used to ping the servers from an external watcher node.
#           This script is scheduled on a management server via scheduled tasks.
#           Make sure to fill in your destination (which is your watcher node) in the variable
#
# Usage:    readms.PS1
# Example:
#=====================================================================================================
$destination: "fill in the destination on the watchernode here"
$ms = get-scommanagementserver

foreach ($mstemp in $ms){
$ms.DisplayName | Out-File $destination
}

[/xml]
Schedule this script on the management server via scheduled tasks and run it once a day.

The program to run is: powershell.exe c:\scripts\readms.ps1

This will generate the infile for the ping command to check the management servers and will place it on the watcher node.

Configuration on the Watcher node

(this is action 2 in the graph above)

Next up is to configure the watcher node to monitor our management servers and alert when they are unreachable. This is done by executing the following PowerShell on a regular basis through schedule tasks. I schedule this task every 5 minutes. This means that you get a mail every 5 min until it’s resolved. Better annoy a little bit more than just send 1 mail which just drowns in the mail volume.

[xml]
#=====================================================================================================
# AUTHOR:    Dieter Wijckmans
# DATE:        03/12/2014
# Name:        Pingtest.PS1
# Version:    1.0
# COMMENT:    This script will ping all the Management servers in a management group according to the
#           input file and escalate when a server is not reachable.
#           Make sure to fill in all the parameters in the parameter section.
#           This script is scheduled on the watcher node via a scheduled tasks.
#           Make sure to fill in your destination (which is your watcher node) in the variable
#
# Usage:    pingtest.PS1
# Example:
#=====================================================================================================

#parameter section: Fill in all the parameters below
$infile = "Location of file with management servers listed"
$outfile = "Location of file which will keep historical data on the pings"
$smtp = "fill in your smtp config to send mail"
$to = "The destination email address"
$from = "The from email address"

#reading the date when the test is executed for logging in the historical file
$testexecuted = Get-Date
#reading in all the objects listed in the infile
$objects = get-content $infile

#running through all the objects and taking action accordingly
foreach ($object in $objects)
{
$pingresult = Test-Connection $object -quiet
if ($pingresult -eq $True)
{
$pingresult = "Online"
}
else
{
$pingresult = "Offline"
$subject = "SCOM: Management Server " + $object + " is down!"
$body = "<b><font color=red>ATTENTION SCOM support staff:</b></font> <br>"
$body += "Management Server: " + $object + " is down! Please check the server!"
send-MailMessage -SmtpServer $smtp -To $to -From $from -Subject $subject -Body $body -BodyAsHtml -Priority high
}
$result = $object + " :ping result: " + $pingresult + " :" + $testexecuted | Out-File $outfile -append

}

#read the length of the inputfile and validate the same amount of lines in the outfile to validate whether all management
#servers are down.
$filelength= Get-content $infile | measure-object -Line
$numberoflines = $filelength.Lines
$file = Get-Content $outfile -Tail $numberoflines
$wordToFind = "Online"
$containsWord = $file | %{$_ -match $wordToFind}
If($containsWord -notcontains $True)
{
$subject = "SCOM: ALL Management Servers are down!"
$body = "<b><font color=red>ATTENTION SCOM support staff:</b></font> <br>"
$body += "All Management servers are down. Please take immediate action"
send-MailMessage -SmtpServer $smtp -To $to -From $from -Subject $subject -Body $body -BodyAsHtml -Priority high
}
[/xml]
Note: Make sure that you change all the parameters in the parameter section.

This script will ping all the machines which are filled in in the infile we created earlier and writes this to the out-file. The outfile is than evaluated and a mail is automatically send when a management server is down. If ALL management servers are down a separate mail is sent to notify that SCOM is completely down.

You can change the mail appearance in the $body fields in the PowerShell.

The outfile will have the following entries:

printscreen_surf-0020

My servers were Offline last night at 21:13:38. So the mailing was triggered and  the mail will look like below when SCOMMS2 is down:

printscreen_surf-0019

When all management servers are down it will look like this:

printscreen_surf-0021

So now we get completely independent from SCOM mails telling us there’s an issue with the SCOM management servers.

  • So what if our watcher node is down? Well I’ve installed a SCOM agent on this machine with a special subscription to notify me when it’s down.
  • So what if our management servers are down AND my watcher node is down… Well then you probably have a far greater problem and your phone will probably be already red hot by now…

You can find the PowerShell scripts and the files here on Technet Gallery:

download-button-fertig11

In Part 2 I’ll go over the ability to monitor your SQL connection of the management servers.

ExpertsLive Free ticket giveaway

After last years successful edition Expertslive is back on 18/11/2014!

printscreen_surf-0003

Identified as one of the most Microsoft centric events organized by the community in Holland this event will be packed with sessions regarding the Microsoft stack.

Sessions will span the entire product group inlcuding Azure, System Center, Hyper-v, SQL Server, Windows Server, PowerShell and Office365.

All these session will be provided by top notch speakers in their respective field. Numerous international speakers will bring you the best to get you up to speed as quick as possible. It will be a not to miss event!

I’ll be hosting a session about monitoring everything with SCOM to become the one tool to monitor it all. A not to miss session…

For more info check out : http://www.expertslive.nl

Now the fun part!

Because Scugbe is supporting this event we are entitled to give away 15 free tickets for the event!

twitter-logo

All you have to do is follow @scugbe on twitter and tweet: @scugbe I would love to go to @expertslive! I want to win a ticket! If you are already following @scugbe just send the tweet.

The winners will be announced on 31st of October.

Hopefully see you there in EDE!

SCOM: Connect management groups between on-prem and Azure

 

During a recent project I explored the benefits on hosting a 2 legged SCOM environment for both on-prem and cloud services. Although this is possible with just one management group and site to site VPN to the cloud they opted for a 2 management group approach to keep a certain sort of divider between the on-prem and the cloud.

In this blog post (who knows it could become a series) I’ll show you how to connect the management groups to each other so they can exchange alerts and use 1 console but benefit from presence of a management group on both platforms.

wall2top_z23gd-129

In this scenario I’m going to use connected management groups. As explained here http://technet.microsoft.com/en-us/library/hh230698.aspx

Connecting management groups in SCOM 2012 gives you a couple of benefits. The biggest one in my opinion is the fact you can have multiple management groups with different settings but use 1 console to get all the alerts. The customer wanted the ability to monitor their clients on different thresholds than their own systems. The own systems were mainly situated on site although the other systems were at the clients site or in the cloud.

The management group which will have the consolidated view is called the local management group. In my example it is VLAB which is on prem. The other management groups are called “connected management groups” in this case VCLOUD.

They relate to each other in a hierarchical fashion, with connected groups in the bottom tier and the local group in the top tier. The connected groups are in a peer-to-peer relationship with each other. Each connected group has no visibility or interaction with the other connected groups; the visibility is strictly from the local group into the connected group.

So in this scenario it’s a good idea to connect these management groups to see all data in 1 console for both on-prem and client based. In VCLOUD it’s not possible to see the alerts of VLAB but the other way around it’s possible.

So what do we need to do to obtain this (even without different AD domains and firewalls in between).

First of all prep the VCLOUD in Azure:

Create endpoints on Azure machine

In order to be able to resolve the Azure management group from the on prem we need to make sure that connection is possible to the VCLOUD management server. This is done through port 5723 and 5724.

Open the Azure management portal:

My server is called vcloud-ms1

printscreen-0231

Open the endpoints and add 5723 and 5724 to the endpoints. This in fact opens the firewall of azure to your machines. All communication will happen over these 2 ports.

printscreen-0232

Click add and fill in the endpoints as shown below.

printscreen-0233

Next find the following

  • The Public Virtual IP address (VIP) and take a note. In my case it’s 23.101.73.xxx
  • The DNS name: in my case vcloud-ms1.cloudapp.net

 

printscreen-0234

Prepare the onsite management server

Now that the management server of our VCLOUD management group is configured we need to configure the management server in our VLAB environment to become the local management group which will receive the alerts.

First we need to make sure that the onsite server can resolve AND reach the server in VCLOUD management group.

This can be done by changing the hosts file on the VLAB management server.

Go to c:\windows\system32\drivers\etc\ and open the hosts file:

printscreen-0235 

Note: I’ve deleted the last 3 digits of all the IP addresses above you need to fill in the full IP address as documented in the Windows Azure console.

Let’s check whether this works now from the VLAB management server. Doing THE route check: ping the hostname:

printscreen-0236

hmmm not working. Did we configure something incorrect? Check, double check. NO.

Well this makes perfect sense because: PING IS DISABLED towards Azure machines. Therefore you will get a Request timed out all the time you test no matter what you configure!

Connecting the management groups

Now that we have both ends configured it’s time to see whether we can connect the management groups. Remember: initiate the connection from the local management group (the one who needs to see all alerts and is on top of the hierarchy)

So let’s connect to the management server in VLAB:

Open the Administration pane and select Connected Management Groups and click

printscreen-0237

Right click and choose Add Management Group

printscreen-0238

Fill in all the data requested:

  • Management Group Name: The name of the VCLOUD management group
  • Management Server: The name of the management server in VCLOUD (make sure to use the exact name as filled in in the host file)
  • Account: Because the account we use as SDK service resides in the VLAB AD and is not known in the VCLOUD we need to use the VCLOUD credentials

printscreen-0239

Note: You need to initiate this from the management server where you have changed the host file so make sure there’s a console on there

You will get the message below because it’s not possible to validate the account in the local AD:

printscreen-0240

Just click next and normally you should be connected at this point:

printscreen-0241

Success!

So now all we have to do is configure what we want to show on the local management group.

 

I’ll explain this further in the next blog in this series.

Social Update: A manner of speaking…

A lot of exciting things are happening in the System Center community these days. Different new releases; new features are already delivered or are on the brink of being delivered shortly. TechED NA is right around the corner and other events are being planned as well. I always enjoy being part of these events and meet old and new friends all with the same interest: System Center products.

speaker

This blog post will be my (and your) one place to keep track of all sessions which I’m presenting and events I’ll be attending both national and International.

Hope you will attend one of my sessions and if you do, make sure to take the time to meet up!

If you have any suggestions or question about this list please sure to drop me a line on twitter or send me a mail

Event Date Location Session URL
SCU Network 22/05/2014
1PM CET
Online Webinar Exploring monitoring beyond the borders of Microsoft: Part 1 Linux monitoring http://www.systemcenteruniverse.com/scunetwork.htm
SCU Network 27/05/2014
1PM CET
Online
Webinar
Exploring monitoring beyond the borders of Microsoft: Part 2 PowerShell http://www.systemcenteruniverse.com/scunetwork.htm
SCU Network 05/06/2014
1PM CET
Online Webinar Exploring monitoring beyond the borders of Microsoft: Part 3 Monitoring API based devices http://www.systemcenteruniverse.com/scunetwork.htm
ITPROceed 12/06/2014 Antwerp (Belgium) Can SCOM monitor other stuff than Windows Thingies? Euhm yes itt can! http://www.itproceed.be

 

Home automation: Putting a child lock on my Nest thermostat using SCOM

 

This post is part of a series on how I demonstrate how to use SCOM to basically monitor everything. The other parts can be found here:

After I have successfully been able to get data into SCOM from my Nest Thermostat and my Flukso energy meter it’s time to do some cool stuff with it. More devices are in the pipeline to get data into SCOM to create the ultimate Domotics controller or should I say “SCOMotics”…

The world: Keeping an eye on Teen Trouble

One problem I have in real life is the fact that it’s very hard to explain to my wife and kids the process off radiant floors. It takes some time to heat up but it stays warm a long time so there’s no point in setting the thermostat to a higher point to get instant heat because it takes approx 1 hour to heat up 2 degrees celcius (something I also learned from getting my Nest thermostat data into SCOM).

But you can explain all you want if they find it chilly they’ll turn up the thermostat assuming it will get warm instantly but in fact they are just using more energy than necessary to heat the house in 2 hours when they already left the house.

So the mission was very simple. To stop them from doing this. Yes… I could put a lock code on the Nest thermostat and make it only available to me but if I’m not home and they really need to put the heating higher they are not able to do so.

So I came up with another solution: Setting a hard limit on the degrees and enforcing it.

So in short what do I need to achieve with SCOM:

  • Detection of the current temperature set: Target temperature
  • Alerting when the Target temperature breaches the set limit
  • Take corrective action to make sure the target temperature is set below the max temperature.

So let’s start with the detection of the current target temperature. I can reuse the work I already did to read in this value and compare it to the limit. To keep track of things and as this is a more general approach I’ve documented the process of creating a PowerShell script monitor using Silect MPAuthor here: http://scug.be/dieter/2014/04/24/scom-creating-a-powershell-script-monitor-with-silect-mpauthor/

So now that we have the monitor in place let’s check out whether it’s working!

First of all I’m setting my nest thermostat to 20 Celsius while my limit is set to 19 Celsius:

SNAG-0257

After the first run the monitor is picking up that indeed the temperature is higher than the requested limit. This is detected by running the PowerShell script monitor we’ve configured earlier:

SNAG-0263

Here you can see that the Recovery target which I configured kicked in as well. This recovery target consists out of a PHP file which is located on my Webserver and loaded by using the PowerShell Invoke-Webrequest module..

Note: I’m running this recovery against my Watchernode class which consists of 1 server and thus I’ve copied the “settempnest.ps1” to the local folder of that particular server.

How did I configure the recovery task

First open the monitor and click add on the “configure recovery tasks” section

SNAG-0260

Fill in the name of the recovery and select the status where to react upon.

SNAG-0261

Enter the command:

  • Full path: C:\Windows\System32\WindowsPowerShell\V1.0\powershell.exe
  • Parameter: -noexit “& “C:\scripts\settempnest.ps1”

SNAG-0262

The powershell is running a invoke-webrequest on my webserver. The PHP script it is running is copied below:

[xml]

<?php

require ‘inc/config.php’;
require ‘nest-api-master/nest.class.php’;

define(‘USERNAME’, $config[‘nest_user’]);
define(‘PASSWORD’, $config[‘nest_pass’]);
date_default_timezone_set($config[‘local_tz’]);

$nest = new Nest();
$nest->setTargetTemperatureMode(TARGET_TEMP_MODE_HEAT, 18.0);

[/xml]

So after running the recovery we see the monitor changing back from error to healthy:

SNAG-0259

There we go… All good again saving some energy

SNAG-0265

And final check on the thermostat itself… Back humming at 18 degrees.

SNAG-0264

SCOM: Creating a PowerShell script monitor with Silect MPAuthor

Sometimes it’s necessary to create a monitor to monitor something which is not included in the standard management packs. Unfortunately it’s not possible in SCOM  to use PowerShell to crerate a script monitor in the scom console. Although it’s not a good idea to start authoring in the operations console it sometimes can be a quick and easy way to create a monitor.

Recently Silect Sofftware released a free version of MPAuthor to create your management packs. I’m using this to create my script monitors to collect and monitor the data which I use in my monitoring my home series: http://scug.be/dieter/2014/02/19/monitor-your-home-with-scom/

Download the tool here: http://www.silect.com/mp-author

Below is an example of how I monitor the target temperature set on my Nest Thermostat.

So open the tool and create a new management pack => Create New Script Monitor…

SNAG-0243

Name the script (if you have the script somewhere as a PS1 file it will load the script body automatically.

SNAG-0246

This is the script I’m using:

[xml]

param([int]$maxtarget)
[void][system.reflection.Assembly]::LoadFrom(“C:\Program Files (x86)\MySQL\MySQL Connector Net 6.8.3\Assemblies\v2.0\MySQL.Data.dll”)

#Create a variable to hold the connection:

$myconnection = New-Object MySql.Data.MySqlClient.MySqlConnection

#Set the connection string:

$myconnection.ConnectionString = "Fill in the connection string here"

#Call the Connection object’s Open() method:

$myconnection.Open()

$API = New-Object -ComObject "MOM.ScriptAPI"
$PropertyBag = $API.CreatePropertyBag()

#uncomment this to print connection properties to the console
#echo $myconnection

#The dataset must be created before it can be used in the script:
$dataSet = New-Object System.Data.DataSet

$command = $myconnection.CreateCommand()
$command.CommandText = "SELECT target FROM data ORDER BY timestamp DESC LIMIT 1";
$reader = $command.ExecuteReader()
#echo $reader
#The data reader will now contain the results from the database query.

#Processing the Contents of a Data Reader
#The contents of a data reader is processes row by row:

while ($reader.Read()) {
#And then field by field:
for ($i= 0; $i -lt $reader.FieldCount; $i++) {
$value = $reader.GetValue($i) -as [int]
}
}
#echo $value
$myconnection.Close()
#$value = $value -replace ",", "."

if($value -gt $maxtarget)
{
$PropertyBag.addValue("State","ERROR")
$PropertyBag.addvalue("Desription","Target temperature currently set to " + $value + ": is higher than the maximum target temp " + $maxtarget)
}
else
{
$PropertyBag.addValue("State","OK")
$PropertyBag.addvalue("Desription","Target temperature currently set to " + $value + ": is lower than the maximum target temp " + $maxtarget)
}

$PropertyBag

[/xml]

Note that you need to pass the parameters through to SCOM via the propertybags. I also am a fan of doing the logic in the script itself as shown above to avoid any logic in SCOM afterwards. It’s far more easy to do the comparison in the PowerShell script. In this case I’m setting State to either ERROR or OK. This also avoids the format conflict of the output whether it’s a string or an integer.

I’m setting the maxtarget parameter to 19

SNAG-0245

Next you need to create the conditions for the monitor states:

SNAG-0247

As I’m only using a 2 state monitor I’m deleting the OverWarning state and only using UnderWarning (= Healthy state) and OverError (= Error state).

SNAG-0248

For the Healthy state I’m detecting the “State” property value as OK (note that I’m defining the Type as a String as the state is just plain text)

SNAG-0249

For the Error state I’m detecting the “State” property value as ERROR

SNAG-0250

Now we need to target the monitor. In my case it’s the watcher node target I’ve created earlier on.

 

SNAG-0251

Naming and enabling the rule

SNAG-0252

Set the schedule how many time to check the status of the max temp

SNAG-0253

Speciffy the alert that needs to be raised if any:

SNAG-0255

And create.

SNAG-0256

Now save the management pack and test it in your environment.

System Center 2012 R2 Update Rollup 2 Released

 

Just a quick note that System Center 2012 R2 Update Rollup 2 was released last night. For a full view of the different updates included head over to the official KB which is located here: http://support.microsoft.com/kb/2932881

it_photo_119959
A lot of features and fixes.

Below you can find the links to the different fixes.

Data Protection Manager (KB2958100) (6 fixes in total)

Operations Manager (KB2929891) (9 fixes in total)

Operations Manager – UNIX and Linux Monitoring (Management Pack Update KB2929891) (1 fix in total)

2929891 System Center 2012 Operations Manager R2 Update Rollup 2

Orchestrator (KB2904689) (3 fixes in total)

Service Manager (KB2904710) (15 (!) fixes in total)

Service Provider Foundation (KB2932939) (6fixes in total)

 

Virtual Machine Manager (KB2932926) (30 (!) fixes in total)

 

As always these packages are cumulative and hold all the fixes off Update Rollup 1 as well. I’ll be taking the different packages for a test spin in my lab environment and will keep you informed about the things I came across.

Last but not least the Windows Azure Pack also got a very extended update.

More info can be found here: http://support.microsoft.com/kb/2932946

System Center Universe 2014 Houston session video online

 

Wow I can’t believe it has been already 3 weeks ago that I was able to speak at System Center universe 2014 in Houston. The topic of my session was all about connecting both your on premise SCOM environment with your Azure cloud.

BfQYWpCIQAAZ3At

The event was a cool experience and a great chance to catch up with a lot of people and meet new ones while we are at it. Had some great talks about our beloved technology and how we see it developing in the next year.

My Session video is available here:

System Center Universe 2014 Dieter Wijckmans

 

In addition to the video above I’m including some links to articles with more info on how to connect your on prem with your Azure cloud:

Introduction and install of a Scom Azure Gateway server (Cameron Fuller): http://blogs.catapultsystems.com/cfuller/archive/2013/12/04/operations-manager-and-azure-better-together-introducing-the-azure-monitoring-gateway-[sysctr-scom-azure].aspx

Configure Site to site vpn: http://www.windowsazure.com/en-us/documentation/articles/virtual-networks-create-site-to-site-cross-premises-connectivity/

Configure point to site vpn: http://msdn.microsoft.com/en-us/library/windowsazure/dn133792.aspx

Configure the Scom Azure management pack: http://blogs.technet.com/b/cbernier/archive/2013/10/23/monitoring-windows-azure-with-system-center-operations-manager-2012-get-me-started.aspx

SCOM: System Center Data Access Service stops (event 26380 , 33333)

 

When I started to review a SCOM 2012 R2 environment recently I came across an interesting issue I didn’t witness before… Time to blog the solution!

Problem

The System Center Data Access Service started successfully but stopped within the minute. After investigating I found out that there were at least 2 events logged during the time when the service crashes that could give us a clue on what is going on.

Event 26380: The System Center Data Access Service failed due to an unhandled exception… Cannot be added to the container…

sql02

Event 33333: Data access layer rejected: An entity of type service cannot be owned by a role, a group, or by principals mapped to certificates or asymmetric keys.

sql01

Strange… This worked the day before. What was going on?

After my search on the web I found this article of Travis Wright who had a similar problem with SCSM (which share the same code base so a nice entry point to start my troubleshoot).

http://blogs.technet.com/b/servicemanager/archive/2011/10/04/system-center-data-access-service-start-up-failure-due-to-sql-configuration-change.aspx

By now I could pinpoint that there was an issue on the SQL side.

After heading over to the SQL admin with the article we continued our troubleshoot together. Turned out that the issue was not exact what Travis had experienced. In fact the SQL admin had made a review of the SA accounts and removed the SA role from the scom SDK user. No problem so far… But the SDK user was not defined in SQL as a SQL user but just as a member of a group.

Solution

Turned out that the SQL user had no rights to create an instance when executing  the stored procedure: [p_TypeSpaceSetupBrokerService]

Original

SET @Query = N’CREATE SERVICE [‘ + @ServiceName + N’] ON QUEUE [‘ + @QueueName + N’] ([http://schemas.microsoft.com/SQL/Notifications/PostQueryNotification]);’;

This was changed by the followin stored procedure to authorize the DBO to execute and after that the issue was resolved.

SET @Query = N’CREATE SERVICE [‘ + @ServiceName + N’] AUTHORIZATION [dbo] ON QUEUE [‘ + @QueueName + N’] ([http://schemas.microsoft.com/SQL/Notifications/PostQueryNotification]);’;

Hopefully when you have stumbled on this page it has saved you some extra troubleshooting…

Scom: Batch reset monitors through PowerShell

Monitors are a very useful addition to SCOM since SCOM 2007 came out back in the days. However for a lot of fresh SCOM administrators the alerts generated by monitors sometimes can create headaches.

An alert is raised when a state is changed and closed when the state changes back to the health condition. This is the really short version…

If you speak to advanced SCOM admins they can all agree that the management of the monitor generated alerts can be tricky from time to time if you work with operators.

If at one point they close an alert in the console which was generated by a monitor but the condition is not changed for the monitor it will remain in unhealthy state until a force reset is done on the monitor itself.

We all know how many monitors are floating around in our environment so it’s just a disaster waiting to happen. Therefore it is wise to reset the unhealthy monitors for your core business services regularly until everybody is aware about the fact that they can not close alerts from a monitor…

However I use this setup also for another annoying thing that can have great impact on your environment. Again this is a scenario to rule out a human error.

  • IF an alert is raised by a monitor going into a unhealthy state, a notification is successfully triggered and a ticket is created… So far so good.
  • BUT if someone closes the ticket or the alert without looking at it the condition remains and no warning will be raised again.
  • As a lot of my customers are using scom as a monitoring tool in the backend and monitor the tickets it generates they will not be alerted again.

Therefore I created this small PowerShell script in combination with a bat file. It will just reset the health of the unhealthy monitors of a specific monitor you specify. Only thing left to do is create a scheduled task for the bat file and you are good to go.

The script can be downloaded at the Gallery together with the bat file.

download-button-fertig11

Example: Fragmentation level is high and we want to be alerted everyday again as long as the condition remains:

SNAG-0168

Check the monitor properties to retrieve the monitor display name:

SNAG-0169

In this case “Logical Disk Fragmentation Level” Copy paste the name.

SNAG-0170

Fill in the name in the batch file and run it.

SNAG-0171

The unhealthy monitors will be reset and their alerts are automatically closed in the console.

SNAG-0172

If we check the monitor again it is now forced to reset state and will fire again the next time it checks the unhealthy condition when this is still true.

 SNAG-0173

This way you will receive a new alert every time this script runs. You could also schedule this during shift change of the helpdesk to get a clear view of the current situation on your environment that they start with a clean sheet.

Enough talk, let’s build
Something together.