Just recently I had an issue at a client where there was a need to rethink the notification possibilities due to various issues. I’ve developed a powershell script to gain more control over the notification process.
My Client is using an in-house developed and maintained problem management system installed on a mainframe platform.
The alerts which need escalation are detected in SCOM and then sent by mail to a Lotus Notes system. The data is then read through a connector between the mainframe system and the Lotus Notes dbase. The mail is scrubbed and through a series of scripts on the mainframe the key fields of the mail are detected and filled in in the ticket….
So far so good… BUT because of the use of different systems there was an issue with encoding. The mails were sent in UTF8 encoding and correctly decoded when viewing in the Lotus Notes Client but stayed encoded in the Lotus Notes Dbase and therefore the scrubbed text was all scrambled and unusable for the problem management system.
After various attempts to mail in different encoding formats I decided to rethink the notification and detach it from the SCOM system to get more freedom in testing.
The following Powershell script together with a custom notification channel did the trick:
It’s constructed in 3 sections: preparation + composing file, mailing and error handling for reporting reasons.
You can Download the script here.
First of all we are preparing everything to execute the script.
The areas in yellow need to be customized for your environment.
Variables which need customization:
$rootMS: Is used to read the RMS name (if the RMS is a single server you can use the first method, mine is on a cluster so I filled in the name to avoid issues with the RPC server when reading the name through WMI.
$NotifiedResState: Just pick any number which is not already in use. We’ll have to make the resolution state in SCOM afterwards.
$CultureInfo: Make sure you fill in the correct locale info to get the date / time format correct. For a list of all culture info check here: Table
In this part of the script we are reading in all the different desired elements of the Alert and write them in a TXT file. You could however leave the txt file option out and just write this to a string but I prefer to keep the txt files for backup to check whether a ticket was raised at any given time.
Variables which need customization:
$strResolutionState: Because the resolution state is a number in the dbase and not the word itself we need to translate the number to the correct word. This way we’ll get the resolution state name in our mail instead of the number. You need to fill in the resolution state number you’ve chosen earlier + the text you’ve associated with it in SCOM. Check below on how to implement this setting in SCOM.
$strobjectname: Because not all the desired info was in the alerting I had to use 3 custom fields to get the mails to contain compliant info for the custom made problem management system. CustomField2 is reading out the NetBIOS name. Because I don’t need the full name (servername.domain.locale) but just the server name I’m splitting the name and using just the first part in the variable $Objectname
$FilePath: The file path is constructed out of 2 parameters from the alert to create a unique name and avoid overwriting an existing txt file. You need to use the time raised of the event because if you use the Get-Date function to get the current date and time it will generate 2 files if the time changes during the process.
Off course you can adapt the different fields + structure at your liking but for our problem management system this format had to be strictly followed to be able to scrub the mail.
Note: CustomField1 and CustomField3 are static text passed by the alert generated rule.
In the last part of the script you need to send out the mail to your destination.
I’m using static parameters here because the destination will not change that often. However If you have multiple destinations it’s best to use a variable and pass it when you are running the notification command from SCOM.
Variable which needs customization:
$Sender: Fill in the From email address
$OKRecipient: This will be the email address where you want to send the mail to when everything went fine
$strOKSubject: Define the subject for the mail when everything was fine.
$ErrRecipient: This will be the email address where you want to send the mail with the error.
$strErrSubject: Define the subject for the error mail
$strErrBody: Small body to notify something went wrong along the way.
Note: due to my issues in my customers environment with encoding I’ve used a command line mail utility which I’ve used quite often and is platform independent: blat. It’s a lightweight mail utility which can be downloaded here: Blat Download
More info on Blat can be found here: Blat Info
The install + config info for blat on the RMS is at the end of the blog post.
Last but not least I’m writing an event in the event log for successful and unsuccessful script runs. This can be used to set up alerting in SCOM to give you a quick warning when the ticketing is not working anymore.
At the end we unload the snap-in to have a clean system and avoid error messages when running the script the next time:
In order to use this script some things need to be configured in your SCOM environment + on your RMS:
Install Blat:
Any tips or hints on improving this script are always welcome…
Sometimes it’s necessary to launch a custom script or other action after an alert is detected. This can be all executable scripts or programs.
In my particular case I’m using this to launch scripts when an alert is detected to properly escalate the alert and perform additional tasks on the alert.
So how do you make sure that the script you intend to run will actually run when a predefined alert is raised?
By creating a Command notification channel and subscription…
Let’s start with setting up the command notification channel.
Note: I’m using my script Create_Ticket.Ps1 as documented here. The parameters I’m passing are useful for this script but you can pass many more parameters according to your needs.
First of all open the Notification Channels by opening the SCOM console > administration > Notifications > Channels
Right click in the Right pane > choose New > Command…
In the settings tab you need to fill in what you prefer to run:
Click Finish.
At this point your Command Notification Channel is set up. The next thing you need to configure is the trigger which will run this Command Notification Channel. This is done by creating a Subscriber:
Open the Scom console and navigate to Administration > notifications > Subscribers
Right click in the right pane and choose New…
Fill in a name for the Subscriber
Leave the “always send notifications” or specify a time window (ex. during business hours only) and click next.
Click Add to ad a subscriber address to the list. The following window appears:
Fill in the address name and click next
Leave the always send notifications setting or change according to your needs.
Click Finish and you have configured your Command to run whenever you subscribe to an alert with this channel.
Sometimes it’s useful to make your own Custom Alert Resolution States to further classify your alerts in the console and use these states to trigger different actions using various scripts.
I’ll be posting some scripts which are going to use this custom alert resolution state so therefore I’m documenting here how to configure them.
Open your SCOM console, select the administration tab, settings and alerts.
Click new…
Type in the Resolution State display name and choose a uniqueID. Click OK.
And we are done.
Not much to it but it makes live a little easier when you want to classify different alerts.
In the next series of blogs I’ll be frequently using this Custom Alert Resolution States to classify and report on different types of alerts.
Most of the time the prerequisite checker when installing SCOM 2007 is right that there’s a prerequisite not met to install the specific role or specific item of SCOM 2007.
However If you are 100% sure everything is there you can bypass the prerequisite checker by running the install with the following command:
MSIEXEC /i <path>\MOM.msi /qn /l*v D:\logs\MOMUpgrade.log PREREQ_COMPLETED=1
This is however NOT supported by Microsoft.
Note: in Windows Server 2008 always run commands in an elevated prompt.
This should be your last resort to get things going. Most of the time there’s indeed a prerequisite not met and therefore the checker is right.
If you want to double check your prerequisites you can find them here:
http://technet.microsoft.com/en-us/library/bb309428.aspx
A known issue with the prerequisites is that ASP.Net is not correctly detected. More info here: http://support.microsoft.com/kb/934759
Recently I got a question of a customer to move the Opsdb Datawarehouse (DW) to another drive because the disk on which it was originally installed was not big enough. In fact they wanted to move the DW to an iscsi disk to boost performance.
To verify whether there would be an issue or it would be a straight forward move I did some browsing on the biggest manual out there… The internet!
However all that came up were actually moves from one server to another but not from one drive to another on the same server…
I did some testing in my lab and thought I ‘d share the outcome with you.
First of all this is your DW you are tempering about. Make sure you have proper backups of your db and read the entire blog before proceeding. Just to be on the safe side. It would be a shame that you lost all your data older than 8 days (if this is your grooming setting) because of a bad manipulation.
Ok enough said. Let’s get things started.
These are the steps I followed and in my case everything went smoothly without any problems.
First of all (again) take backups of your dbase and secondly plan a SCOM down time. To be absolutely sure that there’s no interference or blocking of the DW dbase you need to shutdown your RMS, any MS and GW servers in your environment (or at least in the management group of which the DW is part of). Some sources just drop the connections to the dbase which is an option as well but I prefer the first option. In my opinion it’s safer to do it like this.
Connect to the SQL server where your DW and open up the Microsoft SQL Server Management Studio:
Open up the connection to your DW. In my case it is residing on my VSERVER05.
Again better safe than sorry. Backing up!
The DW can be very big so it could be that it needs some time to perform the backup. When it’s finished.
At this point shutdown your environment. This means RMS, MS and GW’s. This sounds like a draconic measure but it ensures that your environment is completely shutdown and no queries are made to the dbase.
When this is done we can proceed to move the dbase
Take the DW offline by right clicking it and choosing “Take Offline”
A small dialog will popup and eventually of all goes well it will tell you the dbase is offline successfully. Notice the red arrow on the DW dbase.
Now take the ReportServer$OpSDBDW and ReportServer$OPSDBDWTempDB offline as well. Note that these dbases can have a different name in your environment or could not be present.
Note: My OpsdbDW is installed in a separate SQL instance. Be cautious with restarting your SQL service as this impacts all dbases under this instance.
When all the dbases are down they can be detached. This is done by right clicking the dbase > tasks > “detach”.
Choose the option to drop the connections to the dbase and hit OK.
Now we can copy (yes copy) the data. Again better safe than sorry and make a copy of the data rather than moving it.
After the copy has been done we are going to attach the copied DW to the SQL
Right click Databases and click Attach:
Select your dbase and attach:
In this case I’m moving my DW from E: to F: drive.
NOTE: It’s not automatically selecting the correct log file. Make sure you select it manually by clicking on the icon behind the path in the lower section.
When the attach is completed successfully you will dbases are moved to your new drive.
Start your SCOM environment again by starting your RMS first and then your MS and or GW servers you might have.
Just to be on the safe side verify whether you’re able to generate a report in the reporting view of your console with data older than 7 days (when your grooming settings are different you need to modify this to make sure you have a report with data older than your grooming setting.
If all goes well you now have successfully moved your dbase to another drive and you are free to delete the initial copy on your old location.
Today at a customer I came across a problem with cross platform monitoring.
They had several Linux servers running with RedHat distro. They installed the Linux monitoring pack for cross platform monitoring their Linux environment.
They installed all the agents on the Linux servers but did not configure the proper action accounts to perform the discovery and monitoring.
To give my client some documentation how to perform these actions I came across this article on the Microsoft website.
http://technet.microsoft.com/en-us/library/dd788981.aspx
The instructions however are outdated with SCOM 2007 R2 so I’ll document them below.
First things first.
If you notice these events in the Operations Manager Eventlog:
Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 1107
Date: 11/24/2008
Time: 2:18:03 PM
User: N/A
Computer: RMS_SERVER
Description:
Account for RunAs profile in workflow “Microsoft.Linux.RedHat.Computer.Discovery”, running for instance “Linux_server_name” with id”{384D2415-A49D-4002-768B-51D8D2EDBDD*}’ is not defined. Workflow will not be loaded. Please associate an account with the profile. Management group “group_name”
This most likely will indicate an issue with the run as accounts to connect to your Linux environment.
Following the article above at some point it’s outdated so here’s the proper way with some more clear instructions and some extra info I’ve learned in the field while configuring it for my customer.
Outlined steps:
Open the Operations console with an account that is a member of the Operations Manager 2007 R2 Administrators profile.
Select the Administration view.
In the navigation pane under Run As Configuration, select Profiles.
In the results pane, double-click the UNIX Action Account, or UNIX Privileged Account. You need to create both.
Click next on the first page. This is the overview page. Nothing can be changed here.
In the next screen you need to select which user you are going to use as an action account on the Unix / Linux system. This screen consists out of 2 portions. The upper portion which is used to define the user and the bottom portion which will be defining the target.
Select the Run As account by selecting the drop down list or create a new one. In this case we’ll create a new one. Click new…
Click next on the welcome screen to proceed in creating the account:
The next screen you need to fill in the type of the account and the desired display name in SCOM. In this case we’re going to use the basic authentication type and we’ll name the user “UNIX Action Account” as shown below:
Click next and in the next screen fill in the credentials which have access to the Unix / Linux machine. In this example I’ve used the Root account. This can be any account with the proper access rights on your Unix / Linux server.
Click Next. The next thing you need to select is whether you want to manually select the targets where this action account will be targeted against or if you want to target it to all computers (which is less secure because all the admins on those machines can see the username and password). In this example we’ll choose the more secure way.
Click Create and on the following screen click close. It’s actually telling you that this first step is not enough but you have to associate it to a profile which will be done in the following step. Click Close.
Now we’re back at our 2 portioned screen. The top portion is filled in with the newly created user. So the next step will be to target it.
Select the “A Selected class, Group or object field and click the select button. A little selection list will pop up. In this example we chose to target the action account to a class…
The class selected for this example is Unix Computer. You have to see what’s manageable for your environment. Another approach is to target the run as account to Linux Computer group or specific Linux Objects.
Click OK. Now you’re back at the 2 portioned screen with the 2 sections filled in. Hit OK at this point.
Because we’ve chosen to manually select the computers we want to target the newly created action account the following screen will appear to do so.
Click on the User Action Account hyperlink to go to the settings page of the User Action Account.
In this example I’ve added the VSERVER07 to the list and clicked ok.
Normally now all your Linux servers should become discovered and the 1107 events should disappear. In the environment I had to manually close the events on the RMS queue and it also came back to a healthy state.
It’s probably a good thing to create a notification of these 1107 events to make sure you don’t miss any of these alerts as they are easy to miss but have a great impact on the monitoring of the Linux servers as they are not monitored when these events come up.
You need to repeat all the steps to create also a UNIX Privileged user to perform tasks which need more elevated rights.
After this the Linux servers status went from unmonitored to monitored and all the components were detected successfully.
Today the updated management pack for Exchange 2010 with support for SP1 is published. It can be downloaded from the MS Download site:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7150bfed-64a4-42a4-97a2-07048cca5d23
The new version is: 14.02.0071.0
Be sure to download also the explanatory doc which holds all the changes to this management pack. Some great info in there!
Download the correct file from the site:
This is not a standard straight forward management pack but requires to install a Exchange Correlation Engine.
This Correlation engine is basically a windows service which uses the Operations Manager SDK to first retrieve the health model and then process the stat change events. The correlation engine is capable of checking the health status before raising an alert. This significantly reduces the alerts generated as the engine is logically looking at the relationship between the alerts and closing them when they are caused by other alerts which were caused by the underlying issue.
The correlation engine is by default enabled. Be cautious when you are using helpdesk tools which don’t like events to be closed automatically.
Changes in This Update
The Exchange 2010 SP1 version of the Exchange 2010 Management Pack includes significant improvements beyond those included in the RTM version of the Exchange 2010 Management Pack. The following list includes some of the new features and updates:
I’ll be playing with this MP shortly and post my findings.
Source: Exchange Server 2010 Management Pack Guide.doc
There were actually quite some sessions which gave a good preview of the SCOM2012 version which is pre-beta now and will become RTM by the end of 2012.
Until then more and more features will be communicated.
One of the most interesting features actually is that SCOM 2012 will tackle one of the biggest nightmares of all SCOM admins: the SPOF which is called RMS. All SCOM admins will have to admit that at one point or another they faced problems with their RMS which was acting up funny. In SCOM 2007 you are only allowed to run one RMS which is actually a MS which has the Root MS role. The SDK service can only and exclusively run on this machine making it the hart of your SCOM environment.
Your environment is highly impacted when your RMS is down.
The consciences:
Fortunately this is tackled in SCOM 2012 by a new organization of the management servers. RMS which was introduced in SCOM 2007 will be history. In fact all the management servers (MS) will be automatically joined to a management pool which will all have the SDK service running. Because all the MS have the SDK service running they can all perform the task of the old RMS
This has some nice advantages:
The management pool is automatically created when you install the first MS and it will automatically add all MS’s which are installed afterwards.
Pretty cool feature if you ask me
Seems like Microsoft Really beefed up the network monitoring features. There’s a complete new way of discovering new devices which are in your environment. A nice cool feature is the map which is drawn of your network. You can also check which components are in the vicinity of the troublesome device so this can be very helpful in case of a faulty device.