Blog

SCOM 2007: Dump alerts to text file and mail

Just recently I had an issue at a client where there was a need to rethink the notification possibilities due to various issues. I’ve developed a powershell script to gain more control over the notification process.

Case:

My Client is using an in-house developed and maintained problem management system installed on a mainframe platform.

The alerts which need escalation are detected in SCOM and then sent by mail to a Lotus Notes system. The data is then read through a connector between the mainframe system and the Lotus Notes dbase. The mail is scrubbed and through a series of scripts on the mainframe the key fields of the mail are detected and filled in in the ticket….

Problem:

So far so good… BUT because of the use of different systems there was an issue with encoding. The mails were sent in UTF8 encoding and correctly decoded when viewing in the Lotus Notes Client but stayed encoded in the Lotus Notes Dbase and therefore the scrubbed text was all scrambled and unusable for the problem management system.

Solution:

After various attempts to mail in different encoding formats I decided to rethink the notification and detach it from the SCOM system to get more freedom in testing.

The following Powershell script together with a custom notification channel did the trick:

It’s constructed in 3 sections: preparation + composing file, mailing and error handling for reporting reasons.

You can Download the script here.

Preparation

create_ticket01

First of all we are preparing everything to execute the script.

The areas in yellow need to be customized for your environment.

Variables which need customization:

$rootMS: Is used to read the RMS name (if the RMS is a single server you can use the first method, mine is on a cluster so I filled in the name to avoid issues with the RPC server when reading the name through WMI.

$NotifiedResState: Just pick any number which is not already in use. We’ll have to make the resolution state in SCOM afterwards.

$CultureInfo: Make sure you fill in the correct locale info to get the date / time format correct. For a list of all culture info check here: Table

Compose file

create_ticket02

In this part of the script we are reading in all the different desired elements of the Alert and write them in a TXT file. You could however leave the txt file option out and just write this to a string but I prefer to keep the txt files for backup to check whether a ticket was raised at any given time.

Variables which need customization:

$strResolutionState: Because the resolution state is a number in the dbase and not the word itself we need to translate the number to the correct word. This way we’ll get the resolution state name in our mail instead of the number. You need to fill in the resolution state number you’ve chosen earlier + the text you’ve associated with it in SCOM. Check below on how to implement this setting in SCOM.

$strobjectname: Because not all the desired info was in the alerting I had to use 3 custom fields to get the mails to contain compliant info for the custom made problem management system. CustomField2 is reading out the NetBIOS name. Because I don’t need the full name (servername.domain.locale) but just the server name I’m splitting the name and using just the first part in the variable $Objectname

$FilePath: The file path is constructed out of 2 parameters from the alert to create a unique name and avoid overwriting an existing txt file. You need to use the time raised of the event because if you use the Get-Date function to get the current date and time it will generate 2 files if the time changes during the process.

Off course you can adapt the different fields + structure at your liking but for our problem management system this format had to be strictly followed to be able to scrub the mail.

Note: CustomField1 and CustomField3 are static text passed by the alert generated rule.

Mailing and error handling

create_ticket03

In the last part of the script you need to send out the mail to your destination.

I’m using static parameters here because the destination will not change that often. However If you have multiple destinations it’s best to use a variable and pass it when you are running the notification command from SCOM.

Variable which needs customization:

$Sender: Fill in the From email address

$OKRecipient: This will be the email address where you want to send the mail to when everything went fine

$strOKSubject: Define the subject for the mail when everything was fine.

$ErrRecipient: This will be the email address where you want to send the mail with the error.

$strErrSubject: Define the subject for the error mail

$strErrBody: Small body to notify something went wrong along the way.

Note: due to my issues in my customers environment with encoding I’ve used a command line mail utility which I’ve used quite often and is platform independent: blat. It’s a lightweight mail utility which can be downloaded here: Blat Download

More info on Blat can be found here: Blat Info

The install + config info for blat on the RMS is at the end of the blog post.

Last but not least I’m writing an event in the event log for successful and unsuccessful script runs. This can be used to set up alerting in SCOM to give you a quick warning when the ticketing is not working anymore.

At the end we unload the snap-in to have a clean system and avoid error messages when running the script the next time:

create_ticket04

Things which need to be in place to use this script

In order to use this script some things need to be configured in your SCOM environment + on your RMS:

  • The script needs to run on your RMS
  • The powerShell execution policy on your RMS needs to set to RemoteSigned or Unrestricted. More info here Execution policy Powershell
  • Blat needs to be installed on your RMS
  • The Notification state needs to be added to your SCOM environment: Check here to create
  • The command channel notification + subscribers need to be configured: Check here to create 

Install Blat:

  • Download blat here Blat Download
  • Extract the Archive to your %System%\windows\system32 folder to include it in the path
  • open a command prompt (just to be safe open an elevated one on win2k8)
  • Install blat by using the command: blat –install <your smtp server here> <the sender you would like to use> <Number of retries to send the mail out if unsuccessful>

Any tips or hints on improving this script are always welcome…

SCOM 2007: Setup Command Notification Channel + Subscriber

Sometimes it’s necessary to launch a custom script or other action after an alert is detected. This can be all executable scripts or programs.

In my particular case I’m using this to launch scripts when an alert is detected to properly escalate the alert and perform additional tasks on the alert.

So how do you make sure that the script you intend to run will actually run when a predefined alert is raised?

By creating a Command notification channel and subscription…

Let’s start with setting up the command notification channel.

Note: I’m using my script Create_Ticket.Ps1 as documented here. The parameters I’m passing are useful for this script but you can pass many more parameters according to your needs.

First of all open the Notification Channels by opening the SCOM console > administration > Notifications > Channels

notification_channel

Right click in the Right pane > choose New > Command…

notification_channel3

In the settings tab you need to fill in what you prefer to run:

  • Full Path of the command file: In my case this is PowerShell as I would like to run a PowerShell script
  • Command Line Parameters: In my case I’m running a PowerShell script and I’m passing the AlertID of the specific alert as an argument which I’m using in my script. Again you can use any arguments here if you like.
  • Startup folder for the command line: This is basically the path of your program you want to run.

Click Finish.

notification_channel4

At this point your Command Notification Channel is set up. The next thing you need to configure is the trigger which will run this Command Notification Channel. This is done by creating a Subscriber:

Open the Scom console and navigate to Administration > notifications > Subscribers

notification_channel5

Right click in the right pane and choose New…

notification_channel6

Fill in a name for the Subscriber

notification_channel7

Leave the “always send notifications” or specify a time window (ex. during business hours only) and click next.

notification_channel8

Click Add to ad a subscriber address to the list. The following window appears:

notification_channel9

Fill in the address name and click next

notification_channel10

  • Channel Type: Select Command in the drop down list
  • Command Channel: Select the previously created Channel in this case it’s “Ticket” from the drop down list.
  • Click Next

notification_channel11

Leave the always send notifications setting or change according to your needs.

notification_channel12

Click Finish and you have configured your Command to run whenever you subscribe to an alert with this channel.

SCOM 2007: Create custom Alert Resolution States

Sometimes it’s useful to make your own Custom Alert Resolution States to further classify your alerts in the console and use these states to trigger different actions using various scripts.

I’ll be posting some scripts which are going to use this custom alert resolution state so therefore I’m documenting here how to configure them.

Open your SCOM console, select the administration tab, settings and alerts.

Custom_alert_states

Click new…

Custom_alert_states1

Type in the Resolution State display name and choose a uniqueID. Click OK.

Custom_alert_states2

And we are done.

Not much to it but it makes live a little easier when you want to classify different alerts.

In the next series of blogs I’ll be frequently using this Custom Alert Resolution States to classify and report on different types of alerts.

SCOM 2007: installation bypassing the prerequisite checker

Most of the time the prerequisite checker when installing SCOM 2007 is right that there’s a prerequisite not met to install the specific role or specific item of SCOM 2007.

However If you are 100% sure everything is there you can bypass the prerequisite checker by running the install with the following command:

MSIEXEC /i <path>\MOM.msi /qn /l*v D:\logs\MOMUpgrade.log PREREQ_COMPLETED=1

This is however NOT supported by Microsoft.

Note: in Windows Server 2008 always run commands in an elevated prompt.

This should be your last resort to get things going. Most of the time there’s indeed a prerequisite not met and therefore the checker is right.

If you want to double check your prerequisites you can find them here:

http://technet.microsoft.com/en-us/library/bb309428.aspx

A known issue with the prerequisites is that ASP.Net is not correctly detected. More info here: http://support.microsoft.com/kb/934759

SCOM: Moving the Opsdb Datawarehouse to another drive

Recently I got a question of a customer to move the Opsdb Datawarehouse (DW) to another drive because the disk on which it was originally installed was not big enough. In fact they wanted to move the DW to an iscsi disk to boost performance.

To verify whether there would be an issue or it would be a straight forward move I did some browsing on the biggest manual out there… The internet!

However all that came up were actually moves from one server to another but not from one drive to another on the same server…

I did some testing in my lab and thought I ‘d share the outcome with you.

First of all this is your DW you are tempering about. Make sure you have proper backups of your db and read the entire blog before proceeding. Just to be on the safe side. It would be a shame that you lost all your data older than 8 days (if this is your grooming setting) because of a bad manipulation.

Ok enough said. Let’s get things started.

These are the steps I followed and in my case everything went smoothly without any problems.

First of all (again) take backups of your dbase and secondly plan a SCOM down time. To be absolutely sure that there’s no interference or blocking of the DW dbase you need to shutdown your RMS, any MS and GW servers in your environment (or at least in the management group of which the DW is part of). Some sources just drop the connections to the dbase which is an option as well but I prefer the first option. In my opinion it’s safer to do it like this.

Connect to the SQL server where your DW and open up the Microsoft SQL Server Management Studio:

scom_db_move01

Open up the connection to your DW. In my case it is residing on my VSERVER05.

Again better safe than sorry. Backing up!

scom_db_move02

The DW can be very big so it could be that it needs some time to perform the backup. When it’s finished.

At this point shutdown your environment. This means RMS, MS and GW’s. This sounds like a draconic measure but it ensures that your environment is completely shutdown and no queries are made to the dbase.

When this is done we can proceed to move the dbase

Take the DW offline by right clicking it and choosing “Take Offline”

scom_db_move04

A small dialog will popup and eventually of all goes well it will tell you the dbase is offline successfully. Notice the red arrow on the DW dbase.

Now take the ReportServer$OpSDBDW and ReportServer$OPSDBDWTempDB offline as well. Note that these dbases can have a different name in your environment or could not be present.

Note: My OpsdbDW is installed in a separate SQL instance. Be cautious with restarting your SQL service as this impacts all dbases under this instance.

When all the dbases are down they can be detached. This is done by right clicking the dbase > tasks > “detach”.

scom_db_move05

Choose the option to drop the connections to the dbase and hit OK.

Now we can copy (yes copy) the data. Again better safe than sorry and make a copy of the data rather than moving it.

After the copy has been done we are going to attach the copied DW to the SQL

Right click Databases and click Attach:

scom_db_move06

Select your dbase and attach:

scom_db_move07

In this case I’m moving my DW from E: to F: drive.

scom_db_move08

NOTE: It’s not automatically selecting the correct log file. Make sure you select it manually by clicking on the icon behind the path in the lower section.

When the attach is completed successfully you will dbases are moved to your new drive.

Start your SCOM environment again by starting your RMS first and then your MS and or GW servers you might have.

Just to be on the safe side verify whether you’re able to generate a report in the reporting view of your console with data older than 7 days (when your grooming settings are different you need to modify this to make sure you have a report with data older than your grooming setting.

If all goes well you now have successfully moved your dbase to another drive and you are free to delete the initial copy on your old location.

Preparing SCOM for cross platform monitoring

Today at a customer I came across a problem with cross platform monitoring.

They had several Linux servers running with RedHat distro. They installed the Linux monitoring pack for cross platform monitoring their Linux environment.

They installed all the agents on the Linux servers but did not configure the proper action accounts to perform the discovery and monitoring.

To give my client some documentation how to perform these actions I came across this article on the Microsoft website.

http://technet.microsoft.com/en-us/library/dd788981.aspx

The instructions however are outdated with SCOM 2007 R2 so I’ll document them below.

First things first.

If you notice these events in the Operations Manager Eventlog:

Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 1107
Date: 11/24/2008
Time: 2:18:03 PM
User: N/A
Computer: RMS_SERVER
Description:
Account for RunAs profile in workflow “Microsoft.Linux.RedHat.Computer.Discovery”, running for instance “Linux_server_name” with id”{384D2415-A49D-4002-768B-51D8D2EDBDD*}’ is not defined. Workflow will not be loaded. Please associate an account with the profile. Management group “group_name”

This most likely will indicate an issue with the run as accounts to connect to your Linux environment.

Following the article above at some point it’s outdated so here’s the proper way with some more clear instructions and some extra info I’ve learned in the field while configuring it for my customer.

Outlined steps:

  1. Open the Operations console with an account that is a member of the Operations Manager 2007 R2 Administrators profile.

  2. Select the Administration view.

  3. In the navigation pane under Run As Configuration, select Profiles.scom1

  4. In the results pane, double-click the UNIX Action Account, or UNIX Privileged Account. You need to create both.

  5. Click next on the first page. This is the overview page. Nothing can be changed here.

  6. scom2

  7. Click Add to create the action account which we are going to link to the UNIX Action Account.scom3
  8. In the next screen you need to select which user you are going to use as an action account on the Unix / Linux system. This screen consists out of 2 portions. The upper portion which is used to define the user and the bottom portion which will be defining the target.  scom4

  9. Select the Run As account by selecting the drop down list or create a new one. In this case we’ll create a new one. Click new…

  10. Click next on the welcome screen to proceed in creating the account:scom5

  11. The next screen you need to fill in the type of the account and the desired display name in SCOM. In this case we’re going to use the basic authentication type and we’ll name the user “UNIX Action Account” as shown below:scom6

  12. Click next and in the next screen fill in the credentials which have access to the Unix / Linux machine. In this example I’ve used the Root account. This can be any account with the proper access rights on your Unix / Linux server.scom7

  13. Click Next. The next thing you need to select is whether you want to manually select the targets where this action account will be targeted against or if you want to target it to all computers (which is less secure because all the admins on those machines can see the username and password). In this example we’ll choose the more secure way. scom8

  14. Click Create and on the following screen click close. It’s actually telling you that this first step is not enough but you have to associate it to a profile which will be done in the following step. Click Close.scom9

  15. Now we’re back at our 2 portioned screen. The top portion is filled in with the newly created user. So the next step will be to target it.scom10

  16. Select the “A Selected class, Group or object field and click the select button. A little selection list will pop up. In this example we chose to target the action account to a class…scom12

  17. The class selected for this example is Unix Computer. You have to see what’s manageable for your environment. Another approach is to target the run as account to Linux Computer group or specific Linux Objects.scom13

  18. Click OK. Now you’re back at the 2 portioned screen with the 2 sections filled in. Hit OK at this point. scom14

  19. Click save on the next screen.scom15

  20. Because we’ve chosen to manually select the computers we want to target the newly created action account the following screen will appear to do so.scom16

  21. Click on the User Action Account hyperlink to go to the settings page of the User Action Account. scom17

  22. In this example I’ve added the VSERVER07 to the list and clicked ok.

Normally now all your Linux servers should become discovered and the 1107 events should disappear. In the environment I had to manually close the events on the RMS queue and it also came back to a healthy state.

It’s probably a good thing to create a notification of these 1107 events to make sure you don’t miss any of these alerts as they are easy to miss but have a great impact on the monitoring of the Linux servers as they are not monitored when these events come up.

You need to repeat all the steps to create also a UNIX Privileged user to perform tasks which need more elevated rights.

After this the Linux servers status went from unmonitored to monitored and all the components were detected successfully.

SCOM: #Exchange 2010 SP1 MP is here

exchange2010Today the updated management pack for Exchange 2010 with support for SP1 is published. It can be downloaded from the MS Download site:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7150bfed-64a4-42a4-97a2-07048cca5d23

The new version is: 14.02.0071.0

Be sure to download also the explanatory doc which holds all the changes to this management pack. Some great info in there!

Download the correct file from the site:

  • Exchange2010ManagementPackForOpsMgr2007-EN-i386.msi
  • Exchange2010ManagementPackForOpsMgr2007-EN-x64.msi

This is not a standard straight forward management pack but requires to install a Exchange Correlation Engine.

This Correlation engine is basically a windows service which uses the Operations Manager SDK to first retrieve the health model and then process the stat change events. The correlation engine is capable of checking the health status before raising an alert. This significantly reduces the alerts generated as the engine is logically looking at the relationship between the alerts and closing them when they are caused by other alerts which were caused by the underlying issue.

The correlation engine is by default enabled. Be cautious when you are using helpdesk tools which don’t like events to be closed automatically.

 

Changes in This Update

The Exchange 2010 SP1 version of the Exchange 2010 Management Pack includes significant improvements beyond those included in the RTM version of the Exchange 2010 Management Pack. The following list includes some of the new features and updates:

  • Capacity planning and performance reports   New reports dig deep into the performance of individual servers and provide detailed information about how much capacity is used in each site.
  • SMTP and remote PowerShell availability report   The management pack now includes two new availability reports for SMTP client connections and management end points.
  • New Test-SMTPConnectivity synthetic transaction   In addition to the inbound mail connectivity tasks for protocols such as Outlook Web App, Outlook, IMAP, POP, and Exchange ActiveSync, the Management Pack now includes SMTP-connectivity monitoring for outbound mail from IMAP and POP clients. For information about how to enable this feature, see Optional Configurations.
  • New Test-ECPConnectivity view   Views for the Exchange Control Panel test task are now included in the monitoring tree.
  • Cross-premises mail flow monitoring and reporting   The Management Pack includes new mail flow monitoring and reporting capabilities for customers who use our hosted service.
  • Improved Content Indexing and Mailbox Disk Space monitoring   New scripts have been created to better monitor context indexing and mailbox disk space. These new scripts enable automatic repair of indexes and more accurately report of disk space issues.
  • The ability to disable Automatic Alert Resolution in environments that include OpsMgr connectors   When you disable Automatic Alert Resolution, the Correlation Engine won’t automatically resolve alerts. This lets you use your support ticketing system to manage your environment. For information about how to disable this feature, see Optional Configurations.
  • Several other updates and improvements were also added to this version of the Management Pack, including the following.
  • · Suppression of alerts when the alerts only occur occasionally was added to many monitors.
  • · Most of the event monitors in the Exchange 2010 Management Pack are automatically reset by the Correlation Engine. Automatic reset was added to those event monitors so that issues aren’t missed the next time they occur. For a list of the event monitors that are not reset automatically, see Understanding Alert Correlation.
  • · Monitoring was added for processes that crash repeatedly.
  • · Additional performance monitoring was added for Outlook Web App.
  • · Monitoring of Active Directory access was improved.
  • · Monitoring of anonymous calendar sharing was added.
  • · Reliability of database offline alerts was improved.
  • · Monitoring for the database engine (ESE) was added.

I’ll be playing with this MP shortly and post my findings.

Source: Exchange Server 2010 Management Pack Guide.doc

First insight into SCOM 2012: What’s up next…

There were actually quite some sessions which gave a good preview of the SCOM2012 version which is pre-beta now and will become RTM by the end of 2012.

Until then more and more features will be communicated.

One of the most interesting features actually is that SCOM 2012 will tackle one of the biggest nightmares of all SCOM admins: the SPOF which is called RMS. All SCOM admins will have to admit that at one point or another they faced problems with their RMS which was acting up funny. In SCOM 2007 you are only allowed to run one RMS which is actually a MS which has the Root MS role. The SDK service can only and exclusively run on this machine making it the hart of your SCOM environment.

Your environment is highly impacted when your RMS is down.

The consciences:

  • You cannot perform any admin tasks
  • All consoles (including web) connect to RMS and will not open
  • Product connectors depend on RMS and therefore they can not get info out of SCOM.
  • Subscriptions depend on RMS therefore you will not have notifications.
RMS becomes “Management Pool”

RMS_gone

Fortunately this is tackled in SCOM 2012 by a new organization of the management servers. RMS which was introduced in SCOM 2007 will be history. In fact all the management servers (MS) will be automatically joined to a management pool which will all have the SDK service running. Because all the MS have the SDK service running they can all perform the task of the old RMS

This has some nice advantages:

  • MS can easily be added and removed because there’s a automatic failover between all the MS which are in the management pool
  • There’s no need for clustering in the management servers any more to assure high availability of the RMS.
  • High availability is now available out of the box!
  • MS share the workload over the entire management pool.

The management pool is automatically created when you install the first MS and it will automatically add all MS’s which are installed afterwards.

Pretty cool feature if you ask me Smile

New network monitoring features

infoblox-microsoft-scom-integrationSeems like Microsoft Really beefed up the network monitoring features. There’s a complete new way of discovering new devices which are in your environment. A nice cool feature is the map which is drawn of your network. You can also check which components are in the vicinity of the troublesome device so this can be very helpful in case of a faulty device.

  • The network will be drawn in nice topology maps and the monitoring will have some cool gauges / dashboards to make network monitoring much more clean and sleek.
  • MSFT plans to support roughly 90, IIRC, vendors out of the box so not much customization needed and more time to tackle the real day to day issues!
  • Monitoring includes all the small bits and bolts of your network like Network Port monitoring, memory counters, VLAN health, HSRP health, connection health at end points

Enough talk, let’s build
Something together.