Server Monitor - Peter Hahndorf

Updated: 20 August 2020

A PowerShell script to monitor various aspects of a Windows Server OS

This version works on the concept of plugins. There are three types of plugins:

Providers that provide or collect information about issues from various sources.
Aggregators that try to summarize the information
Loggers that do report the information in some form.

All three plugin types are implemented as 'support files' and have to be in the same directory as the main script.

The script first executes all providers. Any errors or warnings are stored in an internal data structure.

Then the aggregators try to eliminate redundant data.

Finally every logger is executed, each logger has access to the data and can use it however it wants.

Usage
Loggers
Providers
Aggregators
Write your own provider
Tips and Tricks
Software Requirements
Required User Permissions
Download

Usage

Copy all the files into an empty directory. Copy 'example.xml' to 'ServerMonitor.xml' and open it in a text editor and adjust the settings.
Open a PowerShell console and navigate into your ServerMonitor directory:

.\ServerMonitor.ps1

This executes all the providers and their checks as specified in the xml file.
It then logs the found issues to all the loggers specified in the config file.

If scripts are not enabled on your server, enable them:

set-executionpolicy remotesigned

You would usually schedule the execution of ServerMonitor with Windows Task Scheduler. I run it every 2 hours.

For more help use:

help .\ServerMonitor.ps1 -full

To show a bit more what it going on during the execution use:

help .\ServerMonitor.ps1 -verbose

Loggers

After checking issues, ServerMonitor can log these issues using various Loggers. Each logger has to be implemented in a file sml*.ps1 in the same directory as ServerMonitor.ps1. It has to have one function named LogTo* where the variable part must be the same as the variable part in the file name. So a file smlEmail.ps1 must have a function LogToEmail

If the logger file is present, that function gets called. So if you don't want to use the logger just remove the file.

Your config file should always have a 'loggers' node. Each logger has a sub-node and on that an 'enabled' attribute. If the sub-node is missing or enabled is not 'true' the logger is not used.

Console

Writes the issues to the console window.

<loggers>
  <console enabled="true" />
</loggers>

You can also specify a -LogToConsole switch as a parameter to the script to enable this ad hoc, even if it is disabled in the config file.

File

Writes the issues to a file. Set 'Base' to a existing path/filename such as C:\logs\servermonitor\sm The logger will append the current date to the file name and will create a new file every day.

<loggers>
  <file enabled="true" base="C:\mylogs\servermonitor_" />
</loggers>

Database

Writes the issues to a database table. Enable by setting 'connectionstring' to a full ado.net connection string You may encrypt the whole string in LocalMachine scope using DPAPI.

<loggers>
  <database enabled="true" connectionstring="Server=.;Database=master;Integrated Security=True;" />
</loggers>

Sends the issues to an email address. Enable by setting 'Recipient' You also need to set 'emailHost' and you should set 'emailSender'. The other attribute are optional.

<loggers>
  <email enabled="true" 
    recipient="me@gmail.com" 
    recipientcc="webmaster@foo.net" 
    host="mail.myserver.com" 
    sender="servermonitor@mydomain.com" 
    subject="Server Monitor Issues on %computername%" 
    user="ausername" 
    password="apassword" 
    html="true" />
</loggers>

Providers

Out of the box we have the following providers:

DiskSpace
WinEvents
HyperV
Services
WinUpdate
SQLCount
FileCount
IIS
FileVersion
UserCount
EventLog
Process Working Set Size
Windows Feature Count
Folder Size
File Age
Certificates

The usage of providers is configured in the XML configuration file.
Each provider has its own node with its name. Inside the node multiple 'check' nodes describe what the provider should do.
If you never want to use a certain provider, just delete the file for it from the directory.

DiskSpace

Checks hard disk space.
If a disk has less free space in percent than a warning is created.
If the disk space is less than half of the specified value, an error is created instead.

  <diskspace>
    <check drive="*" min="30" />
  </diskspace>

Checks for a minimum disk space of 30% on all fixed drives.

  <diskspace>
    <check drive="C" min="50" />
    <check drive="F" min="20" />
  </diskspace>

Checks for a minimum disk space of 50% on drive C: and 20% on drive F:

FolderSize

Checks if a specified folder on a local disk exceeded a total size limit.

  <foldersize>
    <check path="C:\temp" max="100MB" />
    <check path="C:\MyApps" max="2GB" />
  </foldersize>

You specify the path and the max total size, you can use KB, MB, GB or just bytes.

WinEvents

Check the Windows event logs. Both the traditional logs 'System, Application and Security' and the newer 'Applicatios and Services' logs can be checked.
Every time the script runs, it stores the 'LastCheck' time in a file, and the next time it only looks for events after that time. You can override the LastCheck with the '-LastCheck' parameter.
You can filter the events because usually some of them are not that interesting, and some other ones you have decided to ignore.

<winevents>
  <check log="Application" types="error" sources="!Microsoft-Windows-LoadPerf" ids="" />
  <check log="Microsoft-Windows-TaskScheduler/Operational" types="error,warning" sources="" ids="!322,1024" />
</winevents>

We define two checks: The first one looks in the application log for 'error' events but excludes all events with the source 'Microsoft-Windows-LoadPerf'. The bang character at the beginning tells the provider to exclude the source, without it, it would only look in 'Microsoft-Windows-LoadPerf' for events.
The second check looks for errors and warnings of the task scheduler but ignores all events with an ID of 322 or 1024.

<winevents>
  <check log="System" types="error" sources="Schannel,NetBT" ids="436" />
</winevents>

Only find errors in the system log for sources 'schannel' or 'NetBT' and only if the id is 436

<winevents>
   <check log="Security" types="AuditFailure" sources="" ids="" />
</winevents>

The security log is a bit special, it doesn't have Errors or Warnings, but 'AuditSuccess' and 'AuditFailure' events. Here you can specify only on type. We support a third type 'AuditAdmin' which covers events like 'Log cleared'.

<winevents>
   <check log="*" types="error"   sources="" ids="" />
   <check log="!WMI" types="warning" sources="!NTFS,disk," ids="" />
</winevents>

Here are two special cases. Provide a wildcard star '*' for the log value means 'search all available logs' (When running the script as an administrator you can access to a few extra logs). In this case we want all errors in all logs.
What if we want all logs except that one stupid log that is full of errors. Start the log value with a bang to say exclude the following, but do all other logs. The string after the bang is a regular expression, so in this case we exclude all logs that have 'WMI' in their name. We also exclude all sources that match NTFS or disk.

WinEvents Ignore Filters

Even though you can exclude certain events in the check nodes above it quickly becomes tricky to exclude certain events you know are safe to ignore. Ignoring a EventID in one log should not always ignore the same EventID in all other logs.
In Version 3.5 we introduced ignore-filters. These are applied after events are found, but before they are logged. This allows you to exclude very specific events from logging.
Here is an example:

<winevents>
  <check log="Security" types="AuditFailure" sources=""/>
  <filters>
    <ignore enabled="true" id="5555" source="Security-Auditing" text="johndow">
      this happens because old Joe always gets his password wrong!
    </ignore>
    <ignore enabled="true" id="5551|5558" source="." text=".">
      ignore all these events, regardless
    </ignore>
  </filters>
</winevents>

Let's look at this in detail. Under winevents you can define a new node <filters> on the same level as <check>
Inside filters you can define an unlimited number of <ignore> nodes.

enabled = has to be 'true' to use this filter
id = a regular expression to match the EventId to exclude
source = a regular expression to match the source to exclude
text = a regular expression to match the content of the event to exclude

All three regular expressions have to match to exclude the event, so if you are not certain use wildcards or even a '.' to match anything.
The text of the ignore node can be used to describe why this event should be excluded, it is not used by the script itself.

HyperV

A simple provider that checks whether certain VMs are running on a Hyper-V host:

<hyperv>
  <check name="web04"></check>
  <check name="sql01"></check>
</hyperv>

Checking the machine 'web04' and 'sql01', these are the names in Hyper-V, the actual host names may be different.

Services

Checks for running services on the machine:

<services>
  <check name="w3svc" />
  <check name="workstation" />
  <check name="msssqlserver" />
</services>

Use the internal service name, not the display name.

WinUpdate

Checks whether any critical or important Windows OS updates are available for the server.

<winupdate frequency="0" />

The value for 'frequency' can be either 0 = don't actually check, 1 = check once a day (assuming ServerMonitor runs at least once a day) and 2 = check every time ServerMonitor runs.

SQLCount

Checks for correct number of records in SQL-Server tables:

<sqlcount connectionstring="Server=.;Database=master;Integrated Security=True;">
  <check name="SimpleRecoveryModel" count="3" comparer="eq" >
    SELECT COUNT(*) FROM master.sys.databases WHERE recovery_model = 3
  </check>
  <check name="Logins" count="10" comparer="lt" >
    SELECT COUNT(*) FROM master.sys.server_principals
  </check>
</sqlcount>

The first check makes sure that there are only three databases with a simple recovery model.
The second one checks whether there are more than 10 logins on the SQL-server
The name attribute is just for identifying the issue in the logs. The count attribute specified the expected number of records. The comparer could be 'eq', 'lt' or 'gt' so we can check for an exact number or for less or more than the number.
The SQL statement itself can be complex but should always return a single integer. You don't have to use a COUNT(*) all the time.
The 'connectionstring' attribute points to the server to check. Make sure that the user who executes ServerMonitor has at least read permissions for the tables you are using. You should always specify the 'master' database in the connection string and then use the fully qualified table name in your statements.

FileCount

Counts files in the specified directory.

<filecount>
  <check name="AppHost" count="1" comparer="eq" filter="applicationHost\.config" folder="%windir%\System32\inetsrv\config\" />
  <check name="Several log files" count="3" comparer="gt" filter="log$" folder="E:\logs\" />
  <check name="Not too many temp files" count="200" comparer="lt" filter=".+" folder="Q:\temp\" />
</filecount>

Again the name is just for logging. The count and comparer values are similar to the SQL count provider. The filter is is regular expression the file names have to match. The folder is the full path to look in.
Check 1 looks for the applicationHost.config in its native home. It uses a global environment variable.
Check 2 makes sure there are at least 3 *.log files in the logs directory
Check 3 creates a warning if there are more than 200 files in the temp directory

IIS

Checks for ApplicationPool and sites as well as content on a hosted page.

<iis>
  <check type="apppool" uri="Live"></check>
  
  <check type="site" uri="mySite"></check>
  <check type="site" uri="yourSite"></check>
  
  <check type="http" uri="http://www.mySite.com"   pattern="My cool site"></check>
  <check type="http" uri="http://www.yourSite.net/welcome.php" pattern="Welcome to this site"></check>
</iis>

The type='apppool' checks make sure the specified application pool is running
The type='site' checks make sure that the specified iis web site is running
The type='http' downloads the page from the specified uri and looks for the specified string, if not found a warning will be created.

FileVersion

Compares the versions of Windows executables and DLLs in two directories. This is help if you have you files in various location and want to make sure there are up to date.

<fileversion>
  <check reference="C:\bin\myfiles\" target="C:\bin\otherfiles\" pattern=".+(dll|exe)$" reportmissing="0" compare="version" />
</fileversion>

You specify a reference location, this is were you apply your updates and have your latest versions.
The 'target' is a directory with possible older versions of the files.
The pattern allow you to specify which file to check. If 'reportmissing' is '1' and warning is created for every file that is missing. If both files exist a warning is created if the version is different.
The 'compare' attribute defines how to compare the two files. Possible values are version,size,date,content. However currently only 'version' is implemented.

UserCount

Counts the active user accounts on the local machine.

<usercount>
  <check name="all Users"  count="7" comparer="lt" group="*" />
  <check name="Admins"     count="1" comparer="eq" group="S-1-5-32-544" />
  <check name="Users"      count="6" comparer="lt" group="users" />
</usercount>

As usual the name is just for the logs, the count and compare attributes work the same way as for other provider.
The group defines the Windows group to count the members for. '*' is a special value for all active users on the machine
Then name of the group has to be in the language of your system. For well known groups you can use the sid instead: 'S-1-5-32-544' for administrators or 'S-1-5-32-545' for users.

EventLog

The WinEvents provider above only works on Windows Vista or newer. To run on XP and Server 2003 you can use the older EventLog provider. It only searches this older Windows NT event logs.

<eventlog 
  EventLogNameExpression="application|system" 
  EventSourceExpression="." 
  EventTypesToIncludeExpression="warning|error" 
  EventIdsToExcludeExpression="xxx" 
/>

There are no 'check' nodes, just a few attributes which are all regular expressions.
Here we look into the Application and System logs, for any sources or type warning or error. The 'xxx' is just a dummy for not excluding any IDs, you could specify the numeric IDs like '256|296|65932'.

Process Working Set Size

You may want to know when certain processes use too much memory, here you go:

<processworkingset>
  <check name="w3wp"     threshold="200MB" />
  <check name="sqlservr" threshold="2GB" />
</processworkingset>

You specify a name of the process, which is actually a regular expression, so can match processes with similar but not equal names. The threshold is the maximum allowed working set size, use MB or GB to avoid big numbers.

Windows Feature Count

Count the number of installed optional Windows Features:

  <winfeatures>
    <check name="countInstalled" type="count" count="62" comparer="eq" />
  </winfeatures>

The 'type' should always be 'count' (for now), the 'comparer' can be 'eq','lt' or 'gt' and 'count' is the number of features to compare

Folder Size

Checks the combined size of all files under one directory

  <foldersize>
    <check path="C:\temp" max="100MB" />
  </foldersize>

Specify the path to the directory and the maximum of allowed bytes, you can use KB,MB or GB to make it easier. If the folder has a total size greater than what you specified, a warning will be logged.

File Age

Checks the last modified date/time for files

  <fileage>
    <check maxage="1440" folder="%USERPROFILE%\Documents\logs" recurse="false" filespec="mylog\.txt" />
  </fileage>

This example checks for files in the specified folder, but only named 'mylog.txt' (Regex), it doesn't look into sub-directories (recurse=false), and will create an alert if the file is older than 1440 minutes (1 day). This can be used to make certain files are updated in the last x minutes.

Certificates

<certificates helper="C:\tools\sigcheck64.exe" store="machine" >
  <allow thumbprint="C6C2...ABCDE55381" remark="MyCA" /> 
</certificates>

This provider checks for any certificate that is not rooted in the trusted Microsoft root certificate list. It will show root certificates added to you certificate store by third party applications or yourself. You should review these root certificates. If you decide they are okay, add their thumbprint as an allow node in the xml. If you you should consider deleting them.
ServerMonitor is using SysInternals sigcheck utility to perform this task. You need to download it (Version 2.52 or newer) and place in anywhere on your machine. Put the full file path to it in the 'helper' attribute.
The 'store' attribute can either be 'machine' or 'user' depending on which store you want to check.

Aggregators

Aggregators are stored in sma*.ps1 files, one aggregator per file. They run between the providers and the loggers to summarize the information collected by the providers.
Often providers collect information about the same underlaying problem. The same problem listed in the Windows eventlog occurrs every 30 seconds. If you run ServerMonitor once an hour, you would get 1800 warnings for the same problem.
An aggregator allows you to specify the properties of the problem to aggregate all 1800 warnings into a single one.

There is currently only one aggregator, called Default, it groups all items with the same Logname,EventId,Source and EventType together and removes any duplicates.
It uses the info and DateTime from the first item, so any different information the removed items is not reported.

Multiple aggregators would be executed in order of their name, alphabetically sorted, each later aggregator can only work the the already aggregated data from it predecessors.

The following XML must be present in your config file to enable aggregators. You also need to enabled=true. The threshold for the default aggregator defines when to summarize items. In this case if we have 4 warnings from the same Logname,EventId,Source and EventType, they are treated as individual items in the report, but if we had 5, then they would be summarized into a single item.
The infoprefix text will be added in front of the text information led by the number of occurrances.

<aggregators>
  <default enabled="true" threshold="5" infoprefix="similar alerts to this:" />
</aggregators>

Write your own provider

Create a new file. The name has to start with 'smp' and end in '.ps1'. Put it into the ServerMonitor folder.
Say you've created 'smpTapeDrive.ps1', then you need to add a function 'CheckTapeDrive' in your file.
This function is called by the ServerMonitor script:

function CheckTapeDrive()
{
  ...
}

In there you can do your checks and call other functions.
When you find an issue, report it using the ServerMonitor's 'AddItem' function:

  AddItem -info 'my text' `
  -Source 'my source' `
  -EventId 500 `
  -EventType 'Error' `
  -TheTime (Get-Date) `
  -LogName "TapeDrive Checker" `
  -MachineName $env:ComputerName

You can also rely on default values and just use:

  AddItem -info 'my text' -Source 'my source'

This creates a 'warning' with Id 1 for the local computer and the current time.

If an error occurred in your provider, you should log it using the same AddItem function.
As the Id use one of the following:

$smIdUserNotAdmin
$smIdError
$smIdUnexpectedResult
$smIdItemNotRunning
$smIdConfigFileNotFound
$smIdConfigInvalid
$smIdAttentionRequired

Helper Functions

You can use the following functions in your code:

ShowInfo([string]$info)

Displays text during runtime if the script was started with the -verbose switch.

DPApiDecrypt([string]$data)

Decrypts a string that was encrypted in the LocalMachine scope with some well known extra entropy.

CheckForElevatedAdmin([string]$providerName)

Checks whether the script is running under an elevated admin. Return $false if it doesn't. Also displays a warning if it doesn't, for that is uses the provided $providerName.

ExpandEnvironmentVariables([string]$data)

Expands any environment variable like %USERNAME% in the data with their real value.

Tips and Tricks

Using a different config file

By default the script is looking for a config file in the same directory as itself with the same name as itself but with an '.xml' extension.
If not found, it looks for a file computername.xml in its own directory.
You can also specify a different config file as the first parameter or using the -configfile option.

Override LastCheck

The LastCheck time for the event log checks, is written to a temp file (usually %userprofile%\AppData\Local\Temp).
If you are testing your filters it makes sense to delete that file. Then you get the default 24 hours period to check.
But it is much easier to override the LastCheck on the command line:

ServerMonitor.ps1 ... -LastCheck "2012-10-15 06:00"

or if you always want to go a few days back:

ServerMonitor.ps1 ... -LastCheck ((Get-Date).AddDays(-3))

How long did it take?

If you want to know how long it took the various check to perform their duties, enable the file logger.
It is the only logger that logs some additional information about the execution time of the providers.

Multiple Server Monitors

Even though the config file is very flexible, sometime you want to schedule different ServerMonitors with different provider settings and loggers.
In this case you should set the 'StateName' parameter to be different for each instance of ServerMonitor.
Otherwise both instances use the same temp file to store the LastCheck value.

Protect your secrets (a little)

It is possible to encrypt two configuration values, the connectionstring attribute for the database logger and the password attribute for the email logger.
You first need to encrypt your secret:

ServerMonitor.ps1 -EncryptText "mySuperPa55w0rd"

The script will display an encrypted version of your text. Copy that into the configuration file attribute.
The encrypted secret will be automatically decrypted on the same machine.
Any user with access to the machine and this script can decrypt the data. But your secrets are not in clear-text and backups of the configuration files only contain encrypted information.

Open Issues

The text for some events comes from DLLs, it would be nice to figure out how to get to it.
Enable execution on a remote server
More error handling and testing

Software Requirements

Windows with PowerShell 4.0 or newer

Currently I have not tested this with PowerShell 6 or 7

Required User Permissions

It is very likely that you will schedule servermonitor.ps1 with Windows scheduled tasks. What user should it run under?
If you are using a member of the administrators group and check 'Run with highest privileges' you should be fine.
If you want to run it as a normal user, you may not be able to perform all checks.

In the WinEvents provider, access to the security log wont work, if you don't need that you're fine.
Enumerating the status of services does not work for normal users when running in task scheduler. You can fix this by changing permissions on the service. See this blog post for how to do this.
The IIS Provider will not work

Download

Get the files from GitHub

Server Monitor Vs.3.6

Pages in this section