Updated: 20 August 2020
A PowerShell script to monitor various aspects of a Windows Server OSThis version works on the concept of plugins. There are three types of plugins:
- Providers that provide or collect information about issues from various sources.
- Aggregators that try to summarize the information
- Loggers that do report the information in some form.
All three plugin types are implemented as 'support files' and have to be in the same directory as the main script.
The script first executes all providers. Any errors or warnings are stored in an internal data structure.
Then the aggregators try to eliminate redundant data.
Finally every logger is executed, each logger has access to the data and can use it however it wants.
- Usage
- Loggers
- Providers
- Aggregators
- Write your own provider
- Tips and Tricks
- Software Requirements
- Required User Permissions
- Download
Copy all the files into an empty directory. Copy 'example.xml' to 'ServerMonitor.xml' and open it in a text editor
and adjust the settings.
Open a PowerShell console and navigate into your ServerMonitor directory:
.\ServerMonitor.ps1This executes all the providers and their checks as specified in the xml file.
It then logs the found issues to all the loggers specified in the config file.
If scripts are not enabled on your server, enable them:
set-executionpolicy remotesigned
You would usually schedule the execution of ServerMonitor with Windows Task Scheduler. I run it every 2 hours.
For more help use:
help .\ServerMonitor.ps1 -full
To show a bit more what it going on during the execution use:
help .\ServerMonitor.ps1 -verbose
After checking issues, ServerMonitor can log these issues using various Loggers. Each logger has to be implemented in a file sml*.ps1 in the same directory as ServerMonitor.ps1. It has to have one function named LogTo* where the variable part must be the same as the variable part in the file name. So a file smlEmail.ps1 must have a function LogToEmail
If the logger file is present, that function gets called. So if you don't want to use the logger just remove the file.
Your config file should always have a 'loggers' node. Each logger has a sub-node and on that an 'enabled' attribute. If the sub-node is missing or enabled is not 'true' the logger is not used.
<loggers> <console enabled="true" /> </loggers>You can also specify a -LogToConsole switch as a parameter to the script to enable this ad hoc, even if it is disabled in the config file.
<loggers> <file enabled="true" base="C:\mylogs\servermonitor_" /> </loggers>
<loggers> <database enabled="true" connectionstring="Server=.;Database=master;Integrated Security=True;" /> </loggers>
<loggers> <email enabled="true" recipient="me@gmail.com" recipientcc="webmaster@foo.net" host="mail.myserver.com" sender="servermonitor@mydomain.com" subject="Server Monitor Issues on %computername%" user="ausername" password="apassword" html="true" /> </loggers>
Out of the box we have the following providers:
- DiskSpace
- WinEvents
- HyperV
- Services
- WinUpdate
- SQLCount
- FileCount
- IIS
- FileVersion
- UserCount
- EventLog
- Process Working Set Size
- Windows Feature Count
- Folder Size
- File Age
- Certificates
The usage of providers is configured in the XML configuration file.
Each provider has its own node with its name. Inside the node
multiple 'check' nodes describe what the provider should do.
If you never want to use a certain provider, just delete the file for it from the directory.
If a disk has less free space in percent than a warning is created.
If the disk space is less than half of the specified value, an error is created instead.
<diskspace> <check drive="*" min="30" /> </diskspace>Checks for a minimum disk space of 30% on all fixed drives.
<diskspace> <check drive="C" min="50" /> <check drive="F" min="20" /> </diskspace>Checks for a minimum disk space of 50% on drive C: and 20% on drive F:
<foldersize> <check path="C:\temp" max="100MB" /> <check path="C:\MyApps" max="2GB" /> </foldersize>You specify the path and the max total size, you can use KB, MB, GB or just bytes.
Check the Windows event logs. Both the traditional logs 'System, Application and Security' and the newer
'Applicatios and Services' logs can be checked.
Every time the script runs, it stores the 'LastCheck' time in a file, and the next time it only looks for
events after that time. You can override the LastCheck with the '-LastCheck' parameter.
You can filter the events because usually some of them are not that interesting, and some other ones you have decided to ignore.
<winevents> <check log="Application" types="error" sources="!Microsoft-Windows-LoadPerf" ids="" /> <check log="Microsoft-Windows-TaskScheduler/Operational" types="error,warning" sources="" ids="!322,1024" /> </winevents>We define two checks: The first one looks in the application log for 'error' events but excludes all events with the source 'Microsoft-Windows-LoadPerf'. The bang character at the beginning tells the provider to exclude the source, without it, it would only look in 'Microsoft-Windows-LoadPerf' for events.
The second check looks for errors and warnings of the task scheduler but ignores all events with an ID of 322 or 1024.
<winevents> <check log="System" types="error" sources="Schannel,NetBT" ids="436" /> </winevents>Only find errors in the system log for sources 'schannel' or 'NetBT' and only if the id is 436
<winevents> <check log="Security" types="AuditFailure" sources="" ids="" /> </winevents>The security log is a bit special, it doesn't have Errors or Warnings, but 'AuditSuccess' and 'AuditFailure' events. Here you can specify only on type. We support a third type 'AuditAdmin' which covers events like 'Log cleared'.
<winevents> <check log="*" types="error" sources="" ids="" /> <check log="!WMI" types="warning" sources="!NTFS,disk," ids="" /> </winevents>Here are two special cases. Provide a wildcard star '*' for the log value means 'search all available logs' (When running the script as an administrator you can access to a few extra logs). In this case we want all errors in all logs.
What if we want all logs except that one stupid log that is full of errors. Start the log value with a bang to say exclude the following, but do all other logs. The string after the bang is a regular expression, so in this case we exclude all logs that have 'WMI' in their name. We also exclude all sources that match NTFS or disk.
In Version 3.5 we introduced ignore-filters. These are applied after events are found, but before they are logged. This allows you to exclude very specific events from logging.
Here is an example:
<winevents> <check log="Security" types="AuditFailure" sources=""/> <filters> <ignore enabled="true" id="5555" source="Security-Auditing" text="johndow"> this happens because old Joe always gets his password wrong! </ignore> <ignore enabled="true" id="5551|5558" source="." text="."> ignore all these events, regardless </ignore> </filters> </winevents>Let's look at this in detail. Under winevents you can define a new node <filters> on the same level as <check>
Inside filters you can define an unlimited number of <ignore> nodes.
- enabled = has to be 'true' to use this filter
- id = a regular expression to match the EventId to exclude
- source = a regular expression to match the source to exclude
- text = a regular expression to match the content of the event to exclude
The text of the ignore node can be used to describe why this event should be excluded, it is not used by the script itself.
<hyperv> <check name="web04"></check> <check name="sql01"></check> </hyperv>Checking the machine 'web04' and 'sql01', these are the names in Hyper-V, the actual host names may be different.
<services> <check name="w3svc" /> <check name="workstation" /> <check name="msssqlserver" /> </services>Use the internal service name, not the display name.
<winupdate frequency="0" />The value for 'frequency' can be either 0 = don't actually check, 1 = check once a day (assuming ServerMonitor runs at least once a day) and 2 = check every time ServerMonitor runs.
<sqlcount connectionstring="Server=.;Database=master;Integrated Security=True;"> <check name="SimpleRecoveryModel" count="3" comparer="eq" > SELECT COUNT(*) FROM master.sys.databases WHERE recovery_model = 3 </check> <check name="Logins" count="10" comparer="lt" > SELECT COUNT(*) FROM master.sys.server_principals </check> </sqlcount>The first check makes sure that there are only three databases with a simple recovery model.
The second one checks whether there are more than 10 logins on the SQL-server
The name attribute is just for identifying the issue in the logs. The count attribute specified the expected number of records. The comparer could be 'eq', 'lt' or 'gt' so we can check for an exact number or for less or more than the number.
The SQL statement itself can be complex but should always return a single integer. You don't have to use a COUNT(*) all the time.
The 'connectionstring' attribute points to the server to check. Make sure that the user who executes ServerMonitor has at least read permissions for the tables you are using. You should always specify the 'master' database in the connection string and then use the fully qualified table name in your statements.
<filecount> <check name="AppHost" count="1" comparer="eq" filter="applicationHost\.config" folder="%windir%\System32\inetsrv\config\" /> <check name="Several log files" count="3" comparer="gt" filter="log$" folder="E:\logs\" /> <check name="Not too many temp files" count="200" comparer="lt" filter=".+" folder="Q:\temp\" /> </filecount>Again the name is just for logging. The count and comparer values are similar to the SQL count provider. The filter is is regular expression the file names have to match. The folder is the full path to look in.
Check 1 looks for the applicationHost.config in its native home. It uses a global environment variable.
Check 2 makes sure there are at least 3 *.log files in the logs directory
Check 3 creates a warning if there are more than 200 files in the temp directory
<iis> <check type="apppool" uri="Live"></check> <check type="site" uri="mySite"></check> <check type="site" uri="yourSite"></check> <check type="http" uri="http://www.mySite.com" pattern="My cool site"></check> <check type="http" uri="http://www.yourSite.net/welcome.php" pattern="Welcome to this site"></check> </iis>The type='apppool' checks make sure the specified application pool is running
The type='site' checks make sure that the specified iis web site is running
The type='http' downloads the page from the specified uri and looks for the specified string, if not found a warning will be created.
Compares the versions of Windows executables and DLLs in two directories. This is help if you have you files in various location and want to make sure there are up to date.
<fileversion> <check reference="C:\bin\myfiles\" target="C:\bin\otherfiles\" pattern=".+(dll|exe)$" reportmissing="0" compare="version" /> </fileversion>You specify a reference location, this is were you apply your updates and have your latest versions.
The 'target' is a directory with possible older versions of the files.
The pattern allow you to specify which file to check. If 'reportmissing' is '1' and warning is created for every file that is missing. If both files exist a warning is created if the version is different.
The 'compare' attribute defines how to compare the two files. Possible values are version,size,date,content. However currently only 'version' is implemented.
Counts the active user accounts on the local machine.
<usercount> <check name="all Users" count="7" comparer="lt" group="*" /> <check name="Admins" count="1" comparer="eq" group="S-1-5-32-544" /> <check name="Users" count="6" comparer="lt" group="users" /> </usercount>
As usual the name is just for the logs, the count and compare attributes work the same way as for other provider.
The group defines the Windows group to count the members for. '*' is a special value for all active users on the machine
Then name of the group has to be in the language of your system. For well known groups you can use the sid instead:
'S-1-5-32-544' for administrators or 'S-1-5-32-545' for users.
The WinEvents provider above only works on Windows Vista or newer. To run on XP and Server 2003
you can use the older EventLog provider. It only searches this older Windows NT event logs.
<eventlog EventLogNameExpression="application|system" EventSourceExpression="." EventTypesToIncludeExpression="warning|error" EventIdsToExcludeExpression="xxx" />
There are no 'check' nodes, just a few attributes which are all regular expressions.
Here we look into the Application and System logs, for any sources or type warning or error.
The 'xxx' is just a dummy for not excluding any IDs, you could specify the numeric IDs like '256|296|65932'.
<processworkingset> <check name="w3wp" threshold="200MB" /> <check name="sqlservr" threshold="2GB" /> </processworkingset>You specify a name of the process, which is actually a regular expression, so can match processes with similar but not equal names. The threshold is the maximum allowed working set size, use MB or GB to avoid big numbers.
<winfeatures> <check name="countInstalled" type="count" count="62" comparer="eq" /> </winfeatures>The 'type' should always be 'count' (for now), the 'comparer' can be 'eq','lt' or 'gt' and 'count' is the number of features to compare
<foldersize> <check path="C:\temp" max="100MB" /> </foldersize>Specify the path to the directory and the maximum of allowed bytes, you can use KB,MB or GB to make it easier. If the folder has a total size greater than what you specified, a warning will be logged.
<fileage> <check maxage="1440" folder="%USERPROFILE%\Documents\logs" recurse="false" filespec="mylog\.txt" /> </fileage>This example checks for files in the specified folder, but only named 'mylog.txt' (Regex), it doesn't look into sub-directories (recurse=false), and will create an alert if the file is older than 1440 minutes (1 day). This can be used to make certain files are updated in the last x minutes.
<certificates helper="C:\tools\sigcheck64.exe" store="machine" > <allow thumbprint="C6C2...ABCDE55381" remark="MyCA" /> </certificates>This provider checks for any certificate that is not rooted in the trusted Microsoft root certificate list. It will show root certificates added to you certificate store by third party applications or yourself. You should review these root certificates. If you decide they are okay, add their thumbprint as an allow node in the xml. If you you should consider deleting them.
ServerMonitor is using SysInternals sigcheck utility to perform this task. You need to download it (Version 2.52 or newer) and place in anywhere on your machine. Put the full file path to it in the 'helper' attribute.
The 'store' attribute can either be 'machine' or 'user' depending on which store you want to check.
Read more about this provider on my blog.
Aggregators are stored in sma*.ps1 files, one aggregator per file.
They run between the providers and the loggers to summarize the information collected by the providers.
Often providers collect information about the same underlaying problem. The same problem listed in the Windows eventlog occurrs every 30 seconds.
If you run ServerMonitor once an hour, you would get 1800 warnings for the same problem.
An aggregator allows you to specify the properties of the problem to aggregate all 1800 warnings into a single one.
There is currently only one aggregator, called Default, it groups all items with the same Logname,EventId,Source and EventType together and removes any duplicates.
It uses the info and DateTime from the first item, so any different information the removed items is not reported.
Multiple aggregators would be executed in order of their name, alphabetically sorted, each later aggregator can only work the the already aggregated data from it predecessors.
The following XML must be present in your config file to enable aggregators. You also need to enabled=true. The threshold for the default aggregator defines when to summarize
items. In this case if we have 4 warnings from the same Logname,EventId,Source and EventType, they are treated as individual items in the report, but if we had 5, then they would be summarized into a single item.
The infoprefix text will be added in front of the text information led by the number of occurrances.
<aggregators> <default enabled="true" threshold="5" infoprefix="similar alerts to this:" /> </aggregators>
Create a new file. The name has to start with 'smp' and end in '.ps1'. Put it into the ServerMonitor folder.
Say you've created 'smpTapeDrive.ps1', then you need to add a function 'CheckTapeDrive' in your file.
This function is called by the ServerMonitor script:
function CheckTapeDrive()
{
...
}
In there you can do your checks and call other functions.When you find an issue, report it using the ServerMonitor's 'AddItem' function:
AddItem -info 'my text' ` -Source 'my source' ` -EventId 500 ` -EventType 'Error' ` -TheTime (Get-Date) ` -LogName "TapeDrive Checker" ` -MachineName $env:ComputerNameYou can also rely on default values and just use:
AddItem -info 'my text' -Source 'my source'This creates a 'warning' with Id 1 for the local computer and the current time.
If an error occurred in your provider, you should log it using the same AddItem function.
As the Id use one of the following:
$smIdUserNotAdmin $smIdError $smIdUnexpectedResult $smIdItemNotRunning $smIdConfigFileNotFound $smIdConfigInvalid $smIdAttentionRequired
ShowInfo([string]$info)Displays text during runtime if the script was started with the -verbose switch.
DPApiDecrypt([string]$data)Decrypts a string that was encrypted in the LocalMachine scope with some well known extra entropy.
CheckForElevatedAdmin([string]$providerName)Checks whether the script is running under an elevated admin. Return $false if it doesn't. Also displays a warning if it doesn't, for that is uses the provided $providerName.
ExpandEnvironmentVariables([string]$data)Expands any environment variable like %USERNAME% in the data with their real value.
By default the script is looking for a config file in the same directory as itself
with the same name as itself but with an '.xml' extension.
If not found, it looks for a file computername.xml in its own directory.
You can also specify a different config file as the first parameter or using the -configfile option.
If you are testing your filters it makes sense to delete that file. Then you get the default 24 hours period to check.
But it is much easier to override the LastCheck on the command line:
ServerMonitor.ps1 ... -LastCheck "2012-10-15 06:00"
or if you always want to go a few days back:
ServerMonitor.ps1 ... -LastCheck ((Get-Date).AddDays(-3))
It is the only logger that logs some additional information about the execution time of the providers.
In this case you should set the 'StateName' parameter to be different for each instance of ServerMonitor.
Otherwise both instances use the same temp file to store the LastCheck value.
You first need to encrypt your secret:
ServerMonitor.ps1 -EncryptText "mySuperPa55w0rd"The script will display an encrypted version of your text. Copy that into the configuration file attribute.
The encrypted secret will be automatically decrypted on the same machine.
Any user with access to the machine and this script can decrypt the data. But your secrets are not in clear-text and backups of the configuration files only contain encrypted information.
- The text for some events comes from DLLs, it would be nice to figure out how to get to it.
- Enable execution on a remote server
- More error handling and testing
- Windows with PowerShell 4.0 or newer
If you are using a member of the administrators group and check 'Run with highest privileges' you should be fine.
If you want to run it as a normal user, you may not be able to perform all checks.
- In the WinEvents provider, access to the security log wont work, if you don't need that you're fine.
- Enumerating the status of services does not work for normal users when running in task scheduler. You can fix this by changing permissions on the service. See this blog post for how to do this.
- The IIS Provider will not work