SCOM, find stucked agents

Some times agents start to stop collecting data without stopping completely . this is often when client are overloaded or management servers are for a while unavailable. In my case most times disk space counters stopped working as on of the first roles. So check this counter frequently, like every week or after maintenance jobs.

It can be easy done by using a query towards ops db.

First create a view

CREATE VIEW [dbo].[50beansDiskFreeSpace]
AS
SELECT TOP (10) PERCENT bme.Path, ps.PerfmonInstanceName, pdav.SampleValue, pdav.TimeSampled, dbo.MaintenanceMode.IsInMaintenanceMode
FROM dbo.PerformanceDataAllView AS pdav WITH (NOLOCK) INNER JOIN
dbo.PerformanceSource AS ps WITH (NOLOCK) ON pdav.PerformanceSourceInternalId = ps.PerformanceSourceInternalId INNER JOIN
dbo.Rules AS r WITH (NOLOCK) ON ps.RuleId = r.RuleId INNER JOIN
dbo.BaseManagedEntity AS bme WITH (NOLOCK) ON ps.BaseManagedEntityId = bme.BaseManagedEntityId INNER JOIN
dbo.MaintenanceMode ON bme.BaseManagedEntityId = dbo.MaintenanceMode.BaseManagedEntityId
WHERE (r.RuleName LIKE N'%LogicalDisk.FreeMB%' OR
r.RuleName = N'Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.FreeSpaceMB') AND (pdav.TimeSampled =
(SELECT MAX(TimeSampled) AS Expr1
FROM dbo.PerformanceDataAllView
WHERE (PerformanceSourceInternalId = pdav.PerformanceSourceInternalId)))
ORDER BY pdav.TimeSampled
GO

Now run the view . On top you will find all disks in Maintenance mode. It is clear they did not collect any data within the last few minutes . After that follow all servers with a partly disabled agent. on the bottom you will find clients having an TimeSampled within an few minutes. They are working well.

This entry was posted in Fix IT, Information Technology and tagged . Bookmark the permalink.

Comments are closed.