Ever receive alerts from the SQL Agent?
They kinda suck. I mean it is nice that we have the option for the SQL Agent to tell us when a job fails or succeeds, but if you have ever looked at the message, there isn’t much detail in it.
Take a look at this (slightly edited) email I used to get from our systems:
To SQL ALert:
Date Dec 11th, 2016 at 2:46 AM
JOB RUN: ‘Job 001 – Load Sales’ was run on 12/11/2016 at 2:40:47 AM
DURATION: 0 hours, 5 minutes, 25 seconds
MESSAGES: The job failed. The Job was invoked by User Domain/User. The last step to run was step 3 (Load Fact – Prod Sales). The job was requested to start at step 1 (Extract Fact – Prod Sales).
Not very helpful. Sure, I know the job failed, and what step it failed on, but now I have to connect to the agent and look up the history to determine if this is something I have to worry about.
It would be nice to receive the details seen in the history of the job showing up in the email alert received.
Recently I have been working with systems that had all the alert on failure configured so we knew when things failed and could jump on re-running them if needed. We even had them showing up into a data team slack channel, so we had a history as well as notification to everyone on the team at the same time. The problem is that there were not any details in the alerts we received so we had to be able to connect and figure out what to do next or hope that our paid monitoring service would act on something after reading the details of the failure.
It started 10 years ago
Way back in 2007, Jameel Ahmed wrote an article for SQLServerPro on how to get more detailed error messages from failed SQL Agent jobs. I downloaded his code and adapted it some to what I have below. I cleaned out some parts I didn’t need, and it worked like a charm!
/****** Object: StoredProcedure [dbo].[spDBA_job_notification] Script Date: 2/3/2017 7:50:51 AM ******/
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
CREATE PROCEDURE [dbo].[spDBA_job_notification]
Purpose: SQL Job Agent does not send the error messages on failure,
so this procedure queries msdb to for the error message and sends an email.
usage: EXEC spDBA_job_notification $(ESCAPE_NONE(JOBID)) SQL2005 SP1 +
SET NOCOUNT ON;
DECLARE @Today DATETIME
DECLARE @Email_To NVARCHAR(100) = 'You@youremailhost.ext'
DECLARE @jobname sysname
,@retry BIT = 0;
SET @Body = '';
SET @CrLf = CHAR(10);
--constants for Job Status execution:
SET @stat_Failed = 0;
SET @stat_Succeeded = 1;
SET @stat_Retry = 2;
SET @stat_Canceled = 3;
SET @stat_In_progress = 4;
SET @Today = GETDATE();
DECLARE curFailedJobs CURSOR READ_ONLY FAST_FORWARD
,run_datetime = CONVERT(DATETIME, LEFT(run_date, 4) + '/'
+ SUBSTRING(run_date, 5, 2) + '/' + RIGHT(run_date, 2) + ' '
+ LEFT(run_time, 2) + ':' + SUBSTRING(run_time, 3, 2) + ':'
+ RIGHT(run_time, 2))
FROM msdb..sysjobs sj
JOIN ( SELECT instance_id
,run_date = CONVERT(VARCHAR(8), run_date)
,run_time = CASE WHEN LEN(CONVERT(VARCHAR(8), run_time)) = 5
+ CONVERT(VARCHAR(8), run_time)
ELSE CONVERT(VARCHAR(8), run_time)
) sjh ON sj.job_id = sjh.job_id
JOIN msdb..sysjobsteps sjs ON sjs.job_id = sjh.job_id
AND sjs.step_id = sjh.step_id
-- sjh_Min contains the most recent instance_id (an identity column) from where we should start checking for any failed status records.
-- to account for when there is are multiple log history
,instance_id = MAX(instance_id)
WHERE job_id = @jobid
AND step_id = 0
GROUP BY job_id
-- to account for when you run the job for the first time, there is no history, there will not be any records where the step_id=0.
,instance_id = MIN(instance_id)
WHERE job_id = @jobid
AND NOT EXISTS ( SELECT *
WHERE job_id = @jobid
AND step_id = 0 )
GROUP BY job_id
) sjh_Min ON sjh_Min.job_id = sj.job_id
AND sjh.instance_id > sjh_Min.instance_id -- we only want the most recent error message(s).
WHERE sj.job_id = @jobid
AND sjh.step_id <> 0 --exclude the job outcome step
AND sjh.run_status IN ( @stat_Failed ) --filter for only failed status
ORDER BY sjh.instance_id;
FETCH NEXT FROM curFailedJobs INTO @jobname, @stepname, @Err_severity,
@DBname, @run_datetime, @command, @ErrMessage;
WHILE @@fetch_status = 0
-- Build the Email Body
SET @Body = @Body + 'Step name= ' + @stepname + @CrLf
+ 'DB Name = ' + CONVERT(VARCHAR(50), ISNULL(@DBname, ''))
+ @CrLf + 'Run Date = ' + CONVERT(VARCHAR(50), @run_datetime)
IF ( @Err_severity <> 0 )
SET @Body = @Body + 'Severity = '
+ CONVERT(VARCHAR(10), @Err_severity) + @CrLf;
SET @Body = @Body + 'Error = ' + ISNULL(@ErrMessage, '')
+ @CrLf + @CrLf + 'Command = ' + ISNULL(@command, '') + @CrLf;
FETCH NEXT FROM curFailedJobs INTO @jobname, @stepname,
@Err_severity, @DBname, @run_datetime, @command, @ErrMessage;
-- Send the Email
IF ( RTRIM(@Body) <> '' )
SET @subject = 'ALERT: ' + @jobname + ' FAILED on \\'
SET @Body = -- 'Server= ' + @@servername + @CrLf +
'Job_name = ' + @jobname + @CrLf
+ '--------------------------------------' + @CrLf + @Body;
EXEC msdb.dbo.sp_send_dbmail @profile_name = 'DBMAILPROFILE',
@recipients = @Email_To, @subject = @subject, @body = @Body;
Once the code is in place, a slight change to the scheduled jobs needs to happen.
A new step has to be added to call this procedure, and all previous steps configured to call the final step on job failure.
What you have to do
This script uses the running job’s ID to pull info from the system databases to relay the detailed information.
When I was testing this by passing in values and running the procedure, it didn’t give any results, unless that job was actively running.
To make sure this worked nicely the following is how it is setup in the job as the final step of the job.
It is just a simple call to execute the stored procedure which I placed in the master database on this server. There are two parameters passed into the procedure. The first, $(ESCAPE_NONE(JOBID)), captures the current job ID that is called the procedure so the procedure can look up the error messages. The second parameter is something I have in my system for an upgraded proc. I’ll cover that in more depth in a later post. For now, you can just leave the ‘,2’ off completely.
The other thing I did for this new alert step was to configure it to fail on both success and failure. I did this so that when I look at the agent job activity monitor, I will still get a visual that the job failed. There isn’t much worse than hiding a failure behind a fake success message (sometimes when dealing with AAGs and jobs it works, but that’s a different post).
Then I went back to the job’s various steps and set their failure action to call the final step rather than just fail.
That’s it. Even though a little work is required to implement this method, it is a relatively straightforward and helpful solution.
Now I get a far more detailed alert that I can evaluate more efficiently and take actions on without having to dig through the Agent history for the job.
To SQL ALert:
Date Dec 11th at 2:49 AM
Job_name = Job 001 – Load Sales
Step name = Load Fact – Prod Sales
DB Name = master
Run Date = Dec 11 2016 2:49AM
Error = Executed as user: Domain\User. Microsoft (R) SQL Server Execute Package Utility Version 12.0.5000.0 for 64-bit Copyright (C) Microsoft Corporation. All rights reserved. Started: 2:48:53 AM Error: 2016-02-15 02:49:54.07 Code: 0xC001000E Source: Load Fact – Prod Sales
Description: The connection “Connection 1” is not found. This error is thrown by Connections collection when the specific connection element is not found. End Error Error: 2016-02-15 02:49:54.07 Code: 0xC001000E Source: Load Fact – Prod Sales Description: The connection “Connection 2” is not found. This error is thrown by Connections collection when the specific connection element is not found. End Error Error: 2016-02-15 02:49:54.95 Code: 0x00000000 Source: Get Staging Dates Description: Invalid column name ‘modified’. End Error Error: 2016-02-15 02:49:54.95 Code: 0xC002F210 Source: Get Staging Dates Execute SQL Task Description: Executing the query “select convert(smalldatetime, convert(char(10), mi…” failed with the following error: “Invalid column name ‘modified’.”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 2:49:53 AM Finished: 2:49:54 PA Elapsed: 1.625 seconds. The package execution failed. The step failed.
Command = /FILE “\”\\placewherewestoreSSISpackages\SSIS\DW\ProdSales\Load Fact – PROD SALES.dtsx\”” /CONFIGFILE “\”C:\SSIS\DW Extract Configuration.dtsConfig\”” /MAXCONCURRENT ” -1 ” /CHECKPOINTING OFF /SET “\”\package.variables[RunbyJobName].Value\””;”\”Load Prod Sales\”” /REPORTING E
An automated retry has been scheduled for this job.
I even get nifty other alerts, like the actual command that executed and that an automate retry has been scheduled.
Wait! An automatic retry?
Yeah, you read that right. I have elaborated on this alert message so that, based on certain criteria configured within the job, an automated retry can fire off. That second parameter above that I said to ignore. Yeah, that’s my delay in minuted between retries.
For now, I want you to have the ability to get a better error message from your failed jobs. As far as retrying jobs, well that’s for another post (soon, I promise).
Also, it wouldn’t be terribly hard to create a script to add this to all your scheduled jobs by adding the new alert step and then adjust all the other steps to call it on failure. I will probably get around to that eventually, but if you do it before me, let me know!