One fine morning as I’m sitting at my desk doing my work I review some emails about cleaning out some old data from a table that stored webpage session information. I’m given hints about first checking out a job that supposedly purges the data automatically. With my first clue, the wheels were turning on this investigation. To my astonishment, I would fine something terrifying… almost unspeakable…
It was a SQL Server job with no owner!
You may think to yourself “what is the big deal, a SQL Server job with no owner no one got hurt.” Well it’s a bit more complicated than that. Sure, the job had failed probably 1000 times since there was no owner to have permission to run the script in the job. In all honestly, there weren’t enough job history records for me to determine when the job had started failing.
Again, it may not seem like a big deal but let me just say this. The table that stores these webpage session was supposed to have expired sessions expunged regularly. Instead the table had accumulated tons of rows. How many exactly? Oh around 80 million rows on a table that only has one index (PK clustered index) on column that is not used in the delete statement that is in the job, and no other indexes on the table. Basically, the query had to scan the entire table to find the correct records to remove. Also, since the job would execute a delete statement it was logging every single delete in the transaction log.
I gave the poor job an owner and after a few minutes realized that the job was set to run every minute. So you may have already guessed what happened next. If you haven’t let me just say this. The job ran successfully after I gave it an owner. And it kept running, and running, and running, and running… you get the idea. I had to kill the job. It was freezing our website. With some pain and frustration, I was able to remove the unnecessary data from the table.
So as you can see, this job failed consistently because it had no owner. But why did it have no owner, did someone delete the owner? Probably not. My guess is that the person that created the job left the company and thus his/her account was removed from the server logins. In general, when a job is created its owner is its creator, unless someone consciously changes it at creation time.
Let me paint an ugly picture to drive my point across. Imagine if someone created a job or maintenance plan to back up all your databases. That person left, and his/her server login was deleted. The job fails everyday and no error messages are sent out and then one of your databases crashes. No backups = No Restores = Time to find a new job. Also, this particular database and its log just kept growing unnecessarily so our drive was filling with rubbish data which maybe another scenario if you had a similar setup for cleaning out old data.
Here is the error message in the job history log:
Date 2/1/2013 8:29:00 PM
Log Job History (ASPState_Job_DeleteExpiredSessions)
Step ID 1
Job Name VeryImportantJob
Step Name If the Job fails you will be in trouble!!
Sql Severity 14
Sql Message ID 916
Operator Net sent
Retries Attempted 0
‘EXECUTE AS LOGIN’ failed for the requested login ‘LoginName‘. The step failed.
By the way, my job history lied no operators were received any emails 🙂
And now the code to find jobs without owners
FROM dbo.sysjobs j
LEFT OUTER JOIN sys.server_principals p
ON j.owner_sid = p.sid
ORDER BY p.name
At this point you have one of two choices. You can find each job without an owner and assign it a good Server Login (SQL or Windows). Or you can do a bulk update (warning you are doing this at YOUR own risk) of the dbo.sysjobs table in MSDB. When you pick a good server login make sure it is one that you know will never get removed from the server and has enough permission to execute all the job steps. Once you have a good server login, update the owner_sid column (dbo.sysjobs table) to a good sid value from your sys.server_principals table where the value for owner_sid is null. Alright fine, I’ll provide the code for you!
SET owner_sid = [TheSidValue]
WHERE owner_sid is NULL
T-SQL Code for killing a job
EXEC dbo.sp_stop_job 'YOUR JOB NAME'