Querying the Management Data Warehouse for Disk Usage over time (Data Collector)

13 Aug
Querying the Management Data Warehouse for Disk Usage over time (Data Collector)

A part of every DBA’s job is to justify all those disk space requests. The phrases “I just need it” or “Trust me this database is going to grow one terabyte this year” might not convince your boss.

Many people are under the impression that disk space is cheap since they base that notion off of the shell shocker they saw on NewEgg or the weekly circular from Best Buy. If you are using consumer SATA drives from retail outlets for your production databases please do the following:
1. Update your resume
2. Apply for jobs at least 1000 miles from your current location
3. Hope the next DBA that replaces you has more common sense

We’re in the times of SANs, iSCSI, SAS drives, SSD drives and RAID architecture! You should be forking out almost the cost of your server in disk space. Unless of course losing data is an option where you work.

Now that I’ve bored you or intrigued you (or something in between) I can get to my point. There is a wonderful tool that comes with SQL Server 2008 and later called the Management Data Warehouse (MDW) also known as the Data Collector.

Now if you don’t have Data collection setup already you can follow the instructions written by SQL Server MVP Kalen Delaney at the link below. Note: Disk Usage is one of the default collection sets that will be initialized, do not turn it off.

After you have completed the steps to setup data collection open up the Disk Usage report (see screen shot below).

Disk Usage Report

You should get a report that looks similar to the one in the background of the screen shot. Notice anything particularly annoying about the report? Here is a hint, you can’t change the start and end date! By the way there are tons of customized reports created to go against the MDW that you can download.  I downloaded a few and was very frustrated trying to get them to work; not sure why since I’m very comfortable with SSRS but there were some issues with the data source.

Anyhow, I created my own report and I wanted to share the query here with you all. This idea initially came up in a meeting where we asked for most space and was asked to prove it through numbers. So I turned on the MDW and started recording disk space and almost a year later I put together a quick dashboard report in both excel and Reporting Services to query the MDW directly.

**Note: I called my Data collection database DataCollector. If yours is different, then you will have to change the code below and replace DataCollector with your Database name.

/*The column DBSize stores the number of pages in the Database.Each page is equal to 8 KiloBytes. Therefore 128 Pages are equal to one MegaByte.Use these conversions to get the DBSize in the measurement you want.

1 Page = 8KB
DBSize*8 = # of KB
128 pages = 128 * 8(Pages per KB) = 1024KB = 1MB

1024 KB is 1MB
DBSize*8/1024 =MB
OR DBSize/128=MB

1024 KB is 1MB and 1024MB is 1GB
DBSize*8/1024/1024 =GB
OR DBSize/131072=GB

1024 KB is 1MB and 1024MB is 1GB and 1024GB is 1TB
OR DBSize/134217728=TB

/*This first query just returns all the databases and their sizes by number of pages
I wanted to show you how sloppy the numbers look so you would appreciate
the math I wrote up above 🙂 */

SELECT database_name,sum(dbsize)
FROM [DataCollector].[snapshots].[disk_usage] du
inner join sys.databases sd on
---where collection_time between '' and '''+@endtime+'''
GROUP BY database_name
ORDER BY database_name

/*This query returns each database's minimum size, and maximum size, and difference between them over the history of its existence. You can filter out for a certain time frame by adjusting the where clause. Additionally, the same data is displayed for log file sizes. The size displayed is in GB, feel free to change it if you want.The join to the sys.databases table forces the query to only return data for currently live databases.Since data collector could potentially still have data for databases that have been deleted, I decided to put that join in
so I could get information about my live DBs*/

SELECT database_name,(cast(min(dbsize) AS REAL)/131072) as MinSizeGB,(CAST(max(dbsize) AS REAL)/131072) as MaxSizeGB,(CAST(max(dbsize) AS REAL)-CAST(min(dbsize) AS REAL))/131072 as DiffGB,(CAST(min(logsize)AS REAL)/131072) as MinLogSizeGB,(CAST(max(logsize)AS REAL)/131072) as MaxLogSizeGB,(max(logsize)-min(logsize))/131072 as LogDiffGB

FROM [DataCollector].[snapshots].[disk_usage] du
inner join sys.databases sd on
---where collection_time between '' and '''+@endtime+'''
GROUP BY database_name
ORDER BY database_name
--order by max(dbsize)-min(dbsize) desc

/*This query shows the most current size for each database and its logs including the last collection time.
Many people may find this information useful since the MAX size might include a recording for example
where a T-Log grew and later on the T-Log was shrunk.*/

SELECT database_name,(dbsize*8/131072) as LatestDBSizeGB,((logsize)*8/131072) as LatestLogSizeGB, collection_time
FROM [DataCollector].[snapshots].[disk_usage] du
inner join sys.databases sd on
WHERE collection_time = (SELECT MAX(collection_time) FROM [DataCollector].[snapshots].[disk_usage] du1 WHERE du.database_name=du1.database_name)
ORDER BY database_name

/*This query simply brings back the first and last collection times for disk usage for all
of the currently live databases.*/

SELECT database_name,min(collection_time) as first_collection, MAX(collection_time) as last_collection
FROM [DataCollector].[snapshots].[disk_usage] du
inner join sys.databases sd on
GROUP BY database_name
ORDER BY database_name


Tags: , , , , ,

8 responses to “Querying the Management Data Warehouse for Disk Usage over time (Data Collector)

  1. Karl Fasick

    December 17, 2012 at 7:39 PM

    Thanks! I’m trying to figure how to make the query work for a consolidated MDW (where databases are not in sys.databases on the MDW server).


  2. vatmed

    April 16, 2013 at 9:31 AM

    Muchas gracias por este blog estaba enredada con este tema! jajaja


  3. liamchew

    March 2, 2015 at 7:43 AM

    This query gets info by instance and DB:

    select convert(date, collection_time) as [Date]
    , max(usedpages) as [Used space]
    –You might waat other results here too
    from snapshots.disk_usage d
    join core.snapshots_internal i
    on d.snapshot_id = i.snapshot_id
    join core.source_info_internal s
    on s.source_id = i.source_id
    where database_name = ‘YourDB’
    and instance_name = ‘YourInstance’
    group by convert(date, collection_time)
    order by convert(date, collection_time)


  4. Beldi fakhreddine (@beldi20)

    February 5, 2016 at 9:08 AM

    this is too helpful , thank you 😀 ,is there a tutorial to help me , how i can create a custom report how i can monitor different Sql server instances installed on different computer using a MDW centralised on one server



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: