I received an email from a connection on LinkedIn seeking some advice for architecture on a new reporting system. Although I don’t know the gentleman personally, he is a SQL Server professional and I felt obliged to help out a fellow colleague in the field. The email is included below and my suggested solutions following it.
Text of the email:
Sorry for the interruption, but I would like to get your professional opinion on a SQL Server 2012 planning/architecture/deployment project I’m working on for my boss. He asked me to research and make recommendations for planning and architecture for the OLTP and reporting databases. He also mentioned that hardware is not a factor; it won’t be a problem to get the hardware we need to make this work.
I’ve done a lot of research on this and I plan on presenting to him a scale-out server deployment that uses a scalable shared database for the reporting database. My recommendations are as follows:
•Separate our reporting database from the production (OLTP) database.
•Use scale-out report server deployment; the report server database will be hosted on a SQL Server instance that is part of a failover cluster to enhance the fault tolerance of our reporting environment.
•Use a reporting volume accessible over a SAN. The volume will be used solely for the reporting database.
•Build new reporting database by mounting the reporting volume on the production OLTP database server. We will use one of the data-copy methods used by SQL Server 2012.
•After building the database, the reporting volume is set to read-only and dismounted.
•Mount the read-only reporting volume onto multiple reporting servers over the SAN.
•Attach (read-only) the reporting database to an instance of SQL Server on each reporting server
•The reporting database is now configured as a scalable shared database shared by multiple report server instances.
•Implement a refresh cycle to bring the reporting data up-to-date. We plan on using SSIS or a script to manage the update cycle
•Lastly, we plan on alternating versions of the database between two sets of reporting volumes to maximize availability of the shared database.
Please kindly provide you professional input/suggestions/objections.
———————–End of Email
These were some of my recommendations based on the email they are in no particular order, each one is appropriate for different environments and require different resources.
1. Always-On Availability groups. This solution gives the ability to have a group of databases with several replicas (up to five to be exact – primary plus 4 replicas) on another server(s) in near real-time. Up to two of the replicas can be setup for Synchronous commit and they can be setup for automatic fail-over if needed as a Disaster Recovery (DR)/High Availability (HA) solution. Additionally, the replicas can be setup as Read-Only allowing for offloading of reporting, database backups, database integrity checks and other maintenance. This solution requires fail-over clustering (which has already been considered in the email description) and can scale out and have fault tolerance but not need to have shared storage as a clustered instance would need.
Overall, this new feature is great for the type of problem presented and there is no custom coding needed, etc. Additionally, there is a feature called the Group Listener which is a virtual network name that can be given to the Availability group. This enable applications to use the Group Listener name to connect to the availability group and not have to rely on the server name which improves scalability and high availability.
2. Database Mirroring with Snapshots. Database mirroring alone will not satisfy this request. Unlike Availability Groups, a mirrored database is inaccessible. The advantage of this solution however is that it can be implemented on older versions of SQL Server (2005,2008 and 2008R2). Also, there is no need for an enterprise level license which is a requirement for Availability Groups. However, using mirroring requires that some custom development be done because as I mentioned the mirrored database cannot be read from. Creating a database snapshot on the mirrored database (not on the principal) will allow for applications/reports to be run from the snapshot. This snapshot will have to be dropped and recreated in order for new data to show up which means that custom code must be developed in order to do this on regular intervals. This solution also gives DR and HA for the environment. Clustering is not required which further reduces the complexity.
Two major draw backs is that Mirroring must be done on a database per database basis (not a group of databases) and the principal can only be mirrored to one other server so it does not scale as well as other solutions. There are also different security options with mirroring unlike Availability Groups which require that the servers be on the same domain (all nodes in a cluster need to be on the same domain).
3. Log-Shipping. This is another viable option. With log shipping you basically automate the transfer and restore of a database to another server. There is a time delay that you would set depending on your needs. So you could have your logs shipped every 15 minutes (bit of over kill) if you needed frequent refreshes or you could put it back to something like 24 hours if you just want a daily refresh for reporting purposes. The solution can be customized for your needs. Take note of one draw back (other than latency) is that when the databases are being restored users will be kicked out and they will be inaccessible for some time while the log is applied. This solution, like mirroring is done on a database per database basis and is considered a “warm” standby with no automatic fail-over. Log-Shipping can be done with multiple secondary servers for increased scalability over Database Mirroring.
4. Transactional Replication. With this solution tables will be replicated in near real-time to another server (or multiple servers). A Publication is created with tables assigned to it, you can also select which columns you want as well and put filter conditions if needed. Other servers “subscribe” to this publication and thus receive data in real time. There are many complexities and challenges to setting up Transactional Replication and they should be fully investigated before implementing since there are many other options that will work for this problem.
5. Peer-to-Peer Replication.
This is an interesting concept and one that is difficult to wrap one’s head around which is why I provided a link to the TechNet Article. The basic concept is that you “load-balance” your database so that Reads/Writes scale out better which increases performance and reduces bottlenecks. I cannot add more value to this option so I will defer you to this link: http://technet.microsoft.com/en-us/library/ms151196.aspx
6. Data Warehousing. This is probably the solution that would take the longest to implement since it is more of a long-term project. Basically, you will create a DataMart or DataWarehouse using an ETL Process to push off your transactional data to another server for reporting. Depending on which approach you use (Kimball vs. Inmon) you would start with the DataMart or DataWarehouse. I personally prefer the Kimball method since it fits better with Agile development and in my humble opinion is easier to implement. You would basically cleanse your data and denormalize it into a DataMart. From there you could build reports or OLAP cubes for more of self service BI model. Finally, as you create multiple DataMarts (each one would be subject/product specific) you built out a DataWarehouse to have all the data stored in one central location. Kimball starts with the DW and ends with the DM and I think it is more complex to use this approach (I use it at my current job).
With this approach you can definitely scale out a lot more and add additional business logic to your data. For example, you could do some fancy aggregations, business rules and other transformations to make the data more valuable to your business users. Also, by denormalizing you allow your users to look at data dimensionally which can be fantastic for analysis. Reporting and BI should be a journey not a solution so if you use one of these other methodologies first you should consider Data Warehousing as a long term approach if your business is data driven. Just consider the fact that the new MCSA SQL 2012 Certification has a Data Warehousing exam requirement (70-463).