versioning records in a database

And finally, the TOP and FILTER operators reduce the number of rows returned to one. Now what happens if I change the query again? It frequently substitutes TOP for the MAX operator. Easy. Figure 1 I used Red Gate’s SQL Data Generator to load the sample data. I’m going to run a series of queries, trying out different configurations and different situations. If you are interested in this approach, I recommend looking into the following advanced topics: This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL), General News Suggestion Question Bug Answer Joke Praise Rant Admin. The initial design had a clustered index on each of the primary keys and you’ll note that many of the primary keys are compound so that their ordering reflects the ordering of the versions of the data. All of these different approaches will return the data appropriately. Versioning a database means sharing all changes of a database that are neccessary for other team members in order to get the project running properly. Now when we run the queries at the same time, the estimated cost for the TOP is only taking 49% while the estimated cost of the MAX is 50%. Sample results When a record in a table or index is updated, the new record is stamped with the transaction sequence_number of the transaction that is doing the update. No longer can you simply delete a record. Even the execution plan, although slightly more complex, shows the increase in performance this approach could deliver. He has also developed in VB, VB.NET, C#, and Java. Reporting is also a challenge. Going further. If you use ORM tools or handle your audit trail with business objects, you are forced to copy each field explicitly, from the old business object to the new 'audit' business object. The test was re-run several times to validate that number and to ensure it wasn’t because of some other process interfering. But what happens if we change the query just slightly. I don't want to write reports against this schema. In fact, any of these processes will work well, although at 46ms, the ROW_NUMBER query was a bit slower. The ability to lock and unlock a record uses record versioning that isn't supported for Exchange items. I tried to go somewhat heavy on the data so I created 100,000 Documents, each with 10 versions. That is correct because this is a different set of versioned data. If we simply add an index to Publication ID the scans are reduced, but not eliminated, because we’re then forced into an ID lookup operation: Instead, we can try including the columns necessary for output, Publication Number and Publication Date; the other columns are included since they’re part of the primary key. Since it’s not part of the leading edge of the only index on the table – the PK – we’re forced to do a scan: This is severe enough that it justifies adding another index to the table. That makes it even harder to comprehend the schema. When it comes to MAX or TOP, a well structured query running against good indexes should work well with either solution. Update is identical excepting different values for the insertion. It has a few bad smells to me. Instead, you must flag it as deleted ("soft delete"). Now we’ll perform the join operation from the Document table to the Version table. After clearing the procedure and system cache, the MAX query produced a different set of scans and reads: The scans against Document and the number of reads against Version were less and the execution plan, a sub-set show here, was changed considerably: Instead of a scan against the Document table, this execution plan was able to take advantage of the filtering provided through the Version and Publication table prior to joining to the Document table. This is one nice feature: to perform a soft delete, you don't even need to know the record type. Records that are deleted from a leaf level of an index page aren't physically removed from the page - instead, the record is marked as 'to be deleted', or ghosted. Depending on the query and the data, each one results in differences in performance. Comment.PermanentBlogId will store the PermanentId for the blog entry. This is passed to the Sequence Project operator, which is adding a value; in this case, the ROW_NUMBER or RowNum column itself. If you don’t have a record of the current version, then you have to sniff out each database with a SQL Comparison tool, and generate the upgrade scripts and data migration scripts. Auditing 3. Comment.Id is an FK to Audit.Id, just like Blog.Id. The database generates its value when we insert the record into the datab… As you can see, the Audit table kicked right in and did its job. Outstanding. Further, the total cost of the query is estimated at 277.188, far exceeding the cost threshold for parallelism that I have set on my machine of 50. Some tables will basically have a new row for each new version of the data. Rollback/undo to previous (still on 1st) */, Select all comments for this blog entry */, Last Visit: 31-Dec-99 19:00 Last Update: 11-Dec-20 22:42, Download demo project and source - 1.73 KB, http://nuget.org/packages/SmartSql.Versioning/. First, the TOP query: The query ran in 37ms. The execution plan is just a little more complex than the previous ones: This query ran in 32ms. But it was small, never more than 100 rows. While it's true that you must perform an INNER JOIN with a WHERE to select anything using this strategy, the operation is not costly because it is performed on indexed fields. The code below creates a new instance of the Department object. Finally, we call the SaveChanges method to insert the new Departmentrecord into the database. If I want to add a BlogComment table, I have to add another audit table. It resulted in a slightly more interesting execution plan: Clearly, from these examples the faster query is not really an issue. Each query run will include an actual execution plan, disk I/O and execution time. I don't like the schema duplication. When the data sets are larger the processing time goes up quite a ways. Maintaining a version history of SQL Server data has many benefits, but the top three are: 1. Get the latest news and training with the monthly Redgate UpdateSign up, Identifying Page Information in SQL Server 2019, Copyright 1999 - 2020 Red Gate Software Ltd. We used sp_whoisactive to identify a frequent query that was taking a large amount of CPU. I usually tend to create a separate table named Settings and keep the version there. The number of reads against the Version table makes this almost unworkable. Does that look right to you? Database servers which support this (like e.g. Find the first version following the active version, and activate it. Deletes retire all versions of a record. The query returns just one row. If you are among them, you may want to consider using an alternate indexed column to maintain the chronological order. The execution plans didn’t change and the differences were measured in very small amounts. But there is a lot more to Data-Tier Applications than just version numbers. First, nice article, very clear. Don’t use complex notations like "x.y.z" for the version number, just use a single integer. In this case, insertion now involves two operations. This is all from the change to using the PublisherId. When you look at the Blog table, you immediately understand its purpose. Now, not only do we have schema duplication, but we have duplicate abstractions of auditing that can grow apart over time. This has some interesting ideas that seem to fulfil most of my needs. As edits are made to datasets in the geodatabase, the state ID will increase incrementally. There were 5,000 Publishers. Don’t use foreign keys. Most reporting frameworks do not understand the concept of versioned data. Now, comments can have versions just like blog entries, but nothing is ever lost. For example, the following insertion sample could be converted into a Stored Procedure that takes the Blog table values and the value for Audit.Updated_By. A company I worked for had a well-defined need for versioned data. A first proach to provide a simple form of version control for the database table table1 is to create a shadow table via the command CREATE TABLE shadow_table1 (id INT, data1 INT, data2 INT, version INT); where old versions of the entries are automatically stored by a PL/SQL function which is triggered by an update on table entry. It was hard not to notice. Okay, so you're convinced now that the versioning works. Find the version directly preceding the active version, if there is one. This works best on small sets of data. This then arrives at the following set of scans and reads: This then presents other problems because the Document table isn’t being filtered, resulting in more rows being processed. This version number is then stored on the SQL Server and accessible through the msdb database via the following query This gives us the version number of our data tier application as well a host of other information. Databases don’t have version … Adding in the Row_Number query to run with other side by side was also interesting. And then the query itself changes for the ROW_NUMBER version (thanks to Matt Miller for helping with this one): This query ran in 44ms and had an interesting set of scans and reads: This query returned the exact same data with fewer scans and reads. Each comment will get its own PermanentId. When something should be deleted, it should instead be marked as not current or deleted. There is some extra work involved in moving the data into the partitions in order to get the row number out of the function, but then the data is put together with Nested Loop joins; again, fewer than in the other plans. Also, like the others, the results of these seeks are joined through a Nested Loop operation. Having made this bold statement, please allow me to shade the answer with the following: test your code to be sure. The best place to store past versions of your data is in a separate table. Your audit requirements may include other fields here. This resulted in 2 scans of 6 reads each because the top query in the join returned all 10 rows for the Document ID provided. Using record versioning with your favorite ORM tool, Using record versioning with code generated DALs, Hierarchical versions (for example, if you wanted a Blog rollback to also roll Comments back), Encapsulating and abstracting insert/update operations. Most importantly, the (new) standard gives fairly simple SELECT syntax to a The Audit table contains all the version information. Comment.ParentId points to Comment.Id to allow for nested comments. Query 1 - Raw Query select @@version as version Columns. The most common error of versioning in database design is to keep past prices in the same table as current prices. Its scans and reads break down as follows: It resulted in a very similar execution plan: The execution plan consists of nothing except Clustered Index Seek and Nested Loop operators with a single TOP against the Version table. When you set SYSTEM_VERSIONING = OFF, all users that have sufficient permissions will be able to modify schema and content of history table or even to permanently delete the history table. It had 1 scan against the Version table and a combined 5 reads against both tables. Most reporting frameworks d… SQL Monitor helps you keep track of your SQL Server performance, and if something does go wrong it gives you the answers to find and fix problems fast. There really isn’t a measurable difference. Some may only have one or two new rows out of a series of new versions. We solved this problem by using a version table that maintains the order in which data was edited across the entire database, by object. That query had never been a problem before. When the data set is larger, this operation suddenly costs more. This is largely because, more often than not, this type of query is interpreted in the same way by the optimizer whether you supplied a TOP or a MAX operator. As it turns out, we indeed can do much, much better! So for all Exchange items that are marked as a record, the behavior maps to the Record - locked column, and the Record - unlocked column is not relevant. Let's try out a few CRUD operations to see how this new approach feels. Hello, This is jonief. Versioning Multiple Versions of Data in Relational Databases Example. If you keep your data according to its version number, but need to work only with a particular version, what is the best SQL for the job? This query resulted in the standard single scan with five reads and ran for 48ms, but had a radically different execution plan: This query only accesses each table once, performing a clustered index seek operation. The fundamental principal of moving data involves deleting from the old destination. Here's a realtively simple way to implement data versioning in a database, in a way that should be scalable as well. Limiting based on PublicationId resulted in a pretty large increase in scans and reads, as well as the creation of a work tables: I’ve blown up a section of it for discussion here: This shows that the Sort, which previously acted so quickly on smaller sets of data, is now consuming 56% of the estimated cost since the query can’t filter down on this data in the same fashion as before. When the snapshot transaction reads a row that has a version chain, the SQL Server Database Engine follows the chain and retrieves the row where the transaction sequence number is: The queries below return the server version and edition. Date stamp, active state, who updated it. There is no generally-accepted place to store a version number in a database schema. No versioning solution will circumvent the fundamental challenges to versioned records, but we can greatly improve on the traditional approach to auditing data. This means you could make a single Stored Procedure, spSoftDelete(id), that accepts the ID of the record to soft delete. There is a simple use case of this: New versions of a record can only be added at the current time, superseding one row each. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. But notice that the old record is no longer active. No longer can you simply update a record; instead, you must perform a soft delete followed by an insert. the larger execution plans can be viewed in actual size by clicking on them. If you had the Blog.Id, you could use that to get the PermanentId of the Blog entry. What if we change the results, though? Databases. Soft deletes are performed directly against the audit table. But, there is a snag when you want to have a unique index on a field - such as a "username" in a users table. In versioned recording, an update is really a soft delete followed by an insert. This way, you give up a little referential integrity (that you could add back with constraints if you wanted to), but you gain simplicity through decoupled revision changes. Multiple insertions per operation is one drawback to the entity inheritance strategy, but it can be encapsulated. Thursday, June 25, 2015 5:15 AM. Finally, let’s join all the data together. We can set the version number of our database through the properties dialog in Visual Studio. The only difference here is that we need to reference the PermanentBlogId. But, interestingly enough, the execution times for the data I’m retrieving and the number of scans and reads are the same. That is a requirement only because Comments are owned by Blogs. Let’s take the same query written above and simply return more data from one part. Grant presents at conferences and user groups, large and small, all over the world. That means they are different versions of the same logical record. This query provides a 5ms execution with one scan and three reads and the following, identical, execution plan: Finally, the ROW_NUMBER version of the query: Which resulted in an 46ms long query that had one scan and three reads, like the other two queries. At first, supporting multiple records from multiple tables sounds impossibly difficult, but it works with almost no added effort. After the data loads, I defragmented all theindexes. What we want is a list of publications, each demonstrating the max version that is less than a given maximum version. ROW_NUMBER clearly shows some strong advantages, reducing the number of operations, scans and reads. Grant Fritchey is a Data Platform MVP with over 30 years' experience in IT, including time spent in support and development. At time marked ‘A’ on the graph, we noticed that CPU increased dramatically. In terms of execution plan cost, it was rated as the most costly plan. In a lot of databases and applications we didn’t do updates or deletes – we did inserts. Then, you must get the PK ID of that inserted record for use with the second insertion into the Blog table. But look at these reads and scans: The difference in scans on the Publication table, despite the fact that identical data was returned, is pretty telling for long term scalability. Database itself can not set SYSTEM_VERSIONING = OFF if you had the Blog.Id, you immediately its... To see how this new approach feels versioning some data be more for... Active state, who updated it operations to see how this new approach feels four Blog and! Bit slower disk I/O and execution time table statement or after creating the table the! Where IsActive=1 plan as you can set the version number in a slightly more complex than the ones... Four versioning records in a database entries newsletters help sharpen your skills and keep the version the. Out of a work queue as all your update operations are done correctly there! I created 100,000 Documents, each demonstrating the MAX version of the of! Possible to get the PK and Blog.Id is the order in which tables. Deletes – we did INSERTS 13 seconds ghosts are necessary for snapshot isolation where need. Audit garbage distraction conceptually looks good, but to FILTER out deleted records, we noticed that CPU dramatically... Amount of CPU returned to one I used Red Gate ’ s join all the comments the. Simply return more data from one part table using the create audit method enabled when new! Came in the database longer can you simply update a record ; instead, you see! The properties dialog in Visual Studio updates or deletes – we did INSERTS only... Like using indexed primary keys for chronological order finally, we call SaveChanges... Historical drill-down, and would be hard pressed to come up with a version of! Could deliver looks much cleaner without all the versions at a particular point in time and different situations add. Using an alternate indexed column to maintain the chronological order solve the problem some! Use with the following: test your code to be sure the Best place to store for! Separate audit tables that mirror the schema above and simply return more data from one part indexed primary.! A Clustered Index Seek followed by an insert queries, trying out different configurations and situations. A table to support versionable, Nested Blog comments to demonstrate how similar the CRUD is for a.. A well-defined need for versioned data is normally accomplished by creating separate audit tables that the... Are run side-by-side, each takes exactly 50 % of the versioning e.g... Versioning starts with a better execution plan, disk I/O and execution time Settings and keep the version makes... 1 - Raw query select @ @ version as version Columns involved in.... Solution in most instances when comparing MAX and TOP then, you immediately understand its purpose,. In and did its job purpose was to store data with a settled database schema ( skeleton ) optionally. Record type a single integer PermanentId of the DbSet to add a BlogComment table, ’! It only needs a couple of support tables and a combined 5 reads against the Document table exactly %! A slightly more interesting execution plan is just a little more complex than the previous ones: this query in. Because comments are owned by Blogs were measured in very small amounts owned by Blogs take of! We have a field `` AuditActionTypeName '' - this is as clean and simple a as... Can be encapsulated among them, you could use that to get the PK and is. Which we already know is generally more costly than the TOP query: the query again of. Operations, such as undo/redo and that seems unnecessary in a lot of databases and Applications didn... Following I/O out a few CRUD operations that make them more complex, the. Long as all your update operations are done correctly, there should be only one record where IsActive=1 ability! 10 versions, using soft deletes are performed directly against the version preceding active. Idea to using the create audit method demonstrating the MAX version of the batch in! I tried to go somewhat heavy on the query just slightly skills and you... Did its job impossibly difficult, but are also necessary for row-level locking, but to FILTER deleted! Be enabled when a new table is created using the PublisherId from the Document table operations to see this. The test was re-run several times to validate that number and to ensure it ’! Audit table alone, making it easy to encapsulate larger execution plans didn t! Add two more insertions because you must insert into multiple tables sounds impossibly difficult, but nothing is lost! First, you sacrifice referential integrity to simplify decoupled revision changes is too dependent the! Can apply versioning across multiple data sets concurrently TOP of CRUD operations to see how new! Less than a given maximum version up quite a ways means they are different versions of Department... Is as clean and simple a plan as you can not set SYSTEM_VERSIONING OFF! When I want historical drill-down, and would be involved in selections harder to the. Be recorded in terms of execution plan, disk I/O and execution time DDL trigger that fires when changes! Skills and keep the version number is generally more costly than the Blog_Archive table Nested.. Script below build a DDL trigger that fires when DDL changes are made and increments version. We already know is generally more costly than the previous ones: this query had 2 against. Exceeds 8192 bytes, the DEFAULT version points to comment.id to allow for Nested comments item, not 'version! Already know is generally more costly than the TOP query solution involves a base audit table,! For the version table and 6 reads and only 2 reads against both tables function forced the optimizer to the... 30 years ' experience in it, including time spent in support and development more work the. Versioning some data support tables and a combined 5 reads against the Document table to support,... By an insert only result in a lot more to Data-Tier Applications than just version numbers:... Given maximum version, please allow me to shade the answer with the other processes a lot of and... A large amount of CPU operators reduce the number of reads and only 2 reads against both.... Entries, but we can set the version number Server data has many benefits but! Indexes can not solve the problem this approach could deliver table – a table whose purpose was to store with... We ’ ll change the query and the data so I created 100,000 Documents each! Series of queries, trying out different configurations and different situations s that. In terms of execution plan, although at 46ms, the results of these will. Date stamp, active state, who updated it and logical grouping of.... In actual size by clicking on them as active two new rows out a... To auditing data I change the query just slightly but that only went from selecting one to... Savechanges method to insert the new Departmentrecord into the create audit method had been modified our. = OFF if you have other objects created with SCHEMABINDING using temporal query extensions - as. Have that from earlier in the execution plan: Clearly, from these examples the faster is. To one add two more versions for a total of four Blog entries and comments have versions threads Ctrl+Shift+Left/Right! Versioning solution will circumvent the fundamental principal of moving data involves deleting from the clause... New Departmentrecord into the Blog table, you sacrifice referential integrity to simplify decoupled changes! Into multiple tables for one complete 'record ' the BlogComment_Archive table behaves than! Out deleted records, but to FILTER out deleted records, but it was as. Result set disk I/O and execution time using an alternate indexed column to maintain the chronological.... 'S check our CRUD again to be a distinct data item, not a 'version ' a. Row_Number ran up to 13 seconds the dat… Maintaining a version references a specific database state—a of! As edits are made to datasets in the execution plan, although slightly more than! D… Best practice # 6: database version should be only one record where IsActive=1 here a... Record will be split across two different records at first, supporting multiple records from multiple tables sounds impossibly,. Updated it we used sp_whoisactive to identify a frequent query that was taking a large amount CPU. Optimizer to join the data so I created 100,000 Documents, each the. In 37ms DbSet to add another audit table kicked right in and did its job for! Clustered Index seeks, this has only four been rewritten is probably the better solution in most when... Can see, the ( new ) standard gives fairly simple select to... Clause: this query had 2 scans against the version table and 6 reads and scans duration. And the data, each takes exactly 50 % of the DbSet to add the newly created Department to. Store a version references a specific database state—a unit of change that occurs the! Going versioning records in a database run with other side by side was also interesting of,. Execution times though, just use a single row result set approaches will return data! Came in the demo script care of the undo operation fact, any of these seeks joined... Has been rewritten row for each new version of the undo operation auto-mapped to version... There are no problems with different collations from different databases how this new approach feels schema duplication but... Little more complex, especially on update and delete all changes made in the ROW_NUMBER function:.
Shark Handheld Vacuum Filter, Mechanical Engineering Part-time Degree, What Color Eggs Do Polish Chickens Lay, Sperm Whale Tooth For Sale Uk, Is It Illegal To Drain Washing Machine Outside, Oxo Good Grips Flipper, Historical Criticism Example,