Saturday, February 13, 2016

4.2 Step 2: moving a second non-SAP application to the same HANA

We now bring a second source application (Application 2) into the picture. Just like the first one (Application 1) ít is also implemented in Java on a relational database system. Application 2 provides data to Application 1, which already runs on HANA, via an interface in nightly, long-running batch loads. Just like 1 Application 2 also comes with 2 TB of source data. Technically we create the tables of Application 2 within the HANA database, import all data and benefit from the built-in compression mechanisms of HANA (60-70% smaller). We then connect the application to the HANA database so that we have to different applications talking to the same HANA instance. From a performance perspective this is no issue at all as we have seen before.


Let’s spend some thoughts on the data in the HANA database. We know from chapter 4.1 that HANA compresses our 2 TB source data by roughly two thirds to 660 GB. With our second application which also comes with 2 TB of source data we will double the amount in HANA to roughly 1,3 TB. Really?!

As our two applications are connected via an interface we can assume that we have a large portion of duplicated data within the two databases. Let’s say that 75% of the data is double. I personally have seen transactional systems where the amount of duplicated data was rather in the 90’s% - and if you look at analytical systems like business warehouses the amount of duplicated data is 100% by design.

So let’s get rid of all the duplicates now and store each piece of information only once in the HANA database. If we discard 75% of the source data of Application 2 we only have 500 GB of unique data left which HANA compresses to 165 GB. What have we achieved by now:
  • 4 TB of source data have been compressed to 825 GB
  • We shut down 2 productive database servers (probably with one integration- and one development system each which adds up to 4 more servers)
  • We decommissioned an interface between two applications along with its nightly batch runs (which increased the accuracy of data in Application 1 as it does not have to wait one night for its data anymore)
  • We decreased our demand for backup storage by 3,175 TB
  • We got everything out-of-the-box without doing anything in particular
Yet there remains one problem: 75% of the data that Application 2 uses now resides in the data model of Application 1 (which we first moved to HANA) which means that it is most probably not accessible to Application 2 anymore without reprogramming. This is where we make use of the HANA view concept.

“HANA views” are basically virtual views on the data stored in the database. These views are not persisted like in other database concepts. They exist only in memory and are created on the fly when they are needed according to the design rules which the administrator has modeled. One of the main concepts of HANA is to store data in tables in a denormalized form and then use “HANA views” to create any data model that is needed.

So now we make use of this concept for our example which means a little extra effort for us – but this effort is much less than to reengineer our application. What we do is we denormalize our data model from both applications and combine the data in elementary tables. We then reconstruct the individual data models of Application 1 and 2 so that when they talk to our HANA database they do not realize that the data model underneath has changed. They will just continue to work as before with no code modifications required.

You can easily guess the next steps: we re-iterate this approach for application 3, 4, 5, and so on until there are no more applications left.

No comments:

Post a Comment