-
9 April 2021
- Data Engineering
Companies collect and store large amount of data that holds information about the business profile e.g. sales, cooperation with contractors, resources in warehouses, supply and product prices. The obvious fact is that data cannot be located arbitrarily in database systems. In this situation, it is necessary to properly design the relational database. A well-designed relational database is characterized by the fact that all the necessary information in the company’s business segments is clearly and legibly documented and easy to use on a regular basis.
One way to intuitively construct a relational database is normalization. It is a proces of organizing data in a database that involves creating tables that represemt a unique segment of the company’s operations (such as vendors or customers) and setting relationship between these tables by eliminating redundancy and inconsistent dependency. At the same time, normalization can be treated as a target in the relational database design phase.
Sometimes, however, there are situations when it is necessary to depart from this rule and there may be a need to carry out the so-called Database Denormalization.
What is database denormalization?
In short, database denormalization is the combination of normalized tables into one. This is the implementation of controlled redundancy into the database to speed up operation on it. As a result of having a large amount of data in a relational tables, joining these tables to obtain the information you need for your business can become too expensive. Therefore, one solution is to cross-check keys or columns between tables that are joined quite frequently.
Consequently, the target table contains not only the data relevant to it, but also information from other tables. Of course, this solution involves the possibility of data redundancy within the tables, which in turn leads to a rapid increase in their size. A common symptom is the possibility of data duplications.
From this point of view, the database denormalization seems to be a kind of compromise. While in the case of database normalization, the aim is to simplify it to the maximum form of tables that are independently responsible for each data subject area, the denormalization is assigned to the specificity of browsing these data and this cannot be universally defined for each case. Its efficient functioning must be defined based on the business needs. If you want to know how to implement it properly visit our Data Engineering Consultancy page
Database denormalization – advantages and disadvantages.
In case of a rapid increase in the amount of data in a database, denormalization brings tangible benefits, but it has several disadvantages. Let’s introduce a few of them:
Advantages of Database denormalization:
- Increased query execution speed.
As there is no need to use joins between tables, it is possible to extract the necessary information from one table, which automatically increases the speed of query execution. Additionally, this solution saves memory.
- Writing queries is much easier.
If the table is properly reorganized for the most common needs, you can extract data from only one table and not waste time looking for join keys. However, one should remember about data redundancy and update the query accordingly.
- No need to obtain data from dictionary tables where the values are constant over time.
Tables with country dictionaries are good examples. If a company operates in a fixed number of world markets, it seems unnecessary to make continuous joins with the dictionary table with countries. In this case, it is worth adding a column with the name of the country to, for example, a sales table.
- Ability to add aggregate data, which can be used for more efficient reporting.
Certain statistics, such as the number of sales actions, average sales, etc., are very necessary to analyze various areas of the company’s operation. Therefore, it may be easier to define key statistics and include them in one table than to retrieve them by joining multiple tables.
- Reduction of the number of tables in a relational database.
In case of a complex relational database architecture, obtaining data from the multiple tables can be tricky. If the database is properly denormalized, the number of these tables can be effectively reduced and, consequently, the database architecture can be simplified.
Disadvantages of Database denormalization:
- Increased pricessing size.
Due to data redundancy and possible data duplication, the size of query processing increases.
- Increased table sizes.
As a result of the denormalization of the database, the table may significantly increase its size, which may be associated with the load on the storage space.
- Increased costs of updating tables and inserts.
In a table where data has undergone redundancy due to the database denormalization, data update may be a problem. For example, let’s assume that an additional column that contains data about customer’s address has been added. Updating this data can be burdensome and costly if the customer changes the address. If the database is normalized, updating can only be done in the dictionary table at a much lower cost. It is similar with inserts. Due to the redundancy of data as a result of joining multiple tables, obtaining many data for one table may be burdensome.
- Data may be inconsistent.
Before executing the query, it is necessary to get to know the table thoroughly and to take into account data duplication. The query that will extract the necessary data without a risk of data inconsistency should be comprehensively prepared.
Database denormalization – examples.
- Columns with aggregated data.
Let’s assume the database has advertisers, sales and campaigns tables. For the purposes of reporting advertiser results, there is a need to count campaigns and the sales columne for each advertiser. It is possible to add to additional columns in the advertisers table, which count the number of campaigns and the sales volume. As a result of this transformation, there is no need to retrieve this data from the sales and campaigns tables each time by using count functions.
- Dictionary tables.
In this example there are two tables in the database: countries and customers. One of the company’s needs is to research customers and countries from the point of view of sales effectiveness therefore joins are refularly performer between the tables: customers and countries. To limit the frequent joining of these two tables, an additional column could be added to the customer table – the country name.
- Create a new table that meets business needs.
Let’s assume that there is a frequent need to extract data from multiple tables. With proper definitione of business needs, it is possible to create a table that will reduce the processing size and time of regular joins.
Let’s go back to the first example. Suppose a business wants to regularly extract sales details data such as campaigns or advertisers with full names. For this purpose it is possible to create a table that will contain all the necessary data in the sales table. In such table business can retrieve the necessary data without the need to join multiple tables on a regular basis.
If you want to know how these solution can help your business, contact us.
Check out our blog for more details on Data Engineering: