When talk turns to the topic of storing and reporting on retail data, it’s common to hear debate about which data system is best. Of course, that’s the wrong approach. It’s not about which is best. Instead, it’s about which best system suits your needs. Take data warehouses and databases as examples. Both are relational data systems. However, they are used in a fundamentally different way.
So the question remains: which should you choose? As already mentioned, that depends on the needs of your retail business. To decide, you first need to consider the characteristics of each. That’s what you’ll find below. Phrased as questions, we unpack five key differences between data warehouses and databases.
It's worth pointing out that you might want to use both. You can. In such cases, it's critical to understand what you want to achieve so that you can take the best approach.
1. Data Warehouse vs Database: What is the purpose of each?
The first key difference between a data warehouse and a database is the purpose.
Let’s consider the data warehouse first.
In simple terms, a data warehouse is a central information storage hub or repository, holding all of your business information and data collected from all of your different systems or sources. From a retail perspective, that can include everything from your sales and inventory data to your customer and POS data.
Of course, your data warehouse is not just for storing data. Using various business intelligence tools and software, you can also analyse this consolidated data and get answers to critical questions posed by your business. That said, querying data from a complex warehouse isn’t always a quick or easy process.
Compare that to a database.
While it also stores information, a database is set up to hold and organise structured data from only a single or hand full of sources. It's also optimised for speed and efficiency. In that sense, it should be relatively easy and straight-forward to query and retrieve any data - usually transactional - when requested.
2. How does each process data?
A second significant difference between data warehouses and databases is in the way in which each processes data.
While data warehouses use OLAP or Online Analytical Processing as a method to process data, databases use OLTP or Online Transactional Processing.
As for how each method works, here's a brief explanation:
OLAP is an approach to answering multi-dimensional analytical queries about your data. It allows you to analyse and evaluate your data from multiple perspectives, thereby providing you with a holistic view of your business. That's why it works so well for a repository such as a data warehouse.
Let's say, for example, you want to perform a query across all of your stores to see where a certain product sold the most during a certain period.
OLAP also allows you to select, extract, view and analyse your retail data to spot trends and identify any potential future issues.
In contrast, OLTP focuses primarily on transaction-orientated tasks. Databases use this processing method to insert, update, delete and replace large numbers of transactions.
For this processing method to be efficient, the response time from databases does need to be fast. Since it's only querying few data sources, that should be all but guaranteed.
3. Database vs Data Warehouse: How is data structured?
When it comes to the topic of data structure, there are generally two different processes to consider.
The first is to use a normalisation data structure; the second is to use a denormalisation data structure.
Most, if not all databases use the first while data warehouses generally use the latter.
As for what each is; normalising data is the process of structuring your data (organising it into multiple columns and tables) to reduce or eliminate any redundant data. It is also meant to improve the integrity of your data.
That said, there is the point that because it involves multiple tables, it can cause queries to slow down. The more normalised your data is, the more complex the queries need to be to read the data as a single query combines data from many tables.
On the other hand, your data warehouses uses a denormalised data structure. Denormalisation is the process of improving the read performance of a database by grouping data. It uses fewer tables, which means when you query anything, everything is in one table. That said, because it include data redundancies, it can come at the expense of some write performance.
In the case of a data warehouses, that's a good thing because it allows for better analysis opportunities. You’re able to get a holistic view of your business.
4. How simple or complex are the tables?
As mentioned above, the more normalised your data, the more complex your queries. That directly influences the complexity of your tables and joins.
Thus, for relational databases such as Microsoft SQL Server or PostgreSQL, tables and joins are generally more complex.
Let’s say, for example, you run a few pharmacy retail stores and you need to track everything. Within each store, you have multiple departments, categories, sub-categories, segments and products. You also have different prices and sales figures and even customer data. That’ll involve multiple joins to keep everything in check and functioning correctly.
Compare that to a data warehouse where it’s a much more straight forward process. That’s because you have a denormalised data structure. Your data warehouse doesn’t require several joins across multiple tables because it’s not meant for quick and efficient queries.
Instead, by grouping data together - increasing the amount of data that you consider - you can take a broader view for analytical purposes.
5. Data Warehouse vs Database: What is the storage limit?
When it comes to storage limit, it’s important to consider the software used. Why? Because you can use the same software for a database and a data warehouse.
Of course, while both can use the same software, the way in which each uses it differs. For a database, you can store data as structured fields, tables and columns. That includes multiple tables and joins. From there you can retrieve them, either directly or through programmatic access.
You’d thus need to use software that has storage limit that suits your businesses needs.
Here are a few examples of databases and data warehouses along with their limits:
Microsoft SQL Server 2016 Express has a size limit of 10GB.
PostgreSQL doesn’t have a limit to its database size. There is, however, a limit to the table size - up to 64 terabytes (TB). There is also no limit to the number of rows in a table or the number of indexes you can create on a table
Microsoft Azure SQL Database can hold 240 terabytes compressed on disk. Since this space is independent of tempdb (system database) or log space, it’s dedicated entirely to permanent tables. Because of compression, you can expect up to 1 petabyte of storage.
There are other data warehouses such as Snowflake, which was built for the cloud and scales dynamically. It supports an ANSI-compliant form of SQL.
DotActiv’s Pro and Enterprise solutions come standard with an integrated database so that you can have a seamless flow of data to and from your software. DotActiv software is also compatible with PostgreSQL. Visit our online store here where you can also try either software edition for 14 days prior to purchasing.