em360tech image

An enterprise data warehouse (EDW) is exactly what the name suggests: a centralized repository for business data. More importantly however, an EDW has the potential to be a valuable strategic asset; providing a foundation for your most ambitious tech projects, boosting your decision-making capabilities, and improving efficiency across your organisation.

Learn more about what an EDW comprises, why it’s important, and how you can put it to work…

EDW Definition: What is an Enterprise Data Warehouse

An EDW is a centralised repository for consolidating an organisation's historical business data, deriving from all of that organisation’s data sources and applications. It comprises a series of databases storing structured data, from which business users can perform queries and generate insights.

Read: Top 10 Data Warehouse Tools for 2024

Why Is EDW Important?

Estimates suggest that the volume of data across the globe doubles every two years. However, the fact that your organisation is generating lots of data does not automatically translate into a business benefit. You need a way to unlock its value: something that a large proportion (70%) of companies struggle to achieve.

This is where an enterprise data warehouse comes in. Get it right, and an EDW will help you achieve the following:

EDW should give you a single, integrated repository of data from all areas of your organisation; i.e. a single source of truth.

You should be able to scale up your EDW as needed, to handle greater volumes of data, and a more diverse range of data sources.

It gives you a good bedrock for most of the data-driven digital transformation projects you have in the pipeline, including Business Intelligence (BI), AI-driven advanced analytics, and projects that draw on machine learning.

What Does an EDW Consist of?

edw

EDW deployment options

EDW can be hosted on an organisation’s own data centers (On-premise EDW). It can also be hosted through cloud services (Cloud EDW). There is no right or wrong option here: IT decision makers tend to choose on-premise if they need tight control over data and infrastructure, and cloud options if they prioritise flexibility and cost-effectiveness.

EDW components

An enterprise data warehouse broadly comprises the following:

Data sources

Some of the most common types of data sources that typically feed into an EDW include the following:

Transactional databases used for day-to-day activities (e.g. orders, inventory, marketing, point of sale, and accounting systems).

Departmental data. This includes exports from smaller, more specific repositories and files focused on a particular area of the business. Examples include performance metrics and supporting data from sales, finance, or marketing.

Third-party app data. Examples include data from customer relationship management (CRM) tools, enterprise resource planning (ERP) solutions, and HR management systems.

External data. Examples include industry benchmarks, general market information, and data on customer trends.

Data ingestion

There are two main methods for taking data from its source and delivering it an EDW. These are referred to as Extract, Transform, Load (ETL), and Extract, Load, Transform (ELT).

The difference between these two approaches relates to the point in time at which you actually ‘transform’ the data (i.e. clean it, fix any errors, remove duplicates, and format it so it’s optimised for its intended purpose). As the names imply, an ETL tool transforms the data in a staging area before you load it into your EDW. With ELT, transformation takes place after you’ve loaded your data into the EDW.

Tip: for a primer on ETL and ELT, and the circumstances under which each method is most useful, it’s definitely worth taking a look at our guide, ETL vs ELT: What’s the Difference?

Staging area

If you are using ETL, the staging area is that component of your EDW where the data gets cleansed, de-duplicated and generally prepared to fit your EDW’s data model.

Storage area

Post extraction (and transformation in the case of ETL), your data is loaded into your EDW’s storage layer. This is where it is organised according to whatever schemas enable you to use and query your data as efficiently as possible; for instance, using fact tables for basic transactional data) and dimension tables - complete with descriptive attributes - for more complex data types.

This storage area is often partitioned into smaller data marts (for instance, by geographical origin, subject area, business function, or data age). This helps ensure that when a user submits a query, the results come up quicker, as less data has to be scanned.

Presentation layer

This EDW component is what users interact with day-to-day. This dashboard provides a hub for users to submit queries, and pull out pieces of information. Using an approach known as Change Data Capture (CDC), you can also track any changes made to your original source data in the EDW, before propagating them with real-time or near-time to destination applications (advanced analytics solutions, for instance).

What Is the Difference between Enterprise Data Warehouse and Data Lake?

edw

As we’ve seen, your enterprise data warehouse is where you store structured data, where it is cleaned, organized and formatted, so it’s rendered usable for whatever applications you intend for it.

Like EDW, a data lake is another form of repository. But unlike EDWs, a data lake has no fixed limitations on data characteristics (e.g. format, or file type), on data volume, or on data purpose. This makes a data lake more of an all-purpose holding pen for all categories of raw (and sometimes ‘messy’) data which you may or may not want to utilise later.

The types of data commonly stored in data lakes include the following:

Web data

Granular transactional data (e.g. detailed purchase logs)

IoT (Internet of Things) data; i.e. potentially vast volumes of data from sensors, monitors, and location data

Machine Learning data; potentially masses of data - much of which might not be formally categorised - but which may be of value when training any machine learning models your organisation may have in the pipeline (customer-facing chatbots, for instance).

Archived historical data. This data is no longer needed or accessed frequently. However, you still need to keep it (for regulatory reasons, for instance).

EDW in Context: Your Takeaway…

Your enterprise data warehouse provides a repository for storing structured data, and rendering it usable for specific activities. This EDW will very likely be organised into sub-categories known as data marts.

Alongside this, it may be useful or necessary to have a data lake in play to store potentially masses of unstructured data. This data can be transformed (using ETL or ELT) and fed into your EDW as and when required.