Tuesday, September 8, 2009


Better Information through Master Data
Management - MDM as a Foundation for BI


INTRODUCTION

Business Intelligence systems are designed to help organizations understand their operations, customers, financial situation, product performance, trends and a host of key business measurements. This information is used to make decisions about organizational direction. Poor intelligence results in poor decision making. The costs can be enormous. Over the past several years, a serious effort has been made to understand the root cause of much of this poor quality business analytics. Most organizations and analyst now agree that the basic reason the reporting is wrong is that the operational data feeding the analytical engines is filled with errors,
duplications and inconsistencies. If the poor quality reporting is to be fixed, it has to be fixed at its source – poor quality data under the applications that run the business. This is the Master Data. The solution to this overarching problem is Master Data Management. MDM is the glue that ties analytical systems to what is actually happening on the operational side of the business.

This paper will examine the nature of master data, how errors are introduced, and how those errors impact analytics. We will discuss the key capabilities in Oracle’s MDM solutions that enable it to 1) clean up poor quality data, 2) keep the data clean in the face of massive ongoing data changes, and 3) provide the necessary information about the data to the analytical side of the business.

In order to understand how MDM capabilities are used to solve the BI problem, we first need to understand the nature of master data.

ENTERPRISE DATA

An enterprise has three kinds of actual business data: Transactional, Analytical, and Master. Transactional data supports the applications. Analytical data supports decision-making. Master data represents the business objects upon which transactions are done and the dimensions around which analysis is accomplished.

Transactional Data

An organization’s operations are supported by applications that automate key business processes. These include areas such as sales, service, order management, manufacturing, purchasing, billing, accounts receivable and accounts payable. These applications require significant amounts of data to function correctly. This includes data about the objects that are involved in transactions, as well as the transaction data itself. For example, when a customer buys a product, the transaction is managed by a sales application. The objects of the transaction are the Customer
and the Product. The transactional data is the time, place, price, discount, payment methods, etc. used at the point of sale. The transactional data is stored in OnLine Transaction Processing (OLTP) tables that are designed to support high volume
low latency access and update.



Solutions that focus on managing the data objects under operational applications are called Operational MDM. They bring real value to the enterprise, but lack the ability to influence reporting and analytics.

Analytical Data

Analytical data is used to support the company’s decision making. Customer buying patterns are analyzed to identify churn, profitability and marketing segmentation. Suppliers are categorized, based on performance characteristics over time, for better supply chain decisions. Product behavior is scrutinized over long periods to identify failure patterns. This data is stored in large Data Warehouses and possibly smaller data marts with table structures designed to support heavy aggregation, ad hoc queries, and data mining. Typically the data is stored in large fact tables surrounded by key dimensions such as customer, product, account, location, and time.

Solutions that focus on managing dimensions data are called Analytical MDM. They master shared entities such as financial data hierarchies and GLs between multiple DW/BI systems across domains. Oracle’s Hyperion DRM is a market leading solution in this area.

Analytical MDM products bring real value to the enterprise, but lack the ability to influence operational systems.

Master Data

Master Data represents the business objects that are shared across more than one transactional application. This data represents the business objects around which the transactions are executed. This data also represents the key dimensions around which analytics are done.

Maximum business value comes from managing both transactional and analytical master data. These solutions are called Enterprise MDM. Operational data cleansing improves the operational efficiencies of the applications themselves and the business process that use these applications. The resultant dimensions for analytical analysis are true representations of how the business is actually running. Oracle, with its recent acquisition of Hyperion, provides the most comprehensive Enterprise MDM solution on the market today. The following sections will illustrate how this combination of operations and analytics solves key business problems.

The Data Quality Problem

On the operational side of the business, data is entered manually by thousands of employees across a large number of departments. This is error prone. Many poor data quality problems begin at this point. In addition, each department has its own rules. For example the Sales department rules for entering customer data into its sales automation application are quite different from the Accounting department rules for customer data entry into its Accounts Receivable application.

Another key characteristic of Master Data is that it is not static. It is in a state of constant change. Based on a variety of sources2 we see an average of 2% change per month. Given the amount of master data in the world, this represents a












































significant number of updates to Master Data. For example, across North America, in any given day:

• 21984 individuals and 1920 businesses will change address

• 3112 individuals and 32 companies will change their name

• 1488 individuals will declare a personal bankruptcy, and 160 corporations will fail

• 46152 individuals in the US will change jobs

• 1200 business telephone numbers will change or be disconnected

• 896 directorship (CEO, CFO, etc.) changes will occur

• 96 new businesses will open their doors

Product data has a similar change profile. 20% of all parts data created in a year are duplicates. This leads to a 60% error rate for invoicing. Financial data adds yet another dynamic dimension in the many hierarchies that exist for accounts and chart of accounts just to name a few.

These represent changes to master data on customers, suppliers, contacts, locations, employees, citizens, competitors, distributors, partners, accounts, households, etc. Items like credit worthiness, vendor viability, and bill to address are always in a state of flux.

The operational side of the business must keep up with this constant change or business processes break down. If one application sees the change and another one doesn’t, the process step across these two applications will break down.

To help illustrate the depth of this problem, a recent survey3 by The Data Warehouse Institute (TDWI) polled over 800 organizations to measure the impact of poor quality master data. The question was simple: “Has your organization suffered problems due to poor quality master data?” 83% said yes. What’s more, when assessing the impact of this poor data quality, the number one business problem was inaccurate reporting.



The MDM Solution

Fixing poor data quality at its source and managing constant change is what Master
Data Management is all about. MDM is a modern architecture designed to
eliminate poor data quality under heterogeneous IT application landscapes. Oracle’s MDM employs powerful prebuilt data models that support operational workloads and service oriented architectures (SOA). It provides tools such as fast and secure parameterized search engines; duplicate identification, elimination and prevention; data attribute survivorship; data quality rules engines; hierarchy management; data standardization; real time change management; and data synchronization. It
employs interfaces to third party data augmentation and address standardization providers. And it builds cross-references for federated data and golden records for centralized data. Quality customer data is made available to the Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) applications. Quality product data is made available to the Product Lifecycle

3 Master Data Management: Consensus-Driven Data Definitions for Cross-Application Consistency, Philip Russom, Sr. Manager of TDWI Research, online at www.tdwi.org/research/reportseries



Management (PLM) and ERP applications. And quality dimensions, cross- references, and hierarchies are made available to the BI applications.

A number of other attempts to deal with this fundamental BI problem have come to market over the past several years. Real time materialized views that automatically populate OLAP cubes; new anomaly detecting data mining techniques; real time decisions via dashboards; and modern Enterprise
Performance Management (EPM) tools all continue to operate on poor quality data and continue to give the wrong answers. Near real time feeds to the data warehouse help with the currency of the data. SOA enabled data in extract, transform and load (ETL) tools and the data warehouse helps make key information available to a
wider audience. But the quality of the information continues to reflect the poor quality of the sourcing data. Operational Data Stores (ODS) attempt to do some cleansing, but don’t provide the OLTP table structures, access methods and tools required to support real time operational environments. These represent attempts to deal with an operational data quality problem on the analytical side of the business.

The following sections cover the interfaces between MDM and the Data Warehouse (DW). An example is provided that will help illustrate why MDM is the only technology that can successfully deal with this fundamental root cause data quality problem impacting BI across the board.



MDM - DATA WAREHOUSE INTERFACES

MDM holds accurate authoritative governed dimension data, the actual operational data cross-reference, and the hierarchy information on all key master data objects. These represent the foundational interfaces between MDM and the DW.

Dimensions

MDM holds quality “governed” dimensions for Customer, Supplier, Product, Location, Distributor, Asset, Account, Employee, Citizen, Parts, etc. Utilizing data standardization, duplicate identification and merge capabilities, a single version of the truth about each dimension is created. When fed to the DW, these represent “Dimensions on Steroids”. They can be moved into the DW, or used to facilitate joins across the MDM and DW data stores. EPM, Dashboards, Reports and ad-hoc queries produce better information when BI utilizes the “trusted” MDM dimensions.

Cross-Reference

MDM holds the corporate cross-reference for key dimensions such as Customer and Product. MDM maintains the ID of every connected system with its Source System Management capabilities, and it maintains the ID of the object in each connected system. The cross-reference capabilities include understanding multiple duplicates in each system and across systems. It maintains this cross-reference
even as it eliminates duplicate records via merge processes. When the DW uses this master cross-reference data, it correctly combines the trickle fed entries for accurate fact table reconciliation. This is key for accurate reporting and analysis. Fragmented data not recognized as the same entity to the BI applications can lead to misleading
results and poor decision-making.



Hierarchies

Hierarchy information is critical for proper rollup of aggregate information in the BI tools. Operational MDM holds the official hierarchy information used by the operational applications. This hierarchy information is needed for the proper functioning of key business processes such as sales, catalog management, and accounts payable. In addition, Analytical MDM takes these clean governed operational hierarchies and manages multiple alternate hierarchies across multiple dimensions with appropriate cross-domain mappings (product to cost centers, customer to product bundle, supplier to purchasing department, etc.). This is critical for accurate reporting out of the downstream analytical applications. When the data warehouse and the data marts utilizes the hierarchy information provided
by Enterprise MDM, profitability analysis, risk assessments, dashboard information, enterprise performance management budgeting and forecasting are all improved.

Analytics Example

To illustrate how Oracle’s MDM works to create better information, we will use a simplified real world example. The following events represent activity on the operational side of the business.
1. Mary Smith buys a blue VN-Sweater for $50 from Old Navy on June
3rd.

2. The next day, Mary Evans sees the identical sweater (labeled RF- Sweater) at Banana Republic and buys it for $45 for a friend.

3. Acme, Inc. supplies Old Navy with their VN line of sweaters.
4. AI Corp supplies Banana Republic with their RF line of sweaters. We have:

Customer Product Retailer Supplier
Mary Smith VN-Sweater Old navy Acme, Inc.
Mary Evans FE-Sweater Banana
Republic AI Corp



Star Schema

A trickle feed into a FACT table in the data warehouse would look like this:








Adding the Dimensions, we would have the following Star Schema:



Query Results

A few ad-hoc queries on this schema would produce the following answers:

What is the average revenue per customer? $47.50
Who is the most valuable customer? Mary Smith
How much did the most valuable customer spend? $50
Who is the number one retailer? Old Navy
What is the maximum revenue for any supplier? $50.00

The Data Quality Problem

We have seen that the operational environment is very dynamic and duplicates are often hard to identify. Reorganizations can change corporate hierarchies over night. Consider the following facts:

• Mary Smith married Mr. Evans and changed her name to Mary Evans after she bought the sweater from Old Navy. She is, in fact, the same person who bought the similar sweater from Banana Republic the next day.

o Understanding these dynamics requires fact based knowledge management, duplicate identification, survivorship rules, and cross- referencing.

• Old Navy and Banana Republic are both subsidiaries of The Gap.

o Dealing with this kind of information requires hierarchy management.

• AI Corp is an alias for Acme, Inc. They are in fact the same supplier.

o This requires supplier data quality management, duplicate identification, and cross-referencing.

• VN-Sweater and RF-Sweater are two ids for the same actual item.
o This requires product data standardization and cross-referencing. Oracle’s MDM solution is designed to understand these facts and accurately reflect
this reality.

MDM CAPABILITIES

The following sections highlight the key MDM capabilities supporting BI.






Data Model

The MDM data model is unique in that it represents a superset of all ways master data has been defined by all attached applications.

It has the flexibility to accommodate organization and industry specific extensions. The model is tailored to map to the way organizations do business. It holds all necessary hierarchical information, all attributes needed for duplicate identification, removal and prevention, as well as cross-reference information for all attached operational systems.


In our example, the single master schema holds customer data in both business-to- business (Old Navy, Banana Republic) and business-to-consumer (Mary Smith, Mary Evans) formats. In addition, it holds the master supplier data (Acme, Inc, AI Corp) and retail product data (VN-Sweater, RF-Sweater). The names and all needed attributes are maintained.

Change Management

In order to deal with real time changes to master data, such a the marriage of Mary Smith to Mr. Evans, Oracle’s MDM solution includes a real time Business Event System (BES). Any change to master data attributes triggers a business event that in turn invokes a workflow process. The workflow process builds appropriate XML payload packages and executes the configured steps for the particular data change.

In our example, the introduction of Mary Evans triggered a ‘New Customer’ event. This kicked off a workflow to populate Mary’s record with all available information. For example, it may have requested address validation from Trillium (or other
postal address verification vendor) to insure that all addresses are mailable. Standardized addresses also aid in duplicate identification. The workflow may have requested data augmentation for credit ratings, or obtained an AbiliTec ID from Acxiom to assist with duplicate identification. This is done in real time.

Person Duplicate Identification

Oracle’s MDM solution for customer data is Customer Hub. It comes with a
variety of mechanisms for finding duplicate customer records. A primary technique is to configure a rules engine to find potential matches using a large number of customer attributes. In our example, Old Navy has entered Mary Smith as a customer. Her master ID is 551. The Customer Hub manages Old Navy as a
source system (ID = ON) and records Mary Smith’s ID in that system as 1234. Mary Evans is similarly managed. This is the base for the MDM cross-reference.





MDM utilizes all available attributes to determine if these are duplicates. Typical match rules will examine addresses, phone numbers, e-mail addresses etc. Additionally, 3rd party data such as an AbiliTec ID from Acxiom may be used. In our example, the system fines that Mary Smith and Mary Evans are indeed duplicates in spite of the different name.

Company Duplicate Identification

Company duplicate identification uses the same general rules engines as the Person duplicate identification. The key difference is that the number and type of attributes available for a company are different. For example, companies can have
a DUNs number provided by D&B. In our example, a search on AI Corp produces a match with Acme Inc.





Alias information was used by out-of-the-box duplicate identification rules.

Duplicate Elimination & Cross-reference

Once the Customer Hub identifies Mary Smith and Mary Evans as duplicates, it eliminates the duplicates by merging the multiple records into one. The cross reference is maintained. Where before the merge, there were two customer records each pointing back to one source system, we now have one customer record pointing back to two source systems.





Attribute Survivorship

Another key capability of the Customer Hub is its ability to manage the survival of customer attributes in the face of multiple sourcing systems and customer record merges. The MDM Customer Hub maintains the source system priority rankings for each attribute. While all records remain in the MDM data store, only the
‘blended’ single version of the truth record is seen by applications and viewers.

Product Standardization

Oracle’s MDM solution for product data is the Product Hub. It uses Silver Creek for product data standardization. This standardization enables rapid and parameterized searching and accurate duplicate identification. In our example, Old Navy uses the string: VN PO 50 Blue W 24W 36B 22A. Banana Republic’s sweater is


identified by: B Wool V Neck Pllver S:36. These records are loaded into the Product Hub schema through Silver Creek’s Data Lens4 . Attributes such as style, color, and size are populated as well as catalog codes. An English description is generated as well as other appropriate languages as needed.





In our example, we see that both products are V-Neck Pullover blue wool sweaters and that they actually have the same ID code. They are in fact the same product and now the MDM system recognizes them as such.

Hierarchy Management

Hierarchy information is critical for proper aggregation and roll-ups. Oracle’s Customer Hub maintains any number of simultaneous hierarchies used by the operational applications. These include Dunn & Bradstreet hierarchies with out-of- the-box connectivity to D&B for both batch and real time information access.





In our example, D&B provides the hierarchy information for Old Navy and
Banana Republic. It turns out that they are both subsidiaries of The Gap.




Updated Star Schema

MDM has identified the customer duplicates; maintained the cross reference back to the sourcing systems across a merge; developed the single golden customer record utilizing survivorship rules; found the two products to be identical; learned that the two retailers belong to one corporate hierarchy; and found through good duplicate identification techniques that Acme, Inc. and AI Corp are in fact two names for the same vendor. If we deliver this updated cross reference and dimension data to the data warehouse, we get the following star schema.







Re-Run the Query

Re-running the same query now get the correct answers:

What is the average revenue per customer? $95
Who is the most valuable customer? Mary Evans
How much did the most valuable customer spend? $95
Who is the number one retailer? The Gap
What is the Max revenue for any supplier? $95

We see that better information has been provided through Master Data Management. In fact, every single answer was wrong without MDM. MDM fixed the data quality problem at its source and delivered quality dimensions to the analytics. No other technology on the market is designed to accomplish this essential task.




Top Ten Example

A more realistic example would be the common ‘Top ten’ query. In this example, we are looking for the top ten customers as measured by revenue. Before MDM was used to clean up the data, understand the hierarchies and provide the needed cross-reference, the query produced the list on the right.
After applying the MDM dimensions, hierarchy information, duplicate removal and cross-reference information, the query was run again. This time correct

results were retrieved. Business decisions based on the first query would have treated Baker as one of the top three customers and Caterpillar would not have been treated as one of the top ten customers at all. But Baker is not even in the top ten, and Caterpillar is the number one customer.




Pre-defined Mappings

Oracle’s MDM not only cleans up and supplies authoritative governed master data to the data warehouse, it supplies this quality master data directly to Oracle BI applications such as OBI EE Dashboards. OBI EE Dashboards are unique in the industry in that they take full advantage of Oracle Applications and their data models by pre-mapping data models into the schema under the dashboards. Since Oracle’s MDM solutions rest on Siebel and E-Business Suite data models, the MDM mappings are inherited and available out-of-the-box.

ANALYTICAL & OPERATIONAL MDM
A more sophisticated example helps illustrate the value of Oracle’s combined operational and analytical MDM capabilities.





The star schema in the data warehouse would look like the picture above. Oracle 11g Data Warehousing tools can automatically materialize OLAP cubes that pivot on each of these dimensions. But in order to pivot correctly, the hierarchies associated with each of these dimensions needs to be understood.


Consider a far-flung advertising agency with a need to understand the performance of its operations for large international customers. Key dimensions include client, company, job, location, employee, organization, and vendor. They want to know how much did a particular employee earn from a top soft drink bottler in Australia on a particular advertising project in Perth.



Client has divisions, products and industry hierarchies. Company has office and department. Job has type and sub-type. Location has country, region and city. Employee has position and user. Organization has chart of accounts, profit centers, cost center and business areas.

Operational MDM is required to provide clean dimension information. Analytical MDM is required to manage the various multiple hierarchies. In combination, they feed the DW and OLAP cube the authoritative master data that it needs to produce the correct answers.

Employee John Doe earned $50,000 off of the Perth project.






Answers to questions like these are difficult to obtain in a heterogeneous IT landscape where the vast majority of business objects needed to support these kinds of queries are scattered and inconsistent across the various applications. Operation MDM must consolidate and cleans the key dimensions. Analytical MDM must manage the multiple hierarchies for each dimension.










CONCLUSION

There are three legs to a complete Business Intelligence solution: 1) the Data Warehouse for holding the operational history; 2) the Enterprise Master Data Management solution for insuring that quality data under those operational applications and hierarchies are supplied to the Data Warehouse; and 3) the BI applications themselves that utilize the DW and MDM data to get clean authoritative information to everyone in the organization that needs it. Without MDM, the solution falls over. Poor decisions based on inaccurate data drive less than optimal performance. Compliance becomes difficult and risks increase.

Oracle MDM provides clean consolidated accurate master data seamlessly propagated throughout the enterprise. This data reflects the actual operations of the organization. It insures that this is the data the BI tools use. It is the glue between the operational and analytical sides of the business. Oracle MDM enables organizations to get a single view of the enterprise for the first time since the application landscape fragmented back in the 1970s. This can save companies millions of dollars a year, dramatically increase operating efficiencies, improve customer loyalty and support sound corporate governance.

In this MDM space, Oracle is the market leader. Oracle has the largest installed base with the most live references. Oracle has the implementation know how to develop and utilize best data management practices with proven industry knowledge. Oracle’s heritage in database, data warehousing, and business intelligence applications development insures a leadership position for integrating master data with operational and analytical applications. These are the reasons why Oracle MDM is a foundation for BI and provides more business value than any other solution available on the market.