Data is the new OIL. If you google “what is data governance?” There are a lot of different definitions that are self sufficient to explain it. In this article, I would like to fill in my perspective on Data Governance. Data Governance is not a new concept and if you have worked in regulated industries during 2007, it was practiced as silos inside the organization (ie individual departments), after 2007/2008 financial crisis, lot of organization changed the perspective towards Data Governance, which geared up to next-level, shifting its focus from individual departments to a centralized organization.

Data Governance – Definition?

Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. 

Source: Talend.com

Data Management vs Data Governance?


In the first place, it is necessary to understand the difference between Data Management vs Data Governance.

Data Governance is a framework that you can use to proactively manage organization data and define how the data can be accessed with a set of processes, roles, policies, and standards.

On the other hand, Data management refers to data architecture, collection, curation, and aggregation of data that is stored and organized in relevant ways. Data governance is an essential piece of component for any organization, but to have successful data governance implemented you need to have data that is collected, structured, and stored appropriately.

Bottomline: Data Management is not Data Governance.

Why is data governance so important for organizations? 


Organizations today store tons of sensitive, regulated, and mission-critical data and every year data is getting tripled and they are damn serious about managing their data including improving data quality, getting meaningful insights from the data, and leveraging the data to throttle the business. So what problems does governance solve?

  1. Data Quality: Organization has large amounts of data piled up and not formatted.
  2. Data Duplication: Organization data is scattered across various internal departments and has duplication. 
  3. Data security and privacy: Who governs the data, For example, everyone inside an organization will not gain access to sensitive data (PII/PCI related)
  4. Metadata Management: Without having the proper process, audit, and control ( creation/implementation .. so on) of all data standards and procedures it will be chaos on who will be having the right control in enabling/executing the policies.
  5. Documentation: Documentation on metadata management, like who is the owner for what. Say for ex: Data architect, Data Steward, Data council, IT Governance council will have a different set of access to data, and new joiners or an existing employee within the respective group will get to know the exact process and boundaries to access the data.

The answer to the above problem is to have Data Governance implemented and It is the only way the organization can take control of its data and realize the full value of the data and improve profitability for the organization.

Within the sunshade of Data Governance, an important role will be played by  Data Stewardship, who will be involved in defining policies and providing advice on implementing the policies.

Data Governance real-world example


Let me explain data governance with a real-time example: 

Consider books are data assets in the library, if the books are not properly indexed in the library you will end up in the chaos for picking the right book you are looking for. So we need a librarian to manage all this stuff.

Consider librarians as Data governance experts who will specify how certain types of data will be stored archived, backed up, and protected with proper data governance that can minimize the risk of security breaches and achieve Data compliance.

So Data Governance experts along with other experts inside the organization will develop a data governance policy that defines methods, technologies, and behavior for properly managing data. 

What is Data Governance Framework ?


A Data Governance framework is a set of rules which are aimed to bring everyone under one umbrella and if it is implemented appropriately, it will control the data quality across the entire organization and it is iterative.

Below are some of the key areas which should be focused to build a successful data governance framework.

  1. Mission & Vision

    Focus on what data you have to manage.

  2. Policy

    Data governance policies will sketch on how to Manage data (Process – Plan/Design/Deploy/Govern/Monitor and Measurement standards) inside your organization should be followed to get accurate and readable data

  3. Roles and responsibilities

    People and organization bodies like
    Data Architects: Who will define Data architecture/Administration
    Data Owners: Defines and authorizes access to data. 
    Data Steward: Implement policies
    Users: Understands the policies/process and access the data information accordingly.

  4. Process

    Detailing what needs to be done to manage data in terms of Data Security, Metadata Management, Data Quality, Data security, and privacy.

Most Popular Data Governance Tools


Data is an asset. Data governance takes care of managing the data and below are some of the most popular Data Governance tools. Also please go through the magic quadrant (metadata solutions) to find the niche players and leaders in the market.

Collibra Data Governance


The Collibra data governance and catalog solutions help data customers find, understand, and trust their data, ensuring quality and accessibility.  Collibra supports data centers and has the ability and support to orchestrate on multi-cloud platforms like AWS, Google Cloud, and Azure by having fine control over the data management activities across every cloud platform aforementioned. It is a commercial-off-the-shelf (COTS) solution.

Some of the key features offered by collibra

  1. Business glossary
  2. Metadata Management and processing.
  3. Self-service Catalog
  4. Data dictionary
  5. Helpdesk
  6. Policy manager
  7. Stewardship
  8. Workflows
  9. Compliance support including GDPR.
  10. Data Lineage maps, and more.

Informatica Enterprise Data Catalog


EDC is a standalone AI-powered data catalog that provides a machine-learning-based discovery engine to scan and catalog data assets across the enterprise/cloud/on-premise applications. It offers Data security, proactive monitoring on data quality and It is a commercial-off-the-shelf (COTS) solution.

Some of the Key Features offered by Informatica EDM:

  1. Connect and catalog any data asset
  2. AI-powered automation
  3. Data provisioning
  4. Collaboration
  5. End-to-end data lineage
  6. Integrated data quality
  7. Data Lineage
  8. Data relationships and recommendations and more

Apache Atlas


Atlas is the first-class metadata management and governance framework for organizations to build a catalog of your data assets and classify and govern the data assets for data scientists, BA, and the data governance team. It enables the organization to meet the compliance requirement efficiently and effectively ensure integration with the whole data ecosystem. It is an open-source solution.

Some of the Key Features offered by Atlas

  1. Data lifecycle management
  2. Metadata Types & entities
  3. Classification
  4. Lineage
  5. Search & discovery
  6. Security & Data Masking
  7. Audit Store

Compare and Contrast – Data Governance tools


Please refer to the below link to get detailed comparisons of EDM tools in terms of various Product capabilities

Collibra vs Informatica vs Talend

Conclusion


Data is an asset. Data Governance is like any other competency for organization practice and business and it will drive you the strategies of getting value from the data. 

Bottom line: Data Governance is a strategic competency. Its whole sole purpose is to throttle the business forward. So Maintaining the data without proper process will not add real value to the organization and that’s how you should be thinking about data governance.