Informatica’s Latest Data Lake Management Platform Automates the Fight Against Harmful ‘Data Swamps’

Informatica says the mad rush to create ‘data lakes’ to promote more business use of analytics can sometimes cause more harm than good. The company is stepping up to help firms guard against murky, under-managed and unreliable data.

Tags: analytics, Azure, big data, data lakes, governance, Hadoop, Informatica, integration, management, YARN,

Businesses are increasingly looking to accelerate data-driven insights they can use to differentiate products, improve customer experience, and grow the business. But sometimes, this insatiable desire creates a mad rush for more and faster data lakes – and that can simply cause more harm than good.  


That’s the view of execs from Informatica, who suggest one major downside is the ‘data swamp,’ which contains murky, under-managed, and unreliable data. Questionable data from a data swamp, if ingested or relied upon too heavily, can prove harmful to the business, execs contend.


To cope with this risk, Informatica is releasing technology to better manage Hadoop-driven distributed data lakes. Informatica Data Lake Management “integrates, governs, and secures the data an organization needs to power its business with a data lake,” according to the company’s website. Informatica’s chief product officer, Amit Walia, explained the company’s approach comes as many customers are struggling to keep pristine and reliable data lakes from turning into ‘data swamps’ that contain “inconsistent, incomplete and stale data.”


The idea behind Informatica Data Lake Management is to deliver several key benefits including:

  • Fast and easy integration of more data from more data sources thanks to an expanded library of hundreds of pre-built connectors.
  • User enabled, self-service data quality and governance to ensure compliance with internal controls and external regulations. This lets business users to contribute to ensuring cross-silo data quality and governance.
  • A risk-centric approach to big data security to protect sensitive data via proactive analysis of data use and protection again unauthorized access or data sharing.


With Data Lake Management, Informatica helps organizations quickly and repeatedly turn big data into trusted information assets that deliver sustainable business value. The solution is architected to also enable data analysts to more easily find, prepare, govern, and protect data of any size. They or It also drives (or derives?) business value from Hadoop-based data lakes.


In part, Walia said that hand coding and code generation tools sometimes used to manage data lakes struggle to discover relationships between datasets. “The result is an unmanageable ‘data swamp’ and the typical response is to pile on multiple point products and labor-intensive, manual processes to integrate, govern and secure the data in the data lake,” he added in a statement.


The latest version of Informatica Data Lake Management combines the following products, brought together with a single metadata driven platform:

Informatica Enterprise Information Catalog enables business users to discover and understand all enterprise data using intelligent self-service tools powered by machine learning and AI.


Informatica Intelligent Streaming helps organizations capture and process big data (machine, social media feeds, website click stream, etc.) and real-time events to gain timely insight for business initiatives, such as IoT, marketing and fraud detection.


Informatica Blaze increases Hadoop processing performance with intelligent data pipelining, job partitioning, job recovery and scaling powered by a unique cluster aware data processing engine integrated with YARN.


Informatica Secure@Source reduces the risk of proliferation of sensitive and private data with enhanced monitoring and alerting based on both risk conditions and user access/activity.


Informatica Big Data Management is deployed in the cloud with the click of a button through the Microsoft Azure Marketplace to integrate, govern, and secure big data at scale in Hadoop.


Informatica Cloud Microsoft Azure Data Lake Store Connector helps customers achieve faster business insights by providing self-service connectivity to integrate and synchronize diverse data sets into Microsoft Azure Data Lake Store.

“This approach tackles head-on several popular business use cases for data lakes, including marketing and fraud detection,” Walia added.


One early adopter of Informatica’s data lake capabilities for fraud detection is US Bank. Rakesh Kant, the bank’s Head of Enterprise Data Management and Analytics said US Bank was able to rapidly transition mainframe-based processes to Hadoop, while continuing to ensure compliance with audit and security requirements.