Informatica’s Intelligent Data Platform To Offer ‘Virtual Data Highway’ for Structured, Unstructured, Machine Data
Informatica Corp. is working on an innovative Intelligent Data Platform, designed to spur the delivery of all types of data to apps, people and even devices. The first three next-gen solutions to use this smart “virtual data highway,” which can provide intelligence, visibility and management for data, are due to ship by the end of 2014 and in early 2015. IDN speaks with Informatica execs.
by Vance McCarthy
"This is the era of data. So there needs to be a new, more intelligent, data-focused way of doing things."
Informatica Corp. is working on an innovative Intelligent Data Platform (IDP), designed to spur the delivery of all types of data to apps, people and even devices.
Informatica’s IDP creates what the company calls a “virtual data highway,” and will do much more than transport data across an enterprise. It will provide intelligence, visibility and management services that can work with many of today’s popular data types and formats – traditional RDBMS, web, social, big data (Hadoop, etc.), machine data or web logs.
IDP’s ability to let companies more easily leverage multiple data types across multiple platforms will unlock untapped value from big data, data security, hybrid IT (on-premises to cloud integration), predictive analytics, real-time operational insights and universal MDM, Ronen Schwartz, vice president and general manager of Informatica Cloud told IDN.
Integration & Web APIs
“The potential of that data is limited by yesterday’s approaches . . . that can’t keep up with the growth of applications, information consumers, and smart devices,” Schwartz said. “This is the era of data, and so we believe strongly that there needs to be a new, more intelligent, data-focused way of doing things. With the Intelligent Data Platform, we have put a lot of effort there.”
IDP’s rich set of capabilities will infuse any data type with MDM, data quality, transforms, integration and even security. Further, because Informatica’s IDP highway is built on a virtualized foundation layer, it can span any infrastructure – traditional on-premises, data grids, Hadoop clusters, cloud – and even a future platform that hasn’t been invented yet.
One especially exciting capability is Informatica IDP’s approach to putting structured, semi-structured and unstructured data on the same semantic wavelength. IDP lets these vastly different data types understand one another using a recipe of mapping, metadata, heuristics and pattern matching.
IDP’s approach to machine logs is a good example. “We are actually going to go through hundreds and thousands of lines of log and understand the patterns. We also use some of our own matching technology [to suggest] what are these fields,” Schwartz said. Just in case that might not be enough to help customers get a strong view of their machine data, IDP “will also let users add their own vocabulary,” he added.
In specific, Informatica engineers are building the IDP data highway with capabilities to:
- Use automation and recommendations to guide data to the right app, person, or device
- Streamline how companies harness data they own, generate, process or store
- Offer a high degree of self-service to non-technical business users, allowing them to find, map, integrate, consume, and analyze all types of data
- Consistently and continuously ensure data is clean, secured and connected
- Master relationships between data across multiple systems
- Leverage existing infrastructure, avoiding the need to rip and replace
- Support hybrid IT, with availability for both on-premise and cloud
- Easy scale from individual users, department and entire enterprises.
Informatica IDP To First Focus on Three Solutions
At the outset, Informatica is optimizing IDP for three popular scenarios or use cases, which are slated to ship by the end of the year Swartz said. They are:
Self-service data. Code-named “Springbok,” this looks to help non-technical users be more self-sufficient when it comes to prepping data for business use. It will bring automation, intelligence and guidance to many common data tasks, including locating, enriching, cleansing and even standardization. The result will be a more self-service approach to making sure data is high-quality, self-cleaning, enriched, relevant, complete and trusted, Schwartz said.
“Why would we ask you to parse data? You should only choose between multiple options that some automatic machine has already parsed and recognized,” Schwartz told IDN. “Why would we ask you to map two data structures, if we can already give you recommendations? We think the human brain should focus on the complex tasks, rather than the drag and drop between right and left.”
Data-centric security. Called Secure@Source, as the name implies, looks to secure data at the source or origin, before it is copied or distributed. This project adds an additional layer of security and governance to protect sensitive data across the lifecycle of how and where data will be used. The Secure@Source solution will discover, locate, and tag sensitive data where it resides – and then map it where it proliferates. “It will discover all instances of sensitive data, visualize the risk, and map the common source of proliferation so data can be secured at its source and throughout its lifecycle,” Schwartz said.
In Informatica’s view, a new, data-centered approach to security is becoming crucial, especially in an era of cloud, mobile, analytics and even Internet of Things, where data is constantly moving between on-premise and off-premise boundaries.
Informatica CEO Sohaib Abbasi said the time is right, putting it this way during the company’s Informatica World user conference this spring. “The traditional defense to secure the perimeters against intruders is no longer sufficient. In the new world of pervasive computing there is no perimeter. Secure@Source, addresses this board-level priority – securing data by identifying sensitive data, assessing risks and protecting information assets,” he said.
“Secure@Source will also let users visualize risks in real-time, like they’ve never been able to before, by analyzing data derived from activities and patterns,” said Sachin Chawla, Informatica’s senior vice president of engineering for cloud integration. A dashboard provides “a data risk heat map” by correlating machine logs and data flows, information about stages and hops that data moves through, how much data is being used, by who and how frequently.
Secure@Source’s added visibility and control let companies develop and ensure their access policies are enforced and always tied to compliance, regulation and governance requirements. Users can also more easily align security-related rules to datasets, such as mask, alert, block, encrypt and tokenization.
Perhaps most intriguing, it will also offer powerful solutions to let customers take their existing fraud detection to the next level – real-time fraud prevention, Chawla said. “With [IDP] you can tie and process all types of data together to get a picture you haven’t easily been able to see with older architectures.”
As an example, Chawla said combine IDP with Hadoop and users can retrieve, process and correlate all types of data to protect against fraud. “You can watch how users interact with their ATM, for example, and watch for different navigation or paths. If you see a user change their path for how they withdraw money, that might be a sign that the card is stolen,” he noted.
This is not theoretical. “This type of [interaction] has already happened to me. I navigated to a web site in a different way than I usually do. They challenged me with a second challenge-response, after I was already on the site,” Chawla said. “With all this data in real-time, I can see you are doing something differently. In the old world, you could never capture all that data and have time to react.”
Managed Data Lake. This aims to balance the needs of both business users (who need fast and simple access to clean data) and IT (which needs data management, governance and policy compliance). IDP takes a lifecycle approach to delivering core technologies that can support these managed data lakes, Schwartz said.
Notably, the Informatica uses a “data refinery” to cleanse and refine data. It also includes a “sandbox” where users can test the results, and do further cleansing or shaping, as needed. The idea is the managed data lake speeds up data provisioning so it can be used by many people, apps and devices, he said. For IT managers, it offers automated cataloging, low-latency and scalable storage and processing.
Inside the Architecture of
Informatica’s Intelligent Data Platform
“IDP intends to deliver self-service to business users -- without compromising the quality, management and security features IT needs,” Schwartz added.
Informatica’s Abbasi stated how IDP balances these need to deliver ‘ease of use’ for the business and ‘control’ for IT. IDP as a “data infrastructure layer [that] “provides comprehensive . . . services to manage your data.” It will also enable “self-service discovery and provisioning to/with the data consumer’s tool or application of choice,” he said.
Architecturally, IDP strikes this balance using three distinct layers that work together.
First, at the foundation, IDP is built atop the company’s Vibe virtual data machine version of Informatica’s long-proven data processing engine. Because it is now embedded, it enables near universal access to data, regardless of location, format or origin. Because it is a virtual machine, it can be run on any on-premises server platform, grid, Hadoop clusters or cloud.
Schwartz said with all its features, Vibe is a key differentiator. “Vibe has allowed us to move our on-premises capabilities to the cloud or other platforms, such as Hadoop [clusters], very, very fast. The only way a company could do that without a massive spree of acquisition, is to have the right architecture. Once Vibe engine became multi-tenant and able to run in the cloud, this is why we’ll be able to build innovations, like [IDP] on top very, very fast,” Schwartz added.
Second, IDP has a data infrastructure layer that runs right atop the Vibe virtual machine layer. This comprises all the data services designed to automate the continuous delivery of clean, safe and connected data at any scale to any platform, grid, Hadoop clusters or cloud.
Third, is a data intelligence layer. It collects metadata, semantic data, usage information and other attributes from across the platform. Once the data is collected, it also automatically organizes the data to make it easy for business users to extract value. Key uses include analytics, visibility, BI and real-time operational intelligence. For ongoing insights, this layer also infuses machine learning into IDP’s capabilities.
Informatica plans to release pre-packaged offerings and reference architectures to general availability by the end of this year, Schwartz said. Meanwhile, Informatica partners are already lining up to work with the company on IDP-based solutions. They include Cognizant, Capgemini UK, Datawatch, MicroStrategy, Qlik, Tableau and Ultimate Software.
- SAS Enters Era of ‘Open Analytics’ with Viya Platform’s Focus on Cloud, Open Programming and Machine Learning
- Informatica’s Latest Data Lake Management Platform Automates the Fight Against Harmful ‘Data Swamps’
- Pepperdata Casts a Bright Light on Amazon Elastic MapReduce; Reveals Hidden Cost, Performance Metrics
- Splunk Marries ‘Machine Data’ with ‘Machine Learning’ for New-Gen Insights for IT Ops, Business, Security
- Progress DataDirect To Offer ‘Day One Support’ To Keep Pace with Changes to Popular Big Data Platforms