Talend Winter '20 Adds AI, Unified Features To Better Reveal Intelligence in Data

Talend Winter '20, the latest update to the company's Talend Data Fabric platform, is adding AI/ML and other intelligence features to improve outcomes and impacts from data analytics pipeline projects. IDN talks with Talend’s Jean-Michael Franco.

Tags: AI, integration, data, machine learning, pipeline, scale, Talend,

Jean-Michel Franco, Talend
Jean-Michel Franco
senior director, product

"Talend’s unified architect delivers a single platform that brings and manages all kinds of data together under one roof."

Intelligent Data Summit
Analytics, Apps & Data for Success in the Digital Enterprise
Online Conference

Talend has released the latest update to its Talend Data Fabric platform is adding several new features, including AI/ML, to more quickly reveal latent intelligence held inside dispersed enterprise data.


The Talend Winter ’20 release delivers trusted data quickly, reliably and at first sight for faster business outcomes, according to Talend execs. 


“The innovations introduced in Talend Data Fabric will provide our customers with dramatically improved efficiency, optimized productivity and scale, and accelerated path to revealing value from data,” said Talend’s Ciaran Dynes senior vice president products in a statement. 


Here’s a list of notable features in Talend’s Winter ‘20 release, and how they deliver value. 


Data Inventory: This new cloud-based app automatically inventories and quality checks data to reveal trusted data quickly and easily. This lets users more easily unlock data silos with efficient reuse and deeper trust. Data Inventory also fosters more collaboration and reuse so that data professionals don’t need to build the same datasets repeatedly. 


Pipeline Designer: This cloud-based pipeline technology adds intelligent data preparation capabilities. This means data engineers, as well as lesser-skilled citizen data integrators, can integrate, standardize, cleanse and enrich any data – all within a single unified application. Data quality is assured “in-flight’ thanks to enhanced capabilities that eliminate quality problems before the data is consumed or replicated. The pipeline does not require coding or complex transformations. 


AI (and APIs): Talend also added intelligent data quality via a combination of explainable AI and APIs. Working together, these features enable several benefits, including (1) automation of more integration tasks, (2) acceleration of cloud project delivery, and (3) deliver data quality at scale. To further trustworthiness of data, Talend Data Fabric also adds an “automatic trust score,” which provides an immediate assessment of data health for every data set.


Exploring Talend Winter ’20 Updates, AI In-Depth 

To get more content for the latest innovations in Talend Winter ’20, IDN spoke with Jean-Michel Franco, Talend’s senior director, product.  


Notably, all Talend Winter ’20 capabilities are delivered in a unified architecture as part of Talend Data Fabric. Talend’s unified architect means “all these capabilities are not siloed but are delivered through “a single platform that brings and manages all kinds of data together under one roof,” Franco told IDN. 

The result is a “unified approach to data and application integration, quality, governance, and data sharing among stakeholders.”  


[Talend Data Fabric is a hybrid platform that intelligently connects, integrates, and shares trusted data at any scale with seamlessly built-in quality and governance. This unified approach aims to convert a mass of siloed data into a consolidated set of trustworthy data accessible to both technical and lesser-skilled users.] 


Let’s get into some details, starting with Talend’s new Data Inventory. This feature can “automatically inventory and quality check data to establish data intelligence quickly and easily,” according to the company.  


Franco explained to IDN how Talend makes sure these “automatic” results are indeed rapid and actionable. “Data Inventory visualizes available datasets with their Data Intelligence Scores, user ratings, data quality ratings and endorsements,” he said. 

Franco also shared under-the-cover details for how Talend makes it all work:

Datasets are augmented with an automatically calculated Data Intelligence Score that delivers an instant assessment of your data health and accuracy based on data quality, data popularity and user-defined ratings. The beauty is that this is all automated or crowdsourced, so this doesn’t require that enterprises have a formal data quality and data governance initiative in place as a prerequisite.

However, customers can add their own metadata. [These includes items such as] their own semantic types on top of the one Talend delivers out-of-the-box for the semantic intelligence [and other] custom attributes to further categorize data sets and improve their searchability.

Talend’s “unified” approach also delivers further benefits from using to Data Inventory, Franco noted. The new service is embedded in Talend Data Preparation (for business users and data analysts. This, in turn, makes it possible for Talend to deliver to users “an excel-like experience,” he added.  


Talend’s new Pipeline Designer also benefit from the unified architecture, Franco added. In specific, pipeline builders “can easily do more sophisticated data engineering with complex data types or streaming data,” he said.  


He mentioned a third case where Talend’s unified approach pays off. Because Talend Winter ’20 can leverage the Talend’s platform existing MDM capacities, it can assure data quality for all insights. “We use our Data Quality backbone with semantic intelligence [which provides the] ability to capture data footprints,” Franco said. 


He shared an example that data specialists may recognize. 

If column A in the data set is more than a string, semantic intelligence automatically detects that it refers to e-mail addresses. It can detect that 10% of the rows in this dataset actually don’t hold valid e-mails, so the data quality is questionable, and this will be reflected in the overall score. We also bring a more detailed automated profiling assessment (see data quality ratings below).

We do this in a systematic way for all incoming data, and we add some crowdsourced information like ratings and comments. In addition, data owners are identified, and they can endorse the datasets that they are responsible for.

Talend’s AI-Powered Data Intelligence ‘Trust Score’ Explained

Drill deep into new levels of data intelligence and you’ll find Talend sports a way to ‘score’ or validate the more in-depth intelligence for accuracy and value.  


“The trust score, or Data Intelligence Score, includes quality, popularity and ratings for all data. It is systematic and automatic, so it would work for master data use cases as well but is not locked into a specific product. It is a calculated metric that doesn’t require AI or ML to be calculated,” Franco told IDN. 


That said, Franco did detail some of Talend’s well-designed updates with AI and machine learning.

Machine learning for data matching allows users to create golden records that are very relevant for Master Data, but it doesn’t require a specific MDM installation. We deliver this as a component in our data quality stack, and it can be used in any batch or real-time pipeline. Customers can decide if they want to store the golden record centrally or across data targets.

Talend’s two “most compelling features” of AI in Talend Winter ’20 are the Magic Fill and Matching with explainable AI, Franco said. 


Both are embedded in Talend’s out-of-the-box components. Magic Fill in Talend Data Prep and Matching with explainable AI in Talend’s Data Quality platform.


Franco explained how these Talend AI features work: 

We also can embed customer-specific AI/ML components in our data pipeline. Through our native Spark support, we even provide components for invoking ML Lib, the machine learning library within Spark. A great example is how we use this at Talend for segmenting our customer base with supervised machine learning using Data Fabric and our Data Stewardship for capturing knowledge from data experts and then running it a scale with ML.

The launch of Talend Winter ’20 comes as a report from IDC finds data professionals are spending up to two-thirds of their time at work simply searching and preparing data, according to the company.