Talend: Ongoing Success with Data Insights Needs 3 Ingredients – Data Integration, Portable Development & User Self-Service

If companies want to be a data-driven, analytics machine they need to make data delivery to business users faster and easier. They also need future-proof technologies to adapt to rapid change.  IDN talks with Talend CMO Ashley Stirrup to learn the 3 ingredients for success.

Tags: analytics, Azure, Cloudera, big data, data integration, data prep, machine learning, self-service, Talend,

Ashley Stirrup, Talend
Ashley Stirrup

"To be data-driven at scale, companies really need to focus on how do we get the business easier access to data as fast as possible."

Intelligent Data Summit
Manage Expanding Data Volumes for Analytics & Operations
Online Conference

In a series of product launches and partnerships this summer, Talend is rolling out its vision for how companies can become data-driven firms – and achieve long-lasting business benefits from any investments in big data or analytics.


Talend CMO Ashley Stirrup said sustained success is more than simply putting business requirements first. Rather, it requires technologies to help IT and business ‘cross the lines’ so they not only collaborate better -- but can take on some of one another’s traditional tasks.


“A lot of companies make the mistake of looking at the business requirements and solving for those -- rather than realizing how quickly their entire environment will change,” Stirrup told IDN. “So instead, we recommend that companies first look at finding 'future-proof' solutions. That to me is a fundamental requirement.”


“Companies need to recognize that a successful data-driven strategy has to go well beyond the IT developer or data scientist,” he added. “To be data-driven at scale companies really need to focus on how do we get the business easier access to data as fast as possible.”


For Talend, being a data-driven company with a future-proof solution has meant focusing on 3 key elements:

  • A developer solution that let companies create agile and fast-performing data pipelines, as well as support development that portable and that will scale 
  • Capabilities that enable more self-service by business users; and
  • A growing ecosystem of partnerships that empower companies to put use multiple data types, put it where they want – and mask a lot of that complexity

Talend quickly discovered that delivering results from this three-legged-stool of capabilities required a unified approach. “We needed to avoid the trap of following the market, which discussed individual products for individual problems. So, we talk about [data] integration holistically, especially as it relates to going from raw data to self-service business insights,” Stirrup said.   


The first step to a future-proof environment for analytics, he added, comes from understanding the value of  data integration to delivering analytics, along with support for hybrid and multi-cloud options.

“Ten years ago, companies would pick a data warehouse tool and they would have confidence they would be using that for a decade or two. Now it’s totally different,” Stirrup said. “Change is unprecedented. Today, Hadoop, Spark, and machine learning are offering many advantages, so you need to think about having an integration tool that lets you plug into the data you need wherever it is, as well as support variety of cloud platforms.”


Talend is delivering on this capability, Stirrup noted, with its latest edition of Talend Data Fabric, a unified data-driven platform that combines capabilities for big data, data integration, cloud, master data management and application integration.


Talend Data Fabric offers a pre-integrated set of capabilities all working together to deliver a highly-performant data pipeline that can plug-and-play into a variety of data targets and sources. The benefits from this unified approach aim to deliver multiple benefits across an end-to-end analytics lifecycle, Stirrup said, including powers to:

  • Simplify cloud data pipeline creation for developers
  • Easily integrate streaming and historical data for contextual insight
  • Quickly migrate on-premises data to multiple clouds
  • Scale data quality for big data using Spark-powered data matching and machine learning
  • Improve DevOps productivity for managing security and configuration on big data clusters
  • Empower non-technical (business) self-service for data prep, data stewardship and insightful outcomes


“Bringing all these technologies together into Talend Data Fabric’s unified platform means that users can work with, and more easily combine, historical and real-time data into a single master view – all without needing to work with a bunch of discreet tools from different providers,” Stirrup added.


The Summer 17 edition of Talend Data Fabric, released last month, adds support to seamlessly manage information across Amazon Web Services (AWS), Cloudera Altus, Google Cloud Platform, Microsoft Azure, and Snowflake platforms. This lets customers more easily adopt cloud and multi-cloud environments for their big data projects. “This version [of Talend Data Fabric] will benefit both IT and business, by enabling organizations to more simply and rapidly integrate, cleanse data – and get to the valuable job of analyzing data much more quickly,” Stirrup said.

One customer, BeachBody, used the Talend Data Fabric to accelerate their ability to load massive amounts of data into data lakes, and thereby get to insights more quickly. In one example, BeachBody used Talend Data Fabric to create five data pipelines for handling five different datasets. The data could by dynamically read, eliminating the need to build data integration for each job. “They could just point the pipelines at different targets and it worked,” he added.  


Stirrup also added one more virtue from Talend’s unified approach.


“Not only are we giving people the flexibility to move from one cloud platform to another - we take the native code along, so you can get the full benefits when you migrate, whether from on-prem to the cloud or to different clouds. Just change a couple settings and you’re running [on the new environment].”


Even as APIs gather strong momentum as a way to promote certain portability benefits, Stirrup says there are trade-offs to be considered. “If you rely on standard APIs, you don’t get the full richness of the solution you built,” he said.


[Talend has shown expertise in this level of portability, having been the first to support data integration running on Apache Spark (and offering tools to support moving native code from MapReduce to Spark), Stirrup said.]   


Talend Says Successful Self-Service Needs ‘Right Tools’ for the ‘Right People’ 

 To Let Talend’s approach also includes a focus on self-service, from a persona-style perspective. 


“The key to successful self-service for business users is to give the right tools to the right people,” Stirrup said. “We don’t expect a business users to become a data scientist. But, we’ve made it easy for a business person to be able to access data themselves and make rules for how to do data prep so it is ready for analytics,” Stirrup said.


In a nod to other BI / analytics firms, he noted, “Tableau and others have proven how important self-service is, and no question you need those same capabilities in the data integration space.”


On that point, Talend’s approach is to provided a more nature hand-off between IT and business, with business taking up a bit more work. This sets the stage for a  ‘cross-over’ from business to the IT side, Stirrup said.   “We let IT use APIs to connect to new data sources, grab that data and possibly do some light data processes – but that’s it. At that point, our approach is to hand it over to business users,” he added.


To pick up from IT’s work, Talend provides a data prep tool with a web based UI so a business person can design the rules on how to work with the data – and even what data is good and trustworthy (or questionable or bad). Beyond tools that are easy to learn and use, self-service also requires an attention to what Stirrup called “the end-to-end problem.” By that he meant: “How much time is spent on cleaning the data, getting it trusted, this all went into our effort to get self-service right.” 


Further, Talend also provides data stewardship, which can tell whether a new entry should be mapped to an existing account record or whether it is truly new. “The key here is you’re looking for a high degree of confidence about your data. When it exists, you’re clear to begin working [your data]. Where confidence is lower, we’ll route it to a human data curator. So, we put that control in the hands of the business user.”


A final touch worth mentioning here: Stirrup also said that Talend is working to include machine learning capabilities. “Machine learning will be key to helping businesses   cope with massive amounts of data. In many cases, you can find high-confidence in 80% [of your data matches]. But when you’re dealing with millions of matches, that remaining 20% can be several hundred thousands, so you’ll need machine learning to get through those in anything close to a timely fashion.”


The machine learning algorithms are also being taught to learn from human decisions. In a low-confidence situation, where the data matching would be sent to a curator “the machine learning algorithm will get data on what decision the curator made, adding that information into its matching algorithms,” Stirrup said.


Talend Summer ’17 utilizes Apache Spark-powered machine learning algorithms to automate and accelerate data matching and cleansing, improving scale, performance and accuracy. Over time, these algorithms monitor decisions made by data curators to become more intelligent and accurate.


Talend Expands, Builds Partnership for Unified Platform    

To focus machine learning on these data and analytics outcomes, Talend is working with Lattice Engines, providers of a predictive scoring tool, Stirrup said.  In fact, third-party partnerships are key to Talend’s unified approach, as just this summer Talend announced on-going work with Cloudera, MapR, Microsoft, Snowflake, among others.


MapR Technologies and Talend are partnering on a solution to help customers address requirements for the European Union’s (EU) General Data Protection Regulation (GDPR) legislation. The combined offering from MapR and Talend enables companies to create a governed data lake capable of meeting even the most stringent data storage, inventory, protection, retention, portability, and security requirements mandated by GDPR.


With Talend’s newest Microsoft Azure connector, customers can  build more intelligent and scalable cloud data pipelines for real-time analytics. Talend’s new connectors are for  Microsoft Azure SQL Data Warehouse, SQL Database, Azure CosmosDB, Data Lake Store, Queue Storage, and Table Storage. They join Talend’s pre-existing connectivity to Microsoft Azure HDInsight and Blob Storage.


Talend is working with Cloudera to support its Altus Platform-as-a-Service (PaaS) public cloud offering. Companies can use combination of Altus and Talend technology to reduce overall data management costs, accelerate, and simplify hybrid, on-premises, and cloud big data projects. For Cloudera, the partnership will make it easier for thousands of data sources to be made available on Altus platform to let users simply execute data pipelines against those ingested data sets.


With Talend’s newest Snowflake connector, enterprises will see even more efficiencies when moving data into Snowflake. Talend’s architecture capitalizes on Snowflakes’ parallel loading capabilities, which enables our joint customers to easily load a diverse set of data types so they can jumpstart their cloud data warehousing projects much faster.


While Talend’s vision is appreciated by these cutting edge vendors in analytics, big data, cloud and machine learning, not everyone gets the value right away.


“This kind of thinking is not yet mainstream,” Stirrup conceded, but pointed out clear benefits to this approach to achieving a data-driven company.   “As customers adopt new cutting-edge technologies, we often see users start straight away with coding. But their challenge is as soon as they want to move from one tech to another, they have to redo that work,” he said.  


For companies intrigued by the approach but not ready to make the plunge all-the-way, Talend has flexible deployment options. “If customers want to start and deploy the full Data Fabric platform, that’s great. But, we also allow customers to build in bite-size pieces and grow it over time. In this ‘grow over time’ option, there’s no need to purchase and integrate discreet pieces of the Talend platform,” Stirrup said. “Just add a license key to what you have already installed, No need to learn new UIs or rebuilt metadata,” Stirrup said.