Talend Open Studio for Big Data Tackles Complexity

Talend is taking the complexity out of big data projects with a product based on Talend Open Studio that adds native support for Apache Hadoop, a powerful Eclipse plug-in, graphical tools to speed data loading and components to auto-generate code for Hadoop.  IDN speaks with Jim Walker, Talend’s director of global product marketing, about Talend Open Studio for Big Data.

Tags: big data, MapReduce, Talend, Hadoop, data quality, governance,

talendbigdataTalend is taking the complexity out of big data with a new product based on Talend Open Studio that adds native support for Apache Hadoop, a powerful Eclipse plug-in, graphical tools to speed data loading and components to auto-generate code for Hadoop.
 
One of the biggest roadblocks to more rapid and higher volumes of big data adoption is simply how complicated and time-consuming they can be, Talend’s director of global product marketing Jim Walker told IDN. “Complexity is one of the biggest barriers today to opening up big data to the masses,” he said.  

To cut down on this complexity, Talend Open Studio for Big Data combines the company’s its data quality, management and integration capabilities with new native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive.  This approach lets Talend Open Studio for Big Data leverage Hadoop's MapReduce architecture for highly-distributed data processing – while masking that complexity from developers and data architects, Walker said. 

Talend’s approach also avoids the need to write code for uploading or extracting data.
 
“One of the biggest challenges companies face when looking at big data is how do [they] get their data loaded,” Walker said. “We tackle that here by removing the complexity of MapReduce. Because we live on top of Pig, we can insert a big code generator that takes away that complexity.” This makes data loading with Hadoop much simpler. Further, Talend’s solution creates Pig scripts so that big data manipulation is simplified into higher-level language, he added.  

Another plus, Talend Open Studio for Big Data can generate native Hadoop code and run data transformations directly inside Hadoop, which provides simplicity and maximum scalability. 

 

"Big data is never an end in itself. It needs, or will need, to fit into an overall enterprise architecture."

Jim Walker
Director of global product marketing Talend

Talend’s Lifecycle View of Big Data
Helps Architects Connect the Dots

Talend’s approach to big data also offers an added benefit to enterprise architects, Walker added. “A lot of enterprise architects still see big data as the Wild West. With our approach, we feel enterprise architects can get a better grasp on big data projects,” he said.

Talend Open Studio for Big Data delivers a lifecycle perspective for big data projects  – from design to operations and management. “Big data is never an end in itself. It needs, or will need, to fit into an overall enterprise architecture,” Walker told IDN. “Our lifecycle view connects the dots between the design time tools, data loading and operations, and that includes data integrations, data quality and data management.”

Talend Open Studio for Big Data delivers these features for integration, quality and management:

 

  • Big Data Integration: Graphical components and workspace to support easy loading of Big Data in Hadoop via HDFS, HBase, Sqoop or Hive, and avoid the need to learn and write complicated code.
  • Big Data Quality: Presents data quality functions that take advantage of Hadoop’s massively parallel environment. It enables devs to use its high performance processing capabilities to identify duplicate records across these huge data stores in moments. It also accelerates the profiling of big data and supports other data quality-related features. 
  • Project Optimization: Ability to schedule, monitor and deploy any big data job via a shared repository. This approach allows data analysts to quickly and easily collaborate and share project metadata and artifacts. 


Notably, one major Hadoop distribution firm has already found benefits to Talend’s approach. The Hortonworks Data Platform, and creators of the Hortonworks Hadoop distribution, will bundle Talend Open Studio for Big Data in its offering, noting Talend’s features and open source licensing.

“By making Talend Open Studio for Big Data a key integration component of the Hortonworks Data Platform, we are providing Hadoop users with the ability to move data in and out of Hadoop without having to write complex code,” said Eric Baldeschwieler, CTO and co-founder of Hortonworks, in a statement. “Talend provides the most powerful open source integration solution for enterprise data, and we are thrilled to be working with Talend to provide to the Apache Hadoop community such advanced integration capabilities.”
 
“At Hortonworks, there was a bake-off among many vendors, including some of our competitors,” Walker said. “And we’re very happy Hortonworks chose us.” He credited Talend’s combination of open source licensing and its tools to make Hadoop more widely adopted. “Hortonworks have been very successful with their Hadoop distribution, getting to hundreds of users. The question they had was: ‘Which partner can help us most quickly get to 2,000?’”

Talend’s latest commitment to blend big data with data integration, quality and management comes as Gartner issued a report last month Who’s Who in Open Source Data Quality notes that big data projects will require more attention to data governance.

In part, Gartner noted, “The strong desire to apply analytics to these new and different data types (often in support of critical decision making), means suitable levels of data quality are essential . . . . Big data will bring huge challenges in information governance.”




back