SnapLogic's iPaaS Adds Big Enhancements for Big Data with Hadoop 2.0

SnapLogic is stretching the capabilities of its Elastic Integration Platform with big enhancements for big data. The SnapLogic Fall 2014 cloud-based iPaaS brings its "snap-based" app integration technology to Hadoop 2.0, making it easier to acquire and prep data – and deliver analytics to users. IDN speaks with SnapLogic's Chief Scientist Greg Benson.

Tags: big data, Cloudera, data, Hadoop, iPaaS, MapReduce, SnapLogic, YARN,

Greg Benson
Chief Scientist


"Often, you need Ph D.s to get data into Hadoop. We remove that complexity."

SnapLogic is stretching the capabilities of its Elastic Integration Platform with big enhancements for big data. The SnapLogic Fall 2014 cloud-based iPaaS brings its "snap-based" app integration technology to Hadoop 2.0, making it easier to acquire and prep data – and deliver analytics to users.

Big Data in Motion Summit
Manage Expanding Data Volumes for Analytics & Operations
Jan 29, 2015
Online Conference

 

SnapLogic’s latest iPaaS update optimizes visual tooling and runtime to simplify many of the complex tasks associated with using different data types from multiple sources with Hadoop, said SnapLogic’s Chief Scientist Greg Benson. “SnapLogic is not to going to write the next Hadoop or Apache Spark, but we can leverage those platforms and with our [iPaaS] technologies bring a lot more ease-of-use to them. We’ve brought our user-friendly ‘pipeline’ technology that made [app] integration easier to Hadoop and MapReduce,” Benson said.

 

Benson said SnapLogic’s support for Hadoop 2.0 tackles three main big data challenges:

 

Acquire: To simplify and speed steps to acquire data for big data, SnapLogic’s HTML5 visual designer tool provides users a drag-and-drop way to bring data to Hadoop clusters, eliminating the need for complex coding. “We can let users ‘snap’ together data flow pipelines that can do integration tasks,” Benson said. These pipelines into Hadoop are powered by SnapLogic’s pre-built library of 160 data connectors (called snaps), which are the key to allowing users to easily assemble data flows between their data sources and Hadoop.

 

“Often, you need Ph D.s to write code or work with XML interfaces to get data into Hadoop from SAP, Salesforce, Workday or other data sources. We remove that complexity and dramatically speed that up,” Benson said.

 

Prepare: Beyond getting data to Hadoop, SnapLogic also tackles many time-consuming data prep tasks, including transform, aggregate and sort. To further speed analytics operations, SnapLogic can support some of the more complex correlation tasks that tie datasets together, including join, merge, map and others, Benson said. “We offer many ways to prepare your data, including data wrangling and data shaping. Users can even graphically transform and enrich data without any coding,” he told IDN.

 

To make sure all these data prep tasks are successful, SnapLogic lets users “preview” a subset of data and its structure before and after it is transformed, cleansed or otherwise integrated. SnapLogic’s latest data mapper is optimized for high performance even with complex schema with new support for progressive schema loading, direct schema-to-schema mapping and map-path highlighting.

Deliver: SnapLogic also added features to output results from analytics operations to a wide range of BI, data visualization and even data warehousing tools, such as SAS, Microstrategy, Tableau and more. SnapLogic’s code-free output is powered by the same core technologies – including snaps, pipeline and transform capabilities – used to get the data to Hadoop clusters, Benson said. “We can deliver output in the formats they need and where they need it,” he told IDN, including CSV files, Tableau-ready data format files, relational formats or others.

 

Thanks to SnapLogic’s “big data lifecycle” perspective, Benson expects SnapLogic Fall 2014 to tap into Hadoop benefits more quickly and at less cost. He listed these benefits:

  • With SnapReduce, users can acquire, prepare, and deliver data using a simple graphical browser interface; there’s no need for XML or command line interaction with Hadoop. The benefits include developer productivity, broad connectivity and more time for downstream business analysts and data scientists to focus on data analysis and insights versus data preparation.
  • The introduction of the Hadooplex allows customers to multiplex their Hadoop cluster investments for integration tasks. The benefits are Hadoop-scale integration processing and native Hadoop administration.
  • Allows an organization to manage and execute on multiple Hadoop clusters though a single interface.
  • Provides analytics operations auto and elastic scale, enabling users to run SnapReduce natively on Hadoop as YARN-managed resources.


Under the Covers of SnapLogic’s Ease-of-Use for Hadoop

To gain an architect’s perspective on how SnapLogic Fall 2014 cloud-based iPaaS can benefit Hadoop projects, Benson took us on a quick tour of the major innovations in this release.

 

SnapLogic is stretching the capabilities of its Elastic Integration Platform with big enhancements for big data. The SnapLogic' Fall 2014 cloud-based iPaaS brings its "snap-based" app integration technology to Hadoop 2.0, making it easier to acquire and prep data – and deliver analytics to users. These pipelines also support parsing and formatting for SequenceFile and RCFile formatting, as well as document (JSON) processing for MapReduce jobs.

 

Hadooplex: This lets users set, schedule and trigger a multi-source pipeline to run natively as a YARN application. SnapLogic’s elastic execution grid can be easily configured by simply selecting a Hadooplex option when it runs in a Hadoop cluster. “With the Hadooplex, we can turn a Hadoop cluster into a SnapPlex for executing SnapLogic pipelines. The key to that is we take advantage of Hadoop 2.0 with YARN, so we are a first-class YARN application. By doing this, we can also interact with the YARN scheduler and utilize those Hadoop resources,” Benson said.

 

This runtime connection between SnapLogic and MapReduce is delivered via its upgrade to SnapReduce. With SnapReduce 2.0, SnapLogic becomes a YARN application allowing Hadoop users to take advantage of a HTML5-based drag-and-drop user interface, breadth of connectivity, and modern architecture. SnapLogic also made configuration of this easy, as its Hadooplex can be set up with an easy one-line install to connect to the SnapLogic cloud. Once connected, users can run pipelines on data living in the Hadoop cluster, he added.

 

“By running natively on Hadoop, SnapLogic delivers powerful application and data integration and extends the reach, performance and utilization of big data platforms,” Gaurav Dhillon, founder and CEO, SnapLogic said in a statement.

 

For the security-minded, the SnapLogic Hadooplex also adds native support for Kerberos authentication. It is available for both reading and writing HDFS data. And it can also be applied to launching Hadooplex nodes via YARN and SnapReduce pipelines to MapReduce.

 

SnapLogic’s thorough approach to support for simplifying Hadoop 2.0 projects has captured the attention of Hadoop distro firm Cloudera, which has already certified SnapReduce for use with Cloudera Enterprise 5. “The certification of SnapReduce 2.0 on Cloudera Enterprise 5 will enable orgs to leverage Cloudera Enterprise’s massively parallel processing capabilities for all big data integration needs,” Tim Stevens, Cloudera’s vice president for corporate and business development, said in a statement.

 

In fact, such a quick embrace from Cloudera, may portend that SnapLogic’s Hadoop-focused upgrades for its iPaaS initiative will broaden the iPaaS playing field beyond SaaS and app integration. “We believe the [iPaaS] category should be looked at more broadly, where data is always going to be the superset. With the Fall 2014 release, SnapLogic is well-positioned to deliver a big data integration platform that can do app integration, not the other way around,” SnapLogic’s vice president of marketing, Darren Cunningham, told IDN.

 




back