SnapLogic Spring ’13 ‘Big Data as a Service’ Supports Data Integration, Hadoop, Cloudera

Cloud integration innovator SnapLogic’s latest update aims to simplify big data projects. SnapLogic’s “Big Data-as-a-Service” offers data access, data integration, and data logging technologies tuned for Hadoop, Hive and Hbase, Cloudera certification and other new features. IDN speaks with SnapLogic product manager Zeb Mahmood.

Tags: big data, BLOB, CLOB, Clkoudera, cloud integration, Hadoop, REST, SnapLogic, Ventana, expert voice,

Zeb Mahmood
product manager

“Big data projects succeed when you can get large amounts of data easily in and out of Hadoop and can leverage data from other multiple data sources.”

The latest update from cloud integration firm SnapLogic brims with new technology that aims to simplify big data projects. SnapLogic’s “Big Data-as-a-Service” offers data access, data integration, and data logging technologies tuned for Hadoop, Hive and Hbase, Cloudera certification and other new features.

SnapLogic is using its expertise with uploading and integrating large-scale data volumes between on-premise and cloud-based SaaS systems for a set of new BDaaS solutions, which can quickly load data into Hadoop as well as connect Hadoop-based datasets to many types of existing enterprise data,  SnapLogic product manager Zeb Mahmood. 

“From our perspective, big data projects can really succeed when you can get large amounts of data easily in and out of Hadoop and can leverage high-volumes of data from other multiple data sources,” Mahmood told IDN. “SnapLogic has a lot of experience dealing with connecting extremely large data volumes, so we  have expertise and technologies that ETL tools can’t match to connect Hadoop with existing enterprise data.”

SnapLogic's Spring ’13 release sports these key technologies to deliver a BDaaS solution:

  • Pre-built Snaps for HBase, Hive, and Hadoop Distributed File System (HDFS) This allows any knowledge worker to easily input and output data to or from Hadoop. This allows users to leverage a range of outside analytics and BI tools, such as Birst. 
  • Pre-built Snaps for connectivity to more than 150 different data sources This allows companies to quickly load a wide range of data from traditional on-premise applications, social media, mobile and even machine data into Hadoop.
  • Certified connectivity to Hadoop from big data analytics platform Cloudera.
  • New Snaps tuned high-volume data operations, including BLOB and CLOB datasets These pre-built and configurable Snaps support monitoring and management of moving data in and out between systems for big data, analytics and BI operations.
  • Support for ‘private’ SnapStores which integrate with SnapLogic Designer tooling,  which allow companies to easily customize pre-built snaps or build their own.
  • SnapLogic Designer sports new time-saving capabilities to enable integration devs to view server-side logs, preview large unstructured data records, and browse the server’s file system).  
  • SnapLogic’s management console lets admins visually monitor and manage integrations using summaries or detailed history views of individual integrations, and the ability to monitor real-time health metrics of all integration points.

Much of the ease-of-use for big data projects is driven by SnapLogic’s Snap technology, Snaps are intelligent, application-specific integration connectors for applications, data and unstructured data. Driven by new Snaps for faster data movement, support for Hadoop (and Hadoop-related projects) and managing data movements, SnapLogic’s BDaaS approach allows IT to succeed with big data without needing a staff of data science experts, Mahmood said.

“Business leaders . . . need help filtering the signal from the noise amidst all the social and machine data out there, but don’t want to waste time on standing up the hardware, configuring the software, and manually coding point-to-point solutions. Integration is a critical component of getting just the right . . . and SnapLogic [BDaaS] quickly powers any big data app a customer wants to experiment with,” said Gaurav Dhillon, founder and CEO of SnapLogic.

“We’re seeing a lot of our customers expanding the use of SnapLogic within their enterprise. [T]hey are adding more data sources to existing pipelines. These new data sources are introducing new varieties/structures of data from very large text fields, to streaming binary data,” Mahmood noted in a recent blog post.

A top Cloudera exec noted how valuable SnapLogic’s BDaaS work is to streamlining big data projects.

snaplogic_02“Building a cluster from the ground up to run an Apache Hadoop cluster can be challenging. There are numerous choices to be made at all levels of the stack, and making those choices can be very complicated,” said Tim Stevens, Vice President of Corporate and Business Development at Cloudera, in a statement. “The Cloudera Certified Technology program is designed to make those choices easy and reliable. We're pleased that SnapLogic has completed the certification of their BDaaS solution on CDH4.”

SnapLogic’s New Snap ‘Connectors’
Address Growing Big Data Use Cases

’To help users address these emerging big data use cases, SnapLogic Spring ’13 also sports new Snaps specifically tuned for BLOBs (binary large objects) and CLOBs (character large objects) to help IT work with larger and more varied data, Mahmood added. “With our BLOB and CLOB Snaps, it doesn’t matter what source the data exists in,” he said. helps us understand the import of BLOBs and CLOBs. 

“A BLOB is a data type that can store binary data. This is different than most other data types used in databases, such as integers, floating point numbers, characters, and strings, which store letters and numbers. Since blobs can store binary data, they can be used to store images or other multimedia files. For example, a photo album could be stored in a database using a blob data type for the images, and a string data type for the captions.

“Because blobs are used to store objects such as images, audio files, and video clips, they often require significantly more space than other data types. The amount of data a blob can store varies depending on the database type, but some databases allow blob sizes of several gigabytes.

“A CLOB,” adds, “is a data type used by various database management systems, including Oracle and DB2. It stores large amounts of character data, up to 4 GB in size. The CLOB data type is similar to a BLOB, but includes character encoding, which defines a character set and the way each character is represented. BLOB data, on the other hand, consists of unformatted binary data.”

More than a dozen Snaps are either new or enhanced in this release, Mahmood said. Among them:

A new “Head and Tail” Snap, lets IT quickly check to confirm data is being cleanly and correctly moved into big data projects. “This lets users quickly view their data stream is being moved cleanly and correctly by sampling the top or bottom ten to twenty records,” Mahmood said. “This is really useful, especially as you connect big data to high-volume data sources or new and different ones.” 

SnaqpLogic Spring ’13 added several improvements, including support for binary data Snaps (for FTP, HTTP, etc.). It also improved its RESTful platform, which always supported streaming data, to support RE$ST “verbs” to give devs more visibility into how they use REST-based Get, Post, Put and Delete.

“Most users simply use a REST API without too much thought, but with bigger data volumes we expose the RST API so developers can preview which REST ‘action’ they are performing,” Mahmood said. “Now we let IT see they are doing the proper REST operations, and expose more about the REST API so you can see exactly what you are doing, such as if you are using ‘Post’ to add or ‘Post’ to delete. ’”

Also, a new Aggregate Snap makes in-memory data processing for large datasets super efficient, Mahmood added.

SnapLogic Spring ’13 also includes updates to the SnapLogic Designer to make it easier and more intuitive to build or troubleshoot SnapLogic integration pipelines. Just one example: All the server logs are separated into several functional buckets and exposed in a spreadsheet like tabular format that the pipeline dev can search, filter, and sort through.

One analyst also noted the huge connection between integration, cloud and big data.

“Big data involves interplay between different data management approaches and business intelligence and operational systems, which makes it imperative that all sources of business data be integrated efficiently and that organizations be able to easily adapt to new data types and sources,” said Mark Smith, CEO and chief research officer at Ventana Research, in a statement. “Big data is broken without integration.”