Strata Showcases Easier, Faster, Secure Solutions for Enterprise-Class Big Data

At this month’s O’Reilly Strata Conference, attendees were treated to a wide-range of technology previews and product announcements that will drive data-driven solutions.  IDN reviews some of the top highlights from well-known names and new players that aim to make it easier, faster and more economical to design and launch Big Data in Motion solutions.

Tags: Actian, Big Data, Alpine, Cloudera, CSC, Hadoop, Infochimps, MapR, Pentaho, SiSense, Strata, Zettaset, Zoomdata,

O’Reilly Strata Conference revealed technologies, tools and best practices for making it easier, faster and less expensive to deliver Big Data in Motion solutions. IDN reviews highlights some offerings from well-known names and new players.


Actian Corp. demonstrated its Actian Analytics Platform designed to encourage more mainstream adoption by simplify deployment of big data projects and boost analytics performance. Simplifying Hadoop and removing the reliance on dated MapReduce-driven approaches allows “Big data for the rest of us,” said Actian CTO Mike Hoskins in a statement.

Architecturally, Actian is taking an embrace-and-extend approach to Hadoop support. Hoskins paints the picture this way: “Hadoop has become the data lake and the Actian Analytics Platform surrounds it in a way that enables organizations of all sizes to move quickly into production and start realizing value.

As to features, the Actian Analytics Platform provides a visual drag-and-drop framework to load, transform, and analyze data in Hadoop with up to 30x better design-time and run-time performance and no coding.

The Actian Analytics Platform is also YARN-certified to accelerate native Hadoop processing and enable high-performance SQL queries on Hadoop via Actian’s massively parallel engine optimized for multi-node Hadoop implementations. It also enables a massively parallel columnar database to natively process Hadoop data in a way that boosts the performance of analytics applications.

It also provides what Hoskins called “invisible integration” across millions of data sources on-premise or in the cloud via Actian’s library of more than 200 connectors for enterprise data, applications, SaaS apps, streaming sensor data, social media, clickstream, email archives and external data sources. It also has a full suite of ETL and preparation capabilities.

Users can also conduct analyses across entire ecosystems of data, users and applications via more than 700 analytic and data transformation functions. The Actian Analytics Platform also sports capabilities to easily and quickly incorporate advanced mathematical, statistical and data-mining functions. The result is rich access to up-to-date analytic information that can be used with open source and third-party tools for visualization, BI and advanced analytics tools.


Alpine Data Labs introduced a “team-based” predictive analytics solution for big data and Hadoop. Alpine Chorus delivers a collaborative and “code-free” approach that lets technical and non-technical users jointly participate in the process of data science.

“Analytics applications have historically forced data scientists to work in solitary mode, in their code, on their desktop. With this new product, we are giving all ‘data people’ the tools and processes they need to build a ‘data nation,’” which lets them engage many types of stakeholders, he said.

The browser-based Alpine Chorus offers team members an end-to-end approach to allow stakeholders to work throughout the entire data pipeline process – from data transformation to modeling and analysis.   Thanks to Alpine’s “In-Cluster” analytics technology for structured and unstructured data, all users can run sophisticated math directly on Hadoop clusters – without requiring users to move data around.


Cloudera announced a partnership with H2O aimed at delivering predictive analytics at in-memory speeds – even combining batch, interactive, and streaming jobs in the same application. 

Through this collaboration, companies that use Cloudera’s Enterprise editions can deploy H2O’s open source in-memory machine learning and analytics software on existing Hadoop clusters without the need for data transfers, officials said.

As a result, customers will be able to leverage big data to discover insights for a wide range of apps such as on customer behavior, pricing optimization and more, up to 100 times faster.

“H2O on Cloudera’s enterprise data hub brings the predictive power of big data to data scientists using familiar interfaces of R, Scala, JavaScript, Java and Python,” H2O co-founder and CEO SriSatish Ambati, said in a statement. For his part, Cloudera vice president Tim Stevens added, “The joint solution expands the use cases for our customers by enabling customers to run predictive models across massive datasets at in-memory speeds.”


Infochimps, which became a unit of CSC in 2013, revealed its open-source-based Big Data PaaS (platform as-a-service).  The Infochimps Big Data Platform-as-a-Service offering is comprised of three “analytic environments as-a-service,” including (1) real-time data processing and streaming (streaming-as-a-service), (2) ad hoc queries for actionable analytics (NoSQL-as-service), and (3) batch analytics with Hadoop-as-a-service

The integrated offering combines web-scale technologies with a cloud-delivery model. The Big Data PaaS package also looks to provide a “proven methodology for navigating enterprises through the big data lifecycle,” according to Jim Kaskade, head of CSC’s open big data solutions.

Key adopter benefits of Infochimps/CSC Big Data PaaS include:

  1.   Ability to identify where enterprises are in their adoption lifecycles
  2.  Guides to how enterprises can operationalize their existing resources to drive new data-driven insights.
  3. Ability to leverage a fully-integrated design pattern from Hadoop to Storm, as well as      integrate with existing infrastructure
  4. Quick PoCs and delivery through application templates, accelerators and analytics building blocks


MapR Technologies showed two key technologies to make it easier and faster for enterprises to adopt Hadoop.

First, MapR previewed its latest MapR distribution, which now includes Hadoop 2.2 with YARN’s resource management and scheduling capabilities within a MapR cluster that also offers reliability and real-time features, according to MapR’s chief marketing officer Jack Norris.

MapR’s Hadoop distribution extends YARN by adding a full, open standard NFS interface in addition to HDFS. This enables non-MapReduce applications to optimally take advantage of a cluster's storage, according to MapR technical documents.

“Our implementation of YARN means organizations will be able to develop and deploy a much broader set of Hadoop applications, and importantly, deliver big data projects with much higher value to the business,” Norris said.

Notably, MapR’s latest update allows Hadoop applications to share a cluster’s compute resources and increase cluster efficiency and resource utilization of a cluster. Combining YARN with MapR’s read-write POSIX data platform enables YARN-based apps to more efficiently run on a Hadoop cluster, as well as share valuable compute resources. Users can also read, write and update data in the distributed file system and database tables.

MapR can run Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster simultaneously. MapR also can run third-party services that are not YARN-compatible on the same cluster.  The MapR Distribution including Apache Hadoop YARN will be available in March.

MapR also unveiled a free ‘virtual sandbox’ to fill the skills gap for big data projects, MapR Sandbox for Hadoop offers training modules; technology to let devs build applications in a simple, lightweight virtual machine that runs on a standard laptop or desktop; rapid testing suite; and the ability for devs to drag-and-drop files from any source into the MapR Sandbox.


Pentaho Corp., a long-time provider of open source BI and analytics solutions, demonstrated native integration of Pentaho Data Integration (PDI) with Storm and YARN, which aims to simplify the task of supplying real-time analytics and big data insights – especially for large-scale applications.  

The combination of Pentaho Data Integration, Storm and YARN will allow devs to immediately leverage real-time processing, without the delay of batch processing or the overhead of designing additional transformations, according to Pentaho Founder Richard Daley.


SiSense demonstrated its latest mobile-optimized analytics solution. Without scripting or coding, SiSense 5’s new mobile UX and drill downs enable mashups of multiple data sources in a proprietary centralized database, according to SiSense CTO Eldad Farkash.

SiSense 5’s backend can handle thousands of users, queries and joins for multiple data sources. Thanks to its “In-chip” technology, Sisense 5 it can deliver 100x more data, 10x faster than many in-memory solutions, he added.  In-chip leverages a CPU's onboard cache and special instruction sets built into the CPU.


Zettaset announced it received a U.S. patent for technology to automatic failover of Hadoop clusters.  The patent for “split brain resistant fail-over in high availability Hadoop clusters” ensures high-availability to all Hadoop services, according to Zettaset president and CEO Jim Vogt. 

“Our mission [is to] help enterprise customers safely and confidently move from pilot to production,” Vogt said in a statement. The newly-patented technology is a core component of Zettaset’s Orchestrator suite for Hadoop management and security.

Beyond making Hadoop more enterprise-grade, the technology works with both Hadoop and NoSQL databases, and supports Hive metastore, NameNode, JobTracker and Oozie. Further, it also supports key security features, including encryption key management, role-based access control, and Kerberos.


Zoomdata demonstrated its latest big data release. Zoomdata 1.2.1 provides pre-built connectors to ease access to popular data sources including Cloudera (Search and Impala), Facebook Presto, Amazon (Redshift, RDS and Kinesis), Apache Solr, Elastic Search and Mongo DB. 

Zoomdata also expanded its visualization studio to include zoomable maps, dual axis and multi-bar graphs and “finger painting” visualization. Zoomdata 1.2.1 also supports the latest web-widgets, MS-SharePoint Web Part, and Hadoop security (via Kerberos).

"With our latest release, businesses now have the power to access and analyze billions of records, including real-time streaming data, and perform functions such as multi-value analysis, all from their web browser or iPad and in a matter of seconds,” said Justin Langseth, CEO of Zoomdata, in a statement.  

The company also said its Zoomdata Server was awarded a patent for the process it uses to generate fleets of micro queries that drive visualizations.