MapR, Fusion-io Boost Big Data Performance 25x To Bring Real-Time Analytics a Big Step Closer

MapR Technologies’ latest partnership is bringing real-time operational Big Data a big step closer. MapR’s M7 Big Data platform for NoSQL and Hadoop, working with Fusion-io’s ioMemory, was benchmarked with 25 times faster performance for read-intensive Apache HBase applications. To discuss how this new supersonic speed could change big data, IDN speaks with MapR’s Jack Norris.

Tags: analytics, big data, Fusion-io, HBase, Hadoop, MapR NoSQL, expert voice,

Jack Norris
vice president, marketing


“This speed is a major proof point on our M7 architecture, and our [Fusion-io] partnership shows the value of optimizing flash memory.”

MapR Technologies’ latest partnership is bringing real-time operational big data a big step closer. MapR M7 big data platform, running with Fusion-io’s ioMemory, was benchmarked with 25 times faster performance for read-intensive Apache HBase applications using Hadoop or NoSQL.

Under the partnership, MapR’s M7 platform will run using Fusion-io’s solid-state drives (SSD).  The result is breakneck speeds and lower price points, which are supplemented by MapR’s M7 features for data availability, recoverability and consistency. “This speed is a major proof point on our M7 architecture, and our [Fusion-io] partnership really shows the value of optimizing flash memory,” MapR’s vice president of marketing Jack Norris told IDN. 

The performance boost of 25-times – not 25% – also validates what Norris and other big data experts see a budding opportunity. “The trend is now that data and compute are now coming closer together, and that means more real-time and predictive analysis from your data,” Norris said.

Benefits to end users from this convergence of data and compute can be very real. We think delivering these kinds of speeds at this cost efficiency could really open up a whole new range of possibilities for how businesses think about on-demand, real-time big data,” Norris said.

Instead of data stored in some network storage and where users need to pull data from all over to perform operations, Norris said users can now handle large volumes and large variety of data quickly on the same cluster.

The better price/performance math will open up more big data use cases, he added. These include: the ability of big data to directly support mission critical applications; the ability to combine file-based analytics and NoSQL; integrated search among various data types; and support for streaming and real-time analytics, he said.

Inside MapR, Fusion-io Big Data Platforms:
Diagnosing & Avoiding Performance Hits

“There are a lot of things that can drag down big data performance,” Norris said, including garbage collection, poor cluster or workload design, and inefficient data compaction.

“One solution is architectural, so rather than having large data moving across different silos, users can get better performance by processing data in one place,” he said. MapR’s M7 employs an architecture that aims to easily enable this.
Another solution is efficient use of memory resources. Enter Fusion-io’s ioMemory platform. The Fusion-io technology differs from conventional solid state disks because it is architected to manage flash-like memory, Norris noted. When coupled with MapR’s architecture, the result is ultra-low latency database performance because MapR’s Hadoop distribution can write directly to disk, and eliminate dependencies on Java or the Linux file system, which can slow performance, he added. 

Even with the new performance marks, Norris readily admits it’s not only more speed that big data adopters are looking for. Users also need data availability and consistency, he added.

Big data disruptions can come in many flavors, Norris noted. MapR’s M7 also delivers several aspects of recoverability. For instance, when HBase applications go down, it can take up to 30 minutes to recover. And even when the data comes available again, the data can’t be used unless it can recover using snapshots consistent across the [Hadoop] cluster, he added. 

These MapR snapshots for HDFS and HBase tables are compared to the copy table. Further, MapR sports a full Hadoop-compatible file system (compliant with HBase, HDFS and MapReduce) to eliminate weak points.

MapR also provides disaster recovery with remote data mirroring, and offers improved tools for Hadoop cluster management, especially to help IT professionals with little or no Hadoop expertise get their arms around big data issues.

After these improvements, MapR maintains support for standard Linux commands and provides POSIX-compliant NFS access.  It’s been a busy couple months for MapR. Notably, the new speed mark comes right after MapR’s M7 just went GA (general availability) in May.