MapR 5.0 Brings Real-Time to Hadoop Big Data Apps, Searches and More

MapR Technologies’ latest update to its Hadoop distro delivers a new level of real-time capabilities to big data apps. Notably, MapR 5.0 supports access to transactional data by auto-synchronizing storage, database and search indices. IDN speaks with MapR’s Jack Norris.

Tags: analytics, Apache, big data, Drill, Hadoop, JSON, IoT, MapR, real-time, YARN,

Jack Norris
chief marketing officer

"Customers are looking for big data to do more than simple query or reporting on data. They want to do something with their 'as-it-happens' data."

MapR Technologies’ latest update to its Hadoop distro delivers a new level of real-time capabilities to big data apps.


Among the enhancements, the MapR 5.0 distribution adds support for real-time auto-synchronizing storage, database and search indices. These and other features improve big data’s value for business operations, MapR’s chief marketing officer Jack Norris told IDN.


“We’re seeing our customers are looking for big data to do more than simple query or reporting on data. They want to do something with their ‘as-it-happens’ data,” Norris said. “So, we’re finding that customers want features to help them use big data in real-time. Those become important to the question they’re asking about ‘How can I operationalize my data and insights?’” 


Norris said MapR heard from more and more users who were asking for ways to soup-up their Hadoop platform to help them with insights on real-time business – or as Norris called it ‘business-as-it-happens.’  MapR 5.0 is architected for processing big and fast data on a single data platform that enables a new class of real-time applications.


He shared an example. “Suppose a company pushes a mobile offer to a customer, and that customer did a purchase or some other web activity. They wanted that activity reflected in their call center right way with the goal of improving that customer experience, and when necessary, make a call resolution.” 


In a recent blog, MapR’s Jim Scott, director of enterprise strategy and architecture points to another customer example – and how they are making the pivot from using big data for historical insights to real-time operations.  

Quantium, an Australian data analytics company, uses Spark and Hadoop to offer fast analytics to companies such as Woolworths, National Australia Bank, and Foxtel. Quantium is using Spark to generate insights near real time. The database queries have a latency of under 50 milliseconds, to support live interactive use such as in call centers.


The company’s whole approach is to favor interactive use rather than batch processing. The ability to move fast allows businesses to create new products and respond to market changes that much faster.
For this capability, MapR 5.0 has extended the company’s real-time, reliable data transport framework (used in the MapR-DB Table Replication) to deliver and synchronize data in real time out to external compute engines.

This MapR 5.0 feature attracted attention even before the commercial release from a noteworthy and cutting-edge partner. Elasticsearch adopted MapR’s 5.0 to obtain real-time input to its compute engine. The combined MapR/Elasticsearch architecture is powering synchronized full-text search indexes automatically – all without writing custom code, noted Jobi George, global partner director of Elastcsearch’s parent company, Elastic. (Elastic is also the driver of the Apache Lucene open source project.)


George also described how MapR 5.0 is helping provide real-time insights on massive amounts of structured and unstructured data. “Customers want search indexes automatically synchronized with the latest data updates.  The MapR architecture makes this easier for application developers who need to let their end users search for data almost immediately after it is updated,” he said in a statement.


Overcoming Complexities of Real-Time Big Data; Other MapR 5.0 Features

While bringing real-time features to big data may not seem a unique idea, delivering on these capabilities is far from a no-brainer, according to one top Hadoop analyst.


“Designed as a large-scale batch data analysis system, Hadoop is not often associated with operational analytics or transaction processing,” according to Carl W. Olofson, research vice president for data management software research at IDC.  Norris put the challenge this way,  “While tremendous capacity for batch is great for insights, it’s not the way to provide that type of real-time visibility.”


Thanks to real-time innovations such as those in MapR 5.0, “Hadoop is emerging . . . as a single platform for handling both live operational data and real-time analytics,” IDC’s Olofson added.


Beyond MapR 5.0’s expanded support for real-time big data, it also adds:

Big Data in Motion Summit
Manage Expanding Data Volumes for Analytics & Operations
February 25, 2016
Online Conference
  • Apache Hadoop 2.7, including YARN 2.7 support. This enables YARN application ‘rolling upgrades’ to complement the platform-level ‘rolling upgrades’ MapR previously supported. It also offers integrated Docker container support.
  • Schema-free SQL engine for big data exploration on IoT and beyond.
  • Enhanced data governance and security. This includes integration with LDAP, Active Directory and other third party directory services, along with Kerberos or username/password authentication.
  • Comprehensive auditing for all data accesses (via log files in JSON format). Under the covers, all events recorded immediately in JSON log files and includes data access and administrative actions. For reporting and validation, ad-hoc queries and custom reports on audit logs via SQL and standard BI tools.
  • Wire-level authentication for all services in the cluster and uses NSA-level cryptographic algorithms.
  • Apache Drill 1.0 (including Drill Views) support. This is especially valuable for securing data access.  Drill 1.0 can ensure secure and governable access by authorized users to field-level data, at fine-grained row and column-level without any centralized security repository required. Further, analysts can also be given data governance privileges in which they can share their data sets with other analysts.

For self-service, Drill 1.0 also sports new technologies that will empower non-technical users to explore data at massive scale. Apache designed Drill 1.0 to scale to 10,000 servers (or more), and process petabytes of data / second (translation: trillions of records in seconds). “With Drill, you don’t have to define the data in a certain way or in a certain format. You don’t even know the queries you’re going to ask. You can just go right to your data directly – it’s a huge change,” Norris told IDN.


MapR Brings Real Time Big Data to the Cloud via Amazon Web Services

MapR is also making real-time big data solutions available from the cloud via the Amazon Web Services marketplace. MapR uses AWS CloudFormation templates to create reliable Hadoop clusters on AWS that also support long-lived workloads.  


The MapR solution in AWS Marketplace provides a seamless path to a cloud platform for continuous, real-time operational Hadoop with fast launch and integrated billing from AWS. This approach also enables customers to pay for only what they use, according to Steve Wooledge, MapR’s vice president, product marketing.


MapR’s support for mirroring and its MapR-DB table replication allow companies to build cloud and on-premises hybrid deployment models which support “burst” analytic or operational applications.


“We’ve had an established relationship with AWS through the availability of MapR on EMR [Elastic MapReduce] and now the addition of our products for purchase on AWS Marketplace expands options for customers who want to run real-time Hadoop applications in the AWS Cloud and impact their business as it happens,” Wooledge said in a statement.


All three MapR Distribution editions (Community, Enterprise and Enterprise Database) are available immediately.