Actian Looks To Help Firms Break Through Hadoop Constraints; Adds Real-Time, Security, ML Support

Actian is helping firms gain more business value and control over their Hadoop data infrastructures. IDN talks with Actian’s Emma McGrattan about the Vector for Hadoop add-ons for real-time updates, security and more.

Tags: Actian, analytics, EDW, Hadoop, machine learning, real-time, SQL,

Emma McGrattan, Actian
Emma McGrattan
senior vp - engineering

"Making real-time updates for Hadoop is hard, but more and more needed. Actian has 'cracked the code' to provide customers this support."

Enterprise Integration Summit
Integration Powers Digital Transformation for APIs, Apps, Data & Cloud
An Online Conference

Actian's latest release looks to help companies gain more business value and control over their Hadoop data infrastructures.  


Actian Vector for Hadoop 6.0 adds features to let enterprises easily support a wide range of capabilities, ranging from real-time updates, granular security, performance boosts and more, Actian's senior vice president for engineering Emma McGrattan told IDN.


One of Vector for Hadoop 6.0's most popular features is support for real-time updates to EDWs (enterprise data warehouses) running with Hadoop, she added. 


"Making real-time updates for Hadoop is hard, but more and more needed. With an EDW workload, companies need to allow for making corrections to the data, so we've 'cracked the code' to provide customers this support." McGrattan said. 


In effect, she added, Actian is "souping up" Hadoop for companies. "We're delivering high-performance analytics on top of Hadoop, and also giving SQL access to that. The result is to extend the lifespan of Hadoop cluster and extend its use by non [Hadoop or MapReduce] experts."


While Hadoop was designed for scale, the technology was not optimized for speed or performance, McGrattan noted. "With that in mind, Actian Vector for Hadoop is making it possible for our customers' existing Hadoop data lakes to take on new operational analytics challenges, which traditional Hadoop SQL applications have historically struggled to address."


Further, Actian Vector for Hadoop provides a feature that McGrattan described as Combined Native Hadoop / Spark Tables -- with Vector Tables. This provides no-hassle universal access to data users for more complete analysis, she noted. 


In specific, customers can register Hadoop data files (such as Parquet, ORC, and CSV files) as tables in Vector for Hadoop. Further, users can join such external tables with native Vector tables. The result: Super-fast analytics execution against these data formats, even faster than their native query engines, she said. 

This provides especially significant benefits to many Tableau users, McGrattan added. 


"If you use Tableau [with] Hadoop, it could be likely that big parts of the dashboard that wouldn't work when using Parquet or Impala," she said. "At Actian, we want to say, "You should be able to use that massive estate of SQL apps you have built and be able to just point those at Vector for Hadoop to run them."


In addition, Netezza users, for the most part, probably use DataStage (data loading) and Cognos (reporting). Users can take their 20,000 Datastage jobs and point them at Vector for Hadoop. "We will load that into the data lake and let you run Cognos against that. You don't need to find source code to make any chance. It all just works," she said. 


Actian Vector for Hadoop also has an enhancement for securely working with data, thanks to deep support for column encryption.


"Typically, Hadoop needs to encrypt an entire file or entire zone of HDFS [Hadoop Distributed File System]," McGrattan said. "Now, we provide dynamic data mapping, which lets companies more granularly protect data." She shared an example of a credit card number, where Actian Vector for Hadoop lets the company expose only the last four digits. "This is unique to Actian in Hadoop." 


Actian Vector for Hadoop also has several ways to provide "fine-grained access control," which includes ways to control user rights to read, write, update and delete – down to the record level, she added. This comes via dynamic data masking, column-level data at rest encryption, data in motion encryption, discretionary access control, security auditing with SQL addressable audit logs, and security alarms, she said.


Other Vector for Hadoop updates include:  

Faster machine learning (ML) execution capabilities. This lets businesses deploy ML models that run alongside the database leveraging Vector's new UDF capabilities. By deploying ML models alongside the Vector database, data movement is reduced, allowing for faster scoring of data.


User-defined function (UDF) support. This extends the database to perform operations not available through built-in, system-defined functions, giving users the capability to create Scalar UDFs to run JavaScript of Python code alongside SQL statements in a single query.


Comprehensive workload management. This enables the control of the database access mode, limiting the row count returned by a given query and abort queries should they reach pre-defined limits.


JavaScript Object Notation (JSON) support allows customers to combine NoSQL and relational concepts in the same database.

To power many of Actian's Vector for Hadoop updates, engineers used a combination of advanced performance capabilities to eliminate bottlenecks commonly encountered by other SQL acceleration products. Technologies here include Actian's patented vector processing and in-CPU cache optimization technology, McGrattan noted. 


Readers can learn more about Actian’s Vector for Hadoop here.