Paxata, Slalom Team Up To Enable Self-Service Data Lake in the Cloud Options for Amazon

To help IT and lines of business keep pace with surge in new data, Paxata is partnering with a services firm with deep Amazon expertise to make “data lakes in the cloud” easier to deliver.

Tags: Amazon, analytics, AWS, BI, cloud, data lake, data prep, EC2, Paxata, Redshift, S3, self-service,

One of the biggest challenge in any analytical exercise is simply getting data ready for BI or analytics tools. Traditional methods of data prep are often under stress with the variety, volume and velocity of new data coming in for analysis.


In the era of cloud, the efforts to keep up are even more strained. Enter the focus of the latest partnership from Paxata, an enterprise-grade self-service data preparation platform.  


To help IT and lines of business keep pace with data’s new realities and opportunities, Paxata is taking steps partnership with Slalom Consulting, to make it easier for companies to build “data lakes in the cloud” using new connectors that work with Amazon Web Services, according to company execs.  Slalom is a Seattle-based consulting firm with deep expertise in working with Amazon services.



The capabilities arise from Paxata’s work to extend its self-service data preparation platform with powerful and high-performant native push/pull connectivity options to and from AWS offerings, including the Amazon Redshift data warehouse solution and Amazon S3 (Simple Storage Service).

Making it faster and easier for customers to build a “data lake in the cloud” will let them obtain insights more rapidly and cost-effectively, according to Michele Goetz, Paxata’s chief marketing officer. 


In a recent Paxata blog post Goetz described the importance of bringing the powers of the cloud into data preparation. 

[Paxata sees] data preparation as a bigger opportunity than a desktop tool. [Our] data preparation application was the stepping stone to changing the way organizations manage their data. If the business defines information, then why would you allow the business to abdicate context and subject matter expertise to technologists that decompose information into a language that works for better for a machine than business decision makers? That is what we have done since the invention of the computer.

Markus Sprenger, Practice Area Lead from Slalom Consulting, noted that customers are already clamouring for such a cloud-based data lake option, with the goal to make this easier to achieve. “This ‘data lake in the cloud’ solution has gained major traction with our clients, and we’ve partnered with Paxata to bring their self-service data preparation capabilities to the solution stack for the Amazon customer base,” Sprenger said in a statement.

Putting the ‘data lake in the cloud’ value into the big picture is Paxata’s Rik Tamm-Daniels, Vice President of Technology and Partnerships. He noted there are three key capabilities that will be at the center of next-gen information management strategies and architectures -- speed, agility, and scale.


By optimizing its Paxata self-service data platform for AWS offerings, including RedShift and S3, the company can cause traditional data architects to rethink how data management and data governance should be done to support real-time BI and analytics, Tamm-Daniels added.


For added value and speed, Paxata’s data preparation platform provides capabilities for bringing together structured and unstructured data sets from internal and external sources. It also performs many key tasks that can ensure data integrity: looking for duplicate data or blank fields, finding and fixing misspellings, splitting or reshaping columns, and even adding additional data in order to provide more context.


To even further accelerate data prep Paxata leverages machine learning, Tamm-Daniels added. This enables analysts to speed up and improve data aggregation and shaping. Paxata’s parallel distributed processing architecture leverages the elastic and other cloud capabilities of AWS for variable, large scale data workloads, he added.


The Paxata platform is a virtualized, highly reliable infrastructure running in a multi-tenant cloud service -- built on Amazon EC2 (Elastic Compute Cloud) technology running in AWS SSAE 16 certified data centers. On-premises customers can also deploy Paxata’s Adaptive Data Preparation platform on VMWare VCloud environments. End-to-end, the Paxata platform sports capabilities for UI (a visually dynamic multi-user interface), data preparation application web services, parallel in-memory pipelined data prep, and file management and storage.


Paxata’s core data preparation features include:

Add data: Paxata works with a wide range of data sources from flat files (Excel, CSV, JSON, XML, Avro) to semi-structured data in Hadoop data lakes or NoSQL databases and structured data in relational databases or business applications, such as


Explore: Explore data in real-time with visual interactive data preparation to quickly understand data and identify data prep needs.


Clean + Change: Automatically normalize similar values using Natural language processing (NLP), split columns, concatenate columns, de- duplicate, detect and remediate blanks, nulls, and whitespace on the fly, without any coding, SQL or scripting.


Data Shaping: In a single click, data can be pivoted or de-pivoted, columns can be split, and aggregations can be created to quickly make the data sets more suitable for the required analytic exercise.


Share + Govern: Operationalize work with built-in one-click automation. Share, reuse and collaborate across teams with the centralized data library.


Combine: With one click, Paxata assembles multiple data sets into a single AnswerSet then merges multiple overlapping entity references into de-duplicated trusted entities without any scripting, SQL or complex Excel functionality like VLOOKUPS, pivot tables and macros.


Paxata Compatibility with BI tools: Visualizing Paxata AnswerSets (clean, contextual ready-to-use datasets) in BI tools is easy, either through a direct connection via Hive or Impala or by exporting data in a supported file format.

A Paxata evaluation with 60-day Amazon Redshift trial is available.