Integration is The Next Step to ROI on Big Data Analytics

Organizations looking for more value from big data investments are finding integration developers  – rather than new big data specialists – may hold the key. Progress’   Sumit Sarkar reveals why integration skills can quickly spread big data ROI to more business apps and users.

Tags: analytics, big data, cloud, developers, Gartner, gateway, Hadoop, Hive, hybrid, integration, OData, Progress, security,

Sumit Sarkar, Progress Software
Sumit Sarkar
chief data evangelist
Progress Software

"Implicit in the power of the ‘integration developer’ to expand ROI for big data is the transition from application integration to data integration."

Intelligent Data Summit
Manage Expanding Data Volumes for Analytics & Operations
Online Conference


It’s no surprise that organizations have made significant investment in big data analytics to date, and now are looking to more broadly democratize their big data insights by bringing them to business applications. That said, what may be a surprise is the growing signals that the next leap forward for big data ROI may lie in the hands of integration developers – rather than big data specialists.


According to a recent Gartner survey, big data investments reached a possible peak in 2016. The survey revealed a tepid three percent growth year over year since 2015. Even more sobering, those who plan to invest in big data within the next two years actually fell in the last year, from 31 to 25 percent.  The reason:  Gartner found that investment focus has shifted from big data itself to specific business problems big data investments can solve.


Enter the ‘integration developer,’ who is at the heart of projects to boost big data ROI by operationalizing data and democratize their big data insights across a wide variety of business applications.


Challenges to Developers
These modern ‘integration developers’ are inheriting big data architectures that are often distributed --- running across hybrid cloud and on-premises platforms.  Their mission will be to help companies better (and more broadly) capitalize on big data insights, and to streamline operations.


That said, integration will prove crucial to the mission of these developers because of two (2) primary technical challenges have arisen from the use of hybrid cloud environments:

  1. They can limit access to developers integrating between the cloud and ground (on-premises).
  2. They can also ‘right-time’ access from backend services to big data platforms for cloud application development.


Hybrid Big Data Access
To address the first challenge, innovations in new hybrid data pipelines and security gateways have been introduced. Depending on the regulatory climate of a project, developers in hybrid environments may have to deal with a number of authentication methods such as Basic Auth, OAuth, SAML and OpenID in the cloud, versus Kerberos, LDAP and AD integration on the ground.


Cloud platforms lack a common approach to access on-premises data behind the firewall, and many rely on VPN, SSH Tunnels or reverse proxies. This is where third party integration vendors are delivering secure solutions independent of a cloud provider, whether it’s AWS, Digital Ocean or any of the others listed in Figure 1 below. To best address companies’ need for a security gateway for building a best-of-breed hybrid environment, a vendor agnostic data gateway is the best way to enable data integration (and ingestion) for big data platforms.

Cloud Adoption

Figure 1 – Responses on Cloud Infrastructure Adoption from Progress 2017 Data Connectivity Outlook Survey


Hybrid Meets Big Data Integration
For the second challenge, we’re seeing a new industry standard ‘REST API in OData’ emerge, expressly designed to provide right-time access to data insights in ways that are simpler to achieve. Right-time access is coming into play more and more often, as companies want to allow users to query high-latency interfaces (such as Apache Hive) designed for batch operations.


For their part, big data engineers are well familiar with today’s modern big data technology stack – which can include SQL and REST interfaces and vary by distribution (Apache, Cloudera, Hortonworks, MapR, Oracle, IBM, etc.).  These big data professionals can also deliver real-time needs by leveraging something like HBase or Spark to deliver data access for real-time applications. Further, they can support analytics and data management tasks by tuning Hive with an upgraded execution engine, table storage formats and so forth.


But, integration developers may not be familiar with the intricate details of today’s ever-changing big data landscape.  So, how are they to focus on integration?

Part of the answer lies in OData, an open protocol that allows the creation and consumption of queryable and interoperable RESTful APIs ideal for hybrid environments. Additionally, OASIS Standard REST API provides a uniform way to surface metadata and query data using a declarative query language missing from most REST APIs. Together, the OData REST interface will augment existing big data connectivity from standard SQL interfaces such as Hadoop Hive, BigSQL by IBM, Apache Phoenix or HAWQ.


OData also serves as the emerging standard for external data strategies in SaaS apps (e.g Salesforce, Oracle Service Cloud, etc.) that need on-demand access to big data. Salesforce expects external data sources to expose OData, which is where open source libraries or commercial options can connect the otherwise disconnected stacks of cloud applications with big data platforms.


Hybrid Meets Big Data Movement
Implicit in the power of the ‘integration developer’ to expand ROI for big data is the transition from application integration to data integration. Let’s consider the enterprise where a majority of core analytics and reporting platforms continue to run on-premises on private cloud or grid infrastructures. At the same time, big data volumes will continue to grow in the cloud since cloud-resident big data sets are not cost effective to scale in on-premises data centers. This is really where cloud big data platforms come into play.


Platforms like SAP Altiscale, Microsoft Azure HDInsight or Amazon EMR are leveraging the cloud to transform big data sets into business insights that companies can then leverage for operations.


This concept just recently took off in 2016 and the insights uncovered will be leveraged and integrated throughout 2017 by moving them into on-premises analytics platforms, such as Microsoft BI or Oracle analytics, for companies to then include in core business operations. In this integration pattern, developers will be tasked to move only the aggregated data insights (rather than all the details) from cloud big data platforms to the ground.


This data movement pattern requires fast exports from cloud to ground to support intraday analytics. Developers are moving towards open source solutions that are part of the big data platform, such as Apache Sqoop, to move data between Hadoop and traditional databases. There are even commercial hybrid data pipelines engineered specifically to work with open source tools in this scenario.


In the age of big data analytics, it’s not enough to just survive – companies need to compete with and differentiate themselves from the rest. Those who can harness data will be the most successful and improve their ROI from big data analytics. It’s crucial to look beyond the platforms and instead focus on the right-time access to data. When used successfully, teams from lines of business to the C-suite will thrive by leveraging insights derived to better their core operations.

Down the line, big data insights derived from machine learning may require new data integration patterns. And as organizations continue to define the scope of data that can be reasonably consumed for operations, a hybrid cloud approach will remain critical to businesses to strengthen their infrastructures and operationalize their data resources for the future.


As Chief Data Evangelist at Progress, Sumit Sarkar works with leading analytics and data management vendors on data connectivity and ingestion for analytics using a wide variety of data, with a focus on standards such as ODBC, JDBC, ADO.NET, GraphQL and OData. Sumit is a frequent speaker at industry events, including Hadoop+Strata World, MongoDB World, Oracle OpenWorld and Dreamforce.