GigaSpaces Latest Upgrade Looks To Fill in Missing Gaps for MLOps Success

The updates in GigaSpaces 15.0 aim to make MLOps easier to achieve with features to help companies run their ML models in production. IDN looks at the latest version with GigaSpaces vice president Yoav Einav.

Tags: AI, analytics, cloud, GigaSpaces, intelligent data, Kubernetes, machine learning, MLOps, operations,

Yoav Einav, GigaSpaces
Yoav Einav
vice president of product
GigaSpaces


"By applying DevOps principles for machine learning, organizations can help bring business interest back to the forefront of ML operations."

Intelligent Data Summit

Analytics, Apps & Data for Success in the Digital Enterprise
February 20, 2020
Online Conference

GigaSpaces is shipping an update that aims to make MLOps easier to achieve. It looks to let organizations better manage their ML models, as well as add speed, scale and accuracy to their ML operationalizing efforts.

 

GigaSpaces 15.0 aims to help enterprises "run their ML models in production," according to GigaSpaces vice president of product Yoav Einav. 

 

The update focuses on improving deployment monitoring, as well as managing of machine learning projects on modern production infrastructures (including Kubernetes and Spark), across on-premise, cloud, hybrid, and multi-cloud environments, Einav noted. 

 

GigaSpaces update marries the capabilities of AI/ML will the world of DevOps practices. As Einav described in a detailed blog post

By applying DevOps principles for machine learning, organizations can help bring business interest back to the forefront of ML operations. Data scientists can now work through the lens of organizational interest with clear direction and measurable benchmarks.

The update aims to simplify the tasks of integrating AI workloads with the organization's core infrastructure. This leads to accelerating machine learning deployment and can help enterprises to more readily experience the business benefits of machine learning models, he added. 

 

The release also provides visibility into the critical components of systems running ML models, he said. These include logs, inputs, outputs, and exceptions, using different performance visualization techniques. 

 

Under the covers, GigaSpaces 15.0 includes its InsightEdge in-memory real-time analytics processing platform and its XAP platform for extreme transactional applications. 

 

The latest update comes as a top analyst report suggest that deploying machine learning models in production remains a "major challenge" for many enterprises. 

 

According to the 2019 Gartner CIO Survey, AI and ML continue to rank as the No. 1 game-changing technology by CIOs. However, most organizations underestimate how long it will take to move AI and ML projects into production.

 

"Machine learning is becoming an essential component of mission-critical applications to optimize operations and deliver superior real-time customer experiences. GigaSpaces Version 15.0 provides enterprises with the machine learning model management capabilities, speed, and scale that they need to accelerate their machine learning and artificial intelligence journey." Einav added in a statement. 

At the component level, GigaSpaces Ops Manager enables continuous monitoring of machine learning pipelines. This starts at the cluster level and drilling through to individual services so users can maintain accurate data models and ensure problems are resolved before they affect overall performance. 

 

The new AnalyticsXtreme Batch Indexing (included in InsightEdge) optimizes and automates data access and storage with the added ability to move data between the more frequent (cold data) access and infrequent (archive data) access tiers on data lakes and data warehouses. The performance of ML models is enhanced since frequently accessed cold data can be retrieved 80X faster directly from data lakes, and processing costs are reduced as data access patterns change.

 

Notable feature in GigaSpaces 15.0 include:

  • Easy MLOps Monitoring: Organizations can monitor applications for performance issues with ML-centric capabilities using the GigaSpaces Ops Manager enterprise-grade monitoring and administration tool. It provides visibility into the components of an organization's system -- starting at the cluster level and drilling through to individual services and service instances.

    This level of visibility supports continuous development and monitoring of microservices and machine learning pipelines, so businesses can maintain accurate data models and ensure that small problems are resolved before they affect general performance. Users can view performance metrics and system alerts at the cluster level to immediately identify problem areas.
  • Enhanced Intelligent Data Tiering: Users can retrieve frequently accessed cold data up to 100X faster with GigaSpaces AnalyticsXtreme. With a new AnalyticsXtreme Batch Indexing feature, GigaSpaces' smart indexing of aged data "has gotten smarter," according to the company's website.

    The goal is to provide organizations more control over the life cycle of their data by differentiating between cold data (which has aged but is still frequently accessed), and archived data (which is needed for historical purposes but is infrequently accessed), and therefore doesn't impact the performance of queries. With the help of GigaSpaces' MemoryXtend, organizations can control the cost and memory footprint for hot and warm data. In addition, thanks to GigaSpaces' AnalyticsXtreme, businesses can define the exact time when warm data becomes cold and move it to data object stores like Hadoop or Cloud Object Stores.

    Batch indexing is supported on any cloud-based object store/data lake that is compatible with Apache Hive and is based on the ability to store data according to partition. Further bucketing the data in "time slices" enables enhancing queries that match the index period, so InsightEdge only has to retrieve specific partitions instead of scanning the entire batch layer.

  • Kubernetes as a Core Platform for MLOps: Users can easily deploy machine learning projects from any platform on modern production infrastructures such as Kubernetes and Spark on any cloud or on-premise. For example, thanks to GigaSpace's ability to work with space-based remoting support for Kubernetes, apps can use remote invocations of microservices within the Space (that resides in a data pod) 

    It also provides a native smart space client in Kubernetes that supports remote CRUD operations, task execution, and event-driven analytics. This ensures high throughput and fast serialization, as well as automatic load balancing. Writing and updating of data without a predefined schema allows easy changes to the data model while ensuring compatibility with JDBC and BI tools so that code can be integrated more reliably and faster with lower administrative overhead.
  • GigaSpaces Connector for Apache Kafka: The official GigaSpaces Connector for Apache Kafka has been released and verified by Confluent, following the guidelines set forth by Confluent's Verified Integrations Program. 
  • Tableau Server Support: Tableau Server provides direct, authorized access to live Space objects and documents, allowing users to build and run complex filtered queries against InsightEdge. 
  • Dynamic Schema Versioning: InsightEdge now supports dynamic schema definition by enabling the writing and updating of data without a predefined schema. With a dynamic schema versioning, organizations can integrate code more reliably, deploy faster, and lower administration overhead.

GigaSpaces, Google Partner on managed Service To Operationalize ML Pipelines

Building on the latest release, GigaSpaces is also providing InsightEdge in-memory computing platform as a managed service on Google Cloud Platform (GCP).  The offering gives customers one-click deployment of InsightEdge in-memory computing platform for in-memory, real-time analytics at scale.

 

The offering supports the easy introduction of new apps that need to work with data at extreme speed and scale.

 

“We are excited to join forces with Google Cloud to provide easy access and a fully automated InsightEdge service” Einav said in a statement. “Simplifying deployment and management, while ensuring a no-downtime service, will help our customers leverage their data -- running analytics and ML models at the speed and scale required for real-time decision making.”

 

In specific, with GigaSpaces Cloud Managed Service on GCP organizations can ingest, process and analyze streaming data, as well as historical data without management overhead, Einav said. 

 

Services and apps can query data from any source in real-time, and quickly connect structured, semi-structured and unstructured data, which provides a 360-degree holistic view of the business. Users get one-click connectivity to any database, interactive SQL queries, quick access to data lakes, connectors to BI apps, data visualization, multi region replication and many more advanced features, he said. 

 

“With GigaSpaces Cloud  managed service you can operationalize the entire machine learning pipeline on Google Cloud Platform (GCP),” Yoav Einav VP Product at GigaSpaces told IDN. This includes many crucial ML operations tasks, he said, including packaging the model as a MOJO or Docker, building the feature vector and serving the model in production and inferencing  at scale, with low latency to monitoring the model performance – and even closing the feedback loop with incremental learning. 

 

Einav detailed how GigaSapces Cloud works with GCP capabilities.

 

The  Google Cloud native Spark job running on GigaSpaces InsightEdge contextualizes the streaming data with historical data stored on Google Cloud Storage, enriching it with external sources based on event triggers.

 

GigaSpaces Ops Manager powers the deployment of the model, running A/B testing and monitoring the ML model accuracy and performance in production and feeding it back for retraining and incremental learning.

A Google exec described how GCP’s partnership with GigaSpaces is driving new digital transformation solutions. 

 

“Businesses around the world are looking to the cloud to transform, scale and improve their operations and we are committed to helping them reach those goals,” said Boaz Maoz, Country Director, Google Cloud Israel. “GigaSpaces Cloud on Google Cloud will help our mutual customers to drive their own digital transformation initiatives, improve business performance through better data analytics, while also allowing them to maintain a high level of customer experience and adapt to changing regulatory environments.”

 

Readers can download GigaSpaces 15.0 from the company's Download Center.




back