ArangoDB Update Makes It Easier for App Developers To Work with Multiple Data Models

ArangoDB, an open source native multi-model database, is adding a new search feature to let developers efficiently interact with multiple data models by using just one technology and one query language. IDN speaks with ArangoDB CTO Dr. Frank Celler.

Tags: apps, data, join, knowledge graph, multi-model, retrieval,

Dr. Frank Celler, ArangoDB
Dr. Frank Celler

"With a native multi-model database, developers save time in development and their companies save maintenance costs."

ArangoDB, an open source native multi-model database, is adding a new search feature to let developers efficiently interact with multiple data models by using just one technology and one query language.  


The new feature in ArangoDB 3.4, dubbed ArangoSearch, when combined with traversals or joins in AQL, aims to supercharge ArangoDB from a “data retrieval” technology to one where “information retrieval” is the goal, ArangoDB CTO Dr. Frank Celler told IDN.


The core way ArangoDB is doing this is to deliver updates that allow developers to work better with multi-models, Celler told IDN. He noted that the use cases for such approaches are expanding.  


“Native multi-model has a broad spectrum of use cases,” Celler said. “Every application today that uses multiple data stores to leverage the performance and data modelling advantages of graph, key/value or document stores pay a high price for development and maintenance of these applications every day. With a native multi-model database, developers save time in development and their companies save maintenance costs.”


These multiple model use cases all put 3 main ingredients in play, Celler added: (1) Graph (2) Key/Value and (3) Document Stores. Here are just a few examples:  

  1. In ecommerce, recommendations are delivered through graph databases, the product catalogue is stored in a document database and the cart is best suited to a fast key/value store.
  2. In manufacturing, companies are entering the “age of smart factories,” Celler said. The supply chain is efficiently managed by a graph, the product data is, again, best suited to a document store and sensor data from machines fits best to a key/value store.
  3. Many machine learning and AI apps use highly unstructured data that has to be processed (document store). These apps also require that the context of these connected data points have to be analyzed (graph).
  4. Enterprise Knowledge Graphs where the context of data (graphs) is crucial to connect the unstructured information (document) and has to allow for complex queries (native multi-model) combining, aggregating and filtering information for employees.
  5. IoT-driven apps across the board. Already in healthcare, Celler noted that as a smartwatch sends biometric data to a doctor (key/value) that data gets profiled and matched against other patients’ profiles (document store). And, then an app may recommend the best treatment (graph).


ArangoDB Dashboard

 “[A]t times the overall query performance is not a major priority. . . Based on community feedback, ArangoDB 3.4 includes integrated streaming cursors which provides ‘first results’ as they become available on the server,” Celler added.


While search often focuses on straight performance, Celler explained how ArangoSearch offers developers even faster results. 


“[While] performance is a key feature of all databases -- and it should be good of course. In some uses cases, a large number of results have to be calculated. Or in cases of very complex queries, queries take longer to be executed in full in any database,” he told IDN. “For both scenarios, streaming cursors can be used to display the first results to a user while the rest is being calculated.”


Celler shared a real-world use case:


“Imagine a search on Amazon. If the page would only load when all products to the search term ‘iPhone’ would have been loaded, it might take 3 seconds and the user experience would be harmed. With streaming cursors the first 20 results relevant to a user’s query would be displayed in a few milliseconds. When the user scrolls through the results, the next batch of relevant results is already calculated and can be displayed quickly,” he said.


ArangoDB’s latest update, with what Celler called its “highly flexible backbone” serves developers working on a wide range of new projects, including cloud native apps, serverless, FaaS, and of course, DevOps.


ArangoDB can also runs everywhere, he added, from bare metal to hybrid clouds and orchestration systems like Kubernetes.


“With arangodb-kube, ArangoDB provides one of the simplest yet highly flexible solutions to run a database on Kubernetes as it can be deployed and run with only 7 lines of YAML code,” he said. ArangoDB also runs on Google Kubernetes Engine and Pivotal Kubernetes Service PKS. For more flexibility, custom routes via the ArangoDB Foxx framework and even GraphQL enables much simpler API management in a Microservice Architecture and performance gains by moving query processing closer to the data, Celler added.


In addition to the latest native multi-model capabilities, ArangoDB earlier released two key features: SmartGraphs & SatelliteCollections. Both make ArangoDB capable of scaling with all supported data models -- while providing high performance for queries against distributed data.  In specific:

  • SmartGraphs: When connected data (graph data) resides on different machines, a query has to jump between these machines to process their results which can often lead to network overhead and very slow query performance. These are common projects for cybersecurity, fraud detection, genomics and similar use cases where the datasets are huge but one needs very fast query execution. SmartGraphs handles these large datasets and queries efficiently.
  • SatelliteCollections: This feature is designed for use cases where large streams of data exceed the storage capabilities of a single machine and data is sharded to a cluster. IoT is a perfect example of such a case. For analytics, one needs to access the different data shards and bring them together (JOIN operation) to answer questions like “Did the data quality improve since the last firmware update?” “What are the areas of high demand for my self-driving cars and which areas should I support next?” With SatelliteCollections developers can shard large datasets to a cluster and replicate smaller datasets (including the firmware version) to each machine and process queries locally without the network overhead. The ability to JOIN datapoints together like this, can save companies up to 60% in storage needs compared to e.g. columnar stores because data does not need to be stored with every incoming event.


ArangoDB offers a community license and commercial subscription. Learn more about these options here.