Building an Industrial AI Companion using Arundo Foundation
While large language models (LLMs) may struggle to generate insights directly from time-series data, their strong pattern-recognition capabilities make them well-suited for identifying and synthesizing already existing insights. We therefore argue that an architecture featuring a separate graph-based domain model and time-series data storage is the ideal architecture for a generalized AI companion serving the heavy asset industry: An agentic AI can first identify relevant nodes (assets, sensors, and a large variety of pre-computed models) for answering a user prompt using a mixture of APIs, query language generation, semantic search, and graph traversal. Only after this filtering step is time-series data accessed and retrieved to synthesize an answer.
Domain and Compute Architecture – The Key to an Industrial AI Companion
In asset-heavy industries like energy, shipping, and manufacturing, the operational complexities of managing vast asset networks with intricate relationships demand timely, insightful analyses. Yet, the large volume of time-series data, often produced by thousands of sensors and connected assets, presents a significant challenge in extracting actionable intelligence.
At first glance, one might expect that large language models (LLMs) could be the perfect solution to the overwhelming challenge of extracting intelligence from large amounts of time-series data. Given their remarkable ability to parse and synthesize large amounts of unstructured information, LLMs seem well-suited to handle data volume and complexity. However, they fall short in this context because they’re not trained on time-series data – they struggle to interpret trends, cyclic behaviors, or the timestamped, sequential nature of sensor data.
To address these limitations, the Arundo Foundation combines a knowledge graph, time-series database, and compute layer to complement LLM capabilities, allowing them to focus on what they do best: interpreting, contextualizing, and synthesizing insights rather than analyzing raw time-series data.
The Foundation is fully configurable and agnostic to the shape of the underlying domain data, allowing it to adapt to a wide range of data types and structures. This flexibility makes it possible to start by onboarding a small part of operations and scale gradually, building insights as you go.
In addition to time-series data, integrating other sources—such as user manuals, maintenance logs, and other unstructured text—can be included as nodes in the knowledge graph, further enriching insights drawn from sensor reading.
- Knowledge graph: The knowledge graph serves as a rich, structured repository of relationships and metadata, representing logical and physical assets, sensors, and various computational model outputs as well-described nodes connected by edges, with pointers to the related data in the time-series database.
- Time-series database: The time-series database efficiently manages and stores large volumes of historical sensor readings and model outputs over time, enabling rapid access to time-stamped data.
- Compute layer: The compute layer is designed to enable seamless and scalable model deployment and execution. Basic models, such as efficiency metrics, data quality checks, and outlier detection, are embedded directly in the layer. More advanced models, including pre-trained machine learning models are deployed and executed by the layer. The meaning of all the model outputs are described in the knowledge graph.
In particular, the capability to deploy and manage a large volume of models is transformative. The architecture lets LLMs interpret the outputs of thousands of models, all tuned to specific asset behaviors. With each model contributing targeted insights into different aspects of performance, maintenance, or efficiency, the LLM can synthesize these outputs into cohesive, actionable narratives. In essence, the architecture magnifies the value of LLMs, turning them into interpreters of an ecosystem of insights.
Figure: Illustration of the Arundo Foundation Knowledge graph, showing Assets, Computes, and Sensors. For example, the “Reactor” asset has multiple child assets, two of which are “Heat Exchanger” and “Valve”. The “Heat Exchanger” asset has two sensors “Outlet Temperature” and “Inlet Temperature” and a compute “Differential Temperature” using these two sensors as inputs.
How an AI Companion Supports Decision Making: A Natural Gas Shipping Example
Consider a natural gas shipping fleet as an example. Each vessel has a variety of physical assets, each with multiple sensors monitoring physical quantities such as temperature, pressure, and flow rates. In addition, we have a mix of both domain-specific models predicting known failure modes and efficiency metrics, as well as generalized data drift/quality and anomaly detection models.
When an operator asks a question to the companion, such as “What is the likelihood of a cooling system failure on the vessel Helios in the next month?” an agentic LLM workflow is activated, which performs the following steps – not in a hard-coded sequence, but through a general interface agent calling on more specialized agents as needed.
- Identifying Relevant Nodes: First, the AI Companion must locate relevant nodes in the graph database to answer the question. It can do this in multiple ways: by calling APIs, generating and running database queries (such as Cypher for Neo4j), using semantic search with vector embeddings of node descriptions, or traversing the graph starting from high-level asset nodes. For example, when the query references "cooling system" and "vessel Helios," the Companion locates nodes connected to Helios' cooling system, associated sensors (such as those monitoring temperature and pressure), and predictive maintenance models related to cooling system failure.
- Retrieving Time-Series Data: After relevant nodes are identified, another agent uses APIs to request the necessary data (for sensor and model nodes only) from the time-series database. This agent also has to decide which resolution is needed to answer the question – or ask the user, if it is not certain.
- Synthesizing the Answer: With data from the graph and time-series databases, yet another agent synthesizes a clear, actionable response – including the lineage for the data used, and generating supporting plots when relevant. For instance: “Based on recent temperature and pressure data, there is a 15% likelihood of cooling system failure on Vessel Helios within the next month. Maintenance is recommended within the next two weeks.”
Future Potential: Enhanced Decision-Making Across Industries
Looking ahead, the integration of LLMs with a dedicated graph and time-series data architecture signals a transformative step forward for the heavy asset industry. As data environments grow increasingly complex, AI companions like the one described here will become essential for distilling meaningful insights from vast data networks. By combining LLMs’ pattern-recognition capabilities with a graph database for structural relationships, a time-series database for historical data, and a compute layer for model deployment, companies can unlock a new level of operational intelligence. This system will enable rapid and accurate responses to complex queries, support predictive maintenance, optimize asset performance, and ultimately drive smarter, more effective decision-making. As this architecture evolves, it promises to enhance safety, reliability, and efficiency in asset-heavy industries, setting a new standard for digital intelligence in industrial operations.
If you would like to learn more or discuss any of the topics of this article further, please contact us for a meeting now at contact@arundo.com.
Authors
Written by Lars Bjålie, Sr. Data Scientist at Arundo.