Why Your Data Might Need a Time Machine
Vector databases with time series capabilities enable a new class of generative AI applications.
In H.G. Wells’ The Time Machine, our hero leaps forward to the year 802,701. Imagine the predictive power you would yield. On the other hand, in George Orwell’s 1984, most of the time travel is back into the past before the Ministry of Truth assumes power. Employees can twist and obliterate facts into the version of recent history that the government of Oceania wants everyone to believe.
Both books reflect on our current situation: a hunger to access enormous amounts of data, both historical and real-time, and to use that data to find risk and opportunity, uncover actionable insight, and optimize capacity and coverage.
AI systems today often make predictions that can’t be fully explained without time travel. While we can’t yet go forward, time-series databases certainly make it easier to go back and time to make sure history wasn’t revised.
Here’s how data time machines work and why they matter.
Five Data Time Machine Requirements
If you’re using AI to run your business, you need a database and analytics engine that allows you to predict the future and plumb the past. In other words, a data time machine.
A data management platform that supports time has five characteristics:
All data is online, with no need for offline archives,
All data is instantly discoverable and queryable,
Data can be replayed and rewound as many times as required,
All data has a real-time view of the current location and state
Query performance does not degrade as data volume grows
Data time machines are designed to support temporal queries like “How has the volatility in US stock markets evolved over the last three decades?” to “What is the volatility in the market right now, and how does that compare to the volatility of 5 seconds or milliseconds ago?”
The Applications of a Data Time Machine
You need a time machine to show your work, like in grade school when your math teacher required you to lay out the steps to get to the answer “8 x 4.” Audit trails help make AI predictions more explainable in a world full of data.
For good reason, temporal explainability is especially important in regulated industries such as financial services or pharmaceutical manufacturing and is subject to audits. If you’re basing a decision about a power grid, a supply chain, or a trading strategy on several petabytes of data, it’s crucial to pinpoint which data was used to support your strategy. Early generative AI has already shown disastrous results because its similarity search properties are unmoored from a precise audit trail.
Some common applications of a data time machine include,
For a trading strategy forensics, pick a point in time and retrieve the state of the market, then replay events to understand what market micro-movements led to your automated trading strategy firing — which rules fired, in what order, and why.
Gather regulatory evidence for enforcement.
Recreate the “crime scene” of nefarious financial market activity and use it as evidence to prosecute.
Provide technology telemetry – data from remote sensors – from a central location, then analyze it to monitor and control remote devices in areas such as latency.
The industries that require this kind of data time travel are endless — any regulated industry, any application that processes sensor data, or tracks complex interactions can use time machines to understand why and what happened.
Your Time Machine Résumé
So, what do you need to build the best data management platform that works as an effective time machine in your organization? There are two core requirements.
First, use a vector database. Vector databases are designed to handle all vector types and store each piece of data as a numerical value. In generative AI, vector processing means indexing and storing data – especially unstructured data such as images, video, audio, or social media – as numerical values based on similarities.
This dramatically simplifies retrieving data and finding similarities or meaningful anomalies in it. In addition, vector databases are designed to handle all vector types, not just those geared to vector embeddings. Vector databases are also AI-fast. Indexing vector-embedded numbers and processing them as stored vectors rather than rows increases query speeds up to 100 times over traditional databases.
Second, use it to organize data by time. The raw material of any data time machine is data, and if you want to randomly access time rather than relying on archives and backups – as McIntosh’s Time Machine backup technology does – your data must be timestamped. Since time is both a vector and a characteristic of any data object, it makes sense to store and query time-series data in a vector database.
If your technology doesn’t timestamp everything naturally, you’ll be disadvantaged when constructing audit trails of data in your AI applications.
Is your database time machine enough? Typical requirements include being able to create dynamic temporal snapshots to expose all your data based on parameters, snapshot data just before (or after) an end date or time, query data between the start and end date/time, or query data between the start and end date/time aggregated into the user-defined time intervals.
You may need a data time machine if you can’t ask and answer these questions quickly.
A Data Timehouse is a Microscope and Telescope for Temporal Data
In physics, you need two separate tools to cover the ends of the known universe, from the edge of the galaxies to sub-atomic particles. In a data time machine, you also need technology that gives you the full range of abilities:
the microscopic power to pinpoint your data down to the most granular, nanosecond level
the telescopic power to scan your entire database, potentially decades of information, with the same level of granularity
the ability to randomly pick the times and the dimensions in which you’d like to view your data
All of this is important for the traceability of AI models. Time, of course, is continuous, and you need the right kind of vector technology to traverse it.
So, if you’re building generative AI applications within your company and looking for vendors who can help, ask them whether they can help you do vector processing with vectors that timestamp, store, and manage time-based information. In other words, ask them if they can help you build a time machine. To succeed long-term in the AI era, you’re going to need one.