What is a Data Timehouse?
Conventional databases are not designed for time. Here's why you should consider using a Data Timehouse for digital business applications and applied generative AI.
Time, as Einstein said, is the fourth dimension. Some of our biggest questions involve time and space. For humans, time is the underlying fabric that ties every facet of life together. It adds context, allowing us to comprehend what occurred, when, and perhaps, why.
Machines need to understand time, too. The ultimate “natural sequence” helps uncover patterns and derive meaning from information. That's where the Data Timehouse comes in - a novel technique for data management that belongs in every enterprise toolkit.
Data Timehouse technology helps humans and machines explore time together. Whether you're a data scientist, programmer, or analyst, temporal data can help you uncover understanding faster and more efficiently than ever.
Why We Need a New Way to Manage Time
Recently, two developments in the technology sector have created a compelling need for more efficient ways to manage and organize time-oriented data.
First, sensor data is now more accessible and inexpensive than ever before. Almost every product we buy today has embedded sensors emitting temporal sequence data. Answering time-related questions such as "What happened?" "What's about to happen?" and "What could happen if we make some tweaks?" has become a universal challenge.
Second, the emergence of data science has spurred a widespread demand for more efficient data storage in the form of time and vectors. Organizing data in this fashion is crucial for AI algorithms such as approximate nearest neighbors (ANN), used for similarity search and anomaly detection.
A Data Timehouse is vital for applications that manage IoT sensor data or employ algorithms like similarity search.
A database designed for time has three essential elements.
Three Elements of a Data Timehouse
A Data Timehouse is a specialized database that stores, compresses, indexes, and retrieves data ordered by time. This makes them a better choice for applications that process time than relational, unstructured, graph, or STAR schema data stores.
They have three essential elements:
#1: A DATA TIMEHOUSE MUST STORE, ORGANIZE, OPTIMIZE DATA ACCORDING TO TIME
Technology must be up to fifty times faster than alternatives to be genuinely game-changing. A Data Timehouse is designed from the bottom up to store data on disk in temporal order. In this way, a Data Timehouse matches data storage with the in-memory representation used to process it.
For example, RxDataScience/Syneos, a company specializing in clinical trials, uses time to evaluate the similarity between potential participants and temporal patterns about their participation in trials. By using a Data Timehouse to manage data about the temporal order and relationships between trial participants, many of their queries execute, according to the company, 100 times faster at 1/10th the computing cost.
The approach guarantees the selection of participants that accurately represent the right patients, a critical element for effective trials. This, in turn, ensures the trial process is smoother and more efficient.
#2: THEY PROVIDE A QUERY LANGUAGE THAT MAKES IT EASY TO ASK AND ANSWER QUESTIONS ABOUT TIME
A Data Timehouse provides a time-based query interface that makes it easy to ask questions about time. For example, imagine you manage apps that manage billions of IoT sensors and need to answer questions like, “What happened at 9:29 AM this morning?” Here’s how a Data Timehouse might help you answer this question:
Modern Data Timehouse tools provide this query interface from various languages like Python, SQL, Tensorflow, and low-code / no-code visual interfaces.
By providing a direct representation of temporal data in the query language, a Data Timehouse helps developers express logic about time more easily. This is why Data Timehouse vendor KX Systems says their engine answers questions “at the speed of thought.”
#3: THEY’RE DESIGNED FOR STREAMING DATA
IoT-connected devices transmit data best organized by time, making it crucial to collect, filter, collate, and store as time-series data. The catch is, connected devices emit lots of data: thousands, millions, billions, or trillions of daily updates.
To make sense of this information, it's essential to filter, aggregate, and store on the fly; applications rely on viewing streaming data as quickly as possible, within minutes, seconds, or, in real-time scenarios, in milliseconds, ensuring applications act while insights still matter.
A Data Timehouse is an ideal format to store streaming data in temporal order, which makes them perfect for answering typical questions that enterprises might ask, like:
At what point did the temperature reading reach hazardous levels and continue to exceed them for more than five minutes?
Display the most recent five attempts to access the potentially compromised account.
Retrieve this stock's five most recent instances of sudden volatility changes.
To achieve high-speed ingest, a Data Timehouse uses micro-batching, data compression, and time windows to store data efficiently. This requires balancing tradeoffs between volume, latency, and algorithmic filtering. Some Data Timehouses provide control over these tradeoffs, enabling developers and administrators to tailor storage to their application's specific requirements.
Why We Need Something New
Conventional data management tools aren’t designed to store time. Data Timehouse technology fills this void and has been declared a “Next Big Thing” by industry analysts, the next in a long line of innovations that began with the relational database over sixty years ago:
Last month, Gartner Distinguished VP Analyst Daryl Plummer examined the economic shifts associated with the rise of geospatial data and the use of data in large language models for AI. In his keynote, he told chief data and analytics officers to stop force-fitting temporal data into a conventional data store (you can watch the full presentation on YouTube, beginning at 15:24).
Plummer called this technology a “Temporal Data Warehouse,” that moves us from “data visualization to insight realization” and provided a logical view of the technology and how it relates to traditional data warehousing technologies.
This substack explores vector databases and data timehouse technology with a weekly post about how they work, why they matter, emergent use cases, industry trends, and more.
It’s about Time!