Vector Embeddings 101: The New Building Blocks for Generative AI
Fundamental concepts to get you going and get the most from the latest craze in the generative AI community: vector databases.
In Emerging Architectures for LLM Applications, Andreessen Horowitz declared vector databases as “the most important piece of the preprocessing pipeline, from a systems standpoint, responsible for efficiently storing, comparing, and retrieving up to billions of embeddings (i.e., vectors).”
Why such a dramatic and definitive declaration of their importance?
Vectors matter because, despite their impressive capabilities, computers cannot easily understand text, images, audio, or other human-readable data formats. When analyzing these data types, we can instead represent them as numerical “vectors” which can be better processed computationally. And conventional databases are not built to process vectors, although, thanks to rising interest in generative AI, most are releasing vector support.
But let’s step back and explore why vector embeddings matter. This article explores the basic concepts of vectors, vector embeddings, and vector databases and why they’re helpful. Let’s dive in.
What is a vector?
A vector is a mathematical way to represent a fixed-length array of numbers representing both magnitude and direction. To have both magnitude and direction, a vector must have at least two dimensions plotted below at [1,1]. This can also be expanded to three dimensions, as shown with the vector [1,1,1]:
In Machine Learning applications, vectors can have thousands of dimensions – too many to even attempt to visualize. The critical point to note here is once something has been represented as a mathematical vector, it opens it up to mathematical vector operations such as measuring distances, calculating similarities, and performing transformations. These operations become crucial for various tasks, including similarity search, clustering, classification, and uncovering patterns and trends.
What is a vector embedding?
A vector embedding, or simply “an embedding,” is a vector created as the numerical representation of typically non-numerical data objects. Embeddings capture the inherent properties and relationships of the original data in a condensed format and are often used in Machine Learning use cases.
For instance, the vector embedding for an image containing millions of pixels, each with a unique color, hue, and contrast, may only have a few hundred or thousand numbers. In this way, embeddings are designed to encode relevant information about the original data in a lower-dimensional space, enabling efficient storage, retrieval, and computation. Simple embedding methods can create sparse embeddings, whereby the vector’s values are often 0, while more complex and “smarter” embedding methods can create dense embeddings, which rarely contain 0’s. However, these sparse embeddings are often higher dimension than their dense counterparts, requiring more storage space.
Unlike the original data, which may be complex and heterogeneous, embeddings typically strive to capture the essence of the data in a more uniform and structured manner. This transformation process is performed by what’s known as an “embedding model” and often involves complex machine learning techniques.
These models take in data objects, extract meaningful patterns and relationships from this data, and return vector embeddings which algorithms can later use to perform various tasks. Many sophisticated embedding models are openly available online, examples of which we will give in a later section of this article.
Four types of vector embeddings: text, image, audio, and time
The precise information contained in an embedding depends on the specific data types and the embedding technique employed.
Embeddings aim to capture semantic, contextual, or structural information relevant to the task. Each embedding model utilizes specific techniques and algorithms tailored to the type of data being dealt with and other characteristics of the data being represented.
Here, we can give examples of what features may be encoded for text, image, audio, and temporal data and provide a list of common techniques used to achieve this:
TEXT EMBEDDINGS
Text embeddings capture the semantic meaning of words and their relationships within a language. For example, they could encode semantic similarities between words, such as "king" being closer to "queen" than to "car".
Common models used for text embeddings include:
TF-IDF (Term Frequency – Inverse Document Frequency) creates sparse embeddings by assigning weights to words based on their occurrence frequency in a document relative to their prevalence across the entire dataset.
Word2Vec creates dense vector representations that capture semantic relationships by training a neural network to predict words in context.
BERT (Bidirectional Encoder Representations from Transformers) creates context-rich embeddings that capture bidirectional dependencies by pretraining a transformer model and using this to predict masked words in sentences.
IMAGE EMBEDDINGS
Image embeddings capture visual features like shapes, colors, and textures. For example, they might encode contrast between colors—orange objects are more similar to yellow objects than black objects.
Common models used for image embeddings include:
Convolutional Neural Networks (CNNs) create dense vector embeddings by passing them through convolutional neural network layers that extract hierarchical visual features from the images.
Transfer Learning with Pretrained CNNs like ResNet and VGG create embeddings by fine-tune pre-trained CNNs which have already learned complex visual features from large datasets.
Autoencoders are neural network models which are trained to encode and decode images by generating embeddings that capture compact representations of the raw images.
AUDIO EMBEDDINGS
Audio embeddings capture audio signals like pitch, frequency, or speaker identity. They’re helpful when encoding the sound of a piano and a guitar to have distinct numerical representations reflecting the acoustic features of each sound, enabling differentiation.
Common models for audio embeddings include:
Spectrogram-based Representations create embeddings by first converting the audio into visual representations, like spectrograms, and then applying image-based methods to embed these images as vectors.
MFCCs (Mel Frequency Cepstral Coefficients) create vector embeddings by calculating spectral features of the audio and using these to represent the sound content.
Convolutional Recurrent Neural Networks (CRNNs) create vector embeddings by combining convolutional and recurrent neural network layers to handle spectral features and the sequential context in creating informative audio representations.
TEMPORAL EMBEDDINGS
Temporal embeddings capture the patterns and dependencies in time-series data. They’re useful to encode time series patterns of heart rates in medical systems to compare the similarities between a person at rest, sleeping, or running a marathon.
Examples of temporal embedding models include,
LSTM (Long Short-Term Memory) models create embeddings by capturing long-range dependencies and temporal patterns in sequential data using a recurrent neural network (RNN) architecture.
Transformer-based Models create vector embeddings using a self-attention mechanism to capture complex temporal patterns in the input sequence.
Fast Fourier Transform (FFT) creates vector embeddings that capture periodic patterns and spectral information in the temporal data by converting it into its frequency-domain representation and extracting frequency components.
What can you do with vector embeddings?
So, what can we do with these vector embeddings once we have obtained them?
Similarity search: Use embeddings to measure the similarity between different instances. For example, in Natural Language Processing (NLP), you can find similar documents or identify related words based on their embeddings.
Clustering and classification: Use embeddings as the input features for clustering and classification models to train machine-learning algorithms to group similar instances and classify objects.
Information retrieval: Utilize embeddings to build powerful search engines that can find relevant documents or media based on user queries.
Recommendation systems: Leverage embeddings to recommend related products, articles, or media based on user preferences and historical data.
Visualizations: Visualize embeddings in lower-dimensional spaces to gain insights into the relationships and patterns within the data.
Transfer learning: Use pre-trained embeddings as a starting point for new tasks, allowing you to leverage existing knowledge and reduce the need for extensive training.
Vector embeddings: the foundation of your new generative AI house
Vector embeddings bridge the gap between human-readable data and computational algorithms. By representing diverse data types as numerical vectors, we unlock the potential for a wide range of Generative AI applications. These embeddings condense complex information, capture relationships, and enable efficient processing, analysis, and computation.
Armed with vector embeddings, you can explore and transform data to facilitate new ways to understand information, make better decisions, and innovate with generative AI applications.
Welcome to Vector Database Central, Nathan. Loving this article and looking forward to more to come!
Really informative Nathan, thanks for the insight.