This module will explore the basics of vector representations; explaining what vectors are, how they represent data in information retrieval and how similarities can be computed using mathematical methods.

Before diving in, if you need help, guidance, or want to ask questions, join our Community and a member of the Marqo team will be there to help.

**1. Definition of Vectors**

Vectors are fundamental mathematical objects used extensively in computer science and data science. A vector, \( \textbf{v} \), is typically described by a list of numbers [1]:

$$\textbf{v} = \left(6, 3, \dots, 7\right)$$

Note, they do not have to include *only* integers but any other number too, including decimals, fractions, etc.:

$$\textbf{v} = \left(0.1, \frac{1}{5}, \sqrt{2}, \dots, 5\right)$$

More generally, we write a vector, \( \textbf{v} \), in the form:

$$\textbf{v} = (v_1, v_2, \dots, v_n),$$

where \( n \) represents the number of dimensions of the vector (also the same as the number of elements in that vector).

For example, a vector comprised of two numbers is known as a two-dimensional vector. Take \( \textbf{v} = (1, 1) \) as an example. This can be represented on a two-dimensional \( (x, y) \) grid as follows:

where \( \textbf{v} = (x = 1, y = 1) \).

We can also extend vectors to three-dimensions as seen below. They can be extended to even higher dimensions but that gets a little tricky to visualise!

Two key components of vectors are their **magnitude** and **direction** - both features play a vital role in artificial intelligence systems as we’ll see in the next section.

**Magnitude:** The magnitude (or length) of a vector, \( \textbf{v} \), is calculated by a mathematical formula known as the Euclidean norm [1],

$$||\textbf{v}|| = \sqrt{v_1^2 + v_2^2 + \dots v_n^2~},$$

where \( n \) represents the dimension of the vector.

**Direction:** The direction of a vector, \( \theta \), is given by the angle it forms with the coordinate axes. In two-dimensions, the direction, \( \theta \), of a vector \( \textbf{v} \), can be found using the following formula [1],

$$\theta = \tan^{-1} \left(\frac{v_2}{v_1}\right).$$

So for our previous two-dimensional example where \( \textbf{v} = (1, 1) \), the magnitude is given by,

$$||\textbf{v}|| = \sqrt{1^2 + 1^2} = \sqrt{2},$$

and the direction is,

$$\theta = \tan^{-1} \left(\frac{1}{1}\right) = \tan^{-1}(1) = 45^\circ.$$

Both these quantities can be seen below.

It’s worth noting that **vectors** represent quantities with magnitude and direction in a one-dimensional space. To extend this to higher dimensions, we use what are known as **tensors**. These are more general mathematical objects that extend the concept of vectors to higher dimensions, allowing for the representation of multidimensional arrays of numbers with multiple indices. This comes in handy when we have *a lot* of data!

So vectors are pretty cool, right? But what do they have to do with computer science and artificial intelligence systems?

**2. Vector Representation**

We’ve seen that vectors store information in the form of numbers. So, the next question is, can these vectors of numbers describe something more complex? A word? A sentence? Even an image? Well, the answer is yes!

Vectors can be used as numerical representations of complex information such as words, sentences, images, videos and even more. This makes sense because, for a computer to understand our input, we have to convert it into machine-readable format. Not only this but language inherently contains a lot of information within it, requiring a considerable volume of data to represent even small snippets of text! Let’s start with an example.

The word “King” might be represented as the vector \( \textbf{v}_K = (0.5, 0.1, 0.3, \dots) \) and the word “Queen” might be represented by the vector \( \textbf{v}_Q = (0.6, 0.2, -0.4, \dots) \) [2]. We’ve added subscripts to the vectors here so we know which is “King” and which is “Queen”. If you’re wondering how we generate these vectors, we’ll be covering that in the next module!

If you look at the numbers in each of the vectors \( \textbf{v}_K \) and \( \textbf{v}_Q \), you might notice that they’re closely related. Here’s what the vectors might look like in two-dimensional space [2]:

Both vectors are close in the vector space which indicates their *semantic* similarity. Semantic similarity refers to the likeness or closeness in meaning between two pieces of text, words, phrases, sentences, or documents.

This concept of representing words as vectors and measuring their semantic similarity isn't limited to just words like "King" and "Queen". We can extend this idea to represent entire sentences, paragraphs, or documents in a similar manner!

For instance, a sentence like "The dog sat on the mat" might be represented as a vector, and another sentence like "A canine laid on the rug" would have its own vector representation. Despite using different words, these sentences convey similar meanings, and thus their vectors would likely be close together in the vector space.

Furthermore, this approach isn't confined to textual data. Images, for example, can also be represented as vectors. Each pixel in an image can be assigned a value, and these values collectively form a vector representation of the image [3]. Similarly, videos, audio clips, and even more complex data structures can be represented using vectors.

By leveraging these vector representations, we can perform various operations such as measuring similarity, performing classification tasks, or even generating new content through techniques like vector arithmetic or neural networks.

The vectors mentioned in this section are also known as **vector embeddings**. These embeddings are crucial when building efficient artificial intelligence systems. In the next module, we'll look into how these vector embeddings are generated and how they can be applied to solve real-world problems across different domains. First, we will establish the different types of vectors.

**3. Dense vs Sparse Vectors**

We now know that we can represent language as vectors but there are two options we can take with this. We can either use **dense** or **sparse** vectors.

Let’s consider the make-up of both dense vs sparse vectors. Dense vectors have very few elements inside them that are zero. Meaning these vectors are populated with *dense* numerical information:

$$\textbf{v}_{dense} = (0.3, -0.4, 0.9, 0.9, 0.7, 0.1, ...., 0.5).$$

Dense vectors are straightforward to work with and are often used when the data has no inherent sparsity or when computational efficiency isn't a major concern.

On the other hand, sparse vectors have many elements inside them that are zero. Meaning they are *sparse* with numerical information.

$$\textbf{v}_{sparse} = (0, 0, 0, 0, 0 ,1, ...., 0).$$

These are used when the data being represented is inherently sparse, meaning there are many missing or zero values. Sparse vectors are memory-efficient and often preferred when dealing with high-dimensional data where most values are zero.

What does this mean in the context of representing data? Let’s consider the following sentences:

and,

Both sentences have very different meanings but both contain the *exact same words* in each sentence. For this reason, sparse vectors would generate a perfect (or near-perfect) match for these two sentences. That’s because sparse vectors are based on the presence or absence of words, rather than their order or context. Therefore, they treat sentences with the same words as highly similar, even if the meanings differ significantly.

While sparse vectors represent text syntax, dense vectors are able to capture the semantic meaning behind the information they are representing. The corresponding dense vector representation of the sentences would not be as similar. This highlights the advantage of dense vectors in capturing the nuanced meanings and contextual relationships within text, making them more effective for tasks requiring semantic understanding.

Let’s say we create dense vectors for all the words in a cookbook, reduce the dimensionality of the vectors and visualise them in 3D, they may look as follows:

Notice how words with similar meaning (semantically similar) are clustered together. In this case, we have the names of fruit clustered together. This consequence of vector generations means that machine learning models can effectively capture the semantic relationships between words and entities.

Another famous example [2] in the field of Natural Language Processing (NLP) is the equation:

$$\text{King - Man + Woman = Queen}$$

This equation exemplifies the power of dense vectors in NLP. In this equation, words are represented as dense vectors in a high-dimensional space, capturing relationships through vector arithmetic. Here’s what it may look like in vector space:

By subtracting the vector representation of “Man” from that of “King” and adding it to “Queen”, the model learns to capture the relationship between these words. Pretty cool!

But, how do we compute whether vectors are similar to one another?

**4. Similarity Measures**

Similarity measures in vector embeddings are important because they allow us to quantify how closely related words or entities are in meaning. These measures allow for efficient retrieval of similar items, relevance ranking, and personalised recommendations in applications such as information retrieval, recommendation systems, and clustering.

One of the most common similarity measures is **cosine similarity**. This measures the cosine of the angle between two vectors, \( \textbf{u}, \textbf{v} \) in a multi-dimensional space. Mathematically, we calculate this by,

$$\text{cosine similarity} = \cos(\theta) = \frac{\textbf{u} \cdot \textbf{v}}{||\textbf{u}|| \cdot ||\textbf{v}||},$$

where \( \textbf{u} \cdot \textbf{v} \) is the dot product of \( \textbf{u} \) and \( \textbf{v} \), and \( ||\textbf{u}|| \) and \( ||\textbf{v}|| \) are the magnitudes of the vectors \( \textbf{u} \) and \( \textbf{v} \).

Computing the cosine similarity will return a value in the range, \( \cos(\theta) \in [-1, 1] \), where a value of 1 implies perfectly aligned vectors, 0 indicates no similarity and -1 indicates maximum dissimilarity.

Let’s test this out through code! For this course, we will be using Google Colab (it’s free!). If you are new to Google Colab, you can follow this guide on getting set up - it’s super easy! For this module, you can find the notebook on Google Colab here or on GitHub here. As always, if you face any issues, join our Slack Community and a member of our team will help!

``

The output:

Notice how the two vectors have the same entries except for the last number which deviates by only 0.1. This is why we have a cosine similarity score near to 1 because the vectors are closely related.

Let's try the same but for very different vectors.``

The output:

Notice how we now have a negative value for the cosine similarity. This means that these two vectors are not similar. Why don't you try out a few different vectors (try varying the number of elements inside) and see what results you get!

**5. Summary**

In this module we’ve covered how and why vectors are used to represent various types of information. But how do we actually generate these vector embeddings? Let’s find out in the next module!

**6. References**

[1] M. Corral, Vector Calculus (2022)

[2] T. Mikolov, et al., Linguistic Regularities in Continuous Space Word Representations (2013)

[3] Y. Lucen, et al., Gradient-Based Learning Applied to Document Recognition (1998)