



New similarity search operator for pgvector

The pgvector extension also introduces new operators for performing similarity matching on vectors, allowing you to search for semantically similar vectors. Two such operators are:

<->: Returns the Euclidean distance between two vectors. Euclidean distance is suitable for applications where vector magnitude is important, such as mapping and navigation applications, or for implementing K-means clustering algorithms in machine learning.

<=>: Returns the cosine distance between two vectors. Cosine similarity is suitable for applications where the direction of the vector is important, for example trying to find the most similar documents to a given document for implementing recommendation systems or natural language processing tasks.

The sample application uses the cosine similarity search operator.

Building a sample application

Let’s start building applications using pgvector and LLM. It also uses LangChain. It is an open-source framework that provides several pre-built components that make it easy to create complex applications using LLM.

The entire application is available as an interactive Google Colab notebook in Cloud SQL PostgreSQL. You can run this sample application directly from your web browser without any additional installation or writing a single line of code.

Follow the instructions in the Colab notebook to set up your environment. Note that the notebook will create a Cloud SQL PostgreSQL instance if an instance with the required name doesn’t exist. Running notebooks may incur Google Cloud charges. You may be eligible for a free trial to get credit for these costs.

Loading toy dataset

This sample application uses the example of an e-commerce company that operates an online marketplace for buying and selling children’s toys. The dataset in this notebook was sampled and created from a large public retail dataset available on Kaggle. The dataset used in this notebook contains only about 800 toy products, while the public dataset contains over 370,000 products in various categories.

After setting up the environment using the instructions provided in the Colab notebook, load the provided sample dataset into a Pandas data frame. For reference, here are the first 5 rows of the dataset.

