FastEmbed + Qdrant AI Image classification Python Code

Introduction

In this article, we'll explore how to use FastEmbed and Qdrant for image classification. FastEmbed is a lightweight and fast Python library designed for generating high-quality text and image embeddings. Meanwhile, Qdrant is a vector database that supports similarity search with various options, including cosine similarity. This article will guide you through the integration of these tools, the creation of embeddings for images, and the process of classification using Python.

Introduction

FastEmbed is a versatile library that can handle both text and image embeddings. Combined with Qdrant, a powerful vector database, these tools can perform image classification efficiently even on older hardware. We'll demonstrate how to set up the environment, preprocess the data, embed the images, and perform classification.

Setting Up the Environment

Start by installing the required libraries. FastEmbed and Qdrant can be installed using pip:

pip install fast-embed
pip install qdrant-client[fast_embed]

Optionally, you can also install Docker to run the Qdrant vector database locally.

Understanding the Specifications

Before diving into the code, let's review the hardware specifications used:

Processor: Intel Core i5-8350U
Frequency: 1.7 GHz
Cores: 4 (2 threads per core)

This setup ensures that FastEmbed and Qdrant can run efficiently even on older laptops.

Data Preparation

We'll classify images into two categories: human and non-human. Organize your dataset into two directories: human and not_human. You can preprocess the images using standard Python libraries.

Loading the Model

FastEmbed supports various models, including the Clip Vision Encoder, which will be used for image embeddings. Load the model as follows:

from fast_embed import image_embedding
from qdrant_client import QdrantClient, models

client = QdrantClient()
model = image_embedding.ClipVisionEncoder()

Embedding Images

Walk through the directories to load and embed the images:

import os
from tqdm import tqdm

image_paths = []
labels = []
for root, dirs, files in os.walk('images_data_set'):
    for file in files:
        if file.endswith('.jpeg'):
            image_paths.append(os.path.join(root, file))
            labels.append(os.path.basename(root))

embeddings = model.embed(image_paths)

Creating the Qdrant Collection

Create a collection in Qdrant to store the embeddings:

collection_name = "image_classification"
client.recreate_collection(
    collection_name,
    vectors_config=models.VectorParams(
        size=len(embeddings[0]),
        distance=models.Distance.COSINE
    )
)

points = [
    models.PointStruct(
        id=i,
        vector=embedding,
        payload=("label": label)
    )
    for i, (embedding, label) in enumerate(zip(embeddings, labels))
]

client.upsert(
    collection_name=collection_name,
    points=points
)

Performing the Search

Now, let's classify a new image by embedding it and searching for its nearest neighbor:

new_image_path = 'user_image/new_image.jpeg'
new_embedding = model.embed([new_image_path])[0]

search_result = client.search(
    collection_name=collection_name,
    query_vector=new_embedding,
    limit=1
)

print(f"The new image is classified as: (search_result[0].payload['label'])")

Conclusion

FastEmbed and Qdrant offer a robust and efficient solution for image classification, even on less powerful hardware. Combining a lightweight embedding library with a versatile vector database makes it easier to handle large datasets and complex queries.

Keywords

FastEmbed
Qdrant
Image Classification
Python Library
Embeddings
Vector Database

FAQs

What is FastEmbed?

FastEmbed is a lightweight and fast Python library designed for generating high-quality embeddings for both text and images.

What is Qdrant?

Qdrant is a vector database designed for high-performance similarity search tasks. It supports various distance metrics, including cosine similarity.

How do I install FastEmbed and Qdrant?

You can install both libraries using pip:

pip install fast-embed
pip install qdrant-client[fast_embed]

Can FastEmbed and Qdrant run on older hardware?

Yes, these tools are designed to be efficient and can run on relatively old hardware. The example provided uses an Intel Core i5-8350U processor.

What models does FastEmbed support for image embeddings?

FastEmbed supports several models, including the Clip Vision Encoder, which is suitable for image embeddings.

Do I need the internet connection to use Qdrant?

No, you can run Qdrant locally using Docker, which allows for offline usage and better data privacy.

Is it possible to use Qdrant with other languages like Rust or TypeScript?

Yes, Qdrant provides client libraries for Rust, TypeScript, Java, and more.

With this article, you now have a comprehensive guide to implementing image classification using FastEmbed and Qdrant. Feel free to experiment and adapt the code to your specific needs.