FastEmbed + Qdrant AI Image classification Python Code
Science & Technology
Introduction
In this article, we'll explore how to use FastEmbed and Qdrant for image classification. FastEmbed is a lightweight and fast Python library designed for generating high-quality text and image embeddings. Meanwhile, Qdrant is a vector database that supports similarity search with various options, including cosine similarity. This article will guide you through the integration of these tools, the creation of embeddings for images, and the process of classification using Python.
Introduction
FastEmbed is a versatile library that can handle both text and image embeddings. Combined with Qdrant, a powerful vector database, these tools can perform image classification efficiently even on older hardware. We'll demonstrate how to set up the environment, preprocess the data, embed the images, and perform classification.
Setting Up the Environment
Start by installing the required libraries. FastEmbed and Qdrant can be installed using pip
:
pip install fast-embed
pip install qdrant-client[fast_embed]
Optionally, you can also install Docker to run the Qdrant vector database locally.
Understanding the Specifications
Before diving into the code, let's review the hardware specifications used:
- Processor: Intel Core i5-8350U
- Frequency: 1.7 GHz
- Cores: 4 (2 threads per core)
This setup ensures that FastEmbed and Qdrant can run efficiently even on older laptops.
Data Preparation
We'll classify images into two categories: human and non-human. Organize your dataset into two directories: human
and not_human
. You can preprocess the images using standard Python libraries.
Loading the Model
FastEmbed supports various models, including the Clip Vision Encoder, which will be used for image embeddings. Load the model as follows:
from fast_embed import image_embedding
from qdrant_client import QdrantClient, models
client = QdrantClient()
model = image_embedding.ClipVisionEncoder()
Embedding Images
Walk through the directories to load and embed the images:
import os
from tqdm import tqdm
image_paths = []
labels = []
for root, dirs, files in os.walk('images_data_set'):
for file in files:
if file.endswith('.jpeg'):
image_paths.append(os.path.join(root, file))
labels.append(os.path.basename(root))
embeddings = model.embed(image_paths)
Creating the Qdrant Collection
Create a collection in Qdrant to store the embeddings:
collection_name = "image_classification"
client.recreate_collection(
collection_name,
vectors_config=models.VectorParams(
size=len(embeddings[0]),
distance=models.Distance.COSINE
)
)
points = [
models.PointStruct(
id=i,
vector=embedding,
payload=("label": label)
)
for i, (embedding, label) in enumerate(zip(embeddings, labels))
]
client.upsert(
collection_name=collection_name,
points=points
)
Performing the Search
Now, let's classify a new image by embedding it and searching for its nearest neighbor:
new_image_path = 'user_image/new_image.jpeg'
new_embedding = model.embed([new_image_path])[0]
search_result = client.search(
collection_name=collection_name,
query_vector=new_embedding,
limit=1
)
print(f"The new image is classified as: (search_result[0].payload['label'])")
Conclusion
FastEmbed and Qdrant offer a robust and efficient solution for image classification, even on less powerful hardware. Combining a lightweight embedding library with a versatile vector database makes it easier to handle large datasets and complex queries.
Keywords
- FastEmbed
- Qdrant
- Image Classification
- Python Library
- Embeddings
- Vector Database
FAQs
What is FastEmbed?
FastEmbed is a lightweight and fast Python library designed for generating high-quality embeddings for both text and images.
What is Qdrant?
Qdrant is a vector database designed for high-performance similarity search tasks. It supports various distance metrics, including cosine similarity.
How do I install FastEmbed and Qdrant?
You can install both libraries using pip:
pip install fast-embed
pip install qdrant-client[fast_embed]
Can FastEmbed and Qdrant run on older hardware?
Yes, these tools are designed to be efficient and can run on relatively old hardware. The example provided uses an Intel Core i5-8350U processor.
What models does FastEmbed support for image embeddings?
FastEmbed supports several models, including the Clip Vision Encoder, which is suitable for image embeddings.
Do I need the internet connection to use Qdrant?
No, you can run Qdrant locally using Docker, which allows for offline usage and better data privacy.
Is it possible to use Qdrant with other languages like Rust or TypeScript?
Yes, Qdrant provides client libraries for Rust, TypeScript, Java, and more.
With this article, you now have a comprehensive guide to implementing image classification using FastEmbed and Qdrant. Feel free to experiment and adapt the code to your specific needs.