Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Related tags

Searchwagtail-clip
Overview

Wagtail CLIP

Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Intro Video

This project was inspired by, and draws heavily from memery, by deepfates et al. It makes use of OpenAI's CLIP model (it was very nice of them to open source it, cheers).

An example project is available here

Installation

You can install this package as follows (requires Git):

pip install \
    [email protected]+https://github.com/MattSegal/wagtail-clip.git \
    -f https://download.pytorch.org/whl/torch_stable.html \
    torch==1.7.1+cpu \
    torchvision==0.8.2+cpu

You will find that this installs ~200MB of deep learning libraries (PyTorch). You will also need to update your Django project's settings:

INSTALLED_APPS = [
    # ... whatever ...
    "wagtailclip",
]

# A place to store ~330MB of downloaded model parameters
WAGTAIL_CLIP_DOWNLOAD_PATH = "/clip"
# Maximum number of search results
WAGTAIL_CLIP_MAX_IMAGE_SEARCH_RESULTS = 256
# A unique name for the search backend.
WAGTAIL_CLIP_SEARCH_BACKEND_NAME = "clip"
# Recommended model, or your can roll your own (read the source).
WAGTAILIMAGES_IMAGE_MODEL = "wagtailclip.NaturalSearchImage"
# Add the search backend.
WAGTAILSEARCH_BACKENDS = {
    # ... whatever ...
    WAGTAIL_CLIP_SEARCH_BACKEND_NAME: {
        "BACKEND": "wagtailclip.search.CLIPSearchBackend",
    },
}

That's enough to get started, however if you want pre-download the ~330MB of model parameters, you can run this management command:

./manage.py download_clip

How it works

This package wraps the CLIP model. which can be used for:

  • encoding text into 1x512 float vectors
  • encoding images into 1x512 float vectors

These vectors can be thought of as points in a 512 dimensional space, where the closer two points are to each other, the more "related" they are. Importantly, CLIP encodes both text and images into the same space, meaning that we can:

  • encode all Wagtail images into vectors and store them in the database
  • encode a user's search query text into a vector; and then
  • compare the search query vector with all the image vectors

This comparison is done using a dot product to get a similarity score for each image. The operation is performed in Python. Once we have a similarity score we pick the top N (say, 256) most similar images and return those as the results.

Will this scale?

Haha probably not. I've tested my naive implementation on up to 2048 images and it runs OK (~3s / query). There are specialized Postgres extensions and vector similarity databases that you can use if you want to do this for tens of thousands of images.

Contributing

If you want to help out, make a pull request and/or email me at [email protected] or DM me on Twitter. Probably better to talk to me first before writing a bunch of code.

Owner
Matt Segal
Matt Segal
Home for Elasticsearch examples available to everyone. It's a great way to get started.

Introduction This is a collection of examples to help you get familiar with the Elastic Stack. Each example folder includes a README with detailed ins

elastic 2.5k Jan 03, 2023
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

elastic 463 Dec 30, 2022
A web search server for ParlAI, including Blenderbot2.

Description A web search server for ParlAI, including Blenderbot2. Querying the server: The server reacting correctly: Uses html2text to strip the mar

Jules Gagnon-Marchand 119 Jan 06, 2023
ForFinder is a search tool for folder and files

ForFinder is a search tool for folder and files. You can use that when you Source Code Analysis at your project's local files or other projects that you are download. Enter a root path and keyword to

Çağrı Aliş 7 Oct 25, 2022
🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

Kaneki 33 Dec 31, 2022
GitScanner is a script to make it easy to search for Exposed Git through an advanced Google search.

GitScanner Legal disclaimer Usage of GitScanner for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to

Kaio Gomes 3 Oct 28, 2022
A library for fast import of Windows NT Registry(REGF) into Elasticsearch.

A library for fast import of Windows NT Registry(REGF) into Elasticsearch.

S.Nakano 3 Apr 01, 2022
Inverted index creation and query search mechanism on Wikipedia pages.

WikiPedia Search Engine Step 1 : Installing Requirements Install "stemming" module for python using pip. Step 2 : Parsing the Data To parse the data,

Piyush Atri 1 Nov 27, 2021
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Dec 30, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format

show-ip-arp-mac-lookup Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format What it does: Takes a text file with t

Stew Alexander 0 Dec 24, 2022
Python Elasticsearch handler for the standard python logging framework

Python Elasticsearch Log handler This library provides an Elasticsearch logging appender compatible with the python standard logging library. This lib

Mohammed Mousa 0 Dec 08, 2021
PwnWiki 数据库搜索命令行工具;该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目

PWSearch PwnWiki 数据库搜索命令行工具。该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目。

K4YT3X 72 Dec 20, 2022
a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot

Drive Search Bot This is a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot How to deploy? Clone this repo: git clone

Hafitz Setya 25 Dec 09, 2022
TG-searcherBot - Search any channel/chat from keyword

TG-searcherBot Search any channel/chat from keyword. Commands /start - Starts th

TechiError 12 Nov 04, 2022
Yuno is context based search engine for anime.

Yuno yuno.mp4 Table of Contents Introduction Power Of Yuno Try Yuno How Yuno was created? References Introduction Yuno is a context based search engin

IAmParadox 354 Dec 19, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
Free and Open, Distributed, RESTful Search Engine

Elasticsearch Elasticsearch is the distributed, RESTful search and analytics engine at the heart of the Elastic Stack. You can use Elasticsearch to st

elastic 62.4k Jan 08, 2023
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
Image search service based on imgsmlr extension of PostgreSQL. Support image search by image.

imgsmlr-server Image search service based on imgsmlr extension of PostgreSQL. Support image search by image. This is a sample application of imgsmlr.

jie 45 Dec 12, 2022