Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Overview

FeatureHasher

Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Note, this requires Jina>=2.2.4.

Example

Here I use FeatureHasher to hash each sentence of Pride and Prejudice into a 128-dim vector, and then use .match to find top-K similar sentences.

from jina import Document, DocumentArray, Flow

# load 
   
d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').convert_uri_to_text()

# cut into non-empty sentences store in a DA
da = DocumentArray(Document(text=s.strip()) for s in d.text.split('\n') if s.strip())

# use FeatureHasher in a Flow
f = Flow().add(uses='jinahub://FeatureHasher')

embed_da = DocumentArray()
with f:
    f.post('/', da, on_done=lambda req: embed_da.extend(req.docs), show_progress=True)

print('self-matching...')
embed_da.match(embed_da, exclude_self=True, limit=5, normalization=(1, 0))
print('total sentences: ', len(embed_da))
for d in embed_da:
    print(d.text)
    for m in d.matches:
        print(m.scores['cosine'], m.text)
    input()
           [email protected][I]:🎉 Flow is ready to use!
	🔗 Protocol: 		GRPC
	🏠 Local access:	0.0.0.0:52628
	🔒 Private network:	192.168.178.31:52628
	🌐 Public address:	217.70.138.123:52628
⠹       DONE ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01 100% ETA: 0 seconds 40 steps done in 1 second
total sentences:  12153
The Project Gutenberg eBook of Pride and Prejudice, by Jane Austen

   
     *** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

    
      *** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

     
       production, promotion and distribution of Project Gutenberg-tm

      
        Pride and Prejudice

       
         By Jane Austen This eBook is for the use of anyone anywhere in the United States and 
        
          This eBook is for the use of anyone anywhere in the United States and 
         
           by the awkwardness of the application, and at length wholly 
          
            Elizabeth passed the chief of the night in her sister’s room, and 
           
             the happiest memories in the world. Nothing of the past was 
            
              charities and charitable donations in all 50 states of the United 
            
           
          
         
        
       
      
     
    
   

In practice, you can implement matching and storing via an indexer inside Flow. This example is only for demo purpose so any non-feature hashing related ops are implemented outside the Flow to avoid distraction.

Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
An ARP Spoofer attacker for windows to block away devices from your network.

arp0_attacker An ARP Spoofer-attacker for Windows -OS to block away devices from your network. INFO Built in Python 3.8.2. arp0_attackerx.py is Upgrad

Wh0_ 15 Mar 17, 2022
Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10

CVE-2021-29440 Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10 Grav is a file based Web-platform. Twig processing of static p

Enox 6 Oct 10, 2022
pwncat module that automatically exploits CVE-2021-4034 (pwnkit)

pwncat_pwnkit Introduction The purpose of this module is to attempt to exploit CVE-2021-4034 (pwnkit) on a target when using pwncat. There is no need

Dana Epp 33 Jul 01, 2022
This repository consists of the python scripts for execution and automation of vivid tasks.

Scripting.py is a repository being maintained to keep log of the python scripts that I create for automating and executing some of my boring manual task.

Prakriti Regmi 1 Feb 07, 2022
Course: Information Security with Python

Curso: Segurança da Informação com Python Curso realizado atravès da Plataforma da Digital Innovation One Prof: Bruno Dias Conteúdo: Introdução aos co

Elizeu Barbosa Abreu 1 Nov 28, 2021
⛤Keylogger Generator for Windows written in Python⛤

⛤Keylogger Generator for Windows written in Python⛤

FZGbzuw412 33 Nov 24, 2022
HTTP security headers for Flask

Talisman: HTTP security headers for Flask Talisman is a small Flask extension that handles setting HTTP headers that can help protect against a few co

Google Cloud Platform 854 Dec 30, 2022
Mass Check Vulnerable Log4j CVE-2021-44228

Log4j-CVE-2021-44228 Mass Check Vulnerable Log4j CVE-2021-44228 Introduction Actually I just checked via Vulnerable Application from https://github.co

Justakazh 6 Dec 28, 2022
Lite - Lite cracker tool for python

Wellcome to tools Results Install Tools

Jeeck X Nano 23 Dec 17, 2022
自动化爆破子域名,并遍历所有端口寻找http服务,并使用crawlergo、dirsearch、xray等工具扫描并集成报告;支持动态添加扫描到的域名至任务;

AutoScanner AutoScanner是什么 AutoScanner是一款自动化扫描器,其功能主要是遍历所有子域名、及遍历主机所有端口寻找出所有http服务,并使用集成的工具进行扫描,最后集成扫描报告; 工具目前有:oneforall、masscan、nmap、crawlergo、dirse

633 Dec 30, 2022
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra

21 Sep 24, 2022
Apk Framework Detector

🚀🚀🚀Program helps you to detect the major framework or technology used in writing any android app. Just provide the apk 😇😇

Daniel Agyapong 10 Dec 07, 2022
Microsoft Exchange Server SSRF漏洞(CVE-2021-26855)

Microsoft_Exchange_Server_SSRF_CVE-2021-26855 zoomeye dork:app:"Microsoft Exchange Server" 使用Seebug工具箱及pocsuite3编写的脚本Microsoft_Exchange_Server_SSRF_CV

conjojo 37 Nov 12, 2022
Confluence Server Webwork OGNL injection

CVE-2021-26084 - Confluence Server Webwork OGNL injection An OGNL injection vulnerability exists that would allow an authenticated user and in some in

Fellipe Oliveira 295 Jan 06, 2023
WebLogic T3/IIOP RCE ExternalizableHelper.class of coherence.jar

CVE-2020-14756 WebLogic T3/IIOP RCE ExternalizableHelper.class of coherence.jar README project base on https://github.com/Y4er/CVE-2020-2555 and weblo

Y4er 77 Dec 06, 2022
A honey token manager and alert system for AWS.

SpaceSiren SpaceSiren is a honey token manager and alert system for AWS. With this fully serverless application, you can create and manage honey token

287 Nov 09, 2022
Web-eyes - OSINT tools for website research

WEB-EYES V1.0 web-eyes: OSINT tools for website research, 14 research methods ar

8 Nov 10, 2022
Multi Brute Force Facebook - Crack Facebook With Login - Free For Now

✭ SAKERA CRACK Made With ❤️ By Denventa, Araya, Dapunta Author: - Denventa - Araya Dev - Dapunta Khurayra X ⇨ Fitur Login [✯] Login Cookies ⇨ Ins

Dapunta ID 26 Jan 01, 2023
A web-app helping to create strong passwords that are easy to remember.

This is a simple Web-App that demonstrates a method of creating strong passwords that are still easy to remember. It also provides time estimates how long it would take an attacker to crack a passwor

2 Jun 04, 2021
A python package with tools to read and postprocess the output of the channel DNS-solver (davecats/channel), as well as its associated postprocessing tools.

Python tools for davecats/channel A python package with tools to read and postprocess the output of the channel dns solver, as well as its associated

Andrea Andreolli 1 Dec 13, 2021