Search for documents in a domain through Google. The objective is to extract metadata

Last update: Dec 16, 2022

Related tags

Overview

MetaFinder - Metadata search through Google

   _____               __             ___________ .__               .___                   
  /     \     ____   _/  |_  _____    \_   _____/ |__|   ____     __| _/   ____   _______  
 /  \ /  \  _/ __ \  \   __\ \__  \    |    __)   |  |  /    \   / __ |  _/ __ \  \_  __ \ 
/    Y    \ \  ___/   |  |    / __ \_  |     \    |  | |   |  \ / /_/ |  \  ___/   |  | \/ 
\____|__  /  \___  >  |__|   (____  /  \___  /    |__| |___|  / \____ |   \___  >  |__|    
        \/       \/               \/       \/               \/       \/       \/          
        
|_ Author: @JosueEncinar
|_ Description: Search for documents in a domain through Google. The objective is to extract metadata
|_ Usage: python3 metafinder.py -d domain.com -l 100 -o /tmp

Installation:

> pip3 install metafinder

Upgrades are also available using:

> pip3 install metafinder --upgrade

Usage

CLI

metafinder -d domain.com -l 20 -o folder [-t 10] [-v]

Parameters:

d: Specifies the target domain.
l: Specify the maximum number of results to be searched.
o: Specify the path to save the report.
t: Optional. Used to configure the threads (4 by default).
v: Optional. It is used to display the results on the screen as well.

In Code

import metafinder.extractor as metadata_extractor

documents_limit = 5
domain = "target_domain"
data = metadata_extractor.extract_metadata_from_google_search(domain, documents_limit)
for k,v in data.items():
    print(f"{k}:")
    print(f"|_ URL: {v['url']}")
    for metadata,value in v['metadata'].items():
        print(f"|__ {metadata}: {value}")

document_name = "test.pdf"
try:
    metadata_file = metadata_extractor.extract_metadata_from_document(document_name)
    for k,v in metadata_file.items():
        print(f"{k}: {v}")
except FileNotFoundError:
    print("File not found")

Author

This project has been developed by:

Josué Encinar García -- @JosueEncinar

Contributors

Félix Brezo Fernández -- @febrezo

Disclaimer!

This Software has been developed for teaching purposes and for use with permission of a potential target. The author is not responsible for any illegitimate use.

Search for documents in a domain through Google. The objective is to extract metadata

Related tags

Overview

MetaFinder - Metadata search through Google

Installation:

Usage

CLI

In Code

Author

Contributors

Disclaimer!

Owner

Josué Encinar

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Code voor mijn Master project omtrent VideoBERT

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Trex is a tool to match semantically similar functions based on transfer learning.

Toward Model Interpretability in Medical NLP

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

Stanford CoreNLP provides a set of natural language analysis tools written in Java

A demo of chinese asr

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Just a Basic like Language for Zeno INC

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

Simple NLP based project without any use of AI

초성 해석기 based on ko-BART

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

Command Line Text-To-Speech using Google TTS

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Train BPE with fastBPE, and load to Huggingface Tokenizer.

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text