A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
The Sue Gray Alert System was a 5 minute project that just beeps every time a new article is updated or published on Gov.UK's news pages.

The Sue Gray Alert System was a 5 minute project that just beeps every time a new article is updated or published on Gov.UK's news pages.

Dafydd 1 Jan 31, 2022
Extend the commitizen tools to create conventional commits and README that link to Jira and GitHub.

cz-github-jira-conventional cz-github-jira-conventional is a plugin for the commitizen tools, a toolset that helps you to create conventional commit m

12 Dec 13, 2022
Send GitHub Issues, PRs or Discussions Updates to Wechat

Send GitHub Issues, PRs or Discussions Updates to Wechat

Hollow Man 2 Jul 12, 2022
Azure Neural Speech Service TTS

Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file. Azure Neural Voices Text-To-Speech enables flu

Rodney 4 Dec 14, 2022
The official source code for Ghost Discord selfbot.

👻 Ghost Selfbot The official code for Ghost which was recently discontinued and released to the public. Feel free to use any of the code found in thi

Ghost 121 Nov 09, 2022
Telegram Voice Chat Music Player UserBot Written with Pyrogram Smart Plugin and tgcalls

Telegram Voice Chat UserBot A Telegram UserBot to Play Audio in Voice Chats. This is also the source code of the userbot which is being used for playi

Dash Eclipse 7 May 21, 2022
Auto-commiter - Auto commiter Github

auto committer Github Follow the steps below to use this repository: 1-install c

Arman Ebtekari 8 Nov 14, 2022
It connects to Telegram's API. It generates JSON files containing channel's data, including channel's information and posts.

It connects to Telegram's API. It generates JSON files containing channel's data, including channel's information and posts. You can search for a specific channel, or a set of channels provided in a

Esteban Ponce de Leon 75 Jan 02, 2023
GitNews: Github webhooks for Telegram

GitNews - Github webhooks for Telegram Setup: server: clone repo git clone https

Druv Jagdish 1 Feb 14, 2022
Solves bombcrypto newest captcha

Solves Bombcrypto newest captcha A very compact implementation using just cv2 and ctypes, ready to be deployed to your own project. How does it work I

19 May 06, 2022
Some 3Commas helper bots, AltRank, GalaxyScore, Watchlist, Auto-Compound

3Commas Cyber Bot Helpers A collection of 3Commas bot helpers I wrote. (collection will grow over time) Disclaimer THE SOFTWARE IS PROVIDED "AS IS", W

Ron Klinkien 176 Jan 02, 2023
An example Music Bot written in Disnake and uses slash commands to operate.

Music Bot An example music bot that is written in Disnake [Maintained discord.py Fork] Disnake Disnake is a maintained and updated fork of discord.py.

6 Jan 08, 2022
Clippin n grafting Backend

Clipping' n Grafting Presenting you, 🎉 Clippin' n Grafting 🎉 , your very own ecommerce website displaying all your artsy-craftsy stuff. Not only the

Google-Developer-Student-Club-ISquareIT (GDSC I²IT) 2 Oct 22, 2021
This is a telegram bot built using the Oxford Dictionary API

Oxford Dictionaries Telegram Bot This is a telegram bot built using the Oxford Dictionary API Source: Oxford Dictionaries API Documentation Install En

Abhijith N T 2 Mar 18, 2022
Aws-cidr-finder - A Python CLI tool for finding unused CIDR blocks in AWS VPCs

aws-cidr-finder Overview An Example Installation Configuration Contributing Over

Cooper Walbrun 18 Jul 31, 2022
File-sharing-Bot: Telegram Bot to store Posts and Documents and it can Access by Special Links.

Bromélia HSS bromelia-hss is the second official implementation of a Diameter-based protocol application by using the Python micro framework Bromélia.

1 Dec 17, 2021
Discord Webhook Spammer (fastest)

Discord Webhook Spammer A simple fast asynchronous webhook spammer. Spammer Features Fast message spamming. Controllable speed. Noob friendly. Usage N

Varient 2 Apr 22, 2022
Using multiple API sources, create an app that allows users to filter through random locations based on their temperature range choices.

World_weather_analysis Overview Using multiple API sources, create an app that allows users to filter through random locations based on their temperat

Jason Boyer 2 Sep 16, 2022
Cedric Owens 16 Sep 27, 2022