Accuracy of BBC Weather forecasts for Honolulu

This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, there's a GitHub Action that runs at each 30 minute mark and saves the latest forecasts. The data is stored in a separate branch called data. Therefore, the data is versioned. This allows going back into the past to see the forecasts that were made for any given hour in the (relative) future.

I made this after watching Git scraping, the five minute lightning talk by Simon Willison. It blew my mind! I agree with Simon that collecting and versioning API data via git is a powerful pattern. You could use this pattern to keep a ledger of any dynamic forecasting system, such as the predicted outcomes of football games. In these dynamical systems, the forecasts are updated when new information becomes available. Therefore, the forecasted values depend on the point in time when they were made. I think that it's super interesting to analyse how these forecasts evolve through time.

The build_database.py script iterates through all the commits in the data branch and consolidates the data into an SQLite database. You can run the script yourself by simply cloning this repository. Then, go into a terminal, navigate to the cloned repository, and install the necessary Python dependencies:

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Then, run the consolidation script:

python build_database.py

This will create a bbc_weather.sqlite file. You can load the latter into your preferred database access tool — I have a personal preference for DataGrip — to analyse the data. At present, the database contains two tables:

`forecasts`

These are the predicted weather values made at one point in time for a future point in time.

issued_at	at	celsius	feels_like_celsius	wind_speed_kph
2021-03-10 09:00:00	2021-03-10 11:00:00	24	30	16
2021-03-10 09:00:00	2021-03-10 12:00:00	25	31	17
2021-03-10 09:00:00	2021-03-10 13:00:00	26	32	17
2021-03-10 09:00:00	2021-03-10 14:00:00	27	33	17
2021-03-10 09:00:00	2021-03-10 15:00:00	26	33	17

`observations`

These are the weather values that actually occurred — as opposed to those that were forecasted.

at	celsius	wind_speed_kph
2021-03-09 19:00:00	23	0
2021-03-09 20:00:00	22	8
2021-03-09 21:00:00	22	0
2021-03-09 22:00:00	21	9
2021-03-09 23:00:00	21	0

Check out measure_accuracy.sql for an example of how to evaluate the correctness of the forecasts.

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Related tags

Overview

Accuracy of BBC Weather forecasts for Honolulu

`forecasts`

`observations`

Owner

Max Halford

Official Stanford NLP Python Library for Many Human Languages

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

An automated program that helps customers of Pizza Palour place their pizza orders

Codes for processing meeting summarization datasets AMI and ICSI.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Code examples for my Write Better Python Code series on YouTube.

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

A BERT-based reverse dictionary of Korean proverbs

Mednlp - Medical natural language parsing and utility library

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

CredData is a set of files including credentials in open source projects

Python port of Google's libphonenumber

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)