whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Black for Python docstrings and reStructuredText (rst).

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Get list of common stop words in various languages in Python

Tool to check whether a GCP bucket is public or not.

Pretrain CPM - 大规模预训练语言模型的预训练代码

Open source annotation tool for machine learning practitioners.

Py65 65816 - Add support for the 65C816 to py65

State of the Art Natural Language Processing

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Shellcode antivirus evasion framework

Google's Meena transformer chatbot implementation

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Shared code for training sentence embeddings with Flax / JAX