A tutorial presents several practical examples of how to build DAGs in Apache Airflow

Overview

Apache Airflow - Python Brasil 2021

Este tutorial apresenta vários exemplos práticos de como construir DAGs no Apache Airflow.

Background

Apache Airflow é uma das principais ferramentas de orquestração de workflows, onde você define as tarefas como Directed Acyclic Graphs (DAGs). O Airflow permite que você construa pipelines de dados escrevendo apenas códigos Python. Quando os workflows são definidos como código, eles se tornam manuteníveis, versionáveis, testáveis e colaborativos.

Rodando localmente com Pyenv

Você vai precisar de um ambiente virtual com python 3.6+ (recomendamos o 3.9).

Pyenv

Caso não tenha instalado na maquina, você pode usar o pyenv para ter multiplas versoes do python e criar seu ambiente virtual com ele. Siga a documentação oficial para instalar o pyenv na sua máquina:

Instale o Pyhton 3.9:

$ pip install 3.9.7
$ pyenv virtualenv 3.9.7 pybr-airflow
$ pyenv local pybr-airflow

Caso você não tenha o pip instalado, instale ele na sua máquina seguindo o tutorial abaixo:

Instalando o Airflow

Depois do ambiente virtual instalado, você vai precisar do apache-airflow e do apache-airflow-providers-docker instalados. Você pode fazer assim:

$ pip install apache-airflow apache-airflow-providers-docker

Depois você precisa configurar o airflow; para isso siga estes passos:

$ airflow db init
$ airflow users create --username=admin --firstname test --lastname test --role Admin --email [email protected]

Agora você pode rodar o airflow; para isso execute o seguinte comando:

$ airflow webserver -p 8081

Agora acesse a seguinte URL: http://localhost:8081.

Troubleshooting: Airflow não sendo reconhecido

Caso o comando do airflow não tiver sendo reconhecido, verifique se o ~/.local/bin na sua variável de ambiente PATH está configurada corretamente:

PATH=$PATH:~/.local/bin

Você também pode iniciar o Airflow com:

$ python -m airflow

Rodando localmente com Docker Compose

Pré-requisitos

Para rodar localmente é necessário, você atender aos seguintes pré-requisitos:

  • Instalar o Docker Community Edition (CE) na sua máquina (link de instalação aqui). É recomendável que sua máquina tenha ao menos 4GB de RAM livres.
  • Instalar o Docker Compose v1.29.1 ou alguma versão mais nova na sua máquina (link de instalação aqui).

Iniciar o ambiente

Para iniciar o ambiente, basta executar o comando abaixo:

make start-airflow

Destruir o ambiente

Para limpar o ambiente, basta executar o seguinte comando:

make reset-airflow

Owner
Jusbrasil
Jusbrasil
About Python's multithreading and GIL

About Python's multithreading and GIL

Souvik Ghosh 3 Mar 01, 2022
A simple projects to help your seo optimizing has been written with python

python-seo-projects it is a very simple projects to help your seo optimizing has been written with python broken link checker with python(it will give

Amirmohammad Razmy 3 Dec 25, 2021
Curses frontend for Canto daemon

Canto Curses The curses (text) client for canto-daemon. Canto-daemon is required to work and is found at: http://github.com/themoken/canto-next Requir

Jack Miller 86 Dec 28, 2022
Earth-to-orbit ballistic trajectories with atmospheric resistance

Earth-to-orbit ballistic trajectories with atmospheric resistance Overview Space guns are a theoretical technology that reduces the cost of getting bu

1 Dec 03, 2021
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Introduction The PPCI (Pure Python Compiler Infrastructure) project is a compiler written entirely in the Python programming language. It contains fro

Windel Bouwman 277 Dec 26, 2022
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text. Diff: Compare two blocks o

Google 5.9k Dec 30, 2022
Visual Python and C++ nanosecond profiler, logger, tests enabler

Look into Palanteer and get an omniscient view of your program Palanteer is a set of lean and efficient tools to improve the quality of software, for

Damien Feneyrou 1.9k Dec 26, 2022
Simple python bot, that notifies about new manga chapters through Telegram.

Simple python bot, that notifies about new manga chapters through Telegram.

Dmitry Kopturov 1 Dec 05, 2021
System Design Assignments as part of Arpit's System Design Masterclass

System Design Assignments The repository contains a set of problem statements around Software Architecture and System Design as conducted by Arpit's S

Relog 1.1k Jan 09, 2023
Zotero references script (and app)

A little script (and PyInstaller build) for a very specific, somewhat hack-ish purpose: managing and exporting project references with Zotero and its API.

Marius Rödder 0 Dec 05, 2021
E5自动续期

AutoApi v6.3 (2021-2-18) ———— E5自动续期 AutoApi系列: AutoApi(v1.0) 、 AutoApiSecret(v2.0) 、 AutoApiSR(v3.0) 、 AutoApiS(v4.0) 、 AutoApiP(v5.0) 说明 E5自动续期程序,但是

34 Feb 20, 2021
Gives criticality score for an open source project

Open Source Project Criticality Score (Beta) This project is maintained by members of the Securing Critical Projects WG. Goals Generate a criticality

Open Source Security Foundation (OpenSSF) 1.1k Dec 23, 2022
A small Python library which gives you the IEEE-754 representation of a floating point number.

ieee754 ieee754 is small Python library which gives you the IEEE-754 representation of a floating point number. You can specify a precision given in t

Bora Canbula 5 Dec 20, 2022
Generic NDJSON importer for hashlookup server

Generic NDJSON importer for hashlookup server Usage usage: hashlookup-json-importer.py [-h] [-v] [-s SOURCE] [-p PARENT] [--parent-meta PARENT_META [P

hashlookup 2 Jan 19, 2022
Materials and information for my PyCascades 2021 Presentation

Materials and information for PyCascades 2021 Presentation: Sparking Creativity in LED Art with CircuitPython

GeekMomProjects 19 May 04, 2022
The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

Firebasky 14 Aug 02, 2021
Collection of Beginner to Intermediate level Python scripts contributed by members and participants.

Hacktoberfest2021-Python Hello there! This repository contains a 'Collection of Beginner to Intermediate level Python projects', created specially for

12 May 25, 2022
A reproduction repo for a Scheduling bug in AirFlow 2.2.3

A reproduction repo for a Scheduling bug in AirFlow 2.2.3

Ilya Strelnikov 1 Feb 09, 2022
A refresher for PowerBI Desktop documents

PowerBI_Refresher-NPP Informació Per executar el programa s'ha de tenir instalat el python versio 3 o mes. Requeriments a requirements.txt. El fitxer

Nil Pujol 1 May 02, 2022
Types for the Rasterio package

types-rasterio Types for the rasterio package A work in progress Install Not yet published to PyPI pip install types-rasterio These type definitions

Kyle Barron 7 Sep 10, 2021