A Python DB-API and SQLAlchemy dialect to Google Spreasheets

Overview

Build Status codecov

Note: shillelagh is a drop-in replacement for gsheets-db-api, with many additional features. You should use it instead. If you're using SQLAlchemy all you need to do:

$ pip uninstall gsheetsdb
$ pip install shillelagh

If you're using the DB API:

# from gsheetsdb import connect
from shillelagh.backends.apsw.db import connect

A Python DB API 2.0 for Google Spreadsheets

This module allows you to query Google Spreadsheets using SQL.

Using this spreadsheet as an example:

A B
1 country cnt
2 BR 1
3 BR 3
4 IN 5

Here's a simple query using the Python API:

from gsheetsdb import connect

conn = connect()
result = conn.execute("""
    SELECT
        country
      , SUM(cnt)
    FROM
        "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
    GROUP BY
        country
""", headers=1)
for row in result:
    print(row)

This will print:

Row(country='BR', sum_cnt=4.0)
Row(country='IN', sum_cnt=5.0)

How it works

Transpiling

Google spreadsheets can actually be queried with a very limited SQL API. This module will transpile the SQL query into a simpler query that the API understands. Eg, the query above would be translated to:

SELECT A, SUM(B) GROUP BY A

Processors

In addition to transpiling, this module also provides pre- and post-processors. The pre-processors add more columns to the query, and the post-processors build the actual result from those extra columns. Eg, COUNT(*) is not supported, so the following query:

SELECT COUNT(*) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"

Gets translated to:

SELECT COUNT(A), COUNT(B)

And then the maximum count is returned. This assumes that at least one column has no NULLs.

SQLite

When a query can't be expressed, the module will issue a SELECT *, load the data into an in-memory SQLite table, and execute the query in SQLite. This is obviously inneficient, since all data has to be downloaded, but ensures that all queries succeed.

Installation

$ pip install gsheetsdb
$ pip install gsheetsdb[cli]         # if you want to use the CLI
$ pip install gsheetsdb[sqlalchemy]  # if you want to use it with SQLAlchemy

CLI

The module will install an executable called gsheetsdb:

$ gsheetsdb --headers=1
> SELECT * FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
country      cnt
---------  -----
BR             1
BR             3
IN             5
> SELECT country, SUM(cnt) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1
pscv8ZXPtg8/" GROUP BY country
country      sum cnt
---------  ---------
BR                 4
IN                 5
>

SQLAlchemy support

This module provides a SQLAlchemy dialect. You don't need to specify a URL, since the spreadsheet is extracted from the FROM clause:

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('gsheets://')
inspector = inspect(engine)

table = Table(
    'https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0',
    MetaData(bind=engine),
    autoload=True)
query = select([func.count(table.columns.country)], from_obj=table)
print(query.scalar())  # prints 3.0

Alternatively, you can initialize the engine with a "catalog". The catalog is a Google spreadsheet where each row points to another Google spreadsheet, with URL, number of headers and schema as the columns. You can see an example here:

A B C
1 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0 1 default
2 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1077884006 2 default

This will make the two spreadsheets above available as "tables" in the default schema.

Authentication

You can access spreadsheets that are shared only within an organization. In order to do this, first create a service account. Make sure you select "Enable G Suite Domain-wide Delegation". Download the key as a JSON file.

Next, you need to manage API client access at https://admin.google.com/${DOMAIN}/AdminHome?chromeless=1#OGX:ManageOauthClients. Add the "Unique ID" from the previous step as the "Client Name", and add https://spreadsheets.google.com/feeds as the scope.

Now, when creating the connection from the DB API or from SQLAlchemy you can point to the JSON file and the user you want to impersonate:

>>> auth = {'service_account_file': '/path/to/certificate.json', 'subject': '[email protected]'}
>>> conn = connect(auth)
Owner
Beto Dealmeida
Writing open source software since 2003.
Beto Dealmeida
A simple wrapper to make a flat file drop in raplacement for mongodb out of TinyDB

Purpose A simple wrapper to make a drop in replacement for mongodb out of tinydb. This module is an attempt to add an interface familiar to those curr

180 Jan 01, 2023
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

High Performance, Kubernetes Native Object Storage 582 Dec 28, 2022
This repository is for active development of the Azure SDK for Python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

Microsoft Azure 3.4k Jan 02, 2023
Py2neo is a client library and toolkit for working with Neo4j from within Python

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications. The library supports both Bolt and HTTP and prov

py2neo.org 1.2k Jan 02, 2023
PyRemoteSQL is a python SQL client that allows you to connect to your remote server with phpMyAdmin installed.

PyRemoteSQL Python MySQL remote client Basically this is a python SQL client that allows you to connect to your remote server with phpMyAdmin installe

ProbablyX 3 Nov 04, 2022
A SQL linter and auto-formatter for Humans

The SQL Linter for Humans SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with

SQLFluff 5.5k Jan 08, 2023
A selection of SQLite3 databases to practice querying from.

Dummy SQL Databases This is a collection of dummy SQLite3 databases, for learning and practicing SQL querying, generated with the VS Code extension Ge

1 Feb 26, 2022
Use SQL query in a jupyter notebook!

SQL-query Use SQL query in a jupyter notebook! The table I used can be found on UN Data. Or you can just click the link and download the file undata_s

Chuqin 2 Oct 05, 2022
A Telegram Bot to manage Redis Database.

A Telegram Bot to manage Redis database. Direct deploy on heroku Manual Deployment python3, git is required Clone repo git clone https://github.com/bu

Amit Sharma 4 Oct 21, 2022
PostgreSQL database access simplified

Queries: PostgreSQL Simplified Queries is a BSD licensed opinionated wrapper of the psycopg2 library for interacting with PostgreSQL. The popular psyc

Gavin M. Roy 251 Oct 25, 2022
A tutorial designed to introduce you to SQlite 3 database using python

SQLite3-python-tutorial A tutorial designed to introduce you to SQlite 3 database using python What is SQLite? SQLite is an in-process library that im

0 Dec 28, 2021
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
New generation PostgreSQL database adapter for the Python programming language

Psycopg 3 -- PostgreSQL database adapter for Python Psycopg 3 is a modern implementation of a PostgreSQL adapter for Python. Installation Quick versio

The Psycopg Team 880 Jan 08, 2023
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
AWS SDK for Python

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

the boto project 7.8k Jan 04, 2023
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily

Nils Braun 271 Dec 30, 2022
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023
A tool to snapshot sqlite databases you don't own

The core here is my first attempt at a solution of this, combining ideas from browser_history.py and karlicoss/HPI/sqlite.py to create a library/CLI tool to (as safely as possible) copy databases whi

Sean Breckenridge 10 Dec 22, 2022