Frictionless DarwinCore pages

View on GitHub

Date: 30 Nov 2019 Title: Programming

Today we’ll write a simple Python program on top of FrictionlessDarwinCore and Goodtables libraries. The use case is the following: you want to validate a DarwinCore file using Goodtables tool. In a previous post, we’ve seen how to do that in two steps from a terminal:

$ fdwca 0000637-190918142434337.zip download_DP.zip
$ goodtables download_DP.zip

Now we’ll do the same but in a single Python CLI program(dwca_report.py).

1. Imports

Our program requires 3 libraries:

Start a Python file (eg dwca_report.py) and write this:

import click

from FrictionlessDarwinCore import DwCArchive
from goodtables import validate

2. CLI Parameters

Our program will receive two parameters:

The report itself being generated on default output. Thanks to Click library we can easily define these parameters:

@click.argument('dwca', type=str, required=True)
@click.argument('outpath', type=click.Path(writable=True),required=True)
def do_it(dwca, outpath):

if __name__ == '__main__':

Our main program will simply invoke a do_it() function. Note that the dwca parameter is a string (URL or path) while outpath paramter is a writable path.

3. Do_it()

To produce the report, do_it() function will successively:

Each step may fail, so we will check that everything went well before processing the next step. At the core of our program, the do_it() function, will now look like this:

def do_it(dwca, outpath):
  da = DwCArchive(dwca)
  if not da.valid:
    print('invalid dwca after init()')
  if not da.valid:
    print('invalid dwca after infer()')
  if not da.valid:
    print('invalid dwca after save()')
  report = validate(outpath)

Alternatively, you might consider using try except.

4. From your terminal

Now, we can run our CLI program directly from a terminal, analyse the report and if necessary fix the data errors.

> Python dwca_report.py https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles DP1.zip
downloading https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles as /var/folders/19/mrkkz1tx34l98n1rs3qr3xr00000gn/T/tmpfao8xjbj
Adding additional_id_field
saving zipfile to DP.zip
{'time': 0.712, 'valid': True, 'error-count': 0, 'table-count': 1, 'tables': [{'datapackage': 'DP.zip', 'time': 0.536, 'valid': True, 'error-count': 0, ...}