Skip to content

Use asynchronous execution with threads on ETL for fast data loading

Current ETL uses one single thread that is why it takes so long.

We can use asynchronous execution with threads the same why we use in S4A to load the spectral data:

https://git.wur.nl/isric/soils4africa/database/-/blob/main/etl/etl.py?ref_type=heads#L88

Some logic must change but its the same principle as the S4A example.

We need to use concurrent.futures and ThreadedConnectionPool from psycopg2.pool check here

We can try something between 3 to 5 threads as a start.

This is not priority. 😄

Edited by Calisto, Luis