Commit d1b46c2e authored by Batjes, Niels's avatar Batjes, Niels
Browse files

Upload New File

parent 2cbf8d41
---
title: "Accessing WoSIS from R"
author: "D G Rossiter"
date: "`r Sys.Date()`"
output:
html_document:
toc: TRUE
toc_float: TRUE
theme: "spacelab"
number_section: TRUE
fig_height: 4
fig_width: 6
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This document shows how to access WoSIS Web Feature Service (WFS) data from R.
For an overview of WoSIS, see https://www.isric.org/explore/wosis. This links to https://www.isric.org/explore/wosis/accessing-wosis-derived-datasets which explains the difference between snapshot and dynamic datasets, and how to access them. The Procedures Manual https://www.isric.org/sites/default/files/isric_report_2018_01.pdf explains how to access the database in various ways; here we expand their brief description of using R for this purpose
# Packages
If you do not have these on your system, install with `install.packages(..., dependencies=TRUE)` or via the R Studio package manager.
```{r}
library(sp) # spatial data types
library(rgdal) # GDAL access from R
library(gdalUtils) # wrappers for GDAL utility programs that could be
# called from the command line
library(aqp) # Algorithms for Quantitative Pedology
```
GDAL is used for spatial data import/export, coordinate systems etc. Check for a valid GDAL installation:
```{r}
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if (valid_install)
print("Valid GDAL found") else
{print("No valid GDAL"); quit()}
```
# Accessing WoSIS
## Specify WoSIS WFS location and list available layers
Here we get the latest (dynamic) version; if we want the static snapshot substitute `wosis_snapshot` for `wosis_latest`
```{r}
#wfs <- "WFS:http://data.isric.org/geoserver/wosis_latest/wfs/"
wfs <- "http://maps.isric.org/mapserv?map=/map/wosis_latest.map"
layers <- ogrinfo(wfs, ro=TRUE, so=TRUE) # `ro` = read-only; `so` = summary only
length(layers)
head(layers)
```
There are `r length(layers)` layers. Their names can be listed here with `print(layers)`; these are also given in the references above.
## Display layer properties
We need to know the layer's format, extent etc. before we read it. Select the first property in the information list above. Again call `ogrinfo` but this time also specifying a layer within the WFS.
```{r}
ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_bdfi33")
```
We see this is "Bulk density of the fine earth fraction < 2 mm, equilibrated at 33 kPa". Refer to the procedures manual for how each property (here, bulk density) was determined. We see the feature count (about 76k), its geometry (points, as is everything in WoSIS), its Coordinate Reference System (CRS, listed here as `GEOCGS` under `SRS` = "Spatial Reference System", `WKT` = "Well-Known Type"), which is EPSG:4326.
This function allows SQL queries to limit the extent of the search, by using the (optional) `where` argument. To use this we need to know the field names, which we saw in the previous output.
For example, all the subsoil bulk densities from India:
```{r}
ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_bdfi33",
where="country_name='India' AND upper_depth > 100")
```
Here there are only 32 records.
The `ogrinfo` function also allows an (optional) limitation to a bounding box with the `spat` argument, as a four-element vector `(xmin, ymin, xmax, ymax)`.
For example, all the profile (site) information from a $2^\circ \times 2^\circ$ tile in central Europe:
```{r}
ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_profiles",
spat=c(6, 48, 8, 50))[11]
```
There are 26 profiles from the area.
# WoSIS layers as ESRI shapefiles.
One format provided by WoSIS is the ESRI shapefile.
## Import to client system
There seems to be no way to directly import from the WFS to an R workspace object, so there must first be an intermediate step: download the WFS layer to a local directory, and then import as usual for GIS layers.
The `ogr2ogr` function reads from one format on the server and writes to another in our client. The default output file format is an ESRI Shapefile; another format can be specified with the (optional) `f` argument.
The possible formats are listed at https://docs.geoserver.org/latest/en/user/services/wfs/outputformats.html. Here we will use the default shapefile and also CSV in the next section. Another popular format is JSON "JavaScript Object Notation".
The `where` and `spat` arguments can also be used here.
The `ogr2ogr` function also allows an (optional) transformation to any EPSG-defined CRS with the `t_srs` argument.
See `?ogr2ogr` for many more options.
We download the first property, i.e., bulk density, and save it a subdirectory of the current path, and with the default format. The directory must be first created if necessary.
```{r}
wosis.dir.name <- "./wosis"
if (!file.exists(wosis.dir.name)) dir.create(wosis.dir.name)
layer.name <- "wosis_latest_bdfi33"
print(wfs)
ogr2ogr(src=wfs,
dst=wosis.dir.name,
layer=layer.name,
f="ESRI Shapefile", # this is default, but specify it anyway
overwrite=TRUE)
```
Because the default destination format is `"ESRI Shapefile"` many of the field names have to be shortened.
## Reading the shapefile into R
Now read the downloaded shapefile into an R `sp` object. For shapefiles, the directory and layer names must be specified separately as two arguments, `dsn` ("data set name") and `layer`.
```{r}
(shapefile.name <- paste0('wosis_latest:',layer.name))
# here strings are just that, not to be interpreted as R factors
bd33 <- readOGR(dsn=wosis.dir.name, layer=shapefile.name,
stringsAsFactors = FALSE)
class(bd33)
proj4string(bd33)
names(bd33@data)
head(bd33@data)
```
The shapefile has been imported as a `SpatialPointsDataFrame` with the correct CRS, as we saw from the `ogrinfo`.
Examine the format of the attribute, this is in field `*_val`:
```{r}
head(bd33@data$bdfi33_val)
```
The format is `{seq:val[,seq:val]}` where the `seq` is an integer on `[1...]` indicating which measurement number -- note that there can be more than one measurement per property, e.g., repeated lab. measurements, and `val` is the numeric value.
But the average value has its own field, so if we only want the average, it is prepared for us. We see an example here, from six rows chosen to show several profiles with their layers:
```{r}
bd33@data[75:80, c("profile_id","upper_dept","lower_dept","bdfi33_val","bdfi33_v_1")]
```
## Working with the WoSIS layer as a SoilProfileCollection
Convert this to a `SoilProfileCollection`, a data type defined in the `aqp` "Algorithms for Quantitive Pedology" package. This data type has separate structures for the site (profile) and horizons.
Initialize with the horizons from the data frame created from the WoSIS layer.
The `aqp::depth` function initializes the SoilProfileCollection object. The formula has the field name of the profile on the left, and the the field names of the horizon boundaries on the right. These fields are in the WoSIS layer.
```{r}
ds.aqp <- bd33@data
depths(ds.aqp) <- profile_id ~ upper_dept + lower_dept
is(ds.aqp)
slotNames(ds.aqp)
str(ds.aqp@site)
str(ds.aqp@horizons)
head(ds.aqp@site)
head(ds.aqp@horizons[c(2,5,6,7,9)],12)
```
Note how the horizons have been grouped into sites, in the `@site` slot, and the per-horizon (by depth) values are in the `@horizons` slot. Here we have `r dim(ds.aqp@horizons)[1]` horizons in `r dim(ds.aqp@site)[1]` profiles.
Now this `SoilProfileCollection` can be used for many `aqp` functions. For example, here is the depth distribution of average bulk density of the components for the first 24 listed profiles, labelled by genetic horizon:
```{r fig.width=12, fig.height=8}
plot(ds.aqp[1:24,], name="layer_name", color='bdfi33_v_1')
```
A few layers in this set of profiles are missing bulk denisity.
# WoSIS layers as CSV files
Another format provided by WoSIS is the CSV 'comma-separated values' file.
## Read a layer into the client system as a CSV file
We can instead read point information into a text file. Here the profile information for the $2^\circ \times 2^\circ$ tile in central Europe:
(Note: `overwrite=TRUE` does not seem to work for `.csv` files, so any previous must be manually deleted).
```{r}
wosis.dir.name <- "./wosis"
layer.name <- "wosis_latest_profiles"
target.name <- paste0(wosis.dir.name,"/",layer.name,".csv")
if (file.exists(target.name)) { file.remove(target.name) }
ogr2ogr(src=wfs,
spat=c(6, 48, 8, 50),
dst=target.name,
layer=layer.name,
f="CSV",
overwrite=TRUE)
```
And the bulk density:
```{r}
layer.name <- "wosis_latest_bdfi33"
target.name <- paste0(wosis.dir.name,"/",layer.name,".csv")
if (file.exists(target.name)) { file.remove(target.name) }
ogr2ogr(src=wfs,
dst=paste0(wosis.dir.name,"/",layer.name,".csv"),
layer=layer.name,
f="CSV",
overwrite=TRUE)
```
## Read from a local text file into R objects
The `read.csv` function reads from a CSV file into an R `data.frame`.
First, the bulk density points:
```{r}
layer.name <- "wosis_latest_bdfi33"
bd33.pts <- read.csv(file=paste0(wosis.dir.name,"/",layer.name,".csv"),
stringsAsFactors = FALSE)
dim(bd33.pts)
names(bd33.pts)
```
Notice that here we have the complete field names, not truncated as they were in the shapefiles.
These can now be processed as above (the shapefiles).
Second, the profiles (sites):
```{r}
layer.name <- "wosis_latest_profiles"
profiles <- read.csv(paste0(wosis.dir.name, "/",layer.name,".csv"),
stringsAsFactors = FALSE)
names(profiles)
coordinates(profiles) <- c("longitude", "latitude")
proj4string(profiles) <- CRS("+init=epsg:4326")
```
Review some site information, e.g., the WRB Reference Soil Groups:
```{r}
table(profiles@data$cwrb_reference_soil_group)
```
And a map:
```{r}
spplot(profiles, zcol="cwrb_reference_soil_group", key.space="right")
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment