EOxServer Operators’ Guide

Basic Concepts

EOxServer is all about coverages - see the EOxServer Basics for a short description.

In the language of the OGC Abstract Specification, coverages are mappings from a domain set that is related to some area of the Earth to a range set. So, the data model for coverages contains information about the structure of the domain set and of the range set (the so-called Range Type).

In the Coverages section below you find more detailed information about what data and metadata is stored by EOxServer.

The actual data EOxServer deals with can be stored in different ways. These storage facilities are discussed below in the section on Storage Backends.

Operators have different possibilities to ingest data into the system. Using the Admin Client, you can edit the contents of the EOxServer database. Especially for batch processing using the Command Line Tools may be preferable.

Storage Backends

EOxServer supports different kinds of data stores for coverage data:

  • as an image file stored on the local file system
  • as an image file stored on a remote FTP server
  • as a raster array in a rasdaman database

These different ways of storing data are called Storage Backends. Internally, EOxServer uses the term Location as an abstraction for the different ways access to the data is described. Each storage backend has its own type of Locations that is described in the following subsections.

Local

A path on the local filesystem is the most straightforward way to define the location of a resource. You can use relative paths as well as absolute paths. Please keep in mind that relative paths are interpreted as being relative to the working directory of the process EOxServer runs in. For Apache processes, for instance, this is usually the root directory /.

FTP Repositories

EOxServer allows to define locations on a remote FTP server. This is useful if you do not want to transfer a whole large archive to the machine EOxServer runs on. In that case you can define a remote path that consists of information about the FTP server and the path relative to the root directory of the FTP repository.

An FTP Storage record - as it is called in EOxServer - contains the URL of the server and optional port, username and password entries.

Resources stored on an FTP server are transferred only when they are needed. There is however a cache for transferred files on the machine EOxServer runs on.

Rasdaman Databases

The third backend supported at the moment are rasdaman databases. A rasdaman location consists of rasdaman database connection information and the collection of the corresponding resource.

The rasdaman storage records contain hostname, port, database name, user and password entries.

The data is retrieved from the database using the rasdaman GDAL driver (see Installation for further information).

Coverages

EOxServer coverages fall into three main categories:

In addition there is the Dataset Series type which corresponds to an inhomogeneous collection of coverages.

Range Types

Every coverage has a range type describing the structure of the data. Each range type has a given data type whereas the following data types are supported:

Data Type Name Data Type Value
Unknown 0
Byte 1
UInt16 2
Int16 3
UInt32 4
Int32 5
Float32 6
Float64 7
CInt16 8
CInt32 9
CFloat32 10
CFloat64 11

A range type contains of one or more bands. For each band you may specify a name, an identifier and a definition that describes the property measured (e.g. radiation). Furthermore, you can define nil values for each band (i.e. values that indicate that there is no measurement at the given position).

This range type metadata is used in the coverage description metadata that is returned by WCS operations and for configuring WMS layers.

Note that WMS supports only one data type (Byte) and only Grayscale and RGB output. Any other range types will be mapped to these: for single-band coverages, Grayscale output is generated and RGB output using the first three bands for all others. Automatic scaling is applied when mapping from another data type to Byte. That means the minimum-maximum interval for the given subset of the coverage is computed and mapped to the 0-255 interval supported by the Byte data type.

If you want to view other band combinations than the default ones, you can use the EO-WMS features implemented by EOxServer. For each coverage, an additional layers called <coverage id>_bands is provided for WMS 1.3. Using this layer and the DIM_BAND KVP parameter you can select another combination of bands (either 1 or 3 bands).

EO Metadata

Earth Observation (EO) metadata records are stored for each EO coverage and Dataset Series. They contain the acquisition begin and end time as well as the footprint of the coverage. The footprint is a polygon that describes the outlines of the area covered by the coverage.

Rectified Datasets

Rectified Datasets are EO coverages whose domain set is a rectified grid i.e. which are having a regular spacing in projected or geographic CRS. In practice, this applies to ortho-rectified satellite data. The rectified grid is described by the EPSG SRID of the coordinate reference system, the extent and pixel size of the coverage.

Rectified Datasets can be added to Dataset Series and Rectified Stitched Mosaics.

Referenceable Datasets

Referenceale Datasets are EO coverages whose domain set is a referenceable grid i.e. which are not rectified, but are associated with (one or more) coordinate transformation which relate the image to a projected or geographic CRS. That means that there is some general transformation between the grid cell coordinates and coordinates in an Earth-bound spatial reference system. This applies for satellite data in its original geometry.

At the moment, EOxServer supports only referenceable datasets that contain ground control points (GCPs) in the data files. Simple approximative transformations based on these GCPs are used to generate rectified views on the data for WMS and to calculate subset bounds for WCS GetCoverage requests. Note that these transformations can be very inaccurate in comparison to an actual ortho-rectification of the coverage.

Rectified Stitched Mosaics

Rectified Stitched Mosaics are EO coverages that are composed of a set of homogeneous Rectified Datasets. That means, the datasets must have the same range type and their domain sets must be subsets of the same rectified grid.

When creating a Rectified Stitched Mosaic a homogeneous coverage is generated from the contained Rectified Datasets. Where datasets overlap the most recent one as indicated by the acquisition timestamps in the EO metadata is shown on top hiding the others.

Dataset Series

Any Rectified and Referenceable Datasets can be organized in Dataset Series. Multiple datasets which are spatially and/or temporally overlapping can be organized in a Dataset Series. Furthermore Stitched Mosaics can also be organized in Dataset Series.

Data Preparation and Supported Data Formats

EO Coverages consist of raster data and metadata. The way this data is stored can vary considerably. EOxServer supports a wide range of different data and metadata formats which are described below.

Raster Data Formats

EOxServer uses the GDAL library for raster data handling. So does MapServer whose scripting API (MapScript) is used by EOxServer as well. In principle, any format supported by GDAL can be read by EOxServer and registered in the database.

There is, however, one caveat. Most data formats are composed of bands which contain the data (e.g. ENVISAT N1, GeoTIFF, JPEG 2000). But some data formats (notably netCDF and HDF) have a different substructure: subdatasets. At the moment these data formats are only supported for data output, but not for data input.

For more information on configuration of supported raster file formats read “Supported Raster File Formats and Their Configuration”.

Raster Data Preparation

Usually, raster data does not need to be prepared in a special way to be ingested into EOxServer.

If the raster data file is structured in subdatasets, though, as is the case with netCDF and HDF, you will have to convert it to another format. You can use the gdal_translate command for that task:

$ gdal_translate -of <Output Format> <Input File Name> <Output File Name>

You can display the list of possible output formats with:

$ gdalinfo --formats

For automatic registration of datasets, EOxServer relies on the geospatial metadata stored with the dataset, notably the EPSG ID of the coordinate reference system and the geospatial extent. In some cases the CRS information in the dataset does not contain the EPSG code. If you are using the command line interfaces of EOxServer you can specify an SRID with the --default-srid option. As an alternative you can try to add the corresponding information to the dataset, e.g. with:

$ gdal_translate -a_srs "+init=EPSG:<SRID>" <Input File Name> <Output File Name>

For performance reasons, especially if you are using WMS, you might also consider to add overviews to the raster data files using the gdaladdo command (documentation). Note however that this is supported only by a few formats like GeoTIFF and JPEG2000.

Metadata Formats

There are two possible ways to store metadata: the first one is to store it in the data file itself, the second one is to store it in an accompanying metadata file.

Only a subset of the supported raster data formats are capable of storing metadata in the data file. Furthermore there are no standards defining the semantics of the metadata for generic formats like GeoTIFF. For mission specific formats, however, there are thorough specifications in place.

EOxServer supports reading basic metadata from ENVISAT N1 files and files that have a similar metadata structure (e.g. a GeoTIFF file with the same metadata tags).

For other formats metadata files have to be provided. EOxServer supports two XML-based formats:

  • OGC Earth Observation Profile for Observations and Measurements (OGC 10-157r2)
  • an EOxServer native format

Here is an example for EO O&M:

<?xml version="1.0" encoding="ISO-8859-1"?>
<eop:EarthObservation gml:id="eop_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775" xmlns:eop="http://www.opengis.net/eop/2.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:om="http://www.opengis.net/om/2.0">
  <om:phenomenonTime>
    <gml:TimePeriod gml:id="phen_time_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775">
      <gml:beginPosition>2005-03-31T07:59:36Z</gml:beginPosition>
      <gml:endPosition>2005-03-31T08:00:36Z</gml:endPosition>
    </gml:TimePeriod>
  </om:phenomenonTime>
  <om:resultTime>
    <gml:TimeInstant gml:id="res_time_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775">
      <gml:timePosition>2005-03-31T08:00:36Z</gml:timePosition>
    </gml:TimeInstant>
  </om:resultTime>
  <om:procedure />
  <om:observedProperty />
  <om:featureOfInterest>
    <eop:Footprint gml:id="footprint_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775">
      <eop:multiExtentOf>
        <gml:MultiSurface gml:id="multisurface_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775" srsName="http://www.opengis.net/def/crs/EPSG/0/4326">
          <gml:surfaceMember>
            <gml:Polygon gml:id="polygon_ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775">
              <gml:exterior>
                <gml:LinearRing>
                  <gml:posList>-33.03902600 22.30175400 -32.53056000 20.09945700 -31.98492200 17.92562200 -35.16690300 16.72760500 -35.73368300 18.97694800 -36.25910700 21.26212300 -33.03902600 22.30175400</gml:posList>
                </gml:LinearRing>
              </gml:exterior>
            </gml:Polygon>
          </gml:surfaceMember>
        </gml:MultiSurface>
      </eop:multiExtentOf>
    </eop:Footprint>
  </om:featureOfInterest>
  <om:result />
  <eop:metaDataProperty>
    <eop:EarthObservationMetaData>
      <eop:identifier>ASA_WSM_1PNDPA20050331_075939_000000552036_00035_16121_0775</eop:identifier>
      <eop:acquisitionType>NOMINAL</eop:acquisitionType>
      <eop:status>ARCHIVED</eop:status>
    </eop:EarthObservationMetaData>
  </eop:metaDataProperty>
</eop:EarthObservation>

The native format has the following structure:

<Metadata>
    <EOID>some_unique_eoid</EOID>
    <BeginTime>YYYY-MM-DDTHH:MM:SSZ</BeginTime>
    <EndTime>YYYY-MM-DDTHH:MM:SSZ</EndTime>
    <Footprint>
        <Polygon>
            <Exterior>Mandatory - some_pos_list as all-space-delimited Lat Lon pairs (closed polygon i.e. 5 coordinate pairs for a rectangle) in EPSG:4326</Exterior>
            [
             <Interior>Optional - some_pos_list as all-space-delimited Lat Lon pairs (closed polygon) in EPSG:4326</Interior>
             ...
            ]
        </Polygon>
    </Footprint>
</Metadata>

The automatic registration tools for EOxServer (see below under Command Line Tools) expect that the metadata file accompanying the data file has the same name with .xml as extension.

Metadata Preparation

EOxServer provides a tool to extract metadata from ENVISAT N1 files and convert it to EO O&M format. It can be found under tools/gen_envisat_md.py. It accepts an input path to an N1 file and stores the resulting XML file under the same path with the appropriate file name (i.e. replacing the .N1 extension with .xml). Note that EOxServer must be in the Python path and the environment variable DJANGO_SETTINGS_MODULE must be set and point to a properly configured EOxServer instance.

Admin Client

The Admin Client is accessible via any standard web browser at the path /admin under the URL your instance is deployed or simply by following the admin link on the start page. Deployment provides more details.

Use the username and password you provided during the syncdb step as described in the Service Instance Creation and Configuration section.

Creating a custom Range Type

Before registering any data in EOxServer some vital information on the datasets has to be provided. Detailed information regarding the kind of data stored can be defined in the Range Type. A Range Type is a collection of bands which themselves are assigned to a specifig Data Type (see Range Types).

A simple standard PNG for example holds 4 bands (RGB + Alpha) each of them able to store 8 bit data. Therefore the Range Type would have to be defined with four bands (red, green, blue, alpha) each of them having ‘Byte’ as Data Type.

In our example we use the reduced MERIS RGB data provided in the autotest instance. gdalinfo provides us with the most important information:

[...]
Band 1 Block=541x5 Type=Byte, ColorInterp=Red
Band 2 Block=541x5 Type=Byte, ColorInterp=Green
Band 3 Block=541x5 Type=Byte, ColorInterp=Blue

First, we have to define the bands by clicking “add” next to “Bands” in the Admin interface. In “Name”, “Identifier” and “Description” you can enter the same content for now. The default “Definition” value for now can be “http://www.opengis.net/def/property/OGC/0/Radiance”. “UOM” stands for “unit of measurement” which in our case is radiance defined by the value “W.m-2.Sr-1”. For displaying the data correctly it is recommended to assign the respective value in “GDAL Interpretation”. NoData values can be defined by adding a “Nilvaluerecord”. (see screenshot)

../../_images/admin_app_01_add_band.png
../../_images/admin_app_02_create_band1.png
../../_images/admin_app_03_create_band2.png

After adding also the green and blue band we can proceed defining the Range Type. After providing the new Range Type with a name you will have to assign a Data Type of all data. In our case we select “Byte”. Below we now have to add our three Bands by clicking on the lowermost “+” icon. The important part here is to assign each Band it’s respective number (‘1’ for red and so on). (see screenshot)

../../_images/admin_app_04_add_rangetype.png

Alternatively we could have started with the Range Type and added each band via the “+” icons next to the bands directly.

To list, export, and load range types using the command-line tools see Range Type Handling.

Linking to a Local Path

Click “Add” on “Local paths” and paste the desired local directory where your data is. Make sure the system user under which the web server process is running, typically apache, has read access.

Creating a Data Package

A Data Package consists of a GDAL-readable image file and a corresponding XML metadata file using the WCS 2.0 Earth Observation Application Profile (EO-WCS).

../../_images/admin_app_05_data_package.png

Adding Data Sources

After adding a Local Path or location (pointing to a single directory, not a specific file) you can combine this with a search pattern and create a Data Source. A viable search pattern would be something like “*.tif” to add all TIFF files stored in that directory. Please note that in this case, every TIFF needs a XML file with the exact same name holding the EO-Metadata.

../../_images/admin_app_06_add_data_source.png

Creating a Dataset Series

A Dataset Series can contain any number of EO Coverages i.e. Datasets or Stitched Mosaics. A Dataset Series therefore has its own metadata entry with respect to the metadata of its containing datasets.

../../_images/admin_app_07_add_dataset_series.png

Command Line Tools

eoxserver-admin.py create_instance

The first important command line tool is used for Service Instance Creation and Configuration of EOxServer and is explained in the Installation section of this user’ guide.

eoxs_register_dataset

Besides this tool EOxServer adds some custom commands to Django’s manage.py script. The eoxs_register_dataset command is detailed in the Data Registration section.

eoxs_deregister_dataset

The eoxs_deregister_dataset command allows the de-registration of existing datasets (simple coverage types as Rectified and Referenceables datasets only) from an EOxServer instance including proper unlinking from relevant container types. The functionality of this command is complementary to the eoxs_register_dataset command.

It is worth to mention that the de-registration does not remove physical data stored in the file system or different storage backende. Therefore an extra effort has to be spent to purge the physical data/meta-data files from their storage.

To de-register a dataset (coverage) identified by its (Coverage/EO) identifier the following command shall be invoked:

python manage.py eoxs_deregister_dataset <CoverageID>

The de-registration command allows convenient de-registration of an arbitrary number of datasets at the same time:

python manage.py eoxs_deregister_dataset <CoverageID> <CoverageID> ...

The eoxs_deregister_dataset does not allow the removing of container objects such as Rectified Stitched Mosaics or Dataset Series.

The eoxs_deregister_dataset command, by default, does not allow the de-registration of automatic datasets (i.e, datasets registered by the synchronisation process, see What is synchronization?). Although this restriction can be overridden by the --force option, it is not recommended to do so.

Updating Datasets

There is currently no way how to update registered EOxServer datasets from the command line. In case such an action would be needed it is recommended to de-register the existing dataset first (see eoxs_deregister_dataset command) and register it again with the updated parameters (see eoxs_register_dataset command). Special attention should be paid to linking of the updated dataset to all the container objects during the registration as this information is removed by the de-registration.

eoxs_add_dataset_series

The eoxs_add_dataset_series command allows the creation of a dataset series with initial data sources or coverages included. In it’s simplest use case, only the --eo-id parameter is required, which has to be a valid and not yet taken identifier for the Dataset Series.

When supplied with the --data-sources parameter, given data sources will be added once the Dataset Series is created. When using the --data-sources it is highly recommended to also use --patterns, a list of search patterns which will be used for the data source of the same index. When only one --pattern is given, it is used for all data sources.

Range types for datasets can be read from configuration files that are accompanying them. There can be a configuration file for each dataset or one that applies to all datasets contained within a directory corresponding to a data source. Configuration files have the file extension .conf. The file name is the same as the one of the dataset (so the dataset foo.tiff needs to be accompanied by foo.conf) or __default__.conf if you want to use the config file for the whole directory. The syntax for the file is as follows:

[range_type]
range_type_name=<range type name>

Both approaches may be combine and configuration files produced only for some of the datasets in a directory and a default range type defined in __default__.conf. EOxServer will first look up the dataset configuration file and fall back to the default only if there is no individual .conf file.

Unless the --no-sync parameter is given, this also triggers a synchronization as explained in the chaper What is synchronization?.

Already registered datasets can be automatically added to the Dataset Series by using the --add option which takes a list of IDs referencing either Rectified Datasets, Referenceable Datasets and Rectified Stitched Mosaics.

The optional --default-begin-time, --default-end-time and --default-footprint parameters can be used to supply some default metadata values. Note: once the Dataset Series is synchronized, these values are overridden.

eoxs_synchronize

This command allows to synchronize an EOxServer instance with the file system.

What is synchronization?

In the context of EOxServer, synchronization is the process of updating the database models for container objects (such as RectifiedStitchedMosaics or DatasetSeries) according to changes in the file system.

Automatic datasets are deleted from the database, when their data files cannot be found in the file system. Similar, new datasets will be created when new files matching the search pattern in the subscripted directories are found.

When datasets are added to or deleted from a container object, the metadata (e.g the footprint of the features of interest or the time extent of the image) of the container is also likely to be adjusted.

Reasons for Synchronization

There are several occasions, where synchronization is necessary:

  • A file has been added to a folder associated with a container
  • A file from a folder associated with a container has been removed
  • EO Metadata has been changed
  • A regular check for database consistency

HowTo

Synchronization can be triggered by a custom Django admin command, called eoxs_synchronize.

To start the synchronization process, navigate to your instances directory and type:

python manage.py eoxs_synchronize <IDs>

whereas <IDs> are the coverage/EO IDs of the containers that shall be synchronized.

Alternatively, with the -a or --all option, all container objects in the database will be synchronized. This option is useful for a daily cron-job, ensuring the databases consistency with the file system.

python manage.py eoxs_synchronize --all

The synchronization process may take some time, especially when FTP/Rasdaman storages are used and also depends on the number of synchronized objects.

eoxs_insert_into_series

This command allows to insert (link) existing coverages (datasets) into dataset series.

The same action can be obtained already during the dataset registration by using of the --dataset-series option of the eoxs_register_dataset.

To insert a coverage into a dataset series use this command:

python manage.py eoxs_insert_into_series <CoverageID> <DatasetSeriesID>

For convenience, multiple coverages can be inserted at once:

python manage.py eoxs_insert_into_series <CoverageID1> <CoverageID2> ... <DatasetSeriesID>

All given IDs but the last are interpreted as coverage IDs and the last as the ID for the dataset series.

The IDs can also be set explicitly via the --dataset and --dataset-series options, which also allows the insertion of datasets into multiple dataset series:

python manage.py eoxs_insert_into_series --datasets <CoverageID1> <CoverageID2> \
                             --dataset-series <DatasetSeriesID1> <DatasetSeriesID2>

eoxs_remove_from_series

This command is complemetary to the eoxs_insert_into_series as it removes (unlinks) coverages from a dataset series. As these two commands have a very similar semantic, the parameters are the same and have the same meaning.

To remove a single coverage from a dataset series type:

python manage.py eoxs_remove_from_series <CoverageID> <DatasetSeriesID>

Like eoxs_insert_into_series also multiple coverages can be excluded at once.

It is worth to mention that the eoxs_remove_from_series command does not deregister the unlinked datasets and these still held by the EOxServer. In case the deregistration of datasets is desired the eoxs_deregister_dataset command does so together with unlinking of the datasets from all datasets.

eoxs_check_id

The eoxs_check_id commands allows checking about status of the queried coverage/EO identifier. The command returns the status via its return code (0 - True or 1 - False).

By default the command checks whether an identifier can be used (is available) as a new Coverage/EO ID:

python manage.py eoxs_check_id <ID> && echo True || echo False

The default behaviour is equivalent to --is-available option:

python manage.py eoxs_check_id --is-available <ID> && echo True || echo False

The available coverage/EO ID is neither used by an existing objects nor reserved for use by a future object.

In order to check whether a coverage/EO ID is used by an existing object apply the --is-used option:

python manage.py eoxs_check_id --is-used <ID> && echo True || echo False

In order to check whether a coverage/EO ID is registered for future use apply the --is-reserved option:

python manage.py eoxs_check_id --is-reserved <ID> && echo True || echo False

Range Type Handling

The eoxs_list_rangetypes command, by default, lists the names of all registered range types:

python manage.py eoxs_list_rangetypes

In case of more range types details required verbose listing may be requested by --details option. When one or more range type names are specified the output will be limited to the specified range-types only:

python manage.py eoxs_list_rangetypes --details [<range-type-name> ...]

The same command can be also used to export rangetype in JSON format (--json option). Following example prints the selected RGB range type in JSON format:

python manage.py eoxs_list_rangetypes --json RGB

The output may be directly savaved to file by using the -o option. Following example saves all the registered range-types to a file named rangetypes.json:

python manage.py eoxs_list_rangetypes --json -o rangetypes.json

The rangetypes saved in JSON format can be loaded (e.g., by another EOxServer instance) by using of the eoxs_load_rangetypes command. By default, this command reads the JSON data from the standard input. To force the command to read the input from a file use -i

python manage.py eoxs_load_rangetypes -i rangetypes.json

Performance

The performance of different EOxServer tasks and services depends heavily on the hardware infrastructure and the data to be handled. Tests were made for two typical operator use cases:

  • registering a dataset
  • generating a mosaic

The tests for registering datasets were performed on a quad-core machine with 4 GB of RAM and with a SQLite/SpatiaLite database. The test datasets were 58 IKONOS multispectral (4-band 16-bit), 58 IKONOS panchromatic (1-band 16-bit) and 58 IKONOS pansharpened (3-band 8-bit) scenes in GeoTIFF format with file sizes ranging between 60 MB and 1.7 GB. The file size did not have any discernible impact on the time it took to register. The average registration took about 61 ms, meaning that registering nearly 1000 datasets per minute is possible.

The tests for the generation of mosaics were performed on a virtual machine with one CPU core allocated and 4 GB of RAM. Yet again, the input data were IKONOS scenes in GeoTIFF format.

Datasets Data Type Files Input File Size Tiles Generated Time GB per minute
IKONOS multispectral 4-band 16-bit 68 8.9 GB 8.819 10 m 0.89 GB
IKONOS panchromatic 1-band 16-bit 68 35.1 GB 126.750 1:05 h 0.54 GB
IKONOS pansharpened 3-band 8-bit 68 52.7 GB 126.750 1:46 h 0.49 GB

As the results show the file size of the input files has a certain impact on performance, but the effect seems to level off.

Regarding the performance of the services there are many influence factors:

  • the hardware configuration of the machine
  • the network connection bandwith
  • the database configuration (SQLite or PostGIS)
  • the format and size of the raster data files
  • the processing steps necessary to fulfill the request (e.g. resampling, reprojection)
  • the coverage type (processing referenceable grid coverages is considerably more expensive than processing rectified grid coverages)
  • the setup of IDM components (if any)

For hints on improving performance of the services see Hardware Requirements, Data Preparation and Supported Data Formats and Improving Performance with MapCache.