The most prominent characteristic of MODIS data is the fact that a single satellite observation triggers the creation of a myriad of data products. These products cover four main groups of disciplines: atmosphere, land, ocean, and cryosphere. Certain products are made available at multiple time scales, and may include data from more than a single observation. Data products may be built from other data products, as well. This creates a dependency hierarchy where the production of certain data sets must wait until all of the data they require have been collected and processed.
The "level" characteristic of MODIS data products is a classification which describes the general nature of the data product as well as its relation to the original satellite observation(s). The level is a fundamental component of the MODIS data product taxonomy. Briefly defined, these are the levels:
...derived geophysical variables at the same resolution and location as level 1 source data...This of course means that there is a one-to-one correspondence between a pixel in a level 2 product and a pixel in the level 0 file. (The pixel may have appeared more than once in the raw telemetry.)
level 2 data mapped on a uniform space-time grid scale (Sinusoidal)
gridded variables in derived spatial and/or temporal resolutions
model output, or the results of analysis of lower level data
The explosion of data products occurs at level 2, where algorithms applied to instrument data yield geophysical variables of interest. Note that at level 2 and below, all data products possess a one-to-one correspondence from a pixel in the product to an observation on the satellite. At level 2G, geophysical variables are spatially resampled to a fixed grid (and multiple observations of the same grid cell are collated). Level 3 data represent products which have been spatially aggregated or resampled (to reduce noise or provide consistently gridded products) and/or temporally aggregated. It is common for the same geophysical variable at level 3 to be produced at multiple spatial and temporal resolutions, yielding "subproducts".
The term "swath product" is generally used for data products which are derived from only a single observation, and which have not been spatially resampled onto a common grid. Swath data products may be calculated from other swath products. Generally, the swath products are the only products generated by Direct Broadcast stations. Swath products are typically the first products generated by NASA's production facility. For all intents and purposes, swath products are level 1b and level 2.
MODIS data products are resampled to a common grid at level 2G. All observations of a given grid cell are recorded within that grid cell. (So each grid cell may have 0 or more values.) The common grid remains invariant from orbit to orbit and day to day. Comparison of two level 2G products is eased due to this fact.
Level 3 products may offer the same geophysical variable at many combinations of spatial and temporal resolutions. Typical temporal resolutions are: daily, 8-day, 16-day, monthly, and annual. Level 3 products usually retain one value for each geophysical variable per cell. Algorithms to produce level 3 products describe the method by which the cell value is selected (or computed) from the constituent data.
Each instance of a Level 3 product is typically considered to be global in scope, and the entire global dataset instance is considered to have a single valid time. The global dataset possesses a projection, and the projection is divided into a fixed set of tiles, where each tile covers a predetermined spatial region. Different projections, of course, possess different tile definitions. The spatial region associated with a tile is fixed regardless of the resolution of the data set, so Level 3 products of different resolutions (but which share a projection) have the same number of tiles. Each individual tile has greater or fewer pixels, as determined by the resolution.
The independent axes available to classify MODIS products varies by data level. These are described here.
The level 1 data are instrument data (e.g., what the instrument actually measured) while the other levels are derived quantities (i.e., something calculated by an algorithm.) There may be more than one geophysical variable bundled into a standard "geophysical package". This classification scheme does not describe any finer granularity than a standard package, typically delivered to the user as an HDF-EOS file.
For swath-based data, the "time of start of acquisition" and satellite parameters are sufficient to uniquely describe the geospatial bounds of the dataset, as they uniquely determine the intrument location. The distribution unit for these data products is a granule, which is the data collected in a fixed length of time (i.e., 5 minutes), or a pass which is all the data collected during one contact with a ground station (usually 12 minutes or less). In the case of the gridded datasets at level 2G and level 3, the data from many orbits and/or many days produce a global dataset which has a certain time it is considered to be valid. This global dataset is broken into tiles of a more manageable size. Thus the distribution unit of MODIS data for these products is one tile of data.
For data generated by NASA's orthodox processing system, all data products have an additional parameter: the data "collection". A collection is formed by freezing the version(s) of all algorithms for all products. Thus all data products belonging to the same collection were produced consistently.
Although all data products at level 1 and level 2 are directly related by the same granule id, there is no simple one-to-one correspondence with the higher, gridded levels, particularly level 3. Given one level 3 data product, the filename for a corresponding level 3 data product is not straightforward to calculate. This is because not all data products are offered at the same temporal or spatial resolutions. In addition, for temporal resolutions less than daily, it is necessary to know at least one valid time in order to determine when the break between datasets is.
The classification of MODIS data may be divided into two parts. One part is a description of the MODIS data series to which the data belongs. The other part is the specification of the particular instance of the data product.
Generally speaking, the data series describes the type of information contained in the file by a product designation (i.e., is the file a land surface temperature file, a fire and thermal anomalies file, or a vegetation index file?). Information in the data series ID is the same for all files belonging to that data series.
The parameters which describe the data instance are related to specifying the time or geographical location relevant to the data in the file. The data instance may specify the tile ID of the data in the file. Because level 3 data possess many tiles for every global dataset, this definition of a MODIS data instance may refer to a tiny fraction of a particular global dataset in the data series. The data instance is always tied to a particular data series and always specifies which series it is an instance of.
Siphoning data from other stations requires a knowledge (or an ability to describe) the strategy they use to organize their collection of data. No two stations do this in precisely the same way. There are two major components to the storage of data: the file naming convention and the directory convention. Both components encode some characteristic of a unit of data (the Granule) or a group of data units.
The file naming convention typically specifies the data product, the start time of the observation, and the satellite which performed the observation. Although the same attributes are specified by each file naming convention, the precise method of encoding this information varies from site to site.
The directory naming convention is more variable. Some sites separate data by satellite, and some do not. Along the temporal dimension, the files may be ungrouped, grouped by day, or grouped by individual observation (in which case, the target directory contains all the data products from a single observation.) Many stations store all the data from one satellite contact in a single file, but one station divides the data into subfiles of roughly equal durations.
The attributes used to describe a station's strategy for the organization of data are as follows:
The main item which is not included in the above description is the precise format of temporal information. Knowing that a station groups files by day is not enough to describe the actual name of the directory. In addition, although we can generally state that the base URL will come first in the pathname, we cannot say whether satellites or time are more significant in the hierarchical organization.
Thus, although we are able to articulate some basic characteristics of what each
directory convention or file naming convention expresses, we fall short of being
able to completely describe the encoding with the set of attributes presented here.
In both cases, we resort to defining a standard enumeration of known stations and
encoding methods. Detailed presentations of each station's encoding scheme
occupy their own reference topics, and the standardized enumeration which relates
the database code to a particular scheme is presented along with the definition of
the
Most Direct Broadcast stations do not produce or offer gridded data (level 2G and above). The primary source of this data is the data pool at the Land Processes DAAC. The LP DAAC organizes data by:
All tiles for a given valid time are stored in the same directory.