Documentation for `DataCollections`

DataCollections is the class `earthaccess` uses to query CMR at the dataset level.

Bases: CollectionQuery

Info

The DataCollection class queries against https://cmr.earthdata.nasa.gov/search/collections.umm_json, the response has to be in umm_json to use the result classes.

Builds an instance of DataCollections to query the CMR.

Parameters:

Name	Type	Description	Default
`auth`	`Optional[Auth]`	An authenticated `Auth` instance. This is an optional parameter for queries that need authentication, e.g. restricted datasets.	`None`

`cloud_hosted(cloud_hosted=True)`

Only match granules that are hosted in the cloud. This is valid for public collections.

Tip

Cloud hosted collections can be public or restricted. Restricted collections will not be matched using this parameter

Parameters:

Name	Type	Description	Default
`cloud_hosted`	`bool`	If `True`, obtain only cloud-hosted collections.	`True`

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`TypeError`	`cloud_hosted` is not of type `bool`.

`concept_id(IDs)`

Filter by concept ID.

For example: C1299783579-LPDAAC_ECS or G1327299284-LPDAAC_ECS, S12345678-LPDAAC_ECS

Collections, granules, tools, services are uniquely identified with this ID.

If providing a collection's concept ID, it will filter by granules associated with that collection.
If providing a granule's concept ID, it will uniquely identify those granules.
If providing a tool's concept ID, it will uniquely identify those tools.
If providing a service's concept ID, it will uniquely identify those services.

Parameters:

Name	Type	Description	Default
`IDs`	`Sequence[str]`	ID(s) to search by. Can be provided as a string or list of strings.	required

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`ValueError`	An ID does not start with a valid prefix.

`daac(daac_short_name)`

Only match collections for a given DAAC, by default the on-prem collections for the DAAC.

Parameters:

Name	Type	Description	Default
`daac_short_name`	`str`	a DAAC shortname, e.g. NSIDC, PODAAC, GESDISC	required

Returns:

Type	Description
`Self`	self

`data_center(data_center_name)`

An alias for the daac method.

Parameters:

Name	Type	Description	Default
`data_center_name`	`str`	DAAC shortname, e.g. NSIDC, PODAAC, GESDISC	required

Returns:

Type	Description
`Self`	self

`debug(debug=True)`

If True, prints the actual query to CMR. Note that the pagination happens in the headers.

Parameters:

Name	Type	Description	Default
`debug`	`Boolean`	If `True`, print the CMR query.	`True`

Returns:

Type	Description
`Self`	self

`doi(doi)`

Search datasets by DOI.

Tip

Not all datasets have an associated DOI, also DOI search works only at the dataset level but not the granule (data) level. We need to search by DOI, grab the concept_id and then get the data.

Parameters:

Name	Type	Description	Default
`doi`	`str`	DOI of a datasets, e.g. 10.5067/AQR50-3Q7CS	required

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`TypeError`	`doi` is not of type `str`.

`fields(fields=None)`

Masks the response by only showing the fields included in this list.

Parameters:

Name	Type	Description	Default
`fields`	`List`	list of fields to show. These fields come from the UMM model (e.g. Abstract, Title).	`None`

Returns:

Type	Description
`Self`	self

`get(limit=2000)`

Get all the collections (datasets) that match with our current parameters up to some limit, even if spanning multiple pages.

Tip

The default page size is 2000, we need to be careful with the request size because all the JSON elements will be loaded into memory. This is more of an issue with granules than collections as they can be potentially millions of them.

Parameters:

Name	Type	Description	Default
`limit`	`int`	The number of results to return	`2000`

Returns:

Type	Description
`List[DataCollection]`	Query results as a (possibly empty) list of `DataCollection` instances.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

`hits()`

Returns the number of hits the current query will return. This is done by making a lightweight query to CMR and inspecting the returned headers. Restricted datasets will always return zero results even if there are results.

Returns:

Type	Description
`int`	The number of results reported by the CMR.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

`instrument(instrument)`

Searh datasets by instrument.

Tip

Not all datasets have an associated instrument. This works only at the dataset level but not the granule (data) level.

Parameters:

Name	Type	Description	Default
`instrument`	`String`	instrument of a datasets, e.g. instrument=GEDI	required

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`TypeError`	`instrument` is not of type `str`.

`keyword(text)`

Case-insensitive and wildcard (*) search through over two dozen fields in a CMR collection record. This allows for searching against fields like summary and science keywords.

Parameters:

Name	Type	Description	Default
`text`	`str`	text to search for	required

Returns:

Type	Description
`Self`	self

`parameters(**kwargs)`

Provide query parameters as keyword arguments. The keyword needs to match the name of the method, and the value should either be the value or a tuple of values.

Example

query = DataCollections.parameters(
    short_name="AST_L1T",
    temporal=("2015-01","2015-02"),
    point=(42.5, -101.25)
)

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`ValueError`	The name of a keyword argument is not the name of a method.
`TypeError`	The value of a keyword argument is not an argument or tuple of arguments matching the number and type(s) of the method's parameters.

`print_help(method='fields')`

Prints the help information for a given method.

`project(project)`

Searh datasets by associated project.

Tip

Not all datasets have an associated project. This works only at the dataset level but not the granule (data) level. Will return datasets across DAACs matching the project.

Parameters:

Name	Type	Description	Default
`project`	`String`	associated project of a datasets, e.g. project=EMIT	required

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`TypeError`	`project` is not of type `str`.

`provider(provider)`

Only match collections from a given provider.

A NASA datacenter or DAAC can have one or more providers. E.g., PODAAC is a data center or DAAC; PODAAC is the default provider for on-premises data, POCLOUD is the PODAAC provider for their data in the cloud.

Parameters:

Name	Type	Description	Default
`provider`	`str`	a provider code for any DAAC, e.g. POCLOUD, NSIDC_CPRD, etc.	required

Returns:

Type	Description
`Self`	self

`temporal(date_from=None, date_to=None, exclude_boundary=False)`

Filter by an open or closed date range. Dates can be provided as date objects or ISO 8601 strings. Multiple ranges can be provided by successive method calls.

Tip

Giving either datetime.date(YYYY, MM, DD) or "YYYY-MM-DD" as the date_to parameter includes that entire day (i.e. the time is set to 23:59:59). Using datetime.datetime(YYYY, MM, DD) is different, because datetime.datetime objects have 00:00:00 as their built-in default.

Parameters:

Name	Type	Description	Default
`date_from`	`Optional[Union[str, date, datetime]]`	start of temporal range	`None`
`date_to`	`Optional[Union[str, date, datetime]]`	end of temporal range	`None`
`exclude_boundary`	`bool`	whether or not to exclude the date_from/to in the matched range.	`False`

Returns:

Type	Description
`Self`	self

Raises:

Type	Description
`ValueError`	`date_from` or `date_to` is a non-`None` value that is neither a datetime object nor a string that can be parsed as a datetime object; or `date_from` and `date_to` are both datetime objects (or parsable as such) and `date_from` is after `date_to`.

Documentation for DataCollections

DataCollections is the class earthaccess uses to query CMR at the dataset level.

cloud_hosted(cloud_hosted=True)

concept_id(IDs)

daac(daac_short_name)

data_center(data_center_name)

debug(debug=True)

doi(doi)

fields(fields=None)

get(limit=2000)

hits()

instrument(instrument)

keyword(text)

parameters(**kwargs)

print_help(method='fields')

project(project)

provider(provider)

temporal(date_from=None, date_to=None, exclude_boundary=False)

Documentation for `DataCollections`

DataCollections is the class `earthaccess` uses to query CMR at the dataset level.

`cloud_hosted(cloud_hosted=True)`

`concept_id(IDs)`

`daac(daac_short_name)`

`data_center(data_center_name)`

`debug(debug=True)`

`doi(doi)`

`fields(fields=None)`

`get(limit=2000)`

`hits()`

`instrument(instrument)`

`keyword(text)`

`parameters(**kwargs)`

`print_help(method='fields')`

`project(project)`

`provider(provider)`

`temporal(date_from=None, date_to=None, exclude_boundary=False)`