Skip to content

Documentation for DataCollections

DataCollections is the class earthaccess uses to query CMR at the dataset level.

Bases: CollectionQuery

Placeholder.

Info

The DataCollection class queries against https://cmr.earthdata.nasa.gov/search/collections.umm_json, the response has to be in umm_json to use the result classes.

Builds an instance of DataCollections to query the CMR.

Parameters:

Name Type Description Default
auth Optional[Auth]

An authenticated Auth instance. This is an optional parameter for queries that need authentication, e.g. restricted datasets.

None

cloud_hosted(cloud_hosted=True)

Only match granules that are hosted in the cloud. This is valid for public collections.

Tip

Cloud hosted collections can be public or restricted. Restricted collections will not be matched using this parameter

Parameters:

Name Type Description Default
cloud_hosted bool

If True, obtain only cloud-hosted collections.

True

Returns:

Type Description
Self

self

Raises:

Type Description
TypeError

cloud_hosted is not of type bool.

concept_id(IDs)

Filter by concept ID.

For example: C1299783579-LPDAAC_ECS or G1327299284-LPDAAC_ECS, S12345678-LPDAAC_ECS

Collections, granules, tools, services are uniquely identified with this ID.

  • If providing a collection's concept ID, it will filter by granules associated with that collection.
  • If providing a granule's concept ID, it will uniquely identify those granules.
  • If providing a tool's concept ID, it will uniquely identify those tools.
  • If providing a service's concept ID, it will uniquely identify those services.

Parameters:

Name Type Description Default
IDs Sequence[str]

ID(s) to search by. Can be provided as a string or list of strings.

required

Returns:

Type Description
Self

self

Raises:

Type Description
ValueError

An ID does not start with a valid prefix.

daac(daac_short_name)

Only match collections for a given DAAC, by default the on-prem collections for the DAAC.

Parameters:

Name Type Description Default
daac_short_name str

a DAAC shortname, e.g. NSIDC, PODAAC, GESDISC

required

Returns:

Type Description
Self

self

data_center(data_center_name)

An alias for the daac method.

Parameters:

Name Type Description Default
data_center_name str

DAAC shortname, e.g. NSIDC, PODAAC, GESDISC

required

Returns:

Type Description
Self

self

debug(debug=True)

If True, prints the actual query to CMR. Note that the pagination happens in the headers.

Parameters:

Name Type Description Default
debug Boolean

If True, print the CMR query.

True

Returns:

Type Description
Self

self

doi(doi)

Search datasets by DOI.

Tip

Not all datasets have an associated DOI, also DOI search works only at the dataset level but not the granule (data) level. We need to search by DOI, grab the concept_id and then get the data.

Parameters:

Name Type Description Default
doi str

DOI of a datasets, e.g. 10.5067/AQR50-3Q7CS

required

Returns:

Type Description
Self

self

Raises:

Type Description
TypeError

doi is not of type str.

fields(fields=None)

Masks the response by only showing the fields included in this list.

Parameters:

Name Type Description Default
fields List

list of fields to show. These fields come from the UMM model (e.g. Abstract, Title).

None

Returns:

Type Description
Self

self

get(limit=2000)

Get all the collections (datasets) that match with our current parameters up to some limit, even if spanning multiple pages.

Tip

The default page size is 2000, we need to be careful with the request size because all the JSON elements will be loaded into memory. This is more of an issue with granules than collections as they can be potentially millions of them.

Parameters:

Name Type Description Default
limit int

The number of results to return

2000

Returns:

Type Description
List[DataCollection]

Query results as a (possibly empty) list of DataCollection instances.

Raises:

Type Description
RuntimeError

The CMR query failed.

hits()

Returns the number of hits the current query will return.

This is done by making a lightweight query to CMR and inspecting the returned headers. Restricted datasets will always return zero results even if there are results.

Returns:

Type Description
int

The number of results reported by the CMR.

Raises:

Type Description
RuntimeError

The CMR query failed.

instrument(instrument)

Search datasets by instrument.

Tip

Not all datasets have an associated instrument. This works only at the dataset level but not the granule (data) level.

Parameters:

Name Type Description Default
instrument String

instrument of a datasets, e.g. instrument=GEDI

required

Returns:

Type Description
Self

self

Raises:

Type Description
TypeError

instrument is not of type str.

keyword(text)

Case-insensitive and wildcard (*) search through over two dozen fields in a CMR collection record. This allows for searching against fields like summary and science keywords.

Parameters:

Name Type Description Default
text str

text to search for

required

Returns:

Type Description
Self

self

parameters(**kwargs)

Provide query parameters as keyword arguments. The keyword needs to match the name of the method, and the value should either be the value or a tuple of values.

Example
query = DataCollections.parameters(
    short_name="AST_L1T",
    temporal=("2015-01","2015-02"),
    point=(42.5, -101.25)
)

Returns:

Type Description
Self

self

Raises:

Type Description
ValueError

The name of a keyword argument is not the name of a method.

TypeError

The value of a keyword argument is not an argument or tuple of arguments matching the number and type(s) of the method's parameters.

print_help(method='fields')

Prints the help information for a given method.

project(project)

Search datasets by associated project.

Tip

Not all datasets have an associated project. This works only at the dataset level but not the granule (data) level. Will return datasets across DAACs matching the project.

Parameters:

Name Type Description Default
project String

associated project of a datasets, e.g. project=EMIT

required

Returns:

Type Description
Self

self

Raises:

Type Description
TypeError

project is not of type str.

provider(provider)

Only match collections from a given provider.

A NASA datacenter or DAAC can have one or more providers. E.g., PODAAC is a data center or DAAC; PODAAC is the default provider for on-premises data, POCLOUD is the PODAAC provider for their data in the cloud.

Parameters:

Name Type Description Default
provider str

a provider code for any DAAC, e.g. POCLOUD, NSIDC_CPRD, etc.

required

Returns:

Type Description
Self

self

temporal(date_from=None, date_to=None, exclude_boundary=False)

Filter by an open or closed date range. Dates can be provided as date objects or ISO 8601 strings. Multiple ranges can be provided by successive method calls.

Tip

Giving either datetime.date(YYYY, MM, DD) or "YYYY-MM-DD" as the date_to parameter includes that entire day (i.e. the time is set to 23:59:59). Using datetime.datetime(YYYY, MM, DD) is different, because datetime.datetime objects have 00:00:00 as their built-in default.

Parameters:

Name Type Description Default
date_from Optional[Union[str, date, datetime]]

start of temporal range

None
date_to Optional[Union[str, date, datetime]]

end of temporal range

None
exclude_boundary bool

whether or not to exclude the date_from/to in the matched range.

False

Returns:

Type Description
Self

self

Raises:

Type Description
ValueError

date_from or date_to is a non-None value that is neither a datetime object nor a string that can be parsed as a datetime object; or date_from and date_to are both datetime objects (or parsable as such) and date_from is after date_to.