Skip to content

Documentation for DataCollections

DataCollections is the class earthaccess uses to query CMR at the dataset level.

Bases: CollectionQuery

Info

The DataCollection class queries against https://cmr.earthdata.nasa.gov/search/collections.umm_json, the response has to be in umm_json to use the result classes.

Builds an instance of DataCollections to query CMR

Parameters:

Name Type Description Default
auth Optional[Auth]

An authenticated Auth instance. This is an optional parameter for queries that need authentication, e.g. restricted datasets.

None

cloud_hosted(cloud_hosted=True)

Only match granules that are hosted in the cloud. This is valid for public collections.

Tip

Cloud hosted collections can be public or restricted. Restricted collections will not be matched using this parameter

Parameters:

Name Type Description Default
cloud_hosted bool

True to require granules only be online

True

concept_id(IDs)

Filter by concept ID. For example: C1299783579-LPDAAC_ECS or G1327299284-LPDAAC_ECS, S12345678-LPDAAC_ECS

Collections, granules, tools, services are uniquely identified with this ID.

  • If providing a collection's concept ID here, it will filter by granules associated with that collection.
  • If providing a granule's concept ID here, it will uniquely identify those granules.
  • If providing a tool's concept ID here, it will uniquely identify those tools.
  • If providing a service's concept ID here, it will uniquely identify those services.

Parameters:

Name Type Description Default
IDs List[str]

ID(s) to search by. Can be provided as a string or list of strings.

required

daac(daac_short_name='')

Only match collections for a given DAAC, by default the on-prem collections for the DAAC.

Parameters:

Name Type Description Default
daac_short_name str

a DAAC shortname, e.g. NSIDC, PODAAC, GESDISC

''

data_center(data_center_name='')

An alias name for daac().

Parameters:

Name Type Description Default
data_center_name str

DAAC shortname, e.g. NSIDC, PODAAC, GESDISC

''

debug(debug=True)

If True, prints the actual query to CMR, notice that the pagination happens in the headers.

Parameters:

Name Type Description Default
debug Boolean

Print CMR query.

True

doi(doi)

Search datasets by DOI.

Tip

Not all datasets have an associated DOI, also DOI search works only at the dataset level but not the granule (data) level. We need to search by DOI, grab the concept_id and then get the data.

Parameters:

Name Type Description Default
doi str

DOI of a datasets, e.g. 10.5067/AQR50-3Q7CS

required

fields(fields=None)

Masks the response by only showing the fields included in this list.

Parameters:

Name Type Description Default
fields List

list of fields to show, these fields come from the UMM model e.g. Abstract, Title

None

get(limit=2000)

Get all the collections (datasets) that match with our current parameters up to some limit, even if spanning multiple pages.

Tip

The default page size is 2000, we need to be careful with the request size because all the JSON elements will be loaded into memory. This is more of an issue with granules than collections as they can be potentially millions of them.

Parameters:

Name Type Description Default
limit int

The number of results to return

2000

Returns:

Type Description
List[DataCollection]

query results as a list of DataCollection instances.

hits()

Returns the number of hits the current query will return. This is done by making a lightweight query to CMR and inspecting the returned headers. Restricted datasets will always return zero results even if there are results.

Returns:

Type Description
int

The number of results reported by CMR.

instrument(instrument)

Searh datasets by instrument

Tip

Not all datasets have an associated instrument. This works only at the dataset level but not the granule (data) level.

Parameters:

Name Type Description Default
instrument String

instrument of a datasets, e.g. instrument=GEDI

required

keyword(text)

Case-insensitive and wildcard (*) search through over two dozen fields in a CMR collection record. This allows for searching against fields like summary and science keywords.

Parameters:

Name Type Description Default
text str

text to search for

required

parameters(**kwargs)

Provide query parameters as keyword arguments. The keyword needs to match the name of the method, and the value should either be the value or a tuple of values.

Example
query = DataCollections.parameters(short_name="AST_L1T",
                                   temporal=("2015-01","2015-02"),
                                   point=(42.5, -101.25))

Returns: Query instance

print_help(method='fields')

Prints the help information for a given method.

project(project)

Searh datasets by associated project

Tip

Not all datasets have an associated project. This works only at the dataset level but not the granule (data) level. Will return datasets across DAACs matching the project.

Parameters:

Name Type Description Default
project String

associated project of a datasets, e.g. project=EMIT

required

provider(provider='')

Only match collections from a given provider.

A NASA datacenter or DAAC can have one or more providers. E.g., PODAAC is a data center or DAAC; PODAAC is the default provider for on-premises data, POCLOUD is the PODAAC provider for their data in the cloud.

Parameters:

Name Type Description Default
provider str

a provider code for any DAAC, e.g. POCLOUD, NSIDC_CPRD, etc.

''

temporal(date_from=None, date_to=None, exclude_boundary=False)

Filter by an open or closed date range. Dates can be provided as datetime objects or ISO 8601 formatted strings. Multiple ranges can be provided by successive calls to this method before calling execute().

Parameters:

Name Type Description Default
date_from String or Datetime object

earliest date of temporal range

None
date_to String or Datetime object

latest date of temporal range

None
exclude_boundary Boolean

whether or not to exclude the date_from/to in the matched range.

False