Documentation for DataCollections
DataCollections is the class earthaccess
uses to query CMR at the dataset level.
Bases: CollectionQuery
Placeholder.
Info
The DataCollection class queries against https://cmr.earthdata.nasa.gov/search/collections.umm_json, the response has to be in umm_json to use the result classes.
Builds an instance of DataCollections to query the CMR.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
auth |
Optional[Auth]
|
An authenticated |
None
|
cloud_hosted(cloud_hosted=True)
Only match granules that are hosted in the cloud. This is valid for public collections.
Tip
Cloud hosted collections can be public or restricted. Restricted collections will not be matched using this parameter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cloud_hosted |
bool
|
If |
True
|
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
TypeError
|
|
concept_id(IDs)
Filter by concept ID.
For example: C1299783579-LPDAAC_ECS or G1327299284-LPDAAC_ECS, S12345678-LPDAAC_ECS
Collections, granules, tools, services are uniquely identified with this ID.
- If providing a collection's concept ID, it will filter by granules associated with that collection.
- If providing a granule's concept ID, it will uniquely identify those granules.
- If providing a tool's concept ID, it will uniquely identify those tools.
- If providing a service's concept ID, it will uniquely identify those services.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
IDs |
Sequence[str]
|
ID(s) to search by. Can be provided as a string or list of strings. |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
ValueError
|
An ID does not start with a valid prefix. |
daac(daac_short_name)
Only match collections for a given DAAC, by default the on-prem collections for the DAAC.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
daac_short_name |
str
|
a DAAC shortname, e.g. NSIDC, PODAAC, GESDISC |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
data_center(data_center_name)
An alias for the daac
method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_center_name |
str
|
DAAC shortname, e.g. NSIDC, PODAAC, GESDISC |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
debug(debug=True)
If True, prints the actual query to CMR. Note that the pagination happens in the headers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
debug |
Boolean
|
If |
True
|
Returns:
Type | Description |
---|---|
Self
|
self |
doi(doi)
Search datasets by DOI.
Tip
Not all datasets have an associated DOI, also DOI search works only at the dataset level but not the granule (data) level. We need to search by DOI, grab the concept_id and then get the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
doi |
str
|
DOI of a datasets, e.g. 10.5067/AQR50-3Q7CS |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
TypeError
|
|
fields(fields=None)
Masks the response by only showing the fields included in this list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields |
List
|
list of fields to show. These fields come from the UMM model (e.g. Abstract, Title). |
None
|
Returns:
Type | Description |
---|---|
Self
|
self |
get(limit=2000)
Get all the collections (datasets) that match with our current parameters up to some limit, even if spanning multiple pages.
Tip
The default page size is 2000, we need to be careful with the request size because all the JSON elements will be loaded into memory. This is more of an issue with granules than collections as they can be potentially millions of them.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
limit |
int
|
The number of results to return |
2000
|
Returns:
Type | Description |
---|---|
List[DataCollection]
|
Query results as a (possibly empty) list of |
Raises:
Type | Description |
---|---|
RuntimeError
|
The CMR query failed. |
hits()
Returns the number of hits the current query will return.
This is done by making a lightweight query to CMR and inspecting the returned headers. Restricted datasets will always return zero results even if there are results.
Returns:
Type | Description |
---|---|
int
|
The number of results reported by the CMR. |
Raises:
Type | Description |
---|---|
RuntimeError
|
The CMR query failed. |
instrument(instrument)
Search datasets by instrument.
Tip
Not all datasets have an associated instrument. This works only at the dataset level but not the granule (data) level.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instrument |
String
|
instrument of a datasets, e.g. instrument=GEDI |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
TypeError
|
|
keyword(text)
Case-insensitive and wildcard (*) search through over two dozen fields in a CMR collection record. This allows for searching against fields like summary and science keywords.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
text to search for |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
parameters(**kwargs)
Provide query parameters as keyword arguments. The keyword needs to match the name of the method, and the value should either be the value or a tuple of values.
Example
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
ValueError
|
The name of a keyword argument is not the name of a method. |
TypeError
|
The value of a keyword argument is not an argument or tuple of arguments matching the number and type(s) of the method's parameters. |
print_help(method='fields')
Prints the help information for a given method.
project(project)
Search datasets by associated project.
Tip
Not all datasets have an associated project. This works only at the dataset level but not the granule (data) level. Will return datasets across DAACs matching the project.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project |
String
|
associated project of a datasets, e.g. project=EMIT |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
TypeError
|
|
provider(provider)
Only match collections from a given provider.
A NASA datacenter or DAAC can have one or more providers. E.g., PODAAC is a data center or DAAC; PODAAC is the default provider for on-premises data, POCLOUD is the PODAAC provider for their data in the cloud.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
provider |
str
|
a provider code for any DAAC, e.g. POCLOUD, NSIDC_CPRD, etc. |
required |
Returns:
Type | Description |
---|---|
Self
|
self |
temporal(date_from=None, date_to=None, exclude_boundary=False)
Filter by an open or closed date range. Dates can be provided as date objects or ISO 8601 strings. Multiple ranges can be provided by successive method calls.
Tip
Giving either datetime.date(YYYY, MM, DD)
or "YYYY-MM-DD"
as the date_to
parameter includes that entire day (i.e. the time is set to 23:59:59
).
Using datetime.datetime(YYYY, MM, DD)
is different, because datetime.datetime
objects have 00:00:00
as their built-in default.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
date_from |
Optional[Union[str, date, datetime]]
|
start of temporal range |
None
|
date_to |
Optional[Union[str, date, datetime]]
|
end of temporal range |
None
|
exclude_boundary |
bool
|
whether or not to exclude the date_from/to in the matched range. |
False
|
Returns:
Type | Description |
---|---|
Self
|
self |
Raises:
Type | Description |
---|---|
ValueError
|
|