earthaccess API
earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.
This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.
collection_query()
Returns a query builder instance for NASA collections (datasets).
Returns:
Type | Description |
---|---|
CollectionQuery
|
a query builder instance for data collections. |
Source code in earthaccess/api.py
download(granules, local_path=None, provider=None, threads=8, *, pqdm_kwargs=None)
Retrieves data granules from a remote storage system.
- If we run this in the cloud, we will be using S3 to move data to
local_path
. - If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
granules
|
Union[DataGranule, List[DataGranule], str, List[str]]
|
a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP) |
required |
local_path
|
Optional[Union[Path, str]]
|
Local directory to store the remote data granules. If not
supplied, defaults to a subdirectory of the current working directory
of the form |
None
|
provider
|
Optional[str]
|
if we download a list of URLs, we need to specify the provider. |
None
|
threads
|
int
|
parallel number of threads to use to download the files, adjust as necessary, default = 8 |
8
|
pqdm_kwargs
|
Optional[Mapping[str, Any]]
|
Additional keyword arguments to pass to pqdm, a parallel processing library.
See pqdm documentation for available options. Default is to use immediate exception behavior
and the number of jobs specified by the |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
List of downloaded files |
Raises:
Type | Description |
---|---|
Exception
|
A file download failed. |
Source code in earthaccess/api.py
get_edl_token()
get_fsspec_https_session()
Returns a fsspec session that can be used to access datafiles across many different DAACs.
Returns:
Type | Description |
---|---|
AbstractFileSystem
|
An fsspec instance able to access data across DAACs. |
Examples:
import earthaccess
earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
f.read(10)
Source code in earthaccess/api.py
get_requests_https_session()
Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.
Returns:
Type | Description |
---|---|
Session
|
An authenticated requests Session instance. |
Examples:
import earthaccess
earthaccess.login()
req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})
Source code in earthaccess/api.py
get_s3_credentials(daac=None, provider=None, results=None)
Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
daac
|
Optional[str]
|
a DAAC short_name like NSIDC or PODAAC, etc. |
None
|
provider
|
Optional[str]
|
if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc. |
None
|
results
|
Optional[List[DataGranule]]
|
List of results from search_data() |
None
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
a dictionary with S3 credentials for the DAAC or provider |
Source code in earthaccess/api.py
get_s3_filesystem(daac=None, provider=None, results=None)
Return an s3fs.S3FileSystem
for direct access when running within the AWS us-west-2 region.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
daac
|
Optional[str]
|
Any DAAC short name e.g. NSIDC, GES_DISC |
None
|
provider
|
Optional[str]
|
Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider. |
None
|
results
|
Optional[DataGranule]
|
A list of results from search_data().
|
None
|
Returns:
Type | Description |
---|---|
S3FileSystem
|
An authenticated s3fs session valid for 1 hour. |
Source code in earthaccess/api.py
get_s3fs_session(daac=None, provider=None, results=None)
Returns a fsspec s3fs file session for direct access when we are in us-west-2.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
daac
|
Optional[str]
|
Any DAAC short name e.g. NSIDC, GES_DISC |
None
|
provider
|
Optional[str]
|
Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider. |
None
|
results
|
Optional[DataGranule]
|
A list of results from search_data().
|
None
|
Returns:
Type | Description |
---|---|
S3FileSystem
|
An |
Source code in earthaccess/api.py
granule_query()
Returns a query builder instance for data granules.
Returns:
Type | Description |
---|---|
GranuleQuery
|
a query builder instance for data granules. |
Source code in earthaccess/api.py
login(strategy='all', persist=False, system=PROD)
Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
strategy
|
str
|
An authentication method.
|
'all'
|
persist
|
bool
|
will persist credentials in a .netrc file |
False
|
system
|
System
|
the Earthdata system to access, defaults to PROD |
PROD
|
Returns:
Type | Description |
---|---|
Auth
|
An instance of Auth. |
Source code in earthaccess/api.py
open(granules, provider=None, *, pqdm_kwargs=None)
Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
granules
|
Union[List[str], List[DataGranule]]
|
a list of granule instances or list of URLs, e.g. |
required |
provider
|
Optional[str]
|
e.g. POCLOUD, NSIDC_CPRD, etc. |
None
|
pqdm_kwargs
|
Optional[Mapping[str, Any]]
|
Additional keyword arguments to pass to pqdm, a parallel processing library.
See pqdm documentation for available options. Default is to use immediate exception behavior
and the number of jobs specified by the |
None
|
Returns:
Type | Description |
---|---|
List[AbstractFileSystem]
|
A list of "file pointers" to remote (i.e. s3 or https) files. |
Source code in earthaccess/api.py
search_data(count=-1, **kwargs)
Search dataset granules using NASA's CMR.
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
Parameters:
Name | Type | Description | Default |
---|---|---|---|
count
|
int
|
Number of records to get, -1 = all |
-1
|
kwargs
|
Dict
|
arguments to CMR:
|
{}
|
Returns:
Type | Description |
---|---|
List[DataGranule]
|
a list of DataGranules that can be used to access the granule files by using
|
Raises:
Type | Description |
---|---|
RuntimeError
|
The CMR query failed. |
Examples:
datasets = earthaccess.search_data(
doi="10.5067/SLREF-CDRV2",
cloud_hosted=True,
temporal=("2002-01-01", "2002-12-31")
)
Source code in earthaccess/api.py
search_datasets(count=-1, **kwargs)
Search datasets using NASA's CMR.
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
Parameters:
Name | Type | Description | Default |
---|---|---|---|
count
|
int
|
Number of records to get, -1 = all |
-1
|
kwargs
|
Dict
|
arguments to CMR:
|
{}
|
Returns:
Type | Description |
---|---|
List[DataCollection]
|
A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc. |
Raises:
Type | Description |
---|---|
RuntimeError
|
The CMR query failed. |
Examples:
Source code in earthaccess/api.py
search_services(count=-1, **kwargs)
Search the NASA CMR for Services matching criteria.
See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
count
|
int
|
maximum number of services to fetch (if less than 1, all services matching specified criteria are fetched [default]) |
-1
|
kwargs
|
Any
|
keyword arguments accepted by the CMR for searching services |
{}
|
Returns:
Type | Description |
---|---|
List[Any]
|
list of services (possibly empty) matching specified criteria, in UMM |
List[Any]
|
JSON format |
Examples: