Skip to content

earthaccess API

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.

This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.


collection_query()

Returns a query builder instance for NASA collections (datasets).

Returns:

Type Description
Type[CollectionQuery]

a query builder instance for data collections.

Source code in earthaccess/api.py
def collection_query() -> Type[CollectionQuery]:
    """Returns a query builder instance for NASA collections (datasets).

    Returns:
        a query builder instance for data collections.
    """
    if earthaccess.__auth__.authenticated:
        query_builder = DataCollections(earthaccess.__auth__)
    else:
        query_builder = DataCollections()
    return query_builder

download(granules, local_path, provider=None, threads=8)

Retrieves data granules from a remote storage system.

  • If we run this in the cloud, we will be using S3 to move data to local_path.
  • If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.

Parameters:

Name Type Description Default
granules Union[DataGranule, List[DataGranule], str, List[str]]

a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)

required
local_path Optional[str]

local directory to store the remote data granules

required
provider Optional[str]

if we download a list of URLs, we need to specify the provider.

None
threads int

parallel number of threads to use to download the files, adjust as necessary, default = 8

8

Returns:

Type Description
List[str]

List of downloaded files

Source code in earthaccess/api.py
def download(
    granules: Union[DataGranule, List[DataGranule], str, List[str]],
    local_path: Optional[str],
    provider: Optional[str] = None,
    threads: int = 8,
) -> List[str]:
    """Retrieves data granules from a remote storage system.

       * If we run this in the cloud, we will be using S3 to move data to `local_path`.
       * If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted,
            we'll use HTTP links.

    Parameters:
        granules: a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)
        local_path: local directory to store the remote data granules
        provider: if we download a list of URLs, we need to specify the provider.
        threads: parallel number of threads to use to download the files, adjust as necessary, default = 8

    Returns:
        List of downloaded files
    """
    provider = _normalize_location(provider)
    if isinstance(granules, DataGranule):
        granules = [granules]
    elif isinstance(granules, str):
        granules = [granules]
    try:
        results = earthaccess.__store__.get(granules, local_path, provider, threads)
    except AttributeError as err:
        print(err)
        print("You must call earthaccess.login() before you can download data")
        return []
    return results

get_edl_token()

Returns the current token used for EDL.

Returns:

Type Description
str

EDL token

Source code in earthaccess/api.py
def get_edl_token() -> str:
    """Returns the current token used for EDL.

    Returns:
        EDL token
    """
    token = earthaccess.__auth__.token
    return token

get_fsspec_https_session()

Returns a fsspec session that can be used to access datafiles across many different DAACs.

Returns:

Type Description
AbstractFileSystem

An fsspec instance able to access data across DAACs.

Examples:

import earthaccess

earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
    f.read(10)
Source code in earthaccess/api.py
def get_fsspec_https_session() -> AbstractFileSystem:
    """Returns a fsspec session that can be used to access datafiles across many different DAACs.

    Returns:
        An fsspec instance able to access data across DAACs.

    Examples:
        ```python
        import earthaccess

        earthaccess.login()
        fs = earthaccess.get_fsspec_https_session()
        with fs.open(DAAC_GRANULE) as f:
            f.read(10)
        ```
    """
    session = earthaccess.__store__.get_fsspec_session()
    return session

get_requests_https_session()

Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.

Returns:

Type Description
Session

An authenticated requests Session instance.

Examples:

import earthaccess

earthaccess.login()

req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})
Source code in earthaccess/api.py
def get_requests_https_session() -> requests.Session:
    """Returns a requests Session instance with an authorized bearer token.
    This is useful for making requests to restricted URLs, such as data granules or services that
    require authentication with NASA EDL.

    Returns:
        An authenticated requests Session instance.

    Examples:
        ```python
        import earthaccess

        earthaccess.login()

        req_session = earthaccess.get_requests_https_session()
        data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

        ```
    """
    session = earthaccess.__store__.get_requests_session()
    return session

get_s3_credentials(daac=None, provider=None, results=None)

Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

Parameters:

Name Type Description Default
daac Optional[str]

a DAAC short_name like NSIDC or PODAAC, etc.

None
provider Optional[str]

if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.

None
results Optional[List[DataGranule]]

List of results from search_data()

None

Returns:

Type Description
Dict[str, Any]

a dictionary with S3 credentials for the DAAC or provider

Source code in earthaccess/api.py
def get_s3_credentials(
    daac: Optional[str] = None,
    provider: Optional[str] = None,
    results: Optional[List[earthaccess.results.DataGranule]] = None,
) -> Dict[str, Any]:
    """Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can
    use the daac name, the provider, or a list of results from earthaccess.search_data().
    If we use results, earthaccess will use the metadata on the response to get the credentials,
    which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

    Parameters:
        daac: a DAAC short_name like NSIDC or PODAAC, etc.
        provider: if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.
        results: List of results from search_data()

    Returns:
        a dictionary with S3 credentials for the DAAC or provider
    """
    daac = _normalize_location(daac)
    provider = _normalize_location(provider)
    if results is not None:
        endpoint = results[0].get_s3_credentials_endpoint()
        return earthaccess.__auth__.get_s3_credentials(endpoint=endpoint)
    return earthaccess.__auth__.get_s3_credentials(daac=daac, provider=provider)

get_s3fs_session(daac=None, provider=None, results=None)

Returns a fsspec s3fs file session for direct access when we are in us-west-2.

Parameters:

Name Type Description Default
daac Optional[str]

Any DAAC short name e.g. NSIDC, GES_DISC

None
provider Optional[str]

Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.

None
results Optional[DataGranule]

A list of results from search_data(). earthaccess will use the metadata from CMR to obtain the S3 Endpoint.

None

Returns:

Type Description
S3FileSystem

An authenticated s3fs session valid for 1 hour.

Source code in earthaccess/api.py
def get_s3fs_session(
    daac: Optional[str] = None,
    provider: Optional[str] = None,
    results: Optional[earthaccess.results.DataGranule] = None,
) -> s3fs.S3FileSystem:
    """Returns a fsspec s3fs file session for direct access when we are in us-west-2.

    Parameters:
        daac: Any DAAC short name e.g. NSIDC, GES_DISC
        provider: Each DAAC can have a cloud provider.
            If the DAAC is specified, there is no need to use provider.
        results: A list of results from search_data().
            `earthaccess` will use the metadata from CMR to obtain the S3 Endpoint.

    Returns:
        An authenticated s3fs session valid for 1 hour.
    """
    daac = _normalize_location(daac)
    provider = _normalize_location(provider)
    if results is not None:
        endpoint = results[0].get_s3_credentials_endpoint()
        if endpoint is not None:
            session = earthaccess.__store__.get_s3fs_session(endpoint=endpoint)
            return session
    session = earthaccess.__store__.get_s3fs_session(daac=daac, provider=provider)
    return session

granule_query()

Returns a query builder instance for data granules

Returns:

Type Description
Type[GranuleQuery]

a query builder instance for data granules.

Source code in earthaccess/api.py
def granule_query() -> Type[GranuleQuery]:
    """Returns a query builder instance for data granules

    Returns:
        a query builder instance for data granules.
    """
    if earthaccess.__auth__.authenticated:
        query_builder = DataGranules(earthaccess.__auth__)
    else:
        query_builder = DataGranules()
    return query_builder

login(strategy='all', persist=False)

Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Parameters:

Name Type Description Default
strategy str

An authentication method.

  • "all": (default) try all methods until one works
  • "interactive": enter username and password.
  • "netrc": retrieve username and password from ~/.netrc.
  • "environment": retrieve username and password from $EARTHDATA_USERNAME and $EARTHDATA_PASSWORD.
'all'
persist bool

will persist credentials in a .netrc file

False

Returns:

Type Description
Auth

An instance of Auth.

Source code in earthaccess/api.py
def login(strategy: str = "all", persist: bool = False) -> Auth:
    """Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

    Parameters:
        strategy:
            An authentication method.

            * **"all"**: (default) try all methods until one works
            * **"interactive"**: enter username and password.
            * **"netrc"**: retrieve username and password from ~/.netrc.
            * **"environment"**: retrieve username and password from `$EARTHDATA_USERNAME` and `$EARTHDATA_PASSWORD`.
        persist: will persist credentials in a .netrc file

    Returns:
        An instance of Auth.
    """
    if strategy == "all":
        for strategy in ["environment", "netrc", "interactive"]:
            try:
                earthaccess.__auth__.login(strategy=strategy, persist=persist)
            except Exception:
                pass

            if earthaccess.__auth__.authenticated:
                earthaccess.__store__ = Store(earthaccess.__auth__)
                break
    else:
        earthaccess.__auth__.login(strategy=strategy, persist=persist)
        if earthaccess.__auth__.authenticated:
            earthaccess.__store__ = Store(earthaccess.__auth__)

    return earthaccess.__auth__

open(granules, provider=None)

Returns a list of fsspec file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name Type Description Default
granules Union[List[str], List[DataGranule]]

a list of granule instances or list of URLs, e.g. s3://some-granule. If a list of URLs is passed, we need to specify the data provider.

required
provider Optional[str]

e.g. POCLOUD, NSIDC_CPRD, etc.

None

Returns:

Type Description
List[AbstractFileSystem]

a list of s3fs "file pointers" to s3 files.

Source code in earthaccess/api.py
def open(
    granules: Union[List[str], List[earthaccess.results.DataGranule]],
    provider: Optional[str] = None,
) -> List[AbstractFileSystem]:
    """Returns a list of fsspec file-like objects that can be used to access files
    hosted on S3 or HTTPS by third party libraries like xarray.

    Parameters:
        granules: a list of granule instances **or** list of URLs, e.g. `s3://some-granule`.
            If a list of URLs is passed, we need to specify the data provider.
        provider: e.g. POCLOUD, NSIDC_CPRD, etc.

    Returns:
        a list of s3fs "file pointers" to s3 files.
    """
    provider = _normalize_location(provider)
    results = earthaccess.__store__.open(granules=granules, provider=provider)
    return results

search_data(count=-1, **kwargs)

Search dataset granules using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • short_name: dataset short name, e.g. ATL08
  • version: dataset version
  • doi: DOI for a dataset
  • daac: e.g. NSIDC or PODAAC
  • provider: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
  • temporal: a tuple representing temporal bounds in the form ("yyyy-mm-dd", "yyyy-mm-dd")
  • bounding_box: a tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
{}

Returns:

Type Description
List[DataGranule]

a list of DataGranules that can be used to access the granule files by using download() or open().

Examples:

datasets = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)
Source code in earthaccess/api.py
def search_data(
    count: int = -1, **kwargs: Any
) -> List[earthaccess.results.DataGranule]:
    """Search dataset granules using NASA's CMR.

    [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

    Parameters:
        count: Number of records to get, -1 = all
        kwargs (Dict):
            arguments to CMR:

            * **short_name**: dataset short name, e.g. ATL08
            * **version**: dataset version
            * **doi**: DOI for a dataset
            * **daac**: e.g. NSIDC or PODAAC
            * **provider**: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
            * **temporal**: a tuple representing temporal bounds in the form
              `("yyyy-mm-dd", "yyyy-mm-dd")`
            * **bounding_box**: a tuple representing spatial bounds in the form
              `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`

    Returns:
        a list of DataGranules that can be used to access the granule files by using
            `download()` or `open()`.

    Examples:
        ```python
        datasets = earthaccess.search_data(
            doi="10.5067/SLREF-CDRV2",
            cloud_hosted=True,
            temporal=("2002-01-01", "2002-12-31")
        )
        ```
    """
    if earthaccess.__auth__.authenticated:
        query = DataGranules(earthaccess.__auth__).parameters(**kwargs)
    else:
        query = DataGranules().parameters(**kwargs)
    granules_found = query.hits()
    print(f"Granules found: {granules_found}")
    if count > 0:
        return query.get(count)
    return query.get_all()

search_datasets(count=-1, **kwargs)

Search datasets using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • keyword: case-insensitive and supports wildcards ? and *
  • short_name: e.g. ATL08
  • doi: DOI for a dataset
  • daac: e.g. NSIDC or PODAAC
  • provider: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
  • temporal: a tuple representing temporal bounds in the form ("yyyy-mm-dd", "yyyy-mm-dd")
  • bounding_box: a tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
{}

Returns:

Type Description
List[DataCollection]

A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc.

Examples:

datasets = earthaccess.search_datasets(
    keyword="sea surface anomaly",
    cloud_hosted=True
)
Source code in earthaccess/api.py
def search_datasets(
    count: int = -1, **kwargs: Any
) -> List[earthaccess.results.DataCollection]:
    """Search datasets using NASA's CMR.

    [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

    Parameters:
        count: Number of records to get, -1 = all
        kwargs (Dict):
            arguments to CMR:

            * **keyword**: case-insensitive and supports wildcards ? and *
            * **short_name**: e.g. ATL08
            * **doi**: DOI for a dataset
            * **daac**: e.g. NSIDC or PODAAC
            * **provider**: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
            * **temporal**: a tuple representing temporal bounds in the form
              `("yyyy-mm-dd", "yyyy-mm-dd")`
            * **bounding_box**: a tuple representing spatial bounds in the form
              `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`

    Returns:
        A list of DataCollection results that can be used to get information about a
            dataset, e.g. concept_id, doi, etc.

    Examples:
        ```python
        datasets = earthaccess.search_datasets(
            keyword="sea surface anomaly",
            cloud_hosted=True
        )
        ```
    """
    if not validate.valid_dataset_parameters(**kwargs):
        print(
            "Warning: a valid set of parameters is needed to search for datasets on CMR"
        )
        return []
    if earthaccess.__auth__.authenticated:
        query = DataCollections(auth=earthaccess.__auth__).parameters(**kwargs)
    else:
        query = DataCollections().parameters(**kwargs)
    datasets_found = query.hits()
    print(f"Datasets found: {datasets_found}")
    if count > 0:
        return query.get(count)
    return query.get_all()