earthaccess API

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.

This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.

`collection_query()`

Returns a query builder instance for NASA collections (datasets).

Returns:

Type	Description
`CollectionQuery`	a query builder instance for data collections.

Source code in earthaccess/api.py

def collection_query() -> CollectionQuery:
    """Returns a query builder instance for NASA collections (datasets).

    Returns:
        a query builder instance for data collections.
    """
    if earthaccess.__auth__.authenticated:
        query_builder = DataCollections(earthaccess.__auth__)
    else:
        query_builder = DataCollections()
    return query_builder

`download(granules, local_path, provider=None, threads=8)`

Retrieves data granules from a remote storage system.

If we run this in the cloud, we will be using S3 to move data to local_path.
If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.

Parameters:

Name	Type	Description	Default
`granules`	`Union[DataGranule, List[DataGranule], str, List[str]]`	a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)	required
`local_path`	`Optional[str]`	local directory to store the remote data granules	required
`provider`	`Optional[str]`	if we download a list of URLs, we need to specify the provider.	`None`
`threads`	`int`	parallel number of threads to use to download the files, adjust as necessary, default = 8	`8`

Returns:

Type	Description
`List[str]`	List of downloaded files

Raises:

Type	Description
`Exception`	A file download failed.

Source code in earthaccess/api.py

def download(
    granules: Union[DataGranule, List[DataGranule], str, List[str]],
    local_path: Optional[str],
    provider: Optional[str] = None,
    threads: int = 8,
) -> List[str]:
    """Retrieves data granules from a remote storage system.

       * If we run this in the cloud, we will be using S3 to move data to `local_path`.
       * If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted,
            we'll use HTTP links.

    Parameters:
        granules: a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)
        local_path: local directory to store the remote data granules
        provider: if we download a list of URLs, we need to specify the provider.
        threads: parallel number of threads to use to download the files, adjust as necessary, default = 8

    Returns:
        List of downloaded files

    Raises:
        Exception: A file download failed.
    """
    provider = _normalize_location(provider)
    if isinstance(granules, DataGranule):
        granules = [granules]
    elif isinstance(granules, str):
        granules = [granules]
    try:
        results = earthaccess.__store__.get(granules, local_path, provider, threads)
    except AttributeError as err:
        print(err)
        print("You must call earthaccess.login() before you can download data")
        return []
    return results

`get_edl_token()`

Returns the current token used for EDL.

Returns:

Type	Description
`str`	EDL token

Source code in earthaccess/api.py

def get_edl_token() -> str:
    """Returns the current token used for EDL.

    Returns:
        EDL token
    """
    token = earthaccess.__auth__.token
    return token

`get_fsspec_https_session()`

Returns a fsspec session that can be used to access datafiles across many different DAACs.

Returns:

Type	Description
`AbstractFileSystem`	An fsspec instance able to access data across DAACs.

Examples:

import earthaccess

earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
    f.read(10)

Source code in earthaccess/api.py

def get_fsspec_https_session() -> AbstractFileSystem:
    """Returns a fsspec session that can be used to access datafiles across many different DAACs.

    Returns:
        An fsspec instance able to access data across DAACs.

    Examples:
        ```python
        import earthaccess

        earthaccess.login()
        fs = earthaccess.get_fsspec_https_session()
        with fs.open(DAAC_GRANULE) as f:
            f.read(10)
        ```
    """
    session = earthaccess.__store__.get_fsspec_session()
    return session

`get_requests_https_session()`

Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.

Returns:

Type	Description
`Session`	An authenticated requests Session instance.

Examples:

import earthaccess

earthaccess.login()

req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

Source code in earthaccess/api.py

def get_requests_https_session() -> requests.Session:
    """Returns a requests Session instance with an authorized bearer token.
    This is useful for making requests to restricted URLs, such as data granules or services that
    require authentication with NASA EDL.

    Returns:
        An authenticated requests Session instance.

    Examples:
        ```python
        import earthaccess

        earthaccess.login()

        req_session = earthaccess.get_requests_https_session()
        data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

        ```
    """
    session = earthaccess.__store__.get_requests_session()
    return session

`get_s3_credentials(daac=None, provider=None, results=None)`

Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	a DAAC short_name like NSIDC or PODAAC, etc.	`None`
`provider`	`Optional[str]`	if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.	`None`
`results`	`Optional[List[DataGranule]]`	List of results from search_data()	`None`

Returns:

Type	Description
`Dict[str, Any]`	a dictionary with S3 credentials for the DAAC or provider

Source code in earthaccess/api.py

def get_s3_credentials(
    daac: Optional[str] = None,
    provider: Optional[str] = None,
    results: Optional[List[DataGranule]] = None,
) -> Dict[str, Any]:
    """Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can
    use the daac name, the provider, or a list of results from earthaccess.search_data().
    If we use results, earthaccess will use the metadata on the response to get the credentials,
    which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

    Parameters:
        daac: a DAAC short_name like NSIDC or PODAAC, etc.
        provider: if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.
        results: List of results from search_data()

    Returns:
        a dictionary with S3 credentials for the DAAC or provider
    """
    daac = _normalize_location(daac)
    provider = _normalize_location(provider)
    if results is not None:
        endpoint = results[0].get_s3_credentials_endpoint()
        return earthaccess.__auth__.get_s3_credentials(endpoint=endpoint)
    return earthaccess.__auth__.get_s3_credentials(daac=daac, provider=provider)

`get_s3fs_session(daac=None, provider=None, results=None)`

Returns a fsspec s3fs file session for direct access when we are in us-west-2.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	Any DAAC short name e.g. NSIDC, GES_DISC	`None`
`provider`	`Optional[str]`	Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.	`None`
`results`	`Optional[DataGranule]`	A list of results from search_data(). `earthaccess` will use the metadata from CMR to obtain the S3 Endpoint.	`None`

Returns:

Type	Description
`S3FileSystem`	An authenticated s3fs session valid for 1 hour.

Source code in earthaccess/api.py

def get_s3fs_session(
    daac: Optional[str] = None,
    provider: Optional[str] = None,
    results: Optional[DataGranule] = None,
) -> s3fs.S3FileSystem:
    """Returns a fsspec s3fs file session for direct access when we are in us-west-2.

    Parameters:
        daac: Any DAAC short name e.g. NSIDC, GES_DISC
        provider: Each DAAC can have a cloud provider.
            If the DAAC is specified, there is no need to use provider.
        results: A list of results from search_data().
            `earthaccess` will use the metadata from CMR to obtain the S3 Endpoint.

    Returns:
        An authenticated s3fs session valid for 1 hour.
    """
    daac = _normalize_location(daac)
    provider = _normalize_location(provider)
    if results is not None:
        endpoint = results[0].get_s3_credentials_endpoint()
        if endpoint is not None:
            session = earthaccess.__store__.get_s3fs_session(endpoint=endpoint)
            return session
    session = earthaccess.__store__.get_s3fs_session(daac=daac, provider=provider)
    return session

`granule_query()`

Returns a query builder instance for data granules

Returns:

Type	Description
`GranuleQuery`	a query builder instance for data granules.

Source code in earthaccess/api.py

def granule_query() -> GranuleQuery:
    """Returns a query builder instance for data granules

    Returns:
        a query builder instance for data granules.
    """
    if earthaccess.__auth__.authenticated:
        query_builder = DataGranules(earthaccess.__auth__)
    else:
        query_builder = DataGranules()
    return query_builder

`login(strategy='all', persist=False)`

Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Parameters:

Name	Type	Description	Default
`strategy`	`str`	An authentication method. "all": (default) try all methods until one works "interactive": enter username and password. "netrc": retrieve username and password from ~/.netrc. "environment": retrieve username and password from `$EARTHDATA_USERNAME` and `$EARTHDATA_PASSWORD`.	`'all'`
`persist`	`bool`	will persist credentials in a .netrc file	`False`

Returns:

Type	Description
`Auth`	An instance of Auth.

Source code in earthaccess/api.py

def login(strategy: str = "all", persist: bool = False) -> Auth:
    """Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

    Parameters:
        strategy:
            An authentication method.

            * **"all"**: (default) try all methods until one works
            * **"interactive"**: enter username and password.
            * **"netrc"**: retrieve username and password from ~/.netrc.
            * **"environment"**: retrieve username and password from `$EARTHDATA_USERNAME` and `$EARTHDATA_PASSWORD`.
        persist: will persist credentials in a .netrc file

    Returns:
        An instance of Auth.
    """
    if strategy == "all":
        for strategy in ["environment", "netrc", "interactive"]:
            try:
                earthaccess.__auth__.login(strategy=strategy, persist=persist)
            except Exception:
                pass

            if earthaccess.__auth__.authenticated:
                earthaccess.__store__ = Store(earthaccess.__auth__)
                break
    else:
        earthaccess.__auth__.login(strategy=strategy, persist=persist)
        if earthaccess.__auth__.authenticated:
            earthaccess.__store__ = Store(earthaccess.__auth__)

    return earthaccess.__auth__

`open(granules, provider=None)`

Returns a list of fsspec file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name	Type	Description	Default
`granules`	`Union[List[str], List[DataGranule]]`	a list of granule instances or list of URLs, e.g. `s3://some-granule`. If a list of URLs is passed, we need to specify the data provider.	required
`provider`	`Optional[str]`	e.g. POCLOUD, NSIDC_CPRD, etc.	`None`

Returns:

Type	Description
`List[AbstractFileSystem]`	a list of s3fs "file pointers" to s3 files.

Source code in earthaccess/api.py

def open(
    granules: Union[List[str], List[DataGranule]],
    provider: Optional[str] = None,
) -> List[AbstractFileSystem]:
    """Returns a list of fsspec file-like objects that can be used to access files
    hosted on S3 or HTTPS by third party libraries like xarray.

    Parameters:
        granules: a list of granule instances **or** list of URLs, e.g. `s3://some-granule`.
            If a list of URLs is passed, we need to specify the data provider.
        provider: e.g. POCLOUD, NSIDC_CPRD, etc.

    Returns:
        a list of s3fs "file pointers" to s3 files.
    """
    provider = _normalize_location(provider)
    results = earthaccess.__store__.open(granules=granules, provider=provider)
    return results

`search_data(count=-1, **kwargs)`

Search dataset granules using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name	Type	Description	Default
`count`	`int`	Number of records to get, -1 = all	`-1`
`kwargs`	`Dict`	arguments to CMR: short_name: dataset short name, e.g. ATL08 version: dataset version doi: DOI for a dataset daac: e.g. NSIDC or PODAAC provider: particular to each DAAC, e.g. POCLOUD, LPDAAC etc. temporal: a tuple representing temporal bounds in the form `("yyyy-mm-dd", "yyyy-mm-dd")` bounding_box: a tuple representing spatial bounds in the form `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`	`{}`

Returns:

Type	Description
`List[DataGranule]`	a list of DataGranules that can be used to access the granule files by using `download()` or `open()`.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

Examples:

datasets = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)

Source code in earthaccess/api.py

def search_data(count: int = -1, **kwargs: Any) -> List[DataGranule]:
    """Search dataset granules using NASA's CMR.

    [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

    Parameters:
        count: Number of records to get, -1 = all
        kwargs (Dict):
            arguments to CMR:

            * **short_name**: dataset short name, e.g. ATL08
            * **version**: dataset version
            * **doi**: DOI for a dataset
            * **daac**: e.g. NSIDC or PODAAC
            * **provider**: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
            * **temporal**: a tuple representing temporal bounds in the form
              `("yyyy-mm-dd", "yyyy-mm-dd")`
            * **bounding_box**: a tuple representing spatial bounds in the form
              `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`

    Returns:
        a list of DataGranules that can be used to access the granule files by using
            `download()` or `open()`.

    Raises:
        RuntimeError: The CMR query failed.

    Examples:
        ```python
        datasets = earthaccess.search_data(
            doi="10.5067/SLREF-CDRV2",
            cloud_hosted=True,
            temporal=("2002-01-01", "2002-12-31")
        )
        ```
    """
    if earthaccess.__auth__.authenticated:
        query = DataGranules(earthaccess.__auth__).parameters(**kwargs)
    else:
        query = DataGranules().parameters(**kwargs)
    granules_found = query.hits()
    print(f"Granules found: {granules_found}")
    if count > 0:
        return query.get(count)
    return query.get_all()

`search_datasets(count=-1, **kwargs)`

Search datasets using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name	Type	Description	Default
`count`	`int`	Number of records to get, -1 = all	`-1`
`kwargs`	`Dict`	arguments to CMR: keyword: case-insensitive and supports wildcards ? and * short_name: e.g. ATL08 doi: DOI for a dataset daac: e.g. NSIDC or PODAAC provider: particular to each DAAC, e.g. POCLOUD, LPDAAC etc. temporal: a tuple representing temporal bounds in the form `("yyyy-mm-dd", "yyyy-mm-dd")` bounding_box: a tuple representing spatial bounds in the form `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`	`{}`

Returns:

Type	Description
`List[DataCollection]`	A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

Examples:

datasets = earthaccess.search_datasets(
    keyword="sea surface anomaly",
    cloud_hosted=True
)

Source code in earthaccess/api.py

def search_datasets(count: int = -1, **kwargs: Any) -> List[DataCollection]:
    """Search datasets using NASA's CMR.

    [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

    Parameters:
        count: Number of records to get, -1 = all
        kwargs (Dict):
            arguments to CMR:

            * **keyword**: case-insensitive and supports wildcards ? and *
            * **short_name**: e.g. ATL08
            * **doi**: DOI for a dataset
            * **daac**: e.g. NSIDC or PODAAC
            * **provider**: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
            * **temporal**: a tuple representing temporal bounds in the form
              `("yyyy-mm-dd", "yyyy-mm-dd")`
            * **bounding_box**: a tuple representing spatial bounds in the form
              `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)`

    Returns:
        A list of DataCollection results that can be used to get information about a
            dataset, e.g. concept_id, doi, etc.

    Raises:
        RuntimeError: The CMR query failed.

    Examples:
        ```python
        datasets = earthaccess.search_datasets(
            keyword="sea surface anomaly",
            cloud_hosted=True
        )
        ```
    """
    if not validate.valid_dataset_parameters(**kwargs):
        print(
            "Warning: a valid set of parameters is needed to search for datasets on CMR"
        )
        return []
    if earthaccess.__auth__.authenticated:
        query = DataCollections(auth=earthaccess.__auth__).parameters(**kwargs)
    else:
        query = DataCollections().parameters(**kwargs)
    datasets_found = query.hits()
    print(f"Datasets found: {datasets_found}")
    if count > 0:
        return query.get(count)
    return query.get_all()