Documentation for `Store`

Bases: object

Store class to access granules on-prem or in the cloud.

Store is the class to access data.

Parameters:

Name	Type	Description	Default
`auth`	`Any`	Auth instance to download and access data.	required

Source code in earthaccess/store.py

def __init__(self, auth: Any, pre_authorize: bool = False) -> None:
    """Store is the class to access data.

    Parameters:
        auth: Auth instance to download and access data.
    """
    if auth.authenticated is True:
        self.auth = auth
        self._s3_credentials: Dict[
            Tuple, Tuple[datetime.datetime, Dict[str, str]]
        ] = {}
        oauth_profile = "https://urs.earthdata.nasa.gov/profile"
        # sets the initial URS cookie
        self._requests_cookies: Dict[str, Any] = {}
        self.set_requests_session(oauth_profile)
        if pre_authorize:
            # collect cookies from other DAACs
            for url in DAAC_TEST_URLS:
                self.set_requests_session(url)

    else:
        print("Warning: the current session is not authenticated with NASA")
        self.auth = None
    self.in_region = self._running_in_us_west_2()

`get(granules, local_path=None, provider=None, threads=8)`

Retrieves data granules from a remote storage system.

If we run this in the cloud, we are moving data from S3 to a cloud compute instance (EC2, AWS Lambda).
If we run it outside the us-west-2 region and the data granules are part of a cloud-based collection, the method will not get any files.
If we request data granules from an on-prem collection, the data will be effectively downloaded to a local directory.

Parameters:

Name	Type	Description	Default
`granules`	`Union[List[DataGranule], List[str]]`	A list of granules(DataGranule) instances or a list of granule links (HTTP).	required
`local_path`	`Union[Path, str, None]`	Local directory to store the remote data granules.	`None`
`threads`	`int`	Parallel number of threads to use to download the files; adjust as necessary, default = 8.	`8`

Returns:

Type	Description
`List[str]`	List of downloaded files

Source code in earthaccess/store.py

def get(
    self,
    granules: Union[List[DataGranule], List[str]],
    local_path: Union[Path, str, None] = None,
    provider: Optional[str] = None,
    threads: int = 8,
) -> List[str]:
    """Retrieves data granules from a remote storage system.

       * If we run this in the cloud,
         we are moving data from S3 to a cloud compute instance (EC2, AWS Lambda).
       * If we run it outside the us-west-2 region and the data granules are part of a cloud-based
         collection, the method will not get any files.
       * If we request data granules from an on-prem collection,
         the data will be effectively downloaded to a local directory.

    Parameters:
        granules: A list of granules(DataGranule) instances or a list of granule links (HTTP).
        local_path: Local directory to store the remote data granules.
        threads: Parallel number of threads to use to download the files;
            adjust as necessary, default = 8.

    Returns:
        List of downloaded files
    """
    if local_path is None:
        today = datetime.datetime.today().strftime("%Y-%m-%d")
        uuid = uuid4().hex[:6]
        local_path = Path.cwd() / "data" / f"{today}-{uuid}"
    elif isinstance(local_path, str):
        local_path = Path(local_path)

    if len(granules):
        files = self._get(granules, local_path, provider, threads)
        return files
    else:
        raise ValueError("List of URLs or DataGranule instances expected")

`get_fsspec_session()` `cached`

Returns a fsspec HTTPS session with bearer tokens that are used by CMR. This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Returns:

Type	Description
`AbstractFileSystem`	fsspec HTTPFileSystem (aiohttp client session)

Source code in earthaccess/store.py

@lru_cache
def get_fsspec_session(self) -> fsspec.AbstractFileSystem:
    """Returns a fsspec HTTPS session with bearer tokens that are used by CMR.
    This HTTPS session can be used to download granules if we want to use a direct,
    lower level API.

    Returns:
        fsspec HTTPFileSystem (aiohttp client session)
    """
    token = self.auth.token["access_token"]
    client_kwargs = {
        "headers": {"Authorization": f"Bearer {token}"},
        # This is important! If we trust the env and send a bearer token,
        # auth will fail!
        "trust_env": False,
    }
    session = fsspec.filesystem("https", client_kwargs=client_kwargs)
    return session

`get_requests_session(bearer_token=True)`

Returns a requests HTTPS session with bearer tokens that are used by CMR. This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Parameters:

Name	Type	Description	Default
`bearer_token`	`bool`	if true, will be used for authenticated queries on CMR	`True`

Returns:

Type	Description
`Session`	requests Session

Source code in earthaccess/store.py

def get_requests_session(self, bearer_token: bool = True) -> requests.Session:
    """Returns a requests HTTPS session with bearer tokens that are used by CMR.
    This HTTPS session can be used to download granules if we want to use a direct,
    lower level API.

    Parameters:
        bearer_token: if true, will be used for authenticated queries on CMR

    Returns:
        requests Session
    """
    return self.auth.get_session()

`get_s3fs_session(daac=None, concept_id=None, provider=None, endpoint=None)`

Returns a s3fs instance for a given cloud provider / DAAC.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	any of the DAACs, e.g. NSIDC, PODAAC	`None`
`provider`	`Optional[str]`	a data provider if we know them, e.g. PODAAC -> POCLOUD	`None`
`endpoint`	`Optional[str]`	pass the URL for the credentials directly	`None`

Returns:

Type	Description
`S3FileSystem`	a s3fs file instance

Source code in earthaccess/store.py

def get_s3fs_session(
    self,
    daac: Optional[str] = None,
    concept_id: Optional[str] = None,
    provider: Optional[str] = None,
    endpoint: Optional[str] = None,
) -> s3fs.S3FileSystem:
    """Returns a s3fs instance for a given cloud provider / DAAC.

    Parameters:
        daac: any of the DAACs, e.g. NSIDC, PODAAC
        provider: a data provider if we know them, e.g. PODAAC -> POCLOUD
        endpoint: pass the URL for the credentials directly

    Returns:
        a s3fs file instance
    """
    if self.auth is None:
        raise ValueError(
            "A valid Earthdata login instance is required to retrieve S3 credentials"
        )
    if not any([concept_id, daac, provider, endpoint]):
        raise ValueError(
            "At least one of the concept_id, daac, provider or endpoint"
            "parameters must be specified. "
        )

    if concept_id is not None:
        provider = self._derive_concept_provider(concept_id)

    # Get existing S3 credentials if we already have them
    location = (
        daac,
        provider,
        endpoint,
    )  # Identifier for where to get S3 credentials from
    need_new_creds = False
    try:
        dt_init, creds = self._s3_credentials[location]
    except KeyError:
        need_new_creds = True
    else:
        # If cached credentials are expired, invalidate the cache
        delta = datetime.datetime.now() - dt_init
        if round(delta.seconds / 60, 2) > 55:
            need_new_creds = True
            self._s3_credentials.pop(location)

    if need_new_creds:
        # Don't have existing valid S3 credentials, so get new ones
        now = datetime.datetime.now()
        if endpoint is not None:
            creds = self.auth.get_s3_credentials(endpoint=endpoint)
        elif daac is not None:
            creds = self.auth.get_s3_credentials(daac=daac)
        elif provider is not None:
            creds = self.auth.get_s3_credentials(provider=provider)
        # Include new credentials in the cache
        self._s3_credentials[location] = now, creds

    return s3fs.S3FileSystem(
        key=creds["accessKeyId"],
        secret=creds["secretAccessKey"],
        token=creds["sessionToken"],
    )

`open(granules, provider=None)`

Returns a list of fsspec file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name	Type	Description	Default
`granules`	`Union[List[str], List[DataGranule]]`	a list of granules(DataGranule) instances or list of URLs, e.g. s3://some-granule	required
`provider`	`Optional[str]`	an option	`None`

Returns:

Type	Description
`List[Any]`	A list of s3fs "file pointers" to s3 files.

Source code in earthaccess/store.py

def open(
    self,
    granules: Union[List[str], List[DataGranule]],
    provider: Optional[str] = None,
) -> List[Any]:
    """Returns a list of fsspec file-like objects that can be used to access files
    hosted on S3 or HTTPS by third party libraries like xarray.

    Parameters:
        granules: a list of granules(DataGranule) instances or list of URLs,
            e.g. s3://some-granule
        provider: an option

    Returns:
        A list of s3fs "file pointers" to s3 files.
    """
    if len(granules):
        return self._open(granules, provider)
    return []

`set_requests_session(url, method='get', bearer_token=False)`

Sets up a requests session with bearer tokens that are used by CMR. Mainly used to get the authentication cookies from different DAACs and URS. This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Parameters:

Name	Type	Description	Default
`url`	`str`	used to test the credentials and populate the class auth cookies	required
`method`	`str`	HTTP method to test, default: "GET"	`'get'`
`bearer_token`	`bool`	if true, will be used for authenticated queries on CMR	`False`

Returns:

Type	Description
`None`	fsspec HTTPFileSystem (aiohttp client session)

Source code in earthaccess/store.py

def set_requests_session(
    self, url: str, method: str = "get", bearer_token: bool = False
) -> None:
    """Sets up a `requests` session with bearer tokens that are used by CMR.
    Mainly used to get the authentication cookies from different DAACs and URS.
    This HTTPS session can be used to download granules if we want to use a direct,
    lower level API.

    Parameters:
        url: used to test the credentials and populate the class auth cookies
        method: HTTP method to test, default: "GET"
        bearer_token: if true, will be used for authenticated queries on CMR

    Returns:
        fsspec HTTPFileSystem (aiohttp client session)
    """
    if not hasattr(self, "_http_session"):
        self._http_session = self.auth.get_session(bearer_token)

    resp = self._http_session.request(method, url, allow_redirects=True)

    if resp.status_code in [400, 401, 403]:
        new_session = requests.Session()
        resp_req = new_session.request(
            method, url, allow_redirects=True, cookies=self._requests_cookies
        )
        if resp_req.status_code in [400, 401, 403]:
            resp.raise_for_status()
        else:
            self._requests_cookies.update(new_session.cookies.get_dict())
    elif resp.status_code >= 200 and resp.status_code <= 300:
        self._requests_cookies = self._http_session.cookies.get_dict()
    elif resp.status_code >= 500:
        resp.raise_for_status()

Documentation for Store

get(granules, local_path=None, provider=None, threads=8)

get_fsspec_session() cached

get_requests_session(bearer_token=True)

get_s3fs_session(daac=None, concept_id=None, provider=None, endpoint=None)

open(granules, provider=None)

set_requests_session(url, method='get', bearer_token=False)

Documentation for `Store`

`get(granules, local_path=None, provider=None, threads=8)`

`get_fsspec_session()` `cached`

`get_requests_session(bearer_token=True)`

`get_s3fs_session(daac=None, concept_id=None, provider=None, endpoint=None)`

`open(granules, provider=None)`

`set_requests_session(url, method='get', bearer_token=False)`