Skip to content

Documentation for Store

Bases: object

Store class to access granules on-prem or in the cloud.

Store is the class to access data.

Parameters:

Name Type Description Default
auth Any

Auth instance to download and access data.

required

get(granules, local_path=None, provider=None, threads=8, *, pqdm_kwargs=None)

Retrieves data granules from a remote storage system.

  • If we run this in the cloud, we are moving data from S3 to a cloud compute instance (EC2, AWS Lambda).
  • If we run it outside the us-west-2 region and the data granules are part of a cloud-based collection, the method will not get any files.
  • If we request data granules from an on-prem collection, the data will be effectively downloaded to a local directory.

Parameters:

Name Type Description Default
granules Union[List[DataGranule], List[str]]

A list of granules(DataGranule) instances or a list of granule links (HTTP).

required
local_path Optional[Union[Path, str]]

Local directory to store the remote data granules. If not supplied, defaults to a subdirectory of the current working directory of the form data/YYYY-MM-DD-UUID, where YYYY-MM-DD is the year, month, and day of the current date, and UUID is the last 6 digits of a UUID4 value.

None
provider Optional[str]

a valid cloud provider, each DAAC has a provider code for their cloud distributions

None
threads int

Parallel number of threads to use to download the files; adjust as necessary, default = 8.

8
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the threads parameter.

None

Returns:

Type Description
List[str]

List of downloaded files

get_fsspec_session() cached

Returns a fsspec HTTPS session with bearer tokens that are used by CMR.

This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Returns:

Type Description
AbstractFileSystem

fsspec HTTPFileSystem (aiohttp client session)

get_requests_session()

Returns a requests HTTPS session with bearer tokens that are used by CMR.

This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Returns:

Type Description
SessionWithHeaderRedirection

requests Session

get_s3_filesystem(daac=None, concept_id=None, provider=None, endpoint=None)

Return an s3fs.S3FileSystem instance for a given cloud provider / DAAC.

Parameters:

Name Type Description Default
daac Optional[str]

any of the DAACs, e.g. NSIDC, PODAAC

None
provider Optional[str]

a data provider if we know them, e.g. PODAAC -> POCLOUD

None
endpoint Optional[str]

pass the URL for the credentials directly

None

Returns:

Type Description
S3FileSystem

a s3fs file instance

get_s3fs_session(daac=None, concept_id=None, provider=None, endpoint=None)

Returns a s3fs instance for a given cloud provider / DAAC.

Parameters:

Name Type Description Default
daac Optional[str]

any of the DAACs, e.g. NSIDC, PODAAC

None
provider Optional[str]

a data provider if we know them, e.g. PODAAC -> POCLOUD

None
endpoint Optional[str]

pass the URL for the credentials directly

None

Returns:

Type Description
S3FileSystem

An s3fs.S3FileSystem authenticated for reading in-region in us-west-2 for 1 hour.

open(granules, provider=None, *, pqdm_kwargs=None)

Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name Type Description Default
granules Union[List[str], List[DataGranule]]

a list of granule instances or list of URLs, e.g. s3://some-granule. If a list of URLs is passed, we need to specify the data provider.

required
provider Optional[str]

e.g. POCLOUD, NSIDC_CPRD, etc.

None
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior.

None

Returns:

Type Description
List[AbstractBufferedFile]

A list of "file pointers" to remote (i.e. s3 or https) files.

set_requests_session(url, method='get', bearer_token=True)

Sets up a requests session with bearer tokens that are used by CMR.

Mainly used to get the authentication cookies from different DAACs and URS. This HTTPS session can be used to download granules if we want to use a direct, lower level API.

Parameters:

Name Type Description Default
url str

used to test the credentials and populate the class auth cookies

required
method str

HTTP method to test, default: "GET"

'get'
bearer_token bool

if true, will be used for authenticated queries on CMR

True

Returns:

Type Description
None

fsspec HTTPFileSystem (aiohttp client session)