Accessing remote files with earthaccess¶
When we search for data using earthaccess we get back a list of results from NASA's Common Metadata Repository or CMR for short. These results contain all the information
we need to access the files represented by the metadata. earthaccess
offers 2 access methods that operate with these results, the first method is the well known, download()
where we copy the results from their location to our local disk, if we are running the code in AWS say on a Jupyterhub the files will be copied to the local VM disk.
The other method is open()
, earthaccess uses fsspec to open remote files as if they were local. open
has advantages and some disadvantages that we must know before using it.
The main advantage for open()
is that we don't have to download the file, we can stream it into memory however depending on how we do it we may run into network performance issues. Again, if we run the code next to the data this would be fast, if we do it locally in our laptopts it will be slow.
import earthaccess
auth = earthaccess.login()
results = earthaccess.search_data(
short_name="ATL06",
cloud_hosted=False,
temporal=("2019-01", "2019-02"),
polygon=[(-100, 40), (-110, 40), (-105, 38), (-100, 40)]
)
results[0]
Granules found: 50
nsidc_url = "https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2019.02.21/ATL06_20190221121851_08410203_005_01.h5"
lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc"
session = earthaccess.get_requests_https_session()
headers = {"Range": "bytes=0-100"}
r = session.get(lpcloud_url, headers=headers)
r
<Response [206]>
fs = earthaccess.get_fsspec_https_session()
with fs.open(lpcloud_url) as f:
data = f.read(100)
data
b'\x89HDF\r\n\x1a\n\x00\x00\x00\x00\x00\x08\x08\x00\x04\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xd7HUn\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00`\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00OHDR'
%%time
import xarray as xr
files = earthaccess.open(results[0:2])
ds = xr.open_dataset(files[0], group="/gt1r/land_ice_segments")
ds
Opening 2 granules, approx size: 0.3 GB
CPU times: user 1.72 s, sys: 410 ms, total: 2.13 s Wall time: 54.1 s
<xarray.Dataset> Dimensions: (delta_time: 153543) Coordinates: * delta_time (delta_time) datetime64[ns] 2019-01-03T06:49:07.97... latitude (delta_time) float64 ... longitude (delta_time) float64 ... Data variables: atl06_quality_summary (delta_time) int8 ... h_li (delta_time) float32 ... h_li_sigma (delta_time) float32 ... segment_id (delta_time) float64 ... sigma_geo_h (delta_time) float32 ... Attributes: Description: The land_ice_height group contains the primary set of deriv... data_rate: Data within this group are sparse. Data values are provide...