Tutorial: Loading data¶
This notebook will teach you how to load data in the EL-PASO framework. You will load data from a csv file on disk, as well as download a cdf file from an online repository and load data from it.
Loading from a csv file¶
As a first step, we will load data from example_orbit.csv, which looks like this:
import pandas as pd
pd.read_csv("example_orbit.csv").head()
| DATETIME | alt(km) | lat(deg) | lon(deg) | |
|---|---|---|---|---|
| 0 | 2019-07-30 17:04:00 | 11362.75841 | 11.391923 | 154.268925 |
| 1 | 2019-07-30 17:09:00 | 11516.45781 | 8.645016 | 156.193306 |
| 2 | 2019-07-30 17:14:00 | 11650.92313 | 5.916772 | 158.021787 |
| 3 | 2019-07-30 17:19:00 | 11765.75939 | 3.209510 | 159.778169 |
| 4 | 2019-07-30 17:24:00 | 11860.64721 | 0.524741 | 161.484253 |
The data holds four columns describing the orbit of a satellite through time, altitude, latitude, and logitude. While we can certainly read in the data using pandas as above, we want to use the EL-PASO SourceFile class, which extracts the necessary data and puts it into variables. For this, we need to tell EL-PASO how to read the file by creating a list of ExtractionInfos. Each ExtractionInfo is used to create one variable based on data from the file and the given unit. The result_key parameter is used later on to identify the variable after loading.
from datetime import datetime, timezone
from astropy import units as u
import el_paso as ep
ep.logger.setup_logging()
start_time = datetime(2019, 7, 30, 17, tzinfo=timezone.utc)
end_time = datetime(2019, 8, 3, 5, tzinfo=timezone.utc)
extraction_infos = [
ep.ExtractionInfo(
result_key="Epoch",
name_or_column="DATETIME",
unit=u.dimensionless_unscaled,
),
ep.ExtractionInfo(
result_key="alt",
name_or_column="alt(km)",
unit=u.km,
),
ep.ExtractionInfo(
result_key="lon",
name_or_column="lon(deg)",
unit=u.km,
),
ep.ExtractionInfo(
result_key="lat",
name_or_column="lat(deg)",
unit=u.km,
),
]
Now we are ready to extract the data and put it into the variables. The extract_variables function return a dictionary holding Variables based on the extraction_infos.
variables = ep.extract_variables_from_files(
start_time,
end_time,
"single_file",
data_path=".",
file_name_stem="example_orbit.csv",
extraction_infos=extraction_infos,
)
print(variables.keys())
print(variables["Epoch"].metadata)
# print a slice of data
print(variables["Epoch"].get_data()[:10])
[INFO ] 2026-06-05 16:17:10 - el_paso.extract_variables_from_files:112 - Extracting variables ...
dict_keys(['Epoch', 'alt', 'lon', 'lat']) VariableMetadata(unit=Unit(dimensionless), original_cadence_seconds=0, source_files=['example_orbit.csv'], description='', processing_notes='', standard_name='') ['2019-07-30 17:04:00' '2019-07-30 17:09:00' '2019-07-30 17:14:00' '2019-07-30 17:19:00' '2019-07-30 17:24:00' '2019-07-30 17:29:00' '2019-07-30 17:34:00' '2019-07-30 17:39:00' '2019-07-30 17:44:00' '2019-07-30 17:49:00']
Download and load a cdf file¶
This example will show you how to inspect a CDF file and how to extract variables from it using a SourceFile.
First thing you want to do is knowing what are the contents of the CDF file. For this, a script is provided by EL-PASO, which prints a table with all relevant information:
from datetime import datetime, timezone
start_time = datetime(2017, 7, 14, tzinfo=timezone.utc)
end_time = datetime(2017, 7, 14, 23, 59, 59, tzinfo=timezone.utc)
file_name_stem = "rbspa_rel04_ect-hope-pa-l3_YYYYMMDD_.{6}.cdf"
ep.download(
start_time,
end_time,
save_path=".",
download_url="https://spdf.gsfc.nasa.gov/pub/data/rbsp/rbspa/l3/ect/hope/pitchangle/rel04/YYYY/",
file_name_stem=file_name_stem,
file_cadence="daily",
method="request",
skip_existing=True,
)
[INFO ] 2026-06-05 16:17:11 - el_paso.download:257 - Downloaded successfully: rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf
[INFO ] 2026-06-05 16:17:11 - el_paso.download:8 - download finished in 0.623 seconds
import sys
sys.path.append("../")
from scripts.inspect_cdf_file import inspect_cdf_file
inspect_cdf_file("rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf")
(3255, 11, 72)
[WARNING ] 2026-06-05 16:17:11 - py.warnings:110 - /home/docs/checkouts/readthedocs.org/user_builds/el-paso/checkouts/latest/tutorials/../scripts/inspect_cdf_file.py:59: RuntimeWarning: divide by zero encountered in log10
plt.pcolormesh(range(flux.shape[0]), np.log10(energy), np.log10(flux[1:,5,1:]).T, cmap="jet")
Variable name Data Type Units Data Shape Fill value Description ---------------- ----------- ---------------------------------- -------------- ---------------------- ----------------------------------------------------------------------- B_Calc_Ele CDF_FLOAT nT (3940,) -9.999999848243207e+30 Model magnetic field strength (electron timebase) B_Calc_Ion CDF_FLOAT nT (3255,) -9.999999848243207e+30 Model magnetic field strength (ion timebase) B_Eq_Ele CDF_FLOAT nT (3940,) -9.999999848243207e+30 Model magnetic field strength at magnetic equator (electron timebase) B_Eq_Ion CDF_FLOAT nT (3255,) -9.999999848243207e+30 Model magnetic field strength at magnetic equator (ion timebase) ENERGY_Ele_DELTA CDF_FLOAT eV (3940, 72) -9.999999848243207e+30 ENERGY_Ele_DELTA ENERGY_Ion_DELTA CDF_FLOAT eV (3255, 72) -9.999999848243207e+30 ENERGY_Ion_DELTA Energy_LABL CDF_CHAR (72,) Energy_LABL Epoch_Ele CDF_EPOCH (3940,) -1e+31 Timebase for electron measurments Epoch_Ele_DELTA CDF_REAL4 ms (3940,) -9.999999848243207e+30 Half sample time Epoch_Ion CDF_EPOCH (3255,) -1e+31 Timebase for ion measurments Epoch_Ion_DELTA CDF_REAL4 ms (3255,) -9.999999848243207e+30 Half sample time FLAGS CDF_CHAR (8,) Flag type descriptions Flags_Ele CDF_UINT1 (3940, 8) 255 Flags on electron data Flags_Ion CDF_UINT1 (3255, 8) 255 Flags on ion data HOPE_ENERGY_Ele CDF_FLOAT eV (3940, 72) -9.999999848243207e+30 HOPE_ENERGY_Ele HOPE_ENERGY_Ion CDF_FLOAT eV (3255, 72) -9.999999848243207e+30 HOPE_ENERGY_Ion I_Ele CDF_FLOAT (3940,) -9.999999848243207e+30 Adiabatic invariant (bounce) (electron timebase) I_Ion CDF_FLOAT (3255,) -9.999999848243207e+30 Adiabatic invariant (bounce) (ion timebase) L_Ele CDF_FLOAT (3940,) -9.999999848243207e+30 Calculated McIlwains L parameter (electron timebase) L_Ion CDF_FLOAT (3255,) -9.999999848243207e+30 Calculated McIlwains L parameter (ion timebase) L_star_Ele CDF_FLOAT (3940,) -9.999999848243207e+30 Calculated Roederers L* parameter (electron timebase) L_star_Ion CDF_FLOAT (3255,) -9.999999848243207e+30 Calculated Roederers L* parameter (ion timebase) MLT_Ele CDF_FLOAT h (3940,) -9.999999848243207e+30 Calculated Magnetic Local Time (electron timebase) MLT_Ion CDF_FLOAT h (3255,) -9.999999848243207e+30 Calculated Magnetic Local Time (ion timebase) Position_Ele CDF_FLOAT km (3940, 3) -9.999999848243207e+30 Position of the satellite in geographic coordinates (electron timebase) Position_Ion CDF_FLOAT km (3255, 3) -9.999999848243207e+30 Position of the satellite in geographic coordinates (ion timebase) Position_LABL_1 CDF_CHAR (3,) Position_LABL_1 PITCH_ANGLE CDF_FLOAT (11,) -9.999999848243207e+30 Pitch angle of the particle [0, 180] Pitch_LABL CDF_CHAR (11,) Pitch_LABL Mode_Ion CDF_UINT1 (3255,) 255 Mode of ion data Mode_Ele CDF_UINT1 (3940,) 255 Mode of electron data Counts_E_Omni CDF_FLOAT (3940, 72) -9.999999848243207e+30 HOPE electron counts Counts_E CDF_FLOAT (3940, 11, 72) -9.999999848243207e+30 HOPE electron counts Counts_He_Omni CDF_FLOAT (3255, 72) -9.999999848243207e+30 HOPE helium counts Counts_He CDF_FLOAT (3255, 11, 72) -9.999999848243207e+30 HOPE helium counts Counts_O_Omni CDF_FLOAT (3255, 72) -9.999999848243207e+30 HOPE Oxygen counts Counts_O CDF_FLOAT (3255, 11, 72) -9.999999848243207e+30 HOPE Oxygen counts Counts_P_Omni CDF_FLOAT (3255, 72) -9.999999848243207e+30 HOPE proton counts Counts_P CDF_FLOAT (3255, 11, 72) -9.999999848243207e+30 HOPE proton counts Ele_SAMPLES_Omni CDF_UINT1 (3940, 72) 255 HOPE differential electron flux Ele_SAMPLES CDF_UINT1 (3940, 11, 72) 255 HOPE differential electron flux Ion_SAMPLES_Omni CDF_UINT1 (3255, 72) 255 Number of ion measurements per bin Ion_SAMPLES CDF_UINT1 (3255, 11, 72) 255 Number of ion measurements per bin FEDO CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3940, 72) -9.999999848243207e+30 HOPE omnidirectional differential electron flux FEDU CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3940, 11, 72) -9.999999848243207e+30 HOPE differential electron flux FPDO CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 72) -9.999999848243207e+30 HOPE omnidirectional differential proton flux FPDU CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 11, 72) -9.999999848243207e+30 HOPE differential proton flux FODO CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 72) -9.999999848243207e+30 HOPE omnidirectional differential oxygen flux FODU CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 11, 72) -9.999999848243207e+30 HOPE differential oxygen flux FHEDO CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 72) -9.999999848243207e+30 HOPE omnidirectional differential helium flux FHEDU CDF_FLOAT s!E-1!Ncm!E-2!Nster!E-1!NkeV!E-1!N (3255, 11, 72) -9.999999848243207e+30 HOPE differential helium flux
The next step is to think which variables you want to use for the processing and translating them to EL-PASO ExtractionInfos.
from astropy import units as u
import el_paso as ep
extraction_infos = [
ep.ExtractionInfo(
result_key="Epoch",
name_or_column="Epoch_Ele",
unit=ep.units.tt2000,
),
ep.ExtractionInfo(
result_key="Energy_FEDU",
name_or_column="HOPE_ENERGY_Ele",
unit=u.eV,
),
ep.ExtractionInfo(
result_key="FEDU",
name_or_column="FEDU",
unit=(u.cm**2 * u.s * u.sr * u.keV) ** (-1),
),
]
In this example, we want to download the data from the server as this is how it is done for most data sets. The file_name_stem contains a pattern (YYYYMMDD) to describe the date of the file. While loading the data, this pattern will be replaced by the correct date. A similar pattern is used for the download url. The file_name_stem also contains the regex expression '.{6}', which is used to find files with different versions. The most up-to-date version will always be downloaded.
from datetime import datetime, timezone
start_time = datetime(2017, 7, 14, tzinfo=timezone.utc)
end_time = datetime(2017, 7, 14, 23, 59, 59, tzinfo=timezone.utc)
file_name_stem = "rbspa_rel04_ect-hope-pa-l3_YYYYMMDD_.{6}.cdf"
ep.download(
start_time,
end_time,
save_path=".",
download_url="https://spdf.gsfc.nasa.gov/pub/data/rbsp/rbspa/l3/ect/hope/pitchangle/rel04/YYYY/",
file_name_stem=file_name_stem,
file_cadence="daily",
method="request",
skip_existing=True,
)
variables = ep.extract_variables_from_files(
start_time, end_time, "daily", data_path=".", file_name_stem=file_name_stem, extraction_infos=extraction_infos
)
variables
[INFO ] 2026-06-05 16:17:12 - el_paso.download:236 - File already exists, skipping download: rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf
[INFO ] 2026-06-05 16:17:12 - el_paso.download:6 - download finished in 0.003 seconds
[INFO ] 2026-06-05 16:17:12 - el_paso.extract_variables_from_files:112 - Extracting variables ...
{'Epoch': Variable holding (3940,) data points with metadata: VariableMetadata(unit=Unit("tt2000"), original_cadence_seconds=0, source_files=['rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf'], description='', processing_notes='', standard_name=''),
'Energy_FEDU': Variable holding (3940, 72) data points with metadata: VariableMetadata(unit=Unit("eV"), original_cadence_seconds=0, source_files=['rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf'], description='', processing_notes='', standard_name=''),
'FEDU': Variable holding (3940, 11, 72) data points with metadata: VariableMetadata(unit=Unit("1 / (keV s sr cm2)"), original_cadence_seconds=0, source_files=['rbspa_rel04_ect-hope-pa-l3_20170714_v7.4.0.cdf'], description='', processing_notes='', standard_name='')}