Skip to content

ProbaV

el_paso.recipes.probav.process_ept_electron_fluxes.process_ept_electron_fluxes

process_ept_electron_fluxes

Process PROBA-V EPT electron flux data into pitch-angle-resolved fluxes with magnetic field coordinates.

This downloads the PROBA-V EPT L1d data for the given time range from the ESA SWE service, extracts the per-channel differential electron fluxes, quality flag (chi2), local pitch angle, timestamps, and spacecraft position, and combines the six energy channels into a single FEDU flux variable. Values with a chi2 quality flag above a fixed threshold are masked to NaN. The local pitch angle is folded around 90 degrees, the timestamps are converted to POSIX time, and center energies are computed from a fixed set of energy limits. The data is then time-binned to bin_cadence, the spacecraft position is transformed from spherical to GEO coordinates, and magnetic field model quantities (B_Calc, B_Eq, MLT_Eq, R_Eq, Alpha_Eq, L_m) are computed using the T89 model. The resulting variables are saved to disk (appending to existing files) using a GFZ and/or NetCDF daily LEO/RB saving strategy depending on save_strategy.

Parameters:

Name Type Description Default
raw_data_path str | Path

Base directory used for downloading and locating the raw EPT data files.

required
processed_data_path str | Path

Base directory in which the processed output files are saved.

required
start_time datetime

Start of the time range to process.

required
end_time datetime

End of the time range to process.

required
num_cores int

Number of CPU cores used for the magnetic field computations. Defaults to 32.

32
bin_cadence timedelta

Time binning cadence applied to the extracted variables. Defaults to timedelta(seconds=10).

timedelta(seconds=10)
skip_existing bool

If True, skip downloading files that already exist locally. Defaults to True.

True
client_id str | None

Client ID for the ESA SWE authentication. If None, it is read from the CLIENT_ID environment variable. Defaults to None.

None
client_secret str | None

Client secret for the ESA SWE authentication. If None, it is read from the CLIENT_SECRET environment variable. Defaults to None.

None
save_strategy Literal['gfz', 'netcdf', 'both']

Which saving strategy (or strategies) to use for the processed output. Defaults to "netcdf".

'netcdf'

Raises:

Type Description
ValueError

If client_id or client_secret is not provided and not available via the CLIENT_ID/CLIENT_SECRET environment variables.

Source code in el_paso/recipes/probav/process_ept_electron_fluxes.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
@timed_function("process_ept_electron_fluxes")
def process_ept_electron_fluxes(
    raw_data_path: str | Path,
    processed_data_path: str | Path,
    start_time: datetime,
    end_time: datetime,
    num_cores: int = 32,
    bin_cadence: timedelta = timedelta(seconds=10),
    skip_existing: bool = True,  # noqa: FBT001, FBT002,
    client_id: str | None = None,
    client_secret: str | None = None,
    save_strategy: typing.Literal["gfz", "netcdf", "both"] = "netcdf",
) -> None:
    """Process PROBA-V EPT electron flux data into pitch-angle-resolved fluxes with magnetic field coordinates.

    This downloads the PROBA-V EPT L1d data for the given time range from the ESA SWE service,
    extracts the per-channel differential electron fluxes, quality flag (chi2), local pitch angle,
    timestamps, and spacecraft position, and combines the six energy channels into a single FEDU
    flux variable. Values with a chi2 quality flag above a fixed threshold are masked to NaN. The
    local pitch angle is folded around 90 degrees, the timestamps are converted to POSIX time, and
    center energies are computed from a fixed set of energy limits. The data is then time-binned to
    `bin_cadence`, the spacecraft position is transformed from spherical to GEO coordinates, and
    magnetic field model quantities (B_Calc, B_Eq, MLT_Eq, R_Eq, Alpha_Eq, L_m) are computed using
    the T89 model. The resulting variables are saved to disk (appending to existing files) using a
    GFZ and/or NetCDF daily LEO/RB saving strategy depending on `save_strategy`.

    Args:
        raw_data_path (str | Path): Base directory used for downloading and locating the raw EPT data files.
        processed_data_path (str | Path): Base directory in which the processed output files are saved.
        start_time (datetime): Start of the time range to process.
        end_time (datetime): End of the time range to process.
        num_cores (int, optional): Number of CPU cores used for the magnetic field computations. Defaults to 32.
        bin_cadence (timedelta, optional): Time binning cadence applied to the extracted variables.
            Defaults to timedelta(seconds=10).
        skip_existing (bool, optional): If True, skip downloading files that already exist locally.
            Defaults to True.
        client_id (str | None, optional): Client ID for the ESA SWE authentication. If None, it is read
            from the `CLIENT_ID` environment variable. Defaults to None.
        client_secret (str | None, optional): Client secret for the ESA SWE authentication. If None, it
            is read from the `CLIENT_SECRET` environment variable. Defaults to None.
        save_strategy (typing.Literal["gfz", "netcdf", "both"], optional): Which saving strategy (or
            strategies) to use for the processed output. Defaults to "netcdf".

    Raises:
        ValueError: If `client_id` or `client_secret` is not provided and not available via the
            `CLIENT_ID`/`CLIENT_SECRET` environment variables.
    """
    if client_id is None:
        client_id = os.environ.get("CLIENT_ID")
    if client_secret is None:
        client_secret = os.environ.get("CLIENT_SECRET")

    if client_id is None:
        msg = "Client ID not found! Either load it from environment variables or pass it as an argument."
        raise ValueError(msg)

    if client_secret is None:
        msg = "Client secret not found! Either load it from environment variables or pass it as an argument."
        raise ValueError(msg)

    data_path_stem = f"{raw_data_path}/PROBAV/YYYY/MM/"

    url = "https://sso-csr-ucl-ac-be.content.swe.s2p.esa.int/r109_111/ascii/YYYYMM/PROBAV_EPT_YYYYMMDD_L1d.dat.gz"
    rename_file_name_stem = "PROBAV_ept_YYYYMMDD_L1d.csv"

    ep.download(
        start_time,
        end_time,
        save_path=data_path_stem,
        method="esa_swe",
        file_cadence="daily",
        download_url=url,
        file_name_stem="",
        rename_file_name_stem=rename_file_name_stem,
        authentication_info=(client_id, client_secret),
        skip_existing=skip_existing,
    )

    flux_unit = typing.cast("u.Unit", (u.cm**2 * u.s * u.sr * u.MeV) ** (-1))

    extraction_infos = [
        ep.ExtractionInfo(result_key="year", name_or_column="Y", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="month", name_or_column="M", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="day", name_or_column="D", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="hour", name_or_column="H", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="minute", name_or_column="MI", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="second", name_or_column="S", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(
            result_key="millisecond", name_or_column="mS", unit=u.dimensionless_unscaled, np_dtype=np.int32
        ),
        ep.ExtractionInfo(result_key="flag", name_or_column="FLAG", unit=u.dimensionless_unscaled),
        ep.ExtractionInfo(result_key="chi2", name_or_column="e-Chi2", unit=u.dimensionless_unscaled),
        ep.ExtractionInfo(result_key="ch0", name_or_column="e-fl-00", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch1", name_or_column="e-fl-01", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch2", name_or_column="e-fl-02", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch3", name_or_column="e-fl-03", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch4", name_or_column="e-fl-04", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch5", name_or_column="e-fl-05", unit=flux_unit),
        ep.ExtractionInfo(result_key="PA_local", name_or_column="Pitch", unit=u.deg),
        ep.ExtractionInfo(result_key="rad", name_or_column="Rad", unit=u.km),
        ep.ExtractionInfo(result_key="lon", name_or_column="Long", unit=u.deg),
        ep.ExtractionInfo(result_key="lat", name_or_column="Lat", unit=u.deg),
    ]

    variables = ep.extract_variables_from_files(
        start_time,
        end_time,
        file_cadence="daily",
        data_path=data_path_stem,
        file_name_stem=rename_file_name_stem,
        extraction_infos=extraction_infos,
        pd_read_csv_kwargs={"sep": r"\s+", "header": 24},
    )

    # create flux variable
    flux_data = np.stack(
        [
            variables["ch0"].get_data(),
            variables["ch1"].get_data(),
            variables["ch2"].get_data(),
            variables["ch3"].get_data(),
            variables["ch4"].get_data(),
            variables["ch5"].get_data(),
        ]
    ).T
    flux_data = flux_data[:, :, np.newaxis]
    variables["FEDU"] = ep.Variable(data=flux_data, original_unit=flux_unit)
    del variables["ch0"], variables["ch1"], variables["ch2"], variables["ch3"], variables["ch4"], variables["ch5"]

    variables["FEDU"].apply_thresholds_on_data(lower_threshold=1e-21)

    # apply chi-2 quality check
    variables["FEDU"].apply_mask(variables["chi2"].get_data().astype(np.float64) < CHI2_BAD_QUALITY_THRESHOLD)
    variables["FEDU"].metadata.add_processing_note(
        f"Values with CHI2 >= {CHI2_BAD_QUALITY_THRESHOLD:0.1f} are set to NaN."
    )

    # expand PA variable
    variables["PA_local"].set_data(variables["PA_local"].get_data()[:, np.newaxis], unit="same")
    pa_arr = variables["PA_local"].get_data(u.deg)
    pa_arr = np.where(pa_arr > 90, 180 - pa_arr, pa_arr)
    variables["PA_local"].set_data(pa_arr, unit=u.deg)

    # create Epoch variable
    epoch_datetime = [
        datetime(y, m, d, h, mi, s, int(ms), tzinfo=timezone.utc)
        for (y, m, d, h, mi, s, ms) in zip(
            variables["year"].get_data(),
            variables["month"].get_data(),
            variables["day"].get_data(),
            variables["hour"].get_data(),
            variables["minute"].get_data(),
            variables["second"].get_data(),
            variables["millisecond"].get_data().astype(np.int32) * 1e3,
            strict=True,
        )
    ]
    epoch_data = [t.timestamp() for t in epoch_datetime]

    variables["Epoch"] = ep.Variable(data=np.asarray(epoch_data), original_unit=ep.units.posixtime)
    del variables["year"], variables["month"], variables["day"], variables["hour"], variables["minute"]
    del variables["second"], variables["millisecond"]

    # calculate mean of energy limits to get center energies
    energy_data = np.convolve(EPT_ENERGY_LIMITS, np.ones(2), "valid") / 2
    variables["Energy_FEDU"] = ep.Variable(data=energy_data, original_unit=u.MeV)
    variables["Energy_FEDU"].metadata.add_processing_note(
        f"Created by calculating center energies from {', '.join(map(str, EPT_ENERGY_LIMITS))}."
    )

    time_bin_methods = {
        "Energy_FEDU": ep.TimeBinMethod.Repeat,
        "rad": ep.TimeBinMethod.NanMean,
        "lat": ep.TimeBinMethod.NanMean,
        "lon": ep.TimeBinMethod.NanMean,
        "PA_local": ep.TimeBinMethod.NanMean,
        "FEDU": ep.TimeBinMethod.NanMedian,
    }

    binned_time_var = ep.processing.bin_by_time(
        variables["Epoch"], variables, time_bin_methods, bin_cadence, start_time=start_time, end_time=end_time
    )

    xsph_arr = np.stack(
        (
            variables["rad"].get_data(ep.units.RE),
            variables["lat"].get_data(u.degree),
            variables["lon"].get_data(u.degree),
        )
    ).T.astype(np.float64)
    model_coord = ep.processing.magnetic_field_utils.Coords()

    epoch_datetime = [datetime.fromtimestamp(t, tz=timezone.utc) for t in binned_time_var.get_data()]
    xgeo_arr = model_coord.transform(epoch_datetime, xsph_arr, ep.IRBEM_SYSAXIS_SPH, ep.IRBEM_SYSAXIS_GEO)
    variables["xGEO"] = ep.Variable(data=xgeo_arr, original_unit=ep.units.RE)

    del variables["rad"], variables["lon"], variables["lat"]

    variables_to_compute: ep.processing.VariableRequest = [
        ("B_Calc", "T89"),
        ("B_Eq", "T89"),
        ("MLT_Eq", "T89"),
        ("R_Eq", "T89"),
        ("Alpha_Eq", "T89"),
        ("L_m", "T89"),
    ]

    magnetic_field_variables = ep.processing.compute_magnetic_field_variables(
        time_var=binned_time_var,
        xgeo_var=variables["xGEO"],
        energy_var=variables["Energy_FEDU"],
        pa_local_var=variables["PA_local"],
        particle_species="electron",
        variables_to_compute=variables_to_compute,
        irbem_options=ep.processing.magnetic_field_utils.IrbemOptions(
            lstar_quantity=ep.processing.magnetic_field_utils.LstarQuantity.NONE,
        ),
        num_cores=num_cores,
    )

    variables |= magnetic_field_variables

    variables_to_save: dict[ep.typing.InternalName, ep.Variable] = {
        "Epoch": binned_time_var,
        "FEDU": variables["FEDU"],
        "Energy_FEDU": variables["Energy_FEDU"],
        "Alpha": variables["PA_local"],
        "Alpha_Eq": magnetic_field_variables["Alpha_Eq_T89"],
        "R_Eq": magnetic_field_variables["R_Eq_T89"],
        "MLT": magnetic_field_variables["MLT_Eq_T89"],
        "L_m": magnetic_field_variables["L_m_T89"],
        "B_Calc": magnetic_field_variables["B_Calc_T89"],
        "B_Eq": magnetic_field_variables["B_Eq_T89"],
        "Position": variables["xGEO"],
    }

    if save_strategy in ("gfz", "both"):
        strategy = ep.saving_strategies.GFZStrategy(
            processed_data_path,
            mission="PROBAV",
            satellite="probav",
            instrument="ept",
            mag_field="T89",
            data_standard=ep.data_standards.GFZStandard(),
        )

    if save_strategy in ("netcdf", "both"):
        strategy = ep.saving_strategies.DailyLEORBStrategy(
            base_data_path=Path(processed_data_path),
            mission="PROBAV",
            satellite="probav",
            instrument="ept",
            mag_field="T89",
            file_format=".nc",
            data_standard=ep.data_standards.GFZStandard(),
        )
    ep.save(variables_to_save, strategy, start_time, end_time, time_var=binned_time_var, append=True)

el_paso.recipes.probav.process_ept_proton_fluxes.process_ept_proton_fluxes

process_ept_proton_fluxes

Process PROBA-V EPT proton flux data into pitch-angle-resolved fluxes with magnetic field coordinates.

This downloads the PROBA-V EPT L1d data for the given time range from the ESA SWE service, extracts the per-channel differential proton fluxes, quality flag (chi2), local pitch angle, timestamps, and spacecraft position, and combines the ten energy channels into a single FPDU flux variable. Values with a chi2 quality flag above a fixed threshold are masked to NaN. The local pitch angle is folded around 90 degrees, the timestamps are converted to POSIX time, and center energies are computed from a fixed set of energy limits. The data is then time-binned to bin_cadence, the spacecraft position is transformed from spherical to GEO coordinates, and magnetic field model quantities (B_Calc, B_Eq, MLT_Eq, R_Eq, Alpha_Eq, L_m) are computed using the T89 model. The resulting variables are saved to disk (appending to existing files) using a GFZ and/or NetCDF daily LEO/RB saving strategy depending on save_strategy.

Parameters:

Name Type Description Default
raw_data_path str | Path

Base directory used for downloading and locating the raw EPT data files.

required
processed_data_path str | Path

Base directory in which the processed output files are saved.

required
start_time datetime

Start of the time range to process.

required
end_time datetime

End of the time range to process.

required
num_cores int

Number of CPU cores used for the magnetic field computations. Defaults to 32.

32
bin_cadence timedelta

Time binning cadence applied to the extracted variables. Defaults to timedelta(seconds=10).

timedelta(seconds=10)
skip_existing bool

If True, skip downloading files that already exist locally. Defaults to True.

True
client_id str | None

Client ID for the ESA SWE authentication. If None, it is read from the CLIENT_ID environment variable. Defaults to None.

None
client_secret str | None

Client secret for the ESA SWE authentication. If None, it is read from the CLIENT_SECRET environment variable. Defaults to None.

None
save_strategy Literal['gfz', 'netcdf', 'both']

Which saving strategy (or strategies) to use for the processed output. Defaults to "netcdf".

'netcdf'

Raises:

Type Description
ValueError

If client_id or client_secret is not provided and not available via the CLIENT_ID/CLIENT_SECRET environment variables.

Source code in el_paso/recipes/probav/process_ept_proton_fluxes.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
@timed_function("process_ept_proton_fluxes")
def process_ept_proton_fluxes(
    raw_data_path: str | Path,
    processed_data_path: str | Path,
    start_time: datetime,
    end_time: datetime,
    num_cores: int = 32,
    bin_cadence: timedelta = timedelta(seconds=10),
    skip_existing: bool = True,  # noqa: FBT001, FBT002,
    client_id: str | None = None,
    client_secret: str | None = None,
    save_strategy: typing.Literal["gfz", "netcdf", "both"] = "netcdf",
) -> None:
    """Process PROBA-V EPT proton flux data into pitch-angle-resolved fluxes with magnetic field coordinates.

    This downloads the PROBA-V EPT L1d data for the given time range from the ESA SWE service,
    extracts the per-channel differential proton fluxes, quality flag (chi2), local pitch angle,
    timestamps, and spacecraft position, and combines the ten energy channels into a single FPDU
    flux variable. Values with a chi2 quality flag above a fixed threshold are masked to NaN. The
    local pitch angle is folded around 90 degrees, the timestamps are converted to POSIX time, and
    center energies are computed from a fixed set of energy limits. The data is then time-binned to
    `bin_cadence`, the spacecraft position is transformed from spherical to GEO coordinates, and
    magnetic field model quantities (B_Calc, B_Eq, MLT_Eq, R_Eq, Alpha_Eq, L_m) are computed using
    the T89 model. The resulting variables are saved to disk (appending to existing files) using a
    GFZ and/or NetCDF daily LEO/RB saving strategy depending on `save_strategy`.

    Args:
        raw_data_path (str | Path): Base directory used for downloading and locating the raw EPT data files.
        processed_data_path (str | Path): Base directory in which the processed output files are saved.
        start_time (datetime): Start of the time range to process.
        end_time (datetime): End of the time range to process.
        num_cores (int, optional): Number of CPU cores used for the magnetic field computations. Defaults to 32.
        bin_cadence (timedelta, optional): Time binning cadence applied to the extracted variables.
            Defaults to timedelta(seconds=10).
        skip_existing (bool, optional): If True, skip downloading files that already exist locally.
            Defaults to True.
        client_id (str | None, optional): Client ID for the ESA SWE authentication. If None, it is read
            from the `CLIENT_ID` environment variable. Defaults to None.
        client_secret (str | None, optional): Client secret for the ESA SWE authentication. If None, it
            is read from the `CLIENT_SECRET` environment variable. Defaults to None.
        save_strategy (typing.Literal["gfz", "netcdf", "both"], optional): Which saving strategy (or
            strategies) to use for the processed output. Defaults to "netcdf".

    Raises:
        ValueError: If `client_id` or `client_secret` is not provided and not available via the
            `CLIENT_ID`/`CLIENT_SECRET` environment variables.
    """
    if client_id is None:
        client_id = os.environ.get("CLIENT_ID")
    if client_secret is None:
        client_secret = os.environ.get("CLIENT_SECRET")

    if client_id is None:
        msg = "Client ID not found! Either load it from environment variables or pass it as an argument."
        raise ValueError(msg)

    if client_secret is None:
        msg = "Client secret not found! Either load it from environment variables or pass it as an argument."
        raise ValueError(msg)

    data_path_stem = f"{raw_data_path}/PROBAV/YYYY/MM/"

    url = "https://sso-csr-ucl-ac-be.content.swe.s2p.esa.int/r109_111/ascii/YYYYMM/PROBAV_EPT_YYYYMMDD_L1d.dat.gz"
    rename_file_name_stem = "PROBAV_ept_YYYYMMDD_L1d.csv"

    ep.download(
        start_time,
        end_time,
        save_path=data_path_stem,
        method="esa_swe",
        file_cadence="daily",
        download_url=url,
        file_name_stem="",
        rename_file_name_stem=rename_file_name_stem,
        authentication_info=(client_id, client_secret),
        skip_existing=skip_existing,
    )

    flux_unit = typing.cast("u.Unit", (u.cm**2 * u.s * u.sr * u.MeV) ** (-1))

    extraction_infos = [
        ep.ExtractionInfo(result_key="year", name_or_column="Y", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="month", name_or_column="M", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="day", name_or_column="D", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="hour", name_or_column="H", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="minute", name_or_column="MI", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(result_key="second", name_or_column="S", unit=u.dimensionless_unscaled, np_dtype=np.int32),
        ep.ExtractionInfo(
            result_key="millisecond", name_or_column="mS", unit=u.dimensionless_unscaled, np_dtype=np.int32
        ),
        ep.ExtractionInfo(result_key="flag", name_or_column="FLAG", unit=u.dimensionless_unscaled),
        ep.ExtractionInfo(result_key="chi2", name_or_column="p-Chi2", unit=u.dimensionless_unscaled),
        ep.ExtractionInfo(result_key="ch0", name_or_column="p-fl-00", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch1", name_or_column="p-fl-01", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch2", name_or_column="p-fl-02", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch3", name_or_column="p-fl-03", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch4", name_or_column="p-fl-04", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch5", name_or_column="p-fl-05", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch6", name_or_column="p-fl-06", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch7", name_or_column="p-fl-07", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch8", name_or_column="p-fl-08", unit=flux_unit),
        ep.ExtractionInfo(result_key="ch9", name_or_column="p-fl-09", unit=flux_unit),
        ep.ExtractionInfo(result_key="PA_local", name_or_column="Pitch", unit=u.deg),
        ep.ExtractionInfo(result_key="rad", name_or_column="Rad", unit=u.km),
        ep.ExtractionInfo(result_key="lon", name_or_column="Long", unit=u.deg),
        ep.ExtractionInfo(result_key="lat", name_or_column="Lat", unit=u.deg),
    ]

    variables = ep.extract_variables_from_files(
        start_time,
        end_time,
        file_cadence="daily",
        data_path=data_path_stem,
        file_name_stem=rename_file_name_stem,
        extraction_infos=extraction_infos,
        pd_read_csv_kwargs={"sep": r"\s+", "header": 24},
    )

    # create flux variable
    flux_data = np.stack(
        [
            variables["ch0"].get_data(),
            variables["ch1"].get_data(),
            variables["ch2"].get_data(),
            variables["ch3"].get_data(),
            variables["ch4"].get_data(),
            variables["ch5"].get_data(),
            variables["ch6"].get_data(),
            variables["ch7"].get_data(),
            variables["ch8"].get_data(),
            variables["ch9"].get_data(),
        ]
    ).T
    flux_data = flux_data[:, :, np.newaxis]
    variables["FPDU"] = ep.Variable(data=flux_data, original_unit=flux_unit)
    del variables["ch0"], variables["ch1"], variables["ch2"], variables["ch3"], variables["ch4"]
    del variables["ch5"], variables["ch6"], variables["ch7"], variables["ch8"], variables["ch9"]

    variables["FPDU"].apply_thresholds_on_data(lower_threshold=1e-21)

    # apply chi-2 quality check
    variables["FPDU"].apply_mask(variables["chi2"].get_data().astype(np.float64) < CHI2_BAD_QUALITY_THRESHOLD)
    variables["FPDU"].metadata.add_processing_note(
        f"Values with CHI2 >= {CHI2_BAD_QUALITY_THRESHOLD:0.1f} are set to NaN."
    )

    # expand PA variable
    variables["PA_local"].set_data(variables["PA_local"].get_data()[:, np.newaxis], unit="same")
    pa_arr = variables["PA_local"].get_data(u.deg)
    pa_arr = np.where(pa_arr > 90, 180 - pa_arr, pa_arr)
    variables["PA_local"].set_data(pa_arr, unit=u.deg)

    # create Epoch variable
    epoch_datetime = [
        datetime(y, m, d, h, mi, s, int(ms), tzinfo=timezone.utc)
        for (y, m, d, h, mi, s, ms) in zip(
            variables["year"].get_data(),
            variables["month"].get_data(),
            variables["day"].get_data(),
            variables["hour"].get_data(),
            variables["minute"].get_data(),
            variables["second"].get_data(),
            variables["millisecond"].get_data().astype(np.int32) * 1e3,
            strict=True,
        )
    ]
    epoch_data = [t.timestamp() for t in epoch_datetime]

    variables["Epoch"] = ep.Variable(data=np.asarray(epoch_data), original_unit=ep.units.posixtime)
    del variables["year"], variables["month"], variables["day"], variables["hour"], variables["minute"]
    del variables["second"], variables["millisecond"]

    # calculate mean of energy limits to get center energies
    energy_data = np.convolve(EPT_ENERGY_LIMITS, np.ones(2), "valid") / 2
    variables["Energy_FPDU"] = ep.Variable(data=energy_data, original_unit=u.MeV)
    variables["Energy_FPDU"].metadata.add_processing_note(
        f"Created by calculating center energies from {', '.join(map(str, EPT_ENERGY_LIMITS))}."
    )

    time_bin_methods = {
        "Energy_FPDU": ep.TimeBinMethod.Repeat,
        "rad": ep.TimeBinMethod.NanMean,
        "lat": ep.TimeBinMethod.NanMean,
        "lon": ep.TimeBinMethod.NanMean,
        "PA_local": ep.TimeBinMethod.NanMean,
        "FPDU": ep.TimeBinMethod.NanMedian,
    }

    binned_time_var = ep.processing.bin_by_time(
        variables["Epoch"], variables, time_bin_methods, bin_cadence, start_time=start_time, end_time=end_time
    )

    xsph_arr = np.stack(
        (
            variables["rad"].get_data(ep.units.RE),
            variables["lat"].get_data(u.degree),
            variables["lon"].get_data(u.degree),
        )
    ).T.astype(np.float64)
    model_coord = ep.processing.magnetic_field_utils.Coords()

    epoch_datetime = [datetime.fromtimestamp(t, tz=timezone.utc) for t in binned_time_var.get_data()]
    xgeo_arr = model_coord.transform(epoch_datetime, xsph_arr, ep.IRBEM_SYSAXIS_SPH, ep.IRBEM_SYSAXIS_GEO)
    variables["xGEO"] = ep.Variable(data=xgeo_arr, original_unit=ep.units.RE)

    del variables["rad"], variables["lon"], variables["lat"]

    variables_to_compute: ep.processing.VariableRequest = [
        ("B_Calc", "T89"),
        ("B_Eq", "T89"),
        ("MLT_Eq", "T89"),
        ("R_Eq", "T89"),
        ("Alpha_Eq", "T89"),
        ("L_m", "T89"),
    ]

    magnetic_field_variables = ep.processing.compute_magnetic_field_variables(
        time_var=binned_time_var,
        xgeo_var=variables["xGEO"],
        energy_var=variables["Energy_FPDU"],
        pa_local_var=variables["PA_local"],
        particle_species="proton",
        variables_to_compute=variables_to_compute,
        irbem_options=ep.processing.magnetic_field_utils.IrbemOptions(
            lstar_quantity=ep.processing.magnetic_field_utils.LstarQuantity.NONE,
        ),
        num_cores=num_cores,
    )

    variables |= magnetic_field_variables

    variables_to_save: dict[ep.typing.InternalName, ep.Variable] = {
        "Epoch": binned_time_var,
        "FPDU": variables["FPDU"],
        "Energy_FPDU": variables["Energy_FPDU"],
        "Alpha": variables["PA_local"],
        "Alpha_Eq": magnetic_field_variables["Alpha_Eq_T89"],
        "R_Eq": magnetic_field_variables["R_Eq_T89"],
        "MLT": magnetic_field_variables["MLT_Eq_T89"],
        "L_m": magnetic_field_variables["L_m_T89"],
        "B_Calc": magnetic_field_variables["B_Calc_T89"],
        "B_Eq": magnetic_field_variables["B_Eq_T89"],
        "Position": variables["xGEO"],
    }

    if save_strategy in ("gfz", "both"):
        strategy = ep.saving_strategies.GFZStrategy(
            processed_data_path,
            mission="PROBAV",
            satellite="probav",
            instrument="EPT-proton",
            mag_field="T89",
            data_standard=ep.data_standards.GFZStandard(),
        )

    if save_strategy in ("netcdf", "both"):
        strategy = ep.saving_strategies.DailyLEORBStrategy(
            base_data_path=Path(processed_data_path),
            mission="PROBAV",
            satellite="probav",
            instrument="EPT-proton",
            mag_field="T89",
            file_format=".nc",
            data_standard=ep.data_standards.GFZStandard(),
        )
    ep.save(variables_to_save, strategy, start_time, end_time, time_var=binned_time_var, append=True)