disdrodb.l0 package
Subpackages
Submodules
disdrodb.l0.l0a_processing module
Functions to process raw text files into DISDRODB L0A Apache Parquet.
- disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Convert ‘object’ dataframe columns into DISDRODB L0A dtype standards.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe with corrected columns types.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Coerce corrupted values in dataframe numeric columns to np.nan.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe with string columns without corrupted values.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, verbose: bool = False) DataFrame[source]
Concatenate a list of dataframes.
- Parameters:
list_df (list) – List of dataframes.
verbose (bool, optional) – If True, print messages. If False, no print.
- Returns:
Concatenated dataframe.
- Return type:
pd.DataFrame
- Raises:
ValueError – Concatenation can not be done.
- disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source]
Drop problematic time_period.
- disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source]
Preprocess arguments required to read raw text file into Pandas.
- Parameters:
reader_kwargs (dict) – Initial parameter dictionary.
- Returns:
Parameter dictionary that matches either Pandas or Dask.
- Return type:
dict
- disdrodb.l0.l0a_processing.process_raw_file(filepath, column_names, reader_kwargs, df_sanitizer_fun, sensor_name, verbose=True, issue_dict={})[source]
Read and parse a raw text files into a L0A dataframe.
- Parameters:
filepath (str) – File path
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing. The default is True
issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are ‘timesteps’ and ‘time_periods’. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.
- Returns:
Dataframe
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.read_raw_data(filepath: str, column_names: list, reader_kwargs: dict) DataFrame[source]
Read raw data into a dataframe.
- Parameters:
filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas pd.read_csv arguments.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
- disdrodb.l0.l0a_processing.read_raw_file_list(file_list: list | str, column_names: list, reader_kwargs: dict, sensor_name: str, verbose: bool, df_sanitizer_fun: object | None = None) DataFrame[source]
Read and parse a list for raw files into a dataframe.
- Parameters:
file_list (Union[list,str]) – File(s) path(s)
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
- Returns:
Dataframe
- Return type:
pd.DataFrame
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source]
Remove corrupted rows by checking conversion of raw fields to numeric.
Note: The raw array must be stripped away from delimiter at start and end !
- disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, verbose: bool = False)[source]
Remove duplicated timesteps.
It keep only the first timestep occurence !
- Parameters:
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe with valid unique timesteps.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, verbose=False)[source]
Drop dataframe rows with timesteps listed in the issue dictionary.
- Parameters:
df (pd.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary
- Returns:
Dataframe with problematic timesteps removed.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: DataFrame, verbose: bool = False)[source]
Remove dataframe rows where the “time” is NaT.
- Parameters:
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe with valid timesteps.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, verbose)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe without nan_flags values.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, verbose)[source]
Set values outside the data range as np.nan.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe without values outside the expected data range.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_unvalid_values(df, sensor_name, verbose)[source]
Set unvalid (class) values to np.nan.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe without unvalid values.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source]
Remove the first and last delimiter occurence from the raw array fields.
- disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Strip leading/trailing spaces from dataframe string columns.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataframe with string columns without leading/trailing spaces.
- Return type:
pd.DataFrame
- disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, fpath: str, force: bool = False, verbose: bool = False)[source]
Save the dataframe into an Apache Parquet file.
- Parameters:
df (pd.DataFrame) – Input dataframe.
fpath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- Raises:
ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.
disdrodb.l0.l0b_processing module
Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source]
Add the CRS coordinate to the xr.Dataset
- disdrodb.l0.l0b_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source]
Add missing Dataset variables as nan DataArrays.
- disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source]
Convert variables with object dtype to string.
- Parameters:
ds (xr.Dataset) – Input dataset.
- Returns:
Output dataset.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, attrs: dict, verbose: bool = False) Dataset[source]
Transform the L0A dataframe to the L0B xr.Dataset.
- Parameters:
df (pd.DataFrame) – DISDRODB L0A dataframe.
attrs (dict) – Station metadata.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- Returns:
DISDRODB L0B dataset.
- Return type:
xr.Dataset
- Raises:
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) array[source]
Split a string with multiple numbers separated by a delimiter into an 1D array.
e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]
If empty string (“”) –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan
The function strip potential delimiters at start and end before splitting.
- Parameters:
string (str) – Input string
n_values (int) – Expected length of the output array.
- Returns:
array of float
- Return type:
np.array
- disdrodb.l0.l0b_processing.get_bin_coords(sensor_name: str) dict[source]
Retrieve diameter (and velocity) bin coordinates.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with coordinate arrays.
- Return type:
dict
- disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source]
Infer the delimeter inside a string.
- Parameters:
string (str) – Input string.
- Returns:
Inferred delimiter.
- Return type:
str
- disdrodb.l0.l0b_processing.preprocess_raw_netcdf(ds, dict_names, sensor_name)[source]
This function preprocess raw netCDF to improve compatibility with DISDRODB standards.
This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a NaN DataArray !
- Parameters:
ds (xr.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.
- Returns:
ds – xarray Dataset with DISDRODB-compliant variable naming conventions.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.process_raw_nc(filepath, dict_names, ds_sanitizer_fun, sensor_name, verbose, attrs)[source]
Read and convert a raw netCDF into a DISDRODB L0B netCDF.
- Parameters:
filepath (str) – netCDF file path.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (function) – Sanitizer function to do ad-hoc processing of the xr.Dataset.
attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
L0B xr.Dataset
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.rechunk_dataset(ds: Dataset, encoding_dict: dict) Dataset[source]
Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.
- Parameters:
ds (xr.Dataset) – Input xarray dataset
encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.
- Returns:
Output xarray dataset
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.rename_dataset(ds, dict_names)[source]
Rename Dataset variables, coordinates and dimensions.
- disdrodb.l0.l0b_processing.replace_custom_nan_flags(ds, dict_nan_flags)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters:
df (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
- Returns:
Dataset without nan_flags values.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.replace_nan_flags(ds, sensor_name, verbose)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters:
ds (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataset without nan_flags values.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) array[source]
Reshape the raw spectrum to a 2D+time array.
The array has dimensions [“time”] + dims_order
- Parameters:
arr (np.array) – Input array.
dims_order (list) –
The order of dimension in the raw spectrum.
Examples: - OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]
dims_size_dict (dict) –
Dictionary with the number of bins for each dimension. For OTT_Parsivel:
- {“diameter_bin_center”: 32,
”velocity_bin_center”: 32}
- For This_LPM
- {“diameter_bin_center”: 22,
”velocity_bin_center”: 20}
n_timesteps (int) – Number of timesteps.
- Returns:
Output array.
- Return type:
np.array
- Raises:
ValueError – Impossible to reshape the raw_spectrum matrix
- disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, verbose: bool = False) dict[source]
Retrieves the L0B data matrix.
- Parameters:
df (pd.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor
- Returns:
Dictionary with data arrays.
- Return type:
dict
- disdrodb.l0.l0b_processing.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) dict[source]
Ensure chunk size to be smaller than the array shape.
- Parameters:
encoding_dict (dict) – Dictionary containing the encoding to write DISDRODB L0B netCDFs.
ds (xr.Dataset) – Input dataset.
- Returns:
Encoding dictionary.
- Return type:
dict
- disdrodb.l0.l0b_processing.set_dataset_attrs(ds, sensor_name)[source]
Set variable and coordinates attributes.
- disdrodb.l0.l0b_processing.set_encodings(ds: Dataset, sensor_name: str) Dataset[source]
Apply the encodings to the xarray Dataset.
- Parameters:
ds (xr.Dataset) – Input xarray dataset.
sensor_name (str) – Name of the sensor.
- Returns:
Output xarray dataset.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.set_nan_outside_data_range(ds, sensor_name, verbose)[source]
Set values outside the data range as np.nan.
- Parameters:
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataset without values outside the expected data range.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.set_nan_unvalid_values(ds, sensor_name, verbose)[source]
Set unvalid (class) values to np.nan.
- Parameters:
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns:
Dataset without unvalid values.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) Dataset[source]
Set attributes to each xr.Dataset variable.
- Parameters:
ds (xr.Dataset) – Input dataset.
sensor_name (str) – Name of the sensor.
- Returns:
xr.Dataset.
- Return type:
ds
- disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, fpath: str, force=False) None[source]
Save the xarray dataset into a NetCDF file.
- Parameters:
ds (xr.Dataset) – Input xarray dataset.
fpath (str) – Output file path.
sensor_name (str) – Name of the sensor.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.
disdrodb.l0.l0b_concat module
disdrodb.l0.l0_processing module
- disdrodb.l0.l0_processing.click_l0_archive_options(function: object)[source]
Click command line arguments for L0 processing archiving of a station.
- Parameters:
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_processing_options(function: object)[source]
Click command line default parameters for L0 processing options.
- Parameters:
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_station_arguments(function: object)[source]
Click command line arguments for L0 processing of a station.
- Parameters:
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_stations_options(function: object)[source]
Click command line options for DISDRODB archive L0 processing.
- Parameters:
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0b_concat_options(function: object)[source]
Click command line default parameters for L0B concatenation.
- Parameters:
function (object) – Function.
- disdrodb.l0.l0_processing.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of DISDRODB stations.
This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters:
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.
- disdrodb.l0.l0_processing.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters:
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.
- disdrodb.l0.l0_processing.run_disdrodb_l0a(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
- disdrodb.l0.l0_processing.run_disdrodb_l0a_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0B processing of a station calling run_disdrodb_l0a_station in the terminal.
- disdrodb.l0.l0_processing.run_disdrodb_l0b(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
- disdrodb.l0.l0_processing.run_disdrodb_l0b_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0B processing of a station calling run_disdrodb_l0b_station in the terminal.
- disdrodb.l0.l0_processing.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0A processing for a specific DISDRODB station.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.
- disdrodb.l0.l0_processing.run_l0b(processed_dir, station_name, parallel, force, verbose, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. Ensure that the threads_per_worker (number of thread per process) is set to 1 to avoid HDF errors. Also ensure to set the HDF5_USE_FILE_LOCKING environment variable to False. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just 3 raw data files. The default is False.
- disdrodb.l0.l0_processing.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.
disdrodb.l0.l0_reader module
- disdrodb.l0.l0_reader.available_readers(data_sources=None, reader_path=False)[source]
Retrieve available readers information.
- disdrodb.l0.l0_reader.check_available_readers()[source]
Check the readers arguments of all package.
- disdrodb.l0.l0_reader.check_reader_arguments(reader)[source]
Check the reader have the expected input arguments.
- disdrodb.l0.l0_reader.check_reader_exists(reader_data_source: str, reader_name: str) str[source]
Check if the provided data source exists and reader names exists within the available readers.
Please run get_available_readers_dict() to get the list of all available reader.
- Parameters:
reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – Campaign name
- Returns:
If True : returns the reader name If False : Error - return None
- Return type:
str
- Raises:
ValueError – Error if the reader name provided for the campaign has not been found.
- disdrodb.l0.l0_reader.get_available_readers_dict() dict[source]
Returns the readers description included into the current release of DISDRODB.
- Returns:
The dictionary has the following schema {“data_source”: {“reader_name”: “reader_file_path”}}
- Return type:
dict
- disdrodb.l0.l0_reader.get_reader(reader_data_source: str, reader_name: str) object[source]
Returns the reader function based on input parameters.
- Parameters:
reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – The reader name.
- Returns:
The reader() function
- Return type:
object
- disdrodb.l0.l0_reader.get_reader_from_metadata_reader_key(reader_data_source_name)[source]
Retrieve the reader from the reader metadata value.
The convention for metadata reader key: <data_source/reader_name> in disdrodb.l0.readers
- disdrodb.l0.l0_reader.get_station_reader(disdrodb_dir, data_source, campaign_name, station_name)[source]
Retrieve reader form station metadata information.
- disdrodb.l0.l0_reader.is_documented_by(original)[source]
Wrapper function to apply generic docstring to the decorated function.
- Parameters:
original (function) – Function to take the docstring from.
- disdrodb.l0.l0_reader.reader_generic_docstring()[source]
Script to convert the raw data to L0A format.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw data files. The default is False.
disdrodb.l0.check_configs module
- disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source]
Check bin consistency from config file.
Do not check the first and last bin !
- Parameters:
sensor_name (str) – Name of the sensor.
disdrodb.l0.check_metadata module
- disdrodb.l0.check_metadata.check_metadata_geolocation(metadata) None[source]
Identify metadata with missing or wrong geolocation.
- disdrodb.l0.check_metadata.get_archive_metadata_key_value(disdrodb_dir, key, return_tuple=True)[source]
Return the values of a metadata key for all the archive.
- disdrodb.l0.check_metadata.identify_empty_metadata_keys(metadata_fpaths: list, keys: str | list) None[source]
Identify empty metadata keys.
- Parameters:
metadata_fpaths (str) – Input YAML file path.
keys (Union[str,list]) – Attributes to verify the presence.
disdrodb.l0.check_standards module
- disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source]
Checks that the dataframe columns respects DISDRODB standards.
- Parameters:
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error if some columns do not meet the DISDRODB standards or if the ‘time’ column is missing in the dataframe.
- disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, verbose: bool = True) None[source]
Checks that a file respects the DISDRODB L0A standards.
- Parameters:
df (pd.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Wheter to verbose the processing. The default is True.
- Raises:
ValueError – Error if some columns have inconsistent values.
- disdrodb.l0.check_standards.check_sensor_name(sensor_name: str) None[source]
Check sensor name.
- Parameters:
sensor_name (str) – Name of the sensor.
- Raises:
TypeError – Error if sensor_name is not a string.
ValueError – Error if the input sensor name has not been found in the list of available sensors.
disdrodb.l0.io module
- disdrodb.l0.io.check_glob_pattern(pattern: str) None[source]
Check if the input parameters is a string and if it can be used as pattern.
- Parameters:
pattern (str) – String to be checked.
- Raises:
TypeError – The input parameter is not a string.
ValueError – The input parameter can not be used as pattern.
- disdrodb.l0.io.check_glob_patterns(patterns: str | list) list[source]
Check if glob patterns are valids.
- disdrodb.l0.io.check_raw_dir(raw_dir: str, verbose: bool = False) None[source]
Check validity of raw_dir.
Steps: 1. Check that ‘raw_dir’ is a valid directory path 2. Check that ‘raw_dir’ follows the expect directory structure 3. Check that each station_name directory contains data 4. Check that for each station_name the mandatory metadata.yml is specified. 4. Check that for each station_name the mandatory issue.yml is specified.
- Parameters:
raw_dir (str) – Input raw directory
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- disdrodb.l0.io.create_directory_structure(processed_dir, product_level, station_name, force, verbose=False)[source]
Create directory structure for L0B and higher DISDRODB products.
- disdrodb.l0.io.create_initial_directory_structure(raw_dir, processed_dir, station_name, force, verbose=False, product_level='L0A')[source]
Create directory structure for the first L0 DISDRODB product.
If the input data are raw text files –> product_level = “L0A” (run_l0a) If the input data are raw netCDF files –> product_level = “L0B” (run_l0b_nc)
- disdrodb.l0.io.get_L0A_dir(processed_dir: str, station_name: str) str[source]
Define L0A directory.
- Parameters:
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns:
L0A directory path.
- Return type:
str
- disdrodb.l0.io.get_L0A_fname(df, processed_dir, station_name: str) str[source]
Define L0A file name.
- Parameters:
df (pd.DataFrame) – L0A DataFrame
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns:
L0A file name.
- Return type:
str
- disdrodb.l0.io.get_L0A_fpath(df: DataFrame, processed_dir: str, station_name: str) str[source]
Define L0A file path.
- Parameters:
df (pd.DataFrame) – L0A DataFrame.
processed_dir (str) – Path of the processed directory.
station_name (str) – Name of the station.
- Returns:
L0A file path.
- Return type:
str
- disdrodb.l0.io.get_L0B_dir(processed_dir: str, station_name: str) str[source]
Define L0B directory.
- Parameters:
processed_dir (str) – Path of the processed directory
station_name (int) – Name of the station
- Returns:
Path of the L0B directory
- Return type:
str
- disdrodb.l0.io.get_L0B_fname(ds, processed_dir, station_name: str) str[source]
Define L0B file name.
- Parameters:
ds (xr.Dataset) – L0B xarray Dataset
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns:
L0B file name.
- Return type:
str
- disdrodb.l0.io.get_L0B_fpath(ds: Dataset, processed_dir: str, station_name: str, l0b_concat=False) str[source]
Define L0B file path.
- Parameters:
ds (xr.Dataset) – L0B xarray Dataset.
processed_dir (str) – Path of the processed directory.
station_name (str) – ID of the station
l0b_concat (bool) – If False, the file is specified inside the station directory. If True, the file is specified outside the station directory.
- Returns:
L0B file path.
- Return type:
str
- disdrodb.l0.io.get_campaign_name(path: str) str[source]
Return the campaign name from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters:
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns:
Name of the campaign.
- Return type:
str
- disdrodb.l0.io.get_data_source(path: str) str[source]
Return the data_source from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters:
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns:
Name of the campaign.
- Return type:
str
- disdrodb.l0.io.get_dataframe_min_max_time(df: DataFrame)[source]
Retrieves dataframe starting and ending time.
- Parameters:
df (pd.DataFrame) – Input dataframe
- Returns:
(starting_time, ending_time)
- Return type:
tuple
- disdrodb.l0.io.get_dataset_min_max_time(ds: Dataset)[source]
Retrieves dataset starting and ending time.
- Parameters:
ds (xr.Dataset) – Input dataset
- Returns:
(starting_time, ending_time)
- Return type:
tuple
- disdrodb.l0.io.get_disdrodb_dir(path: str) str[source]
Return the disdrodb base directory from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters:
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns:
Path of the DISDRODB directory.
- Return type:
str
- disdrodb.l0.io.get_disdrodb_path(path: str) str[source]
Return the path fron the disdrodb_dir directory.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters:
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns:
Path inside the DISDRODB archive. Format: DISDRODB/<Raw or Processed>/<data_source>/…
- Return type:
str
- disdrodb.l0.io.get_l0a_file_list(processed_dir, station_name, debugging_mode)[source]
Retrieve L0A files for a give station.
- Parameters:
processed_dir (str) – Directory of the campaign where to search for the L0A files. Format <..>/DISDRODB/Processed/<data_source>/<campaign_name>
station_name (str) – ID of the station
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns:
list_fpaths – List of L0A file paths.
- Return type:
list
- disdrodb.l0.io.get_raw_file_list(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]
Get the list of files from a directory based on input parameters.
Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.
- Parameters:
raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<data_source>/<campaign_name>
station_name (str) – ID of the station
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns:
list_fpaths – List of files file paths.
- Return type:
list
- disdrodb.l0.io.read_L0A_dataframe(fpaths: str | list, verbose: bool = False, debugging_mode: bool = False) DataFrame[source]
Read DISDRODB L0A Apache Parquet file(s).
- Parameters:
fpaths (str or list) – Either a list or a single filepath .
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. If fpaths is a list, it reads only the first 3 files For each file it select only the first 100 rows. The default is False.
- Returns:
L0A Dataframe.
- Return type:
pd.DataFrame
disdrodb.l0.issue module
- class disdrodb.l0.issue.NoDatesSafeLoader(stream)[source]
Bases:
SafeLoader- classmethod remove_implicit_resolver(tag_to_remove)[source]
Remove implicit resolvers for a particular tag
Takes care not to modify resolvers in super classes.
We want to load datetimes as strings, not dates, because we go on to serialise as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.
- disdrodb.l0.issue.check_issue_file(fpath: str) None[source]
Check issue YAML file validity.
- Parameters:
fpath (str) – Issue YAML file path.
- disdrodb.l0.issue.check_timesteps(timesteps)[source]
Check timesteps validity.
It expects timesteps string in YYYY-mm-dd HH:MM:SS format with second accuracy. If timesteps is None, return None.
- disdrodb.l0.issue.load_yaml_without_date_parsing(filepath)[source]
Read a YAML file without converting automatically date string to datetime.
- disdrodb.l0.issue.read_issue(raw_dir: str, station_name: str) dict[source]
Read YAML issue file.
- Parameters:
raw_dir (str) – Path of the campaign raw directory.
station_name (int) – Station name.
- Returns:
Issue dictionary.
- Return type:
dict
- disdrodb.l0.issue.read_issue_file(fpath: str) dict[source]
Read YAML issue file.
- Parameters:
fpath (str) – Filepath of the issue YAML.
- Returns:
Issue dictionary.
- Return type:
dict
disdrodb.l0.metadata module
- disdrodb.l0.metadata.add_missing_metadata_keys(metadata)[source]
Add missing keys to the metadata dictionary.
- disdrodb.l0.metadata.check_metadata_compliance(disdrodb_dir, data_source, campaign_name, station_name)[source]
Check DISDRODB metadata compliance.
- disdrodb.l0.metadata.create_campaign_default_metadata(disdrodb_dir, campaign_name, data_source)[source]
Create default YAML metadata files for all stations within a campaign.
Use the function with caution to avoid overwrite existing YAML files.
- disdrodb.l0.metadata.get_default_metadata_dict() dict[source]
Get DISDRODB metadata default values.
- Returns:
Dictionary of attibutes standard
- Return type:
dict
- disdrodb.l0.metadata.get_metadata_missing_keys(metadata)[source]
Return the DISDRODB metadata keys which are missing.
- disdrodb.l0.metadata.get_metadata_unvalid_keys(metadata)[source]
Return the DISDRODB metadata keys which are not valid.
- disdrodb.l0.metadata.get_valid_metadata_keys() list[source]
Get DISDRODB valid metadata list.
- Returns:
List of valid metadata keys
- Return type:
list
- disdrodb.l0.metadata.read_metadata(campaign_dir: str, station_name: str) dict[source]
Read YAML metadata file.
- Parameters:
raw_dir (str) – Path of the raw directory
station_name (int) – Id of the station.
- Returns:
Dictionnary of the metadata.
- Return type:
dict
- disdrodb.l0.metadata.remove_unvalid_metadata_keys(metadata)[source]
Remove unvalid keys from the metadata dictionary.
- disdrodb.l0.metadata.sort_metadata_dictionary(metadata)[source]
Sort the keys of the metadata dictionary by valid_metadata_keys list order.
disdrodb.l0.standards module
- disdrodb.l0.standards.available_sensor_name() sorted[source]
Get available names of sensors.
- Returns:
Sorted list of the available sensors
- Return type:
sorted
- disdrodb.l0.standards.get_L0A_encodings_dict(sensor_name: str) dict[source]
Get a dictionary containing the L0A encodings
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
L0A encodings
- Return type:
dict
- disdrodb.l0.standards.get_L0B_encodings_dict(sensor_name: str) dict[source]
Get a dictionary containing the encoding to write L0B netCDFs.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Encoding to write L0B netCDFs
- Return type:
dict
- disdrodb.l0.standards.get_configs_dir(sensor_name: str) str[source]
Retrieve configs directory.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Config directory.
- Return type:
str
- Raises:
ValueError – Error if the config directory does not exist.
- disdrodb.l0.standards.get_coords_attrs_dict(ds)[source]
Return dictionary with DISDRODB coordinates attributes.
- disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source]
Get a dictionary containing the data format of each sensor variable.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Data format of each sensor variable
- Return type:
dict
- disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source]
Get the variable data range.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.
- Return type:
dict
- disdrodb.l0.standards.get_description_dict(sensor_name: str) dict[source]
Get a dictionary containing the description of each sensor variable.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Description of each sensor variable.
- Return type:
dict
- disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source]
Get diameter bin center.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Diameter bin center
- Return type:
list
- disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source]
Get diameter bin lower bound.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Diameter bin lower bound
- Return type:
list
- disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source]
Get diameter bin upper bound.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Diameter bin upper bound
- Return type:
list
- disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source]
Get diameter bin width.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Diameter bin width
- Return type:
list
- disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source]
Get dictionary with sensor_name diameter bins information.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
sensor_name diameter bins information
- Return type:
dict
- disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source]
Get the number of bins for each dimension.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the number of bins for each dimension.
- Return type:
dict
- disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source]
Get the total number of characters from the instrument default string standards.
Important note: it accounts also for the comma and the minus sign !!!
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected number of characters for each data field.
- Return type:
dict
- disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source]
Get number of digits on the right side of the comma from the instrument default string standards.
Example: 123,45 -> 45 –> 2 decimal digits :param sensor_name: Name of the sensor. :type sensor_name: dict
- Returns:
Dictionary with the expected number of decimal digits for each data field.
- Return type:
dict
- disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source]
Get number of digits from the instrument default string standards.
Important note: it excludes the comma but it counts the minus sign !!!
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected number of digits for each data field.
- Return type:
dict
- disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source]
Get number of digits on the left side of the comma from the instrument default string standards.
Example: 123,45 -> 123 –> 3 natural digits
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected number of natural digits for each data field.
- Return type:
dict
- disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source]
Get a dictionary containing the L0A dtype.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
L0A dtype
- Return type:
dict
- disdrodb.l0.standards.get_long_name_dict(sensor_name: str) dict[source]
Get a dictionary containing the long name of each sensor variable.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Long name of each sensor variable.
- Return type:
dict
- disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source]
Get the variable nan_flags.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.
- Return type:
dict
- disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source]
Get the dimension order of the raw fields.
The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.
Examples
OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Dimension order dictionary
- Return type:
dict
- disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source]
Get a dictionary with the number of values expected for each raw array.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Field definition.
- Return type:
dict
- disdrodb.l0.standards.get_sensor_variables(sensor_name: str) list[source]
Get sensor variable names list.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
List of the variables values
- Return type:
list
- disdrodb.l0.standards.get_time_encoding() dict[source]
Create time encoding
- Returns:
Time encoding
- Return type:
dict
- disdrodb.l0.standards.get_units_dict(sensor_name: str) dict[source]
Get a dictionary containing the unit of each sensor variable.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Unit of each sensor variable
- Return type:
dict
- disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source]
Get list of valid coordinates.
- disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source]
Get list of valid dimension names.
- disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source]
Get the list of valid values for a variable.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.
- Return type:
dict
- disdrodb.l0.standards.get_variables_dict(sensor_name: str) dict[source]
Get a dictionary containing the variable name of the sensor field numbers.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Variables names
- Return type:
dict
- disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source]
Returns a dictionary with the variable dimensions of a L0B product.
- disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source]
Get velocity bin center.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Velocity bin center
- Return type:
list
- disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source]
Get velocity bin lower bound.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Velocity bin lower bound.
- Return type:
list
- disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source]
Get velocity bin upper bound.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Velocity bin upper bound
- Return type:
list
- disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source]
Get velocity bin width.
- Parameters:
sensor_name (str) – Name of the sensor
- Returns:
Velocity bin width
- Return type:
list
- disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source]
Get velocity with sensor_name diameter bins information.
- Parameters:
sensor_name (str) – Name of the sensor.
- Returns:
Sensor_name diameter bins information
- Return type:
dict
- disdrodb.l0.standards.read_config_yml(sensor_name: str, filename: str) dict[source]
Read a config yaml file and return the dictionary.
- Parameters:
sensor_name (str) – Name of the sensor.
filename (str) – Name of the file.
- Returns:
Content of the config file.
- Return type:
dict
- Raises:
ValueError – Error if file does not exist.
- disdrodb.l0.standards.set_disdrodb_attrs(ds, product_level: str)[source]
Add DISDRODB processing information to the netCDF global attributes.
It assumes stations metadata are already added the dataset.
- Parameters:
ds (xarray dataset) – Dataset
product_level (str) – DISDRODB product_level
- Returns:
Dataset
- Return type:
xarray dataset
disdrodb.l0.summary module
disdrodb.l0.template_tools module
- disdrodb.l0.template_tools.arr_has_constant_nchar(arr: array) bool[source]
Check if the content of an array has a constant number of characters
- Parameters:
arr (numpy.ndarray) – The array to analyse
- Returns:
True if the number of character is constant
- Return type:
booleen
- disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source]
Checks that the columnn names respects DISDRODB standards.
- Parameters:
column_names (list) – List of columns names.
sensor_name (str) – Name of the sensor.
- Raises:
TypeError – Error if some columns do not meet the DISDRODB standards.
- disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source]
Get the decimal number of digit.
- Parameters:
string (str) – Input string
- Returns:
The number of digit.
- Return type:
int
- disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source]
Create a dictionary {column: unique values}
- Parameters:
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source]
Get the natural number of digit.
- Parameters:
string (str) – Input string
- Returns:
The number of digit.
- Return type:
int
- disdrodb.l0.template_tools.get_nchar(string: str) int[source]
Get the number of charactar.
- Parameters:
string (str) – Input string
- Returns:
Number of charactar
- Return type:
int
- disdrodb.l0.template_tools.get_ndigits(string: str) int[source]
Get the number of digit.
- Parameters:
string (str) – Input string
- Returns:
Number of digit
- Return type:
int
- disdrodb.l0.template_tools.get_possible_keys(dict_options: dict, desired_value: str) set[source]
Get the possible keys from the input values
- Parameters:
dict_options (dict) – Input dictionnary
desired_value (str) – Input value
- Returns:
Keys that the value matches the desired input value.
- Return type:
set
- disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source]
Try to guess the dataframe columns names based on string characteristics.
- Parameters:
df (numpy.ndarray) – The array to analyse
sensor_name (str) – name of the sensor
row_idx (int, optional) – The row ID of the array, by default 1
- Returns:
Dictionary with the keys being the column id and the values being the guessed column names
- Return type:
dict
- disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source]
Print dataframe columns names
- Parameters:
df (dataframe) – The dataframe
- Returns:
Nothing
- Return type:
None
- disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True) None[source]
Print columns’ unique values
- Parameters:
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, column_names: bool = True) None[source]
Print the n first n rows dataframe by column.
- Parameters:
df (pd.DataFrame) – Input dataframe
n (int, optional) – Number of row, by default 5
column_names (bool , optional) – If true columns name are printed, by default True
- disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, with_column_names: bool = True) None[source]
Print the content of the dataframe by column, randomly chosen
- Parameters:
df (dataframe) – The dataframe
n (int, optional) – The number of row to print, by default 5
with_column_names (bool, optional) – If true, print the column name, by default True
- Returns:
Nothing
- Return type:
None
- disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source]
Create a columns statistics summary.
- Parameters:
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- Raises:
ValueError – Error if columns types is not numeric.
- disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source]
Print empty rows
- Parameters:
df (pd.DataFrame) – Input dataframe.
- disdrodb.l0.template_tools.print_valid_L0_column_names(sensor_name: str) None[source]
Print valid columns names from the standard.
- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.template_tools.search_possible_columns(string: str, sensor_name: str) list[source]
Define possible column
- Parameters:
string (str) – Inpur string
sensor_name (str) – Name of the sensor
- Returns:
list of possible columns
- Return type:
list
- disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source]
Check if a string has decimals
- Parameters:
string – Input string
- Returns:
True if sting has digits.
- Return type:
bool
- disdrodb.l0.template_tools.str_is_integer(string: str) bool[source]
Check if a string is an integer
- Parameters:
string (Input string) –
- Returns:
True if integer.
- Return type:
bool
Module contents
- disdrodb.l0.available_readers(data_sources=None, reader_path=False)[source]
Retrieve available readers information.
- disdrodb.l0.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of DISDRODB stations.
This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters:
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.
- disdrodb.l0.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters:
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.
- disdrodb.l0.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0A processing for a specific DISDRODB station.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.
- disdrodb.l0.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
- Parameters:
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.