disdrodb.l0 package
Subpackages
Submodules
disdrodb.l0.check_configs module
disdrodb.l0.check_metadata module
disdrodb.l0.check_readers module
disdrodb.l0.check_standards module
disdrodb.l0.io module
- disdrodb.l0.io.check_glob_pattern(pattern: str) None[source]
Check if the input parameters is a string and if it can be used as pattern.
- Parameters
pattern (str) – String to be checked.
- Raises
TypeError – The input parameter is not a string.
ValueError – The input parameter can not be used as pattern.
- disdrodb.l0.io.check_glob_patterns(patterns: Union[str, list]) list[source]
Check if glob patterns are valids.
- disdrodb.l0.io.check_processed_dir(processed_dir)[source]
Check input, format and validity of the directory path
- Parameters
processed_dir (str) – Path of the processed directory
- Returns
Path of the processed directory
- Return type
str
- disdrodb.l0.io.check_raw_dir(raw_dir: str, verbose: bool = False) None[source]
Check validity of raw_dir.
Steps: 1. Check that ‘raw_dir’ is a valid directory path 2. Check that ‘raw_dir’ follows the expect directory structure 3. Check that each station_name directory contains data 4. Check that for each station_name the mandatory metadata.yml is specified. 4. Check that for each station_name the mandatory issue.yml is specified.
- Parameters
raw_dir (str) – Input raw directory
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- disdrodb.l0.io.create_directory_structure(processed_dir, product_level, station_name, force, verbose=False)[source]
Create directory structure for L0B and higher DISDRODB products.
- disdrodb.l0.io.create_initial_directory_structure(raw_dir, processed_dir, station_name, force, verbose=False, product_level='L0A')[source]
Create directory structure for the first L0 DISDRODB product.
If the input data are raw text files –> product_level = “L0A” (run_l0a) If the input data are raw netCDF files –> product_level = “L0B” (run_l0b_nc)
- disdrodb.l0.io.get_L0A_dir(processed_dir: str, station_name: str) str[source]
Define L0A directory.
- Parameters
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0A directory path.
- Return type
str
- disdrodb.l0.io.get_L0A_fname(df, processed_dir, station_name: str) str[source]
Define L0A file name.
- Parameters
df (pd.DataFrame) – L0A DataFrame
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0A file name.
- Return type
str
- disdrodb.l0.io.get_L0A_fpath(df: DataFrame, processed_dir: str, station_name: str) str[source]
Define L0A file path.
- Parameters
df (pd.DataFrame) – L0A DataFrame.
processed_dir (str) – Path of the processed directory.
station_name (str) – Name of the station.
- Returns
L0A file path.
- Return type
str
- disdrodb.l0.io.get_L0B_dir(processed_dir: str, station_name: str) str[source]
Define L0B directory.
- Parameters
processed_dir (str) – Path of the processed directory
station_name (int) – Name of the station
- Returns
Path of the L0B directory
- Return type
str
- disdrodb.l0.io.get_L0B_fname(ds, processed_dir, station_name: str) str[source]
Define L0B file name.
- Parameters
ds (xr.Dataset) – L0B xarray Dataset
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0B file name.
- Return type
str
- disdrodb.l0.io.get_L0B_fpath(ds: Dataset, processed_dir: str, station_name: str, l0b_concat=False) str[source]
Define L0B file path.
- Parameters
ds (xr.Dataset) – L0B xarray Dataset.
processed_dir (str) – Path of the processed directory.
station_name (str) – ID of the station
l0b_concat (bool) – If False, the file is specified inside the station directory. If True, the file is specified outside the station directory.
- Returns
L0B file path.
- Return type
str
- disdrodb.l0.io.get_campaign_name(path: str) str[source]
Return the campaign name from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Name of the campaign.
- Return type
str
- disdrodb.l0.io.get_data_source(path: str) str[source]
Return the data_source from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Name of the campaign.
- Return type
str
- disdrodb.l0.io.get_dataframe_min_max_time(df: DataFrame)[source]
Retrieves dataframe starting and ending time.
- Parameters
df (pd.DataFrame) – Input dataframe
- Returns
(starting_time, ending_time)
- Return type
tuple
- disdrodb.l0.io.get_dataset_min_max_time(ds: Dataset)[source]
Retrieves dataset starting and ending time.
- Parameters
ds (xr.Dataset) – Input dataset
- Returns
(starting_time, ending_time)
- Return type
tuple
- disdrodb.l0.io.get_disdrodb_dir(path: str) str[source]
Return the disdrodb base directory from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Path of the DISDRODB directory.
- Return type
str
- disdrodb.l0.io.get_disdrodb_path(path: str) str[source]
Return the path fron the disdrodb_dir directory.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Path inside the DISDRODB archive. Format: DISDRODB/<Raw or Processed>/<data_source>/…
- Return type
str
- disdrodb.l0.io.get_l0a_file_list(processed_dir, station_name, debugging_mode)[source]
Retrieve L0A files for a give station.
- Parameters
processed_dir (str) – Directory of the campaign where to search for the L0A files. Format <..>/DISDRODB/Processed/<data_source>/<campaign_name>
station_name (str) – ID of the station
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns
list_fpaths – List of L0A file paths.
- Return type
list
- disdrodb.l0.io.get_raw_file_list(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]
Get the list of files from a directory based on input parameters.
Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.
- Parameters
raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<data_source>/<campaign_name>
station_name (str) – ID of the station
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns
list_fpaths – List of files file paths.
- Return type
list
- disdrodb.l0.io.read_L0A_dataframe(fpaths: Union[str, list], verbose: bool = False, debugging_mode: bool = False) DataFrame[source]
Read DISDRODB L0A Apache Parquet file(s).
- Parameters
fpaths (str or list) – Either a list or a single filepath .
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. If fpaths is a list, it reads only the first 3 files For each file it select only the first 100 rows. The default is False.
- Returns
L0A Dataframe.
- Return type
pd.DataFrame
disdrodb.l0.issue module
- class disdrodb.l0.issue.NoDatesSafeLoader(stream)[source]
Bases:
SafeLoader- classmethod remove_implicit_resolver(tag_to_remove)[source]
Remove implicit resolvers for a particular tag
Takes care not to modify resolvers in super classes.
We want to load datetimes as strings, not dates, because we go on to serialise as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.
- disdrodb.l0.issue.check_issue_file(fpath: str) None[source]
Check issue YAML file validity.
- Parameters
fpath (str) – Issue YAML file path.
- disdrodb.l0.issue.check_timesteps(timesteps)[source]
Check timesteps validity.
It expects timesteps string in YYYY-mm-dd HH:MM:SS format with second accuracy. If timesteps is None, return None.
- disdrodb.l0.issue.is_numpy_array_datetime(arr)[source]
Check if the numpy array contains datetime64
- Parameters
arr (numpy array) – Numpy array to check.
- Returns
Numpy array checked.
- Return type
numpy array
- disdrodb.l0.issue.is_numpy_array_string(arr)[source]
Check if the numpy array contains strings
- Parameters
arr (numpy array) – Numpy array to check.
- disdrodb.l0.issue.load_yaml_without_date_parsing(filepath)[source]
Read a YAML file without converting automatically date string to datetime.
- disdrodb.l0.issue.read_issue(raw_dir: str, station_name: str) dict[source]
Read YAML issue file.
- Parameters
raw_dir (str) – Path of the campaign raw directory.
station_name (int) – Station name.
- Returns
Issue dictionary.
- Return type
dict
- disdrodb.l0.issue.read_issue_file(fpath: str) dict[source]
Read YAML issue file.
- Parameters
fpath (str) – Filepath of the issue YAML.
- Returns
Issue dictionary.
- Return type
dict
disdrodb.l0.l0_processing module
disdrodb.l0.l0_reader module
disdrodb.l0.l0a_processing module
disdrodb.l0.l0b_concat module
disdrodb.l0.l0b_processing module
disdrodb.l0.metadata module
- disdrodb.l0.metadata.add_missing_metadata_keys(metadata)[source]
Add missing keys to the metadata dictionary.
- disdrodb.l0.metadata.check_metadata_compliance(disdrodb_dir, data_source, campaign_name, station_name)[source]
Check DISDRODB metadata compliance.
- disdrodb.l0.metadata.create_campaign_default_metadata(disdrodb_dir, campaign_name, data_source)[source]
Create default YAML metadata files for all stations within a campaign.
Use the function with caution to avoid overwrite existing YAML files.
- disdrodb.l0.metadata.get_default_metadata_dict() dict[source]
Get DISDRODB metadata default values.
- Returns
Dictionary of attibutes standard
- Return type
dict
- disdrodb.l0.metadata.get_metadata_missing_keys(metadata)[source]
Return the DISDRODB metadata keys which are missing.
- disdrodb.l0.metadata.get_metadata_unvalid_keys(metadata)[source]
Return the DISDRODB metadata keys which are not valid.
- disdrodb.l0.metadata.get_valid_metadata_keys() list[source]
Get DISDRODB valid metadata list.
- Returns
List of valid metadata keys
- Return type
list
- disdrodb.l0.metadata.read_metadata(campaign_dir: str, station_name: str) dict[source]
Read YAML metadata file.
- Parameters
raw_dir (str) – Path of the raw directory
station_name (int) – Id of the station.
- Returns
Dictionnary of the metadata.
- Return type
dict
- disdrodb.l0.metadata.remove_unvalid_metadata_keys(metadata)[source]
Remove unvalid keys from the metadata dictionary.
- disdrodb.l0.metadata.sort_metadata_dictionary(metadata)[source]
Sort the keys of the metadata dictionary by valid_metadata_keys list order.