Step-by-step guide for DISDRODB reader preparation

This notebook aims to guide you through creating the reader for the raw files logged by a disdrometer device.

In first place, this notebook will provide you with functions that will display and enable to investigate the content of your raw data files.

Successively, you will define a series of parameters defining the reader behaviour. These pieces of code will be consolidated in the `reader_template.py <https://github.com/ltelab/disdrodb/blob/main/disdrodb/L0/readers/reader_template.py>`__ file to generate a DISDRODB L0 reader.

In this notebook, we uses a lightweight dataset for illustratory purposes. You may use it and readapt it for exploring your own dataset, when preparing a new reader.

Following the documentation chapter `Add a new reader <https://disdrodb.readthedocs.io/en/latest/readers.html#adding-a-new-reader>`__, we will follow 3 steps:

  • Step 1 : We set up the data within the correct directory structure

  • Step 2 : We start digging into the data to set up the transformation parameters.

  • Step 3 : We create the new reader

Step 1: Set up the data within the correct directory structure

For this example, you will find the sample data in the folder `data <https://github.com/ltelab/disdrodb/tree/main/data/DISDRODB>`__ of the disdrodb repository. It corresponds to some measurements taken at two stations (station_name_1 and station_name_2) during two days of a field campaign led by the EPFL LTE laboratory.

📁 DISDRODB
├── 📁 Raw
    ├── 📁 DATA_SOURCE
        ├── 📁 CAMPAIGN_NAME
            ├── 📁 data
                ├── 📁 station_name_1
                ├── 📜 file60_20180817.dat.gz
                ├── 📜 file60_20180818.dat.gz
                ├── 📁 station_name_2
                ├── 📜 file61_20180817.dat.gz
                ├── 📜 file61_20180818.dat.gz
            ├── 📁 info
            ├── 📁 issue
                ├── 📜 station_name_1.yml
                ├── 📜 station_name_2.yml
            ├── 📁 metadata
                ├── 📜 station_name_1.yml
                ├── 📜 station_name_2.yml

This structure fulfills the requirements described in the documentation to Add a new reader.

Step 2: Read and analyse the data

Once the dataset and metadata are set up in the correct directory structure, we can now start analysing our data.

The objectives of Step 2 is to define the specifications to read the raw data into a dataframe and ensure that the dataframe columns match the DISDRODB standards.

At the end, you should be able to generate Apache Parquet files from your input raw data.


Here we load the modules and packages required. Nothing must be changed here.

[2]:
# Define project root directory
import os

root_path = os.path.dirname(os.getcwd())  # something like /home/ghiggi/Projects/disdrodb
print(root_path)
/home/ghiggi/Projects/disdrodb
[3]:
# If you didn't installed disdrodb, but you are running this tutorial within the cloned repository:
import sys

sys.path.insert(0, root_path)
[4]:
import logging
import pandas as pd

# Directory
from disdrodb.l0.io import (
    get_campaign_name,
    create_initial_directory_structure,
    get_raw_file_list,
)


# Tools to develop the reader
from disdrodb.l0.template_tools import (
    check_column_names,
    infer_df_str_column_names,
    print_df_first_n_rows,
    print_df_random_n_rows,
    print_df_column_names,
    print_valid_L0_column_names,
    get_df_columns_unique_values_dict,
    print_df_columns_unique_values,
    print_df_summary_stats,
)

# L0A processing
from disdrodb.l0.l0a_processing import (
    read_raw_data,
    read_raw_file_list,
    cast_column_dtypes,
    write_l0a,
)

# L0B processing
from disdrodb.l0.l0b_processing import (
    retrieve_l0b_arrays,
    create_l0b_from_l0a,
    set_encodings,
)

# Metadata
from disdrodb.l0.metadata import read_metadata

# Standards
from disdrodb.l0.check_standards import check_sensor_name, check_l0a_column_names

1. Define paths and running parameters

In the following section, define the raw and processed directory paths. This may be changed if you are using another folder.

NB: - In the real use case, the DATA_SOURCE and CAMPAIGN_NAMEshould be replaced by meaningul names ! - The raw_dir and processed_dir must end with the same CAMPAIGN_NAME (in upper case format)

[13]:
disdrodb_dir = os.path.join(root_path, "data", "DISDRODB")
raw_dir = os.path.join(disdrodb_dir, "Raw", "DATA_SOURCE", "CAMPAIGN_NAME")
processed_dir = os.path.join(disdrodb_dir, "Processed", "DATA_SOURCE", "CAMPAIGN_NAME")
assert os.path.exists(raw_dir), "Raw directory does not exist"
print(f"raw_dir: {raw_dir}")
print(f"processed_dir: {processed_dir}")
raw_dir: /home/ghiggi/Projects/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME
processed_dir: /home/ghiggi/Projects/disdrodb/data/DISDRODB/Processed/DATA_SOURCE/CAMPAIGN_NAME

Then we define the reader execution parameters. When the new reader will be created, these parameters will be become the reader function arguments. Please have a look at the documentation to get a full description.

[15]:
force = True
parallel = False
verbose = True
debugging_mode = True
sensor_name = "OTT_Parsivel"

3. Selection of the station

In this example, we choose to implement and run the reader for station station_name_1. However, feel free to change the station name :)

[16]:
station_name = "station_name_1"

2. Initialization

We initiate some checks, and get some variable. Nothing must be changed here.

[17]:
# Create directory structure
create_initial_directory_structure(
    raw_dir=raw_dir,
    processed_dir=processed_dir,
    station_name=station_name,
    force=force,
    verbose=False,
)

Please, be sure to run the cell above only one time. If it is run many times, the log file blocks the folder creation.

4. Get the list of file to process

We now list all files that are in selected station. Here we need to specify the glob pattern that enables to select all the relevant data files. Since the files in this case study are named like file<XXX>_<TIME>.dat.gz, we define the glob pattern "*.dat*". Note that also "*.dat.gz" or "file*.dat.gz" would have worked.

[18]:
glob_pattern = "*.dat*"

file_list = get_raw_file_list(
    raw_dir=raw_dir,
    station_name=station_name,
    glob_patterns=glob_pattern,
    verbose=verbose,
    debugging_mode=debugging_mode,
)

print(file_list)
 -  - 2 files to process in /home/ghiggi/Projects/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1
['/home/ghiggi/Projects/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180817.dat.gz', '/home/ghiggi/Projects/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180818.dat.gz']

🚨 The glob_pattern variable definition will be transferred into the `reader_template.py <https://github.com/ltelab/disdrodb/blob/main/disdrodb/L0/readers/reader_template.py>`__ file at the end of this notebook.

Remember that the glob_pattern variable depends on the file extensions of your dataset !!!

5. Retrieve metadata from YAML files

We now load the metadata file of the station.

If the name of the station is not correctly defined, an error message is raised.

[19]:
# Retrieve metadata
attrs = read_metadata(campaign_dir=raw_dir, station_name=station_name)

# Retrieve sensor name
sensor_name = attrs["sensor_name"]
check_sensor_name(sensor_name)

5. Load the one file into a dataframe

In the reader_kwargs dictionary, you may set any arguments that need to be passed to read the raw text file into a pandas.DataFrame.

[20]:
reader_kwargs = {}

# - Define delimiter
reader_kwargs["delimiter"] = ","

# - Avoid first column to become df index !!!
reader_kwargs["index_col"] = False

# Since column names are expected to be passed explicitly, header is set to None
reader_kwargs["header"] = None

# - Number of rows to be skipped at the beginning of the file
reader_kwargs["skiprows"] = None

# - Define behaviour when encountering bad lines
reader_kwargs["on_bad_lines"] = "skip"

# - Define reader engine
#   - C engine is faster
#   - Python engine is more feature-complete
reader_kwargs["engine"] = "python"

# - Define on-the-fly decompression of on-disk data
#   - Available: gzip, bz2, zip
reader_kwargs["compression"] = "infer"

# - Strings to recognize as NA/NaN and replace with standard NA flags
#   - Already included: ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’,
#                       ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’,
#                       ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’
reader_kwargs["na_values"] = ["na", "", "error"]


# -----------------------------------------------------------
# Select first file
filepath = file_list[0]

# Try to read the raw file
df_raw = read_raw_data(filepath, column_names=None, reader_kwargs=reader_kwargs)
# Print the dataframe
print(f"Dataframe for the file {os.path.basename(filepath)} :")
display(df_raw)
Dataframe for the file file60_20180817.dat.gz :
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
0 362511 4612.0301 00847.4977 01-08-2018 12:44:30 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
1 362512 4612.0301 00847.4978 01-08-2018 12:45:01 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
2 362513 4612.0301 00847.4985 01-08-2018 12:45:30 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
3 362514 4612.0305 00847.4990 01-08-2018 12:46:01 NaN OK 0000.000 0056.49 00 00 ... 035 0.05 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
4 362515 4612.0303 00847.4992 01-08-2018 12:46:31 NaN OK 0000.000 0056.49 00 00 ... 034 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4736 367249 4612.0313 00847.4956 03-08-2018 04:13:25 NaN OK 0000.000 0056.71 00 00 ... 015 0.06 24.9 0 005.671 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
4737 367250 4612.0313 00847.4955 03-08-2018 04:13:56 NaN OK 0000.000 0056.71 00 00 ... 015 0.06 24.9 0 005.671 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
4738 367251 4612.0313 00847.4955 03-08-2018 04:14:26 NaN OK 0000.000 0056.71 00 00 ... 015 0.06 24.9 0 005.671 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
4739 367252 4612.0313 00847.4954 03-08-2018 04:14:55 NaN OK 0000.000 0056.71 00 00 ... 015 0.06 24.9 0 005.671 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
4740 367253 4612.0313 00847.4954 03-08-2018 04:15:25 NaN OK 0000.000 0056.71 00 00 ... 015 0.07 24.9 0 005.671 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0

4741 rows × 24 columns

[22]:
print("Column names:", df_raw.columns)
print("Row Index:", df_raw.index)
Column names: Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23],
           dtype='int64')
Row Index: RangeIndex(start=0, stop=4741, step=1)

Here we expect the df_raw to have: - numeric column names (i.e. Int64Index) - numeric row index (i.e. RangeIndex)

If the structure of the dataframe looks fine (no header and no row index), we are on the good track !

Depending on the schema of your data, this reader_kwargs dictionary may be fairly different from the one above.

🚨 The reader_kwargs dictionary will be transferred to the `reader_template.py <https://github.com/ltelab/disdrodb/blob/main/disdrodb/L0/readers/reader_template.py>`__ file at the end of this notebook.

6. Data exploration

The settings for the loading of the data is now ready, we can now load one file and analyse its content to see if there is any errors or inconsistencies.

Here are some instructions :

  • Do not assign column names to the dataframe columns yet

  • Do not assign a dtype to the dataframe columns yet

  • Possibly look at multiple files ;)

We print the content first 3 rows : (Feel free to change the value of n to see more/less rows)

[23]:
print_df_first_n_rows(df_raw, n=2, column_names=False)
 - Column 0 :
      ['362511' '362512' '362513']
 - Column 1 :
      ['4612.0301' '4612.0301' '4612.0301']
 - Column 2 :
      ['00847.4977' '00847.4978' '00847.4985']
 - Column 3 :
      ['01-08-2018 12:44:30' '01-08-2018 12:45:01' '01-08-2018 12:45:30']
 - Column 4 :
      [nan nan nan]
 - Column 5 :
      ['OK' 'OK' 'OK']
 - Column 6 :
      ['0000.000' '0000.000' '0000.000']
 - Column 7 :
      ['0056.49' '0056.49' '0056.49']
 - Column 8 :
      ['00' '00' '00']
 - Column 9 :
      ['00' '00' '00']
 - Column 10 :
      ['-9.999' '-9.999' '-9.999']
 - Column 11 :
      ['9999' '9999' '9999']
 - Column 12 :
      ['12611' '12617' '12600']
 - Column 13 :
      ['00000' '00000' '00000']
 - Column 14 :
      ['035' '035' '035']
 - Column 15 :
      ['0.06' '0.06' '0.06']
 - Column 16 :
      ['24.9' '24.9' '24.9']
 - Column 17 :
      ['0' '0' '0']
 - Column 18 :
      ['005.649' '005.649' '005.649']
 - Column 19 :
      ['000' '000' '000']
 - Column 20 :
      ['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
 - Column 21 :
      ['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
 - Column 22 :
      ['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
 - Column 23 :
      ['0' '0' '0']
[24]:
df_raw.head(3)
[24]:
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
0 362511 4612.0301 00847.4977 01-08-2018 12:44:30 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
1 362512 4612.0301 00847.4978 01-08-2018 12:45:01 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0
2 362513 4612.0301 00847.4985 01-08-2018 12:45:30 NaN OK 0000.000 0056.49 00 00 ... 035 0.06 24.9 0 005.649 000 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 0

3 rows × 24 columns

We print the content of n rows picked randomly :

[25]:
print_df_random_n_rows(df_raw, n=6, with_column_names=False)
- Column 0 : ['365205' '363869' '366700' '366371' '366659' '363330']
- Column 1 : ['4612.0319' '4612.0293' '4612.0293' '4612.0312' '4612.0305' '4612.0328']
- Column 2 : ['00847.4989' '00847.4946' '00847.4936' '00847.4958' '00847.4923'
 '00847.4942']
- Column 3 : ['02-08-2018 11:11:31' '02-08-2018 00:03:31' '02-08-2018 23:39:01'
 '02-08-2018 20:54:30' '02-08-2018 23:18:31' '01-08-2018 19:34:01']
- Column 4 : [nan nan nan nan nan nan]
- Column 5 : ['OK' 'OK' 'OK' 'OK' 'OK' 'OK']
- Column 6 : ['0000.000' '0000.000' '0000.000' '0000.000' '0000.000' '0000.000']
- Column 7 : ['0056.67' '0056.67' '0056.71' '0056.71' '0056.71' '0056.67']
- Column 8 : ['00' '00' '00' '00' '00' '00']
- Column 9 : ['00' '00' '00' '00' '00' '00']
- Column 10 : ['-9.999' '-9.999' '-9.999' '-9.999' '-9.999' '-9.999']
- Column 11 : ['9999' '9999' '9999' '9999' '9999' '9999']
- Column 12 : ['12628' '12562' '11699' '12305' '11694' '12501']
- Column 13 : ['00000' '00000' '00000' '00000' '00000' '00000']
- Column 14 : ['032' '017' '016' '017' '016' '018']
- Column 15 : ['0.05' '0.06' '0.06' '0.06' '0.05' '0.06']
- Column 16 : ['24.9' '24.9' '24.9' '24.9' '24.9' '24.9']
- Column 17 : ['0' '0' '0' '0' '0' '0']
- Column 18 : ['005.667' '005.667' '005.671' '005.671' '005.671' '005.667']
- Column 19 : ['000' '000' '000' '000' '000' '000']
- Column 20 : ['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
- Column 21 : ['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
- Column 22 : ['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
- Column 23 : ['0' '0' '0' '0' '0' '0']

Get the number of column :

[26]:
len(df_raw.columns)
[26]:
24

Look at unique values for a single column :

[27]:
print_df_columns_unique_values(df_raw, column_indices=11, column_names=False)
 - Column 11 :
      ['0824', '0906', '1363', '1397', '2921', '3203', '3326', '3816', '4465', '9999']

Look at unique values for a few columns :

Note: Use column_indices=None to get the unique values for all columns

[28]:
print_df_columns_unique_values(df_raw, column_indices=slice(10, 12), column_names=False)
 - Column 10 :
      ['-9.999', '02.669', '04.241', '04.745', '04.826', '04.879', '05.430', '06.095', '06.220', '07.415', '08.436', '08.489', '08.506', '08.724', '08.956', '09.079', '09.894', '10.057', '10.567', '11.705', '12.097', '12.390', '12.923', '13.114', '13.407', '13.684', '14.324', '15.060', '16.530', '16.636', '16.668', '17.194', '17.382', '17.829', '17.918', '18.334', '18.655', '19.526', '20.329', '21.134', '21.426', '23.098', '23.664', '23.760', '24.472', '25.473', '25.957', '29.270', '31.271', '32.255', '33.844', '36.196']
 - Column 11 :
      ['0824', '0906', '1363', '1397', '2921', '3203', '3326', '3816', '4465', '9999']

Get the unique values as dictionary

[29]:
get_df_columns_unique_values_dict(df_raw, column_indices=slice(10, 12), column_names=False)
[29]:
{'Column 10': ['-9.999',
  '02.669',
  '04.241',
  '04.745',
  '04.826',
  '04.879',
  '05.430',
  '06.095',
  '06.220',
  '07.415',
  '08.436',
  '08.489',
  '08.506',
  '08.724',
  '08.956',
  '09.079',
  '09.894',
  '10.057',
  '10.567',
  '11.705',
  '12.097',
  '12.390',
  '12.923',
  '13.114',
  '13.407',
  '13.684',
  '14.324',
  '15.060',
  '16.530',
  '16.636',
  '16.668',
  '17.194',
  '17.382',
  '17.829',
  '17.918',
  '18.334',
  '18.655',
  '19.526',
  '20.329',
  '21.134',
  '21.426',
  '23.098',
  '23.664',
  '23.760',
  '24.472',
  '25.473',
  '25.957',
  '29.270',
  '31.271',
  '32.255',
  '33.844',
  '36.196'],
 'Column 11': ['0824',
  '0906',
  '1363',
  '1397',
  '2921',
  '3203',
  '3326',
  '3816',
  '4465',
  '9999']}

7. Columns name

Now we have validated the content of our data. It’s time to care about its structure (column names).

The function infer_df_str_column_names() tries to guess the column name based on string patterns according to L0A_encodings.yml and the type of sensor.

[30]:
infer_df_str_column_names(df_raw, sensor_name=sensor_name)
[30]:
{0: [],
 1: [],
 2: [],
 3: [],
 4: [],
 5: [],
 6: ['rainfall_rate_32bit'],
 7: ['rainfall_accumulated_32bit', 'rainfall_accumulated_16bit'],
 8: ['weather_code_synop_4680', 'weather_code_synop_4677'],
 9: ['weather_code_synop_4680', 'weather_code_synop_4677'],
 10: ['reflectivity_32bit', 'rainfall_rate_16bit'],
 11: ['mor_visibility'],
 12: ['number_particles', 'sample_interval', 'laser_amplitude'],
 13: ['number_particles', 'sample_interval', 'laser_amplitude'],
 14: ['error_code', 'sensor_temperature'],
 15: ['sensor_heating_current'],
 16: ['sensor_battery_voltage'],
 17: ['sensor_status'],
 18: ['rainfall_amount_absolute_32bit'],
 19: ['error_code', 'sensor_temperature'],
 20: ['raw_drop_average_velocity', 'raw_drop_concentration'],
 21: ['raw_drop_average_velocity', 'raw_drop_concentration'],
 22: ['raw_drop_number'],
 23: ['sensor_status']}

This can help us to define later the column_names list.

As reference, here is the list of valid columns name (taken from L0A_encodings.yml):

[31]:
print_valid_L0_column_names(sensor_name)
['rainfall_rate_32bit', 'rainfall_accumulated_32bit', 'weather_code_synop_4680', 'weather_code_synop_4677', 'weather_code_metar_4678', 'weather_code_nws', 'reflectivity_32bit', 'mor_visibility', 'sample_interval', 'laser_amplitude', 'number_particles', 'sensor_temperature', 'sensor_serial_number', 'firmware_iop', 'firmware_dsp', 'sensor_heating_current', 'sensor_battery_voltage', 'sensor_status', 'start_time', 'sensor_time', 'sensor_date', 'station_name', 'station_number', 'rainfall_amount_absolute_32bit', 'error_code', 'rainfall_rate_16bit', 'rainfall_rate_12bit', 'rainfall_accumulated_16bit', 'reflectivity_16bit', 'raw_drop_concentration', 'raw_drop_average_velocity', 'raw_drop_number']

It’s time now to define our current column names :

Hint to define the names : * get information from the disdrometer user guide and the data logger employed. * use infer_df_str_column_names() to help you * analyse the content column after column with print_df_columns_unique_values()

[32]:
column_names = [
    "unknown1",
    "unknown2",
    "unknown3",
    "timestep",
    "unknown4",
    "unknown5",
    "rainfall_rate_32bit",
    "rainfall_accumulated_32bit",
    "weather_code_synop_4680",
    "weather_code_synop_4677",
    "reflectivity_32bit",
    "mor_visibility",
    "laser_amplitude",
    "number_particles",
    "sensor_temperature",
    "sensor_heating_current",
    "sensor_battery_voltage",
    "sensor_status",
    "rainfall_amount_absolute_32bit",
    "error_code",
    "raw_drop_concentration",
    "raw_drop_average_velocity",
    "raw_drop_number",
    "unknown6",
]

🚨 The column_names list will be transferred to the reader_template.py file at the end of this notebook.

Check the validity of your definition

[33]:
check_column_names(column_names, sensor_name)
The following columns do no met the DISDRODB standards: ['unknown2', 'timestep', 'unknown4', 'unknown1', 'unknown6', 'unknown3', 'unknown5'].
Please remove such columns within the df_sanitizer_fun
Please be sure to create the 'time' column within the df_sanitizer_fun.
The 'time' column must be datetime with resolution in seconds (dtype='M8[s]').

Ok, fair enough. There are columns that need to be removed, and we need to also define a column “time” with dtype datetime to meet the DISDRODB standards.

These points will be addressed in Section 9 of this notebook !

8. Read the dataframe with correct columns name

We can now create a new dataframe with the columns name :

[34]:
df = read_raw_data(filepath=filepath, column_names=column_names, reader_kwargs=reader_kwargs)

And print the dataframe column names :

[35]:
print_df_column_names(df)
 - Column 0 : unknown1
 - Column 1 : unknown2
 - Column 2 : unknown3
 - Column 3 : timestep
 - Column 4 : unknown4
 - Column 5 : unknown5
 - Column 6 : rainfall_rate_32bit
 - Column 7 : rainfall_accumulated_32bit
 - Column 8 : weather_code_synop_4680
 - Column 9 : weather_code_synop_4677
 - Column 10 : reflectivity_32bit
 - Column 11 : mor_visibility
 - Column 12 : laser_amplitude
 - Column 13 : number_particles
 - Column 14 : sensor_temperature
 - Column 15 : sensor_heating_current
 - Column 16 : sensor_battery_voltage
 - Column 17 : sensor_status
 - Column 18 : rainfall_amount_absolute_32bit
 - Column 19 : error_code
 - Column 20 : raw_drop_concentration
 - Column 21 : raw_drop_average_velocity
 - Column 22 : raw_drop_number
 - Column 23 : unknown6

9. Perform further tests and analysis to check the correctness of ``column_names``

You can for example check some statistics for a specific column.

[36]:
column_name = "rainfall_rate_32bit"
array_of_values = df.loc[:, [column_name]].astype("float")
print_df_summary_stats(array_of_values)
 - Column 0 ( rainfall_rate_32bit ):

mean  0.005426
min   0.000000
25%   0.000000
50%   0.000000
75%   0.000000
max   2.881000

10. Final columns formatting

[37]:
check_l0a_column_names(df, sensor_name=sensor_name)
The following columns do no met the DISDRODB standards: ['unknown2', 'timestep', 'unknown4', 'unknown1', 'unknown6', 'unknown3', 'unknown5']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[37], line 1
----> 1 check_l0a_column_names(df, sensor_name=sensor_name)

File ~/Projects/disdrodb/disdrodb/L0/check_standards.py:181, in check_l0a_column_names(df, sensor_name)
    177     msg = (
    178         f"The following columns do no met the DISDRODB standards: {unvalid_columns}"
    179     )
    180     logger.error(msg)
--> 181     raise ValueError(msg)
    182 # --------------------------------------------
    183 # Check time column is present
    184 if "time" not in df_columns:

ValueError: The following columns do no met the DISDRODB standards: ['unknown2', 'timestep', 'unknown4', 'unknown1', 'unknown6', 'unknown3', 'unknown5']
[38]:
check_column_names(column_names, sensor_name)
The following columns do no met the DISDRODB standards: ['unknown2', 'timestep', 'unknown4', 'unknown1', 'unknown6', 'unknown3', 'unknown5'].
Please remove such columns within the df_sanitizer_fun
Please be sure to create the 'time' column within the df_sanitizer_fun.
The 'time' column must be datetime with resolution in seconds (dtype='M8[s]').

Now, it’s time to remove all the columns that does not match the DISDRODB standard.

[39]:
df = df.drop(columns=["unknown1", "unknown2", "unknown3", "unknown4", "unknown5", "unknown6"])

It’s also time to define the column time which is requested by the DISDRODB standard

[40]:
df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S")
df = df.drop(columns=["timestep"])

Check column names met DISDRODB standards after custom processing :

[41]:
check_l0a_column_names(df, sensor_name=sensor_name)

Check the dataframe looks as desired :

[42]:
print_df_column_names(df)
 - Column 0 : rainfall_rate_32bit
 - Column 1 : rainfall_accumulated_32bit
 - Column 2 : weather_code_synop_4680
 - Column 3 : weather_code_synop_4677
 - Column 4 : reflectivity_32bit
 - Column 5 : mor_visibility
 - Column 6 : laser_amplitude
 - Column 7 : number_particles
 - Column 8 : sensor_temperature
 - Column 9 : sensor_heating_current
 - Column 10 : sensor_battery_voltage
 - Column 11 : sensor_status
 - Column 12 : rainfall_amount_absolute_32bit
 - Column 13 : error_code
 - Column 14 : raw_drop_concentration
 - Column 15 : raw_drop_average_velocity
 - Column 16 : raw_drop_number
 - Column 17 : time
[43]:
print_df_random_n_rows(df, n=5)
- Column 0 (rainfall_rate_32bit) : ['0000.000' '0000.000' '0000.000' '0000.000' '0000.114']
- Column 1 (rainfall_accumulated_32bit) : ['0056.67' '0056.52' '0056.67' '0056.71' '0056.67']
- Column 2 (weather_code_synop_4680) : ['00' '00' '00' '00' '57']
- Column 3 (weather_code_synop_4677) : ['00' '00' '00' '00' '58']
- Column 4 (reflectivity_32bit) : ['-9.999' '-9.999' '-9.999' '-9.999' '10.567']
- Column 5 (mor_visibility) : ['9999' '9999' '9999' '9999' '9999']
- Column 6 (laser_amplitude) : ['12631' '12655' '12606' '11551' '12411']
- Column 7 (number_particles) : ['00000' '00000' '00003' '00000' '00022']
- Column 8 (sensor_temperature) : ['035' '027' '036' '015' '017']
- Column 9 (sensor_heating_current) : ['0.06' '0.06' '0.06' '0.06' '0.06']
- Column 10 (sensor_battery_voltage) : ['24.9' '24.9' '24.9' '24.9' '24.9']
- Column 11 (sensor_status) : ['0' '0' '0' '0' '0']
- Column 12 (rainfall_amount_absolute_32bit) : ['005.667' '005.652' '005.667' '005.671' '005.667']
- Column 13 (error_code) : ['000' '000' '000' '000' '000']
- Column 14 (raw_drop_concentration) : ['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,02.130,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,00.405,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
 '-9.999,-9.999,01.371,02.060,01.994,01.540,01.633,01.738,01.377,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
- Column 15 (raw_drop_average_velocity) : ['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.799,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,01.700,00.000,00.000,00.000,00.000,'
 '00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
 '00.000,00.000,02.200,02.359,02.679,03.000,03.733,03.849,04.400,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
- Column 16 (raw_drop_number) : ['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
 '000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,002,003,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,002,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,000,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,002,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,001,001,002,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
- Column 17 (time) : ['2018-02-08T10:42:00.000000000' '2018-01-08T14:34:31.000000000'
 '2018-02-08T10:04:01.000000000' '2018-03-08T00:32:31.000000000'
 '2018-01-08T17:49:00.000000000']
[44]:
print_df_columns_unique_values(df, column_indices=2, column_names=True)
 - Column 2 ( weather_code_synop_4680 ):
      ['00', '57', '61', '62', '71', '72', '88']

11. Define the dataframe sanitizer function

The df_sanitizer_fun encapsulate the code specific to each reader/dataset that is required to obtain a dataframe compliants with the DISDRODB standards.

With the data used in this notebook, we need to drop some columns and define the time column !

From the code defined in Section 10, we define the following function:

[45]:
def df_sanitizer_fun(df):
    # Import pandas
    import pandas as pd

    # - Drop unvalid columns
    columns_to_drop = [
        "unknown1",
        "unknown2",
        "unknown3",
        "unknown4",
        "unknown5",
        "unknown6",
    ]

    df = df.drop(columns=columns_to_drop)

    # - Convert timestep column to datetime format
    df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S")
    df = df.drop(columns=["timestep"])

    # - Return the dataframe
    return df

🚨 The df_sanitizer_fun() function will be transfered to the reader_template.py file at the end of this notebook.

12. Now let’s try calling the reader function as it will be called in the DISDRODB L0 reader

  • You may try with increasing number of files (update file_list)

Here we combine all raw files in a single dataframe.

The function read_raw_file_list takes as argument : * file_list : the list of files present in the specified station directory * column_names : the list of column (defined previously) * reader_kwargs : dictionary to data loading into the dataframe (defined previously) * sensor_name : taken from the sensor_name key in the metadata YAML file of the station * df_sanitizer_fun: the function to sanitize the data frame (defined previously)

All these arguments are defined either in the data directory structure, or earlier in the code.

[46]:
subset_file_list = file_list[:1]

df = read_raw_file_list(
    file_list=subset_file_list,
    column_names=column_names,
    reader_kwargs=reader_kwargs,
    sensor_name=sensor_name,
    verbose=verbose,
    df_sanitizer_fun=df_sanitizer_fun,
)
display(df)
 - 1 / 1 processed successfully. File name: /home/ghiggi/Projects/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180817.dat.gz
 -  - 0 of 1 have been skipped.
rainfall_rate_32bit rainfall_accumulated_32bit weather_code_synop_4680 weather_code_synop_4677 reflectivity_32bit mor_visibility laser_amplitude number_particles sensor_temperature sensor_heating_current sensor_battery_voltage sensor_status rainfall_amount_absolute_32bit error_code raw_drop_concentration raw_drop_average_velocity raw_drop_number time
0 0.0 56.490002 0.0 0.0 -9.999 9999.0 12611.0 0.0 35.0 0.06 24.9 0.0 5.649 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-01-08 12:44:30
1 0.0 56.490002 0.0 0.0 -9.999 9999.0 12617.0 0.0 35.0 0.06 24.9 0.0 5.649 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-01-08 12:45:01
2 0.0 56.490002 0.0 0.0 -9.999 9999.0 12600.0 0.0 35.0 0.06 24.9 0.0 5.649 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-01-08 12:45:30
3 0.0 56.490002 0.0 0.0 -9.999 9999.0 12603.0 0.0 35.0 0.05 24.9 0.0 5.649 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-01-08 12:46:01
4 0.0 56.490002 0.0 0.0 -9.999 9999.0 12606.0 0.0 34.0 0.06 24.9 0.0 5.649 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-01-08 12:46:31
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4736 0.0 56.709999 0.0 0.0 -9.999 9999.0 11059.0 0.0 15.0 0.06 24.9 0.0 5.671 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-03-08 04:13:25
4737 0.0 56.709999 0.0 0.0 -9.999 9999.0 11175.0 0.0 15.0 0.06 24.9 0.0 5.671 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-03-08 04:13:56
4738 0.0 56.709999 0.0 0.0 -9.999 9999.0 11275.0 0.0 15.0 0.06 24.9 0.0 5.671 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-03-08 04:14:26
4739 0.0 56.709999 0.0 0.0 -9.999 9999.0 11361.0 0.0 15.0 0.06 24.9 0.0 5.671 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-03-08 04:14:55
4740 0.0 56.709999 0.0 0.0 -9.999 9999.0 11492.0 0.0 15.0 0.07 24.9 0.0 5.671 0.0 -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... 00.000,00.000,00.000,00.000,00.000,00.000,00.0... 000,000,000,000,000,000,000,000,000,000,000,00... 2018-03-08 04:15:25

4741 rows × 18 columns

Here we derive the corresponding xr.Dataset object

[47]:
ds = create_l0b_from_l0a(df, attrs, verbose=False)
print(ds)
<xarray.Dataset>
Dimensions:                         (time: 4741, diameter_bin_center: 32,
                                     velocity_bin_center: 32)
Coordinates: (12/13)
  * diameter_bin_center             (diameter_bin_center) float64 0.062 ... 24.5
    diameter_bin_lower              (diameter_bin_center) float64 0.0 ... 23.0
    diameter_bin_upper              (diameter_bin_center) float64 0.1245 ... ...
    diameter_bin_width              (diameter_bin_center) float64 0.125 ... 3.0
  * velocity_bin_center             (velocity_bin_center) float64 0.05 ... 20.8
    velocity_bin_lower              (velocity_bin_center) float64 0.0 ... 19.2
    ...                              ...
    velocity_bin_width              (velocity_bin_center) float64 0.1 ... 3.2
  * time                            (time) datetime64[ns] 2018-01-08T12:44:30...
    crs                             <U5 'WGS84'
    latitude                        float64 46.2
    longitude                       float64 8.792
    altitude                        int64 1671
Data variables: (12/17)
    raw_drop_concentration          (time, diameter_bin_center) float64 0.0 ....
    raw_drop_average_velocity       (time, velocity_bin_center) float64 0.0 ....
    raw_drop_number                 (time, diameter_bin_center, velocity_bin_center) float64 ...
    rainfall_rate_32bit             (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
    rainfall_accumulated_32bit      (time) float32 56.49 56.49 ... 56.71 56.71
    weather_code_synop_4680         (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
    ...                              ...
    sensor_temperature              (time) float32 35.0 35.0 35.0 ... 15.0 15.0
    sensor_heating_current          (time) float32 0.06 0.06 0.06 ... 0.06 0.07
    sensor_battery_voltage          (time) float32 24.9 24.9 24.9 ... 24.9 24.9
    sensor_status                   (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
    rainfall_amount_absolute_32bit  (time) float32 5.649 5.649 ... 5.671 5.671
    error_code                      (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes: (12/64)
    data_source:                     DATA_SOURCE
    campaign_name:                   CAMPAIGN_NAME
    station_name:                    station_name_1
    sensor_name:                     OTT_Parsivel
    reader:                          EPFL/LOCARNO_2018
    raw_data_format:                 raw
    ...                              ...
    doi:
    summary:
    disdrodb_processing_date:        2023-02-23 13:44:29
    disdrodb_product_version:        V0
    disdrodb_software_version:       V0
    disdrodb_product_level:          L0B

which can be saved as DISDRODB L0B netCDF by running the following code:

[48]:
# ds = set_encodings(ds, sensor_name)
# ds.to_netcdf("/path/where/to/save/the/file.nc")

Step 3 : Create the reader

We have now all the elements to start creating the new reader. All the modifications that we did in this notebook must be now transcribed into a DISDRODB L0 reader file.

  1. Copy and paste the `disdrodb\L0\readers\reader_template.py <https://github.com/ltelab/disdrodb/tree/main/disdrodb/L0/readers>`__ into the folder disdrodb\L0\readers\DATA_SOURCE

  2. Rename the copied file <CAMPAIGN_NAME>.py (or i.e. <CAMPAIGN_NAME>_<sensor_acronym>.py if within a single campaign multiple type of sensors have been deployed). This will be the reader name that you need to add to the metadata YAML file of the stations that require such reader.

  3. Within the file, update the portion of code described in the next points 4., 5. and 6.

  4. Add the reader name to the metadata YAML files of the stations.


  1. Update the ``columns_names`` list

    Before :

    column_names = []
    

    After :

    column_names = [
        "unknown1",
        "unknown2",
        "unknown3",
        "timestep",
        "unknown4",
        "unknown5",
        "rainfall_rate_32bit",
        "rainfall_accumulated_32bit",
        "weather_code_synop_4680",
        "weather_code_synop_4677",
        "reflectivity_32bit",
        "mor_visibility",
        "laser_amplitude",
        "number_particles",
        "sensor_temperature",
        "sensor_heating_current",
        "sensor_battery_voltage",
        "sensor_status",
        "rainfall_amount_absolute_32bit",
        "error_code",
        "raw_drop_concentration",
        "raw_drop_average_velocity",
        "raw_drop_number",
        "unknown6",
    ]
    
  1. Update the ``reader_kwargs`` dictionary

Before :

``` python reader_kwargs = {}

```

After :

``` python reader_kwargs = {}

# - Define delimiter reader_kwargs[“delimiter”] = “,”

# - Avoid first column to become df index !!! reader_kwargs[“index_col”] = False

# Since column names are expected to be passed explicitly, header is set to None reader_kwargs[‘header’] = None

# - Number of rows to be skipped at the beginning of the file reader_kwargs[‘skiprows’]= None

# - Define behaviour when encountering bad lines reader_kwargs[“on_bad_lines”] = “skip”

# - Define reader engine # - C engine is faster # - Python engine is more feature-complete reader_kwargs[“engine”] = “python”

# - Define on-the-fly decompression of on-disk data # - Available: gzip, bz2, zip reader_kwargs[“compression”] = “infer”

# - Strings to recognize as NA/NaN and replace with standard NA flags # - Already included: ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, # ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, # ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’ reader_kwargs[“na_values”] = [“na”, ““,”error”]

```

  1. Update the ``df_sanitizer_fun()`` function

    Before:

    def df_sanitizer_fun(df):
        # - Import dask or pandas
        import pandas as pd
    
        # - Add here below the reader required custom code
        pass
    
        # - Return the dataframe
        return df
    

    After :

    def df_sanitizer_fun(df):
        # Import pandas
        import pandas as pd
    
        # - Drop unvalid columns
        columns_to_drop = ["unknown1", "unknown2", "unknown3","unknown4",'unknown5','unknown6']
        df = df.drop(columns=columns_to_drop)
    
        # - Convert timestep column to datetime format
        df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S")
        df = df.drop(columns=["timestep"])
    
        # - Return the dataframe
        return df
    

  1. Run the script

To run the scripts, you need to define the local directory where all the data and metadata are stored. On Windows, it will have a path ending by :nbsphinx-math:`DISDRODB ` On Mac/Linux, it will have a path ending by /DISDRODB

To run the processing of a single station, just run:

run_disdrodb_l0_station <disdrodb_dir> <DATA_SOURCE> <CAMPAIGN_NAME> <STATION_NAME> -l0b True -f True -v True -d False

To run the processing on all stations of a given campaign, just run:

run_disdrodb_l0 <disdrodb_dir> --data_sources <DATA_SOURCE> --campaign_names <CAMPAIGN_NAME> -f True -v True -d False

Have a look here for a full documentation on how to run specific DISDRODB L0 processing.

ATTENTION: For this to command to run, you need to have added the reader name to the station metadata YAML file !


  1. Check if the script has correctly executed

    The output folder should be as follow :

    📁 DISDRODB
    ├── 📁 Processed
       ├── 📁 DATA_SOURCE
          ├── 📁 CAMPAIGN_NAME
              ├── 📁 info
                  ├── 📜 station_name_1.yml
                  ├── 📜 station_name_2.yml
              ├── 📁 L0A
                  ├── 📁 station_name_1
                     ├── 📜 *.parquet
                  ├── 📁 station_name_1
                     ├── 📜 *.parquet
              ├── 📁 L0B
                  ├── 📁 station_name_1
                     ├── 📜 *.nc
                  ├── 📁 station_name_2
                     ├── 📜 *.nc
              ├── 📁 logs
                     ├── 📁 L0A
                          ├── 📁 station_name_1
                              ├── 📜 logs_<raw_file_name>.log
                          ├── 📁 station_name_2
                              ├── 📜 logs_<raw_file_name>.log
                     ├── 📁 L0B
                          ├── 📁 station_name_1
                              ├── 📜 logs_<L0B_file_name>.log
                          ├── 📁 station_name_2
                              ├── 📜 logs_<L0B_file_name>.log
              ├── 📁 metadata
                  ├── 📜 station_name_1.yml
                  ├── 📜 station_name_2.yml
    

Well done 👋👋👋

You should now be able to create a new reader for your own data. Please consider to share the reader for your data with the community by uploading it on the DISDRODB repository.

Have a look at the contributors guidelines for more information and do not hesitate to open a GitHub Issue if you need any clarification.

The DISDRODB team hope you enjoyed this tutorial

[ ]: