Data
Users can make their own data accessible to the community. DISDRODB provides a central storage for code (readers), issues and metadata. However, the raw data itself must be stored by the data provider due to size limitations.
Two types of data must be distinguished:
Station Raw Data:
Stores disdrometer measurements for days, weeks, and years.
This dataset can be very heavy.
No central storage is provided.
Station Metadata and Issues:
Stores a standard set of metadata and measurement issues of each disdrometer
Central storage is provided in the
disdro-dataGit repository.The metadata folder contains a YAML metadata file called
metadata.yml. It has adata_urlkey that references to the remote/online repository where station’s raw data are stored. At this URL, a single zip file provides all data available for a given station.
Data transfer upload and download schema :
Download the DISDRODB metadata archive
You can clone the disdrodb-data repository with:
git clone https://github.com/ltelab/disdrodb-data.git
However, if you plan to add new data or metadata to the archive, first fork the repository on your GitHub account and then clone the forked repository.
Update the DISDRODB metadata archive
Do you want to contribute to the project with your own data? Great! Just follow these steps:
Fork the
disdro-dataGit repository.Create a new branch:
git checkout -b "reader-<data_source>-<campaign_name>"
Add your data source and campaign name directory to the current disdrodb-data structure.
Add your metadata YAML file for each station (following the name format convention
<station_name>.yml) in themetadatadirectory of the campaign directory. We recommend you to copy-paste an existing metadata YAML file to get the correct structure.(Optional) Add your issues YAML files, for each station
station_name.yml, in anissuesdirectory located in the campaign directory. We recommend you to copy-paste an existing issue YAML file to get the correct structure.Commit your changes and push your branch to GitHub.
Test that the integration of your new dataset functions by deleting your data locally and re-fetching it through the process detailed above.
Create a pull request, and wait for a maintainer to accept it!
If you struggle with this process, don’t hesitate to raise an
issue so we can help!
Download the DISDRODB raw data archive
Prerequisite: First clone the disdrodb-data repository as described above to get the folder structure, metadatas and issues.
Objective: You would like to download the raw data referenced in some metadata
<station_name>.yml file.
In order to download the data, you should be in a virtual environment with the disdrodb package installed !
To download all data, just run:
download_disdrodb_archive <the_root_folder> --data_sources <data_source> --campaign_names <campaign_name> --station_names <station_name> --force true
The disdrodb_dir parameter is compulsory and must include the path
of the root folder, ending with DISDRODB. The other parameters are
optional and are meant to restrict the download processing to a specific
data source, campaign, or station.
Parameters:
data_sources(optional): Station data source.campaign_names(optional): Station campaign name.station_names(optional): Name of the stations.force(optional, default =False): a boolean value indicating whether existing files should be overwritten.
To download data from multiple data sources or campaigns, please provide a space-separated string of the data sources or campaigns you require. For example, “EPFL NASA”.
Add new stations raw data to the DISDRODB archive (using Zenodo)
We provide users with a code to upload their station’s raw data to Zenodo.
upload_disdrodb_archive <the_root_folder> --data_sources <data_source> --campaign_names <campaign_name> --station_names <station_name> --platform <name_of_the_platform> --force true
The disdrodb_dir parameter is compulsory and must include the path
of the root folder, ending with DISDRODB. The other parameters are
optional and are meant to restrict the upload processing to a specific
data source, campaign, or station.
Parameters:
data_sources(optional): the source of the data.campaign_names(optional): the name of the campaign.station_names(optional): the name of the station.platform(optional, default is Zenodo).force(optional, default =False): a boolean value indicating whether files already uploaded somewhere else should still be included.
To upload data from multiple data sources or campaigns, please provide a space-separated string of the data sources or campaigns you require. For example, “EPFL NASA”.
Currently, only Zenodo is supported.
After running this command, the user will be prompted to insert a Zenodo token. Once the data is uploaded, a link will be displayed that the user must use to go to the Zenodo web interface and manually publish the data.
To get a Zenodo token, go to https://zenodo.org/account/settings/applications/tokens/new/