pardata.load_dataset
- pardata.load_dataset(name, *, version='latest', download=True, subdatasets=None)
High level function that wraps
dataset.Dataset
class’s load and download functionality. Downloads to and loads from directory:DATADIR/schema_name/name/version
whereDATADIR
is inpardata.get_config().DATADIR
.DATADIR
can be changed by callinginit()
.- Parameters
name (str) – Name of the dataset you want to load from ParData’s available datasets. You can get a list of these datasets by calling
list_all_datasets()
.version (str) – Version of the dataset to load. Latest version is used by default. You can get a list of all available versions for a dataset by calling
list_all_datasets()
.download (bool) – Whether or not the dataset should be downloaded before loading. This is useful in avoiding redownloading a large dataset once it has been downloaded once. If the dataset has never been downloaded before, this function raises a
RuntimeError
.subdatasets (Optional[Iterable[str]]) – An iterable containing the subdatasets to load.
None
means all subdatasets.
- Raises
RuntimeError – The dataset files can’t be found or are corrupted. One possible cause for this is that the dataset files have never been downloaded but
download
isFalse
. SeeDataset.load()
for more details.- Returns
Dictionary that holds all subdatasets.
- Return type
Dict[str, Any]
Example:
>>> data = load_dataset('noaa_jfk') >>> data['jfk_weather_cleaned'][['DATE', 'HOURLYVISIBILITY', 'HOURLYDRYBULBTEMPF']].head(3) DATE HOURLYVISIBILITY HOURLYDRYBULBTEMPF 0 2010-01-01 01:00:00 6.0 33.0 1 2010-01-01 02:00:00 6.0 33.0 2 2010-01-01 03:00:00 5.0 33.0