pardata.load_dataset

pardata.load_dataset(name, *, version='latest', download=True, subdatasets=None)

High level function that wraps dataset.Dataset class’s load and download functionality. Downloads to and loads from directory: DATADIR/schema_name/name/version where DATADIR is in pardata.get_config().DATADIR. DATADIR can be changed by calling init().

Parameters
  • name (str) – Name of the dataset you want to load from ParData’s available datasets. You can get a list of these datasets by calling list_all_datasets().

  • version (str) – Version of the dataset to load. Latest version is used by default. You can get a list of all available versions for a dataset by calling list_all_datasets().

  • download (bool) – Whether or not the dataset should be downloaded before loading. This is useful in avoiding redownloading a large dataset once it has been downloaded once. If the dataset has never been downloaded before, this function raises a RuntimeError.

  • subdatasets (Optional[Iterable[str]]) – An iterable containing the subdatasets to load. None means all subdatasets.

Raises

RuntimeError – The dataset files can’t be found or are corrupted. One possible cause for this is that the dataset files have never been downloaded but download is False. See Dataset.load() for more details.

Returns

Dictionary that holds all subdatasets.

Return type

Dict[str, Any]

Example:

>>> data = load_dataset('noaa_jfk')
>>> data['jfk_weather_cleaned'][['DATE', 'HOURLYVISIBILITY', 'HOURLYDRYBULBTEMPF']].head(3)
                 DATE  HOURLYVISIBILITY  HOURLYDRYBULBTEMPF
0 2010-01-01 01:00:00               6.0                33.0
1 2010-01-01 02:00:00               6.0                33.0
2 2010-01-01 03:00:00               5.0                33.0