pardata.dataset.Dataset
- class pardata.dataset.Dataset(schema, data_dir, *, mode=InitializationMode.LAZY)
Bases:
object
Models a particular dataset version along with download & load functionality.
- Parameters
schema (Dict[str, Any]) – Schema dict of a particular dataset version.
data_dir (Union[str, os.PathLike]) – Directory to/from which the dataset should be downloaded/loaded from. The path can be either absolute or relative to the current working directory, but will be converted to the absolute path immediately upon initialization.
mode (pardata._dataset.Dataset.InitializationMode) – Mode with which to treat a dataset. Available options are:
Dataset.InitializationMode.LAZY
,Dataset.InitializationMode.DOWNLOAD_ONLY
,Dataset.InitializationMode.LOAD_ONLY
, andDataset.InitializationMode.DOWNLOAD_AND_LOAD
.
- Raises
ValueError – An invalid
mode
was specified for handling the dataset.- Return type
Example:
>>> from tempfile import TemporaryDirectory >>> import pprint >>> import pardata >>> dataset_schemata = pardata.schema.DatasetSchemaCollection('./tests/schemata/datasets.yaml') >>> jfk_schema_dict = dataset_schemata.export_schema('datasets', 'noaa_jfk', '1.1.4') >>> pprint.pprint(jfk_schema_dict) {'description': ... 'download_url': '...noaa-weather-data-jfk-airport.tar.gz', ... 'subdatasets': {'jfk_weather_cleaned': {... 'format': {'id': 'table/csv', ...}}, ... 'path': 'noaa-weather-data-jfk-airport/jfk_weather_cleaned.csv'}}} >>> jfk_data_dir = TemporaryDirectory() >>> jfk_dataset = Dataset(schema=jfk_schema_dict, data_dir=jfk_data_dir.name) >>> jfk_dataset.download() >>> data = jfk_dataset.load() >>> data['jfk_weather_cleaned'].shape (75119, 16) >>> jfk_dataset.delete() # The directory jfk_data_dir is deleted here >>> jfk_dataset.is_downloaded() False
Methods
delete
(*[, force])Clear the data directory.
download
(*[, check, verify_checksum])Downloads, extracts, and removes dataset archive.
Check to see if the dataset was downloaded.
load
([subdatasets, format_loader_map, check])Load data files to RAM.
Attributes
Access loaded data objects.