pardata.get_dataset_metadata

pardata.get_dataset_metadata(name, *, version='latest')

Return a dataset’s metadata either in human-readable form or as a copy of its schema.

Parameters
  • name (str) – Name of the dataset you want get the metadata of. You can get a list of these datasets by calling list_all_datasets().

  • version (str) – Version of the dataset to load. Latest version is used by default. You can get a list of all available versions for a dataset by calling list_all_datasets().

Returns

A dataset’s metadata.

Return type

Dict[str, Any]

Example:

>>> import pprint
>>> metadata = get_dataset_metadata('gmb')
>>> metadata['name']
'Groningen Meaning Bank Modified'
>>> metadata['description']
'A dataset of multi-sentence texts, together with annotations for parts-of-speech...
>>> pprint.pprint(metadata['subdatasets'])
{'gmb_subset_full': {'description': 'A full version of the raw dataset. Used '
                                    'to train MAX model – Named Entity Tagger.',
                     'format': 'text/plain',
                     'name': 'GMB Subset Full',
                     'path': 'groningen_meaning_bank_modified/gmb_subset_full.txt'}}