Skip to content

hdx.data.dataset

module hdx.data.dataset

Dataset class containing all logic for creating, checking, and updating datasets and associated resources.

Classes

class NotRequestableError()

Bases : HDXError

class Dataset(initial_data: dict | None = None, configuration: Configuration | None = None)

Bases : HDXObject

Dataset class enabling operations on datasets and associated resources.

Parameters

  • initial_data : dict | None Initial dataset metadata dictionary. Defaults to None.

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

Methods

  • actions Dictionary of actions that can be performed on object

  • separate_resources Move contents of resources key in internal dictionary into self.resources

  • unseparate_resources Move self.resources into resources key in internal dictionary

  • get_dataset_dict Move self.resources into resources key in internal dictionary

  • save_to_json Save dataset to JSON. If follow_urls is True, resource urls that point to datasets are followed to retrieve final urls.

  • load_from_json Load dataset from JSON

  • init_resources Initialise self.resources list

  • add_update_resource Add new or update existing resource in dataset with new metadata

  • add_update_resources Add new to the dataset or update existing resources with new metadata

  • delete_resource Delete a resource from the dataset and also from HDX by default

  • get_resources Get dataset's resources

  • get_resource Get one resource from dataset by index

  • number_of_resources Get number of dataset's resources

  • move_resource Move resource in dataset to be before the resource whose name starts with the value of insert_before.

  • update_from_yaml Update dataset metadata with static metadata from YAML file

  • update_from_json Update dataset metadata with static metadata from JSON file

  • read_from_hdx Reads the dataset given by identifier from HDX and returns Dataset object

  • reorder_resources Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

  • check_resources_url_filetoupload Check for error where both url or file to upload are provided for resources

  • check_resources_fields Check that metadata for resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

  • check_required_fields Check that metadata for dataset is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

  • revise Revises an HDX dataset in HDX

  • update_in_hdx Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

  • create_in_hdx Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

  • delete_from_hdx Deletes a dataset from HDX.

  • search_in_hdx Searches for datasets in HDX

  • get_all_dataset_names Get all dataset names in HDX

  • get_all_datasets Get all datasets from HDX (just calls search_in_hdx)

  • get_all_resources Get all resources from a list of datasets (such as returned by search)

  • autocomplete Autocomplete a dataset name and return matches

  • get_time_period Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).

  • set_time_period Set time period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.

  • set_time_period_year_range Set time period as a range from year or start and end year.

  • list_valid_update_frequencies List of valid update frequency values

  • transform_update_frequency Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')

  • get_expected_update_frequency Get expected update frequency (in textual rather than numeric form)

  • set_expected_update_frequency Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().

  • get_tags Return the dataset's list of tags

  • add_tag Add a tag

  • add_tags Add a list of tags

  • clean_tags Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped

  • remove_tag Remove a tag

  • is_subnational Return if the dataset is subnational

  • set_subnational Set if dataset is subnational or national

  • get_location_iso3s Return the dataset's location

  • get_location_names Return the dataset's location

  • add_country_location Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.

  • add_country_locations Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.

  • add_region_location Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.

  • add_other_location Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.

  • remove_location Remove a location. If the location is already added, it is ignored.

  • get_maintainer Get the dataset's maintainer.

  • set_maintainer Set the dataset's maintainer.

  • get_organization Get the dataset's organization.

  • set_organization Set the dataset's organization.

  • get_showcases Get any showcases the dataset is in

  • add_showcase Add dataset to showcase

  • add_showcases Add dataset to multiple showcases

  • remove_showcase Remove dataset from showcase

  • is_requestable Return whether the dataset is requestable or not

  • set_requestable Set the dataset to be of type requestable or not

  • get_fieldnames Return list of fieldnames in your data. Only applicable to requestable datasets.

  • add_fieldname Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.

  • add_fieldnames Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.

  • remove_fieldname Remove a fieldname. Only applicable to requestable datasets.

  • get_filetypes Return list of filetypes in your data

  • add_filetype Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.

  • add_filetypes Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.

  • remove_filetype Remove a filetype

  • set_custom_viz Set custom visualization url for dataset

  • get_custom_viz Get custom visualization url for dataset

  • preview_off Set dataset preview off

  • preview_resource Set dataset preview on for an unspecified resource

  • set_preview_resource Set the resource that will be used for displaying previews in dataset preview

  • create_default_views Create default resource views for all resources in dataset

  • get_name_or_id Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

  • get_hdx_url Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

  • get_api_url Get the API url of the dataset on HDX

  • generate_resource Write rows to file and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values. Specific columns to include can be specified (ie. a subset of the headers).

  • download_generate_resource Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

  • add_hapi_error Writes error messages that were uncovered while processing data for the HAPI database to a resource's metadata on HDX. If the resource already has an error message, it is only overwritten if the two messages are different.

staticmethod Dataset.actions()dict[str, str]

Dictionary of actions that can be performed on object

Returns

  • dict[str, str] Dictionary of actions that can be performed on object

method Dataset.separate_resources()None

Move contents of resources key in internal dictionary into self.resources

Returns

  • None None

method Dataset.unseparate_resources()None

Move self.resources into resources key in internal dictionary

Returns

  • None None

method Dataset.get_dataset_dict()dict

Move self.resources into resources key in internal dictionary

Returns

  • dict Dataset dictionary

method Dataset.save_to_json(path: Path | str, follow_urls: bool = False, session: Session | None = None)None

Save dataset to JSON. If follow_urls is True, resource urls that point to datasets are followed to retrieve final urls.

Parameters

  • path : Path | str Path to save dataset

  • follow_urls : bool Whether to follow urls. Defaults to False.

  • session : Session | None

Returns

  • None None

staticmethod Dataset.load_from_json(path: Path | str)Optional['Dataset']

Load dataset from JSON

Parameters

  • path : Path | str Path to load dataset

Returns

  • Optional['Dataset'] Dataset created from JSON or None

method Dataset.init_resources()None

Initialise self.resources list

Returns

  • None None

method Dataset.add_update_resource(resource: Union['Resource', dict, str], ignore_datasetid: bool = False)Resource

Add new or update existing resource in dataset with new metadata

Parameters

  • resource : Union['Resource', dict, str] Either resource id or resource metadata from a Resource object or a dictionary

  • ignore_datasetid : bool Whether to ignore dataset id in the resource

Returns

  • Resource The resource that was added after matching with any existing resource

Raises

  • HDXError

method Dataset.add_update_resources(resources: Sequence[Union['Resource', dict, str]], ignore_datasetid: bool = False)None

Add new to the dataset or update existing resources with new metadata

Parameters

  • resources : Sequence[Union['Resource', dict, str]] A list of either resource ids or resources metadata from either Resource objects or dictionaries

  • ignore_datasetid : bool Whether to ignore dataset id in the resource. Defaults to False.

Returns

  • None None

Raises

  • HDXError

method Dataset.delete_resource(resource: Union['Resource', dict, str], delete: bool = True)bool

Delete a resource from the dataset and also from HDX by default

Parameters

  • resource : Union['Resource', dict, str] Either resource id or resource metadata from a Resource object or a dictionary

  • delete : bool Whetehr to delete the resource from HDX (not just the dataset). Defaults to True.

Returns

  • bool True if resource removed or False if not

Raises

  • HDXError

method Dataset.get_resources()list['Resource']

Get dataset's resources

Returns

  • list['Resource'] List of Resource objects

method Dataset.get_resource(index: int = 0)Resource

Get one resource from dataset by index

Parameters

  • index : int Index of resource in dataset. Defaults to 0.

Returns

method Dataset.number_of_resources()int

Get number of dataset's resources

Returns

  • int Number of Resource objects

method Dataset.move_resource(resource_name: str, insert_before: str)Resource

Move resource in dataset to be before the resource whose name starts with the value of insert_before.

Parameters

  • resource_name : str Name of resource to move

  • insert_before : str Resource to insert before

Returns

  • Resource The resource that was moved

method Dataset.update_from_yaml(path: Path | str = Path('config', 'hdx_dataset_static.yaml'))None

Update dataset metadata with static metadata from YAML file

Parameters

  • path : Path | str Path to YAML dataset metadata. Defaults to config/hdx_dataset_static.yaml.

Returns

  • None None

method Dataset.update_from_json(path: Path | str = Path('config', 'hdx_dataset_static.json'))None

Update dataset metadata with static metadata from JSON file

Parameters

  • path : Path | str Path to JSON dataset metadata. Defaults to config/hdx_dataset_static.json.

Returns

  • None None

staticmethod Dataset.read_from_hdx(identifier: str, configuration: Configuration | None = None)Optional['Dataset']

Reads the dataset given by identifier from HDX and returns Dataset object

Parameters

  • identifier : str Identifier of dataset

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

Returns

  • Optional['Dataset'] Dataset object if successful read, None if not

method Dataset.reorder_resources(resource_ids: Sequence[str])None

Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

Parameters

  • resource_ids : Sequence[str] List of resource ids

Returns

  • None None

Raises

  • HDXError

method Dataset.check_resources_url_filetoupload()None

Check for error where both url or file to upload are provided for resources

Returns

  • None None

method Dataset.check_resources_fields(ignore_fields: Sequence[str] = ())None

Check that metadata for resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

Parameters

  • ignore_fields : Sequence[str] Fields to ignore. Default is ().

Returns

  • None None

method Dataset.check_required_fields(ignore_fields: Sequence[str] = (), allow_no_resources: bool = False, **kwargs: Any)None

Check that metadata for dataset is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

Parameters

  • ignore_fields : Sequence[str] Fields to ignore. Default is ().

  • allow_no_resources : bool Whether to allow no resources. Defaults to False.

Returns

  • None None

Raises

  • HDXError

staticmethod Dataset.revise(match: dict[str, Any], filter: Sequence[str] = (), update: dict[str, Any] = {}, files_to_upload: dict[str, str] = {}, configuration: Configuration | None = None, **kwargs: Any)Dataset

Revises an HDX dataset in HDX

Parameters

  • match : Dict[str,Any] Metadata on which to match dataset

  • filter : Sequence[str] Filters to apply. Defaults to tuple().

  • update : dict[str, Any] Metadata updates to apply. Defaults to {}.

  • files_to_upload : dict[str, str] Files to upload to HDX. Defaults to {}.

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

  • **kwargs : Any Additional arguments to pass to package_revise

Returns

method Dataset.update_in_hdx(allow_no_resources: bool = False, update_resources: bool = True, match_resources_by_metadata: bool = True, keys_to_delete: Sequence[str] = (), remove_additional_resources: bool = False, match_resource_order: bool = False, create_default_views: bool = True, **kwargs: Any)dict

Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Returns a dictionary with key resource name and value status code

0 = no file to upload and last_modified set to now (resource creation or data_updated flag is True), 1 = no file to upload and data_updated flag is False, 2 = file uploaded to filestore (resource creation or either hash or size of file has changed), 3 = file not uploaded to filestore (hash and size of file are the same), 4 = file not uploaded (hash, size unchanged), given last_modified ignored

Parameters

  • allow_no_resources : bool Whether to allow no resources. Defaults to False.

  • update_resources : bool Whether to update resources. Defaults to True.

  • match_resources_by_metadata : bool Compare resource metadata rather than position in list. Defaults to True.

  • keys_to_delete : Sequence[str] List of top level metadata keys to delete. Defaults to tuple().

  • remove_additional_resources : bool Remove additional resources found in dataset. Defaults to False.

  • match_resource_order : bool Match order of given resources by name. Defaults to False.

  • create_default_views : bool Whether to call package_create_default_resource_views. Defaults to True.

  • **kwargs : Any See below

  • keep_crisis_tags : bool Whether to keep existing crisis tags. Defaults to True.

  • updated_by_script : str String to identify your script. Defaults to your user agent.

  • batch : str A string you can specify to show which datasets are part of a single batch update

  • force_update : bool Forces files to be updated even if they haven't changed

Returns

  • dict Status codes of resources

Raises

  • HDXError

method Dataset.create_in_hdx(allow_no_resources: bool = False, update_resources: bool = True, match_resources_by_metadata: bool = True, keys_to_delete: Sequence[str] = (), remove_additional_resources: bool = False, match_resource_order: bool = False, create_default_views: bool = True, **kwargs: Any)dict

Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Returns a dictionary with key resource name and value status code

0 = no file to upload and last_modified set to now (resource creation or data_updated flag is True), 1 = no file to upload and data_updated flag is False, 2 = file uploaded to filestore (resource creation or either hash or size of file has changed), 3 = file not uploaded to filestore (hash and size of file are the same), 4 = file not uploaded (hash, size unchanged), given last_modified ignored

Parameters

  • allow_no_resources : bool Whether to allow no resources. Defaults to False.

  • update_resources : bool Whether to update resources (if updating). Defaults to True.

  • match_resources_by_metadata : bool Compare resource metadata rather than position in list. Defaults to True.

  • keys_to_delete : Sequence[str] List of top level metadata keys to delete. Defaults to tuple().

  • remove_additional_resources : bool Remove additional resources found in dataset (if updating). Defaults to False.

  • match_resource_order : bool Match order of given resources by name. Defaults to False.

  • create_default_views : bool Whether to call package_create_default_resource_views (if updating). Defaults to True.

  • **kwargs : Any See below

  • keep_crisis_tags : bool Whether to keep existing crisis tags. Defaults to True.

  • updated_by_script : str String to identify your script. Defaults to your user agent.

  • batch : str A string you can specify to show which datasets are part of a single batch update

  • force_update : bool Forces files to be updated even if they haven't changed

Returns

  • dict Status codes of resources

method Dataset.delete_from_hdx()None

Deletes a dataset from HDX.

Returns

  • None None

classmethod Dataset.search_in_hdx(query: str | None = ':', configuration: Configuration | None = None, page_size: int = 1000, **kwargs: Any)list['Dataset']

Searches for datasets in HDX

Parameters

  • query : str | None Query (in Solr format). Defaults to ':'.

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

  • page_size : int Size of page to use internally to query HDX. Defaults to 1000.

  • **kwargs : Any See below

  • fq : string Any filter queries to apply

  • rows : int Number of matching rows to return. Defaults to all datasets (sys.maxsize).

  • start : int Offset in the complete result for where the set of returned datasets should begin

  • sort : string Sorting of results. Defaults to 'relevance asc, metadata_modified desc' if rows<=page_size or 'metadata_modified asc' if rows>page_size.

  • facet : string Whether to enable faceted results. Default to True.

  • facet.mincount : int Minimum counts for facet fields should be included in the results

  • facet.limit : int Maximum number of values the facet fields return (- = unlimited). Defaults to 50.

  • facet.field : list[str] Fields to facet upon. Default is empty.

  • use_default_schema : bool Use default package schema instead of custom schema. Defaults to False.

Returns

  • list['Dataset'] list of datasets resulting from query

Raises

  • HDXError

staticmethod Dataset.get_all_dataset_names(configuration: Configuration | None = None, **kwargs: Any)list[str]

Get all dataset names in HDX

Parameters

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

  • **kwargs : Any See below

  • rows : int Number of rows to return. Defaults to all datasets (sys.maxsize)

  • start : int Offset in the complete result for where the set of returned dataset names should begin

Returns

  • list[str] list of all dataset names in HDX

classmethod Dataset.get_all_datasets(configuration: Configuration | None = None, page_size: int = 1000, **kwargs: Any)list['Dataset']

Get all datasets from HDX (just calls search_in_hdx)

Parameters

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

  • page_size : int Size of page to use internally to query HDX. Defaults to 1000.

  • **kwargs : Any See below

  • fq : string Any filter queries to apply

  • rows : int Number of matching rows to return. Defaults to all datasets (sys.maxsize).

  • start : int Offset in the complete result for where the set of returned datasets should begin

  • sort : string Sorting of results. Defaults to 'metadata_modified asc'.

  • facet : string Whether to enable faceted results. Default to True.

  • facet.mincount : int Minimum counts for facet fields should be included in the results

  • facet.limit : int Maximum number of values the facet fields return (- = unlimited). Defaults to 50.

  • facet.field : list[str] Fields to facet upon. Default is empty.

  • use_default_schema : bool Use default package schema instead of custom schema. Defaults to False.

Returns

  • list['Dataset'] list of datasets resulting from query

staticmethod Dataset.get_all_resources(datasets: Sequence['Dataset'])list['Resource']

Get all resources from a list of datasets (such as returned by search)

Parameters

  • datasets : Sequence['Dataset'] list of datasets

Returns

  • list['Resource'] list of resources within those datasets

classmethod Dataset.autocomplete(name: str, limit: int = 20, configuration: Configuration | None = None)list

Autocomplete a dataset name and return matches

Parameters

  • name : str Name to autocomplete

  • limit : int Maximum number of matches to return

  • configuration : Configuration | None HDX configuration. Defaults to global configuration.

Returns

  • list Autocomplete matches

method Dataset.get_time_period(date_format: str | None = None, today: datetime = now_utc())dict

Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).

Parameters

  • date_format : str | None Date format. None is taken to be ISO 8601. Defaults to None.

  • today : datetime Date to use for today. Defaults to now_utc().

Returns

  • dict Dictionary of date information

method Dataset.set_time_period(startdate: datetime | str, enddate: datetime | str | None = None, ongoing: bool = False, ignore_timeinfo: bool = True)None

Set time period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.

Parameters

  • startdate : datetime | str Dataset start date

  • enddate : datetime | str | None Dataset end date. Defaults to None.

  • ongoing : bool True if ongoing, False if not. Defaults to False.

  • ignore_timeinfo : bool Ignore time and time zone of date. Defaults to True.

Returns

  • None None

method Dataset.set_time_period_year_range(dataset_year: str | int | Iterable, dataset_end_year: str | int | None = None)list[int]

Set time period as a range from year or start and end year.

Parameters

  • dataset_year : str | int | Iterable Dataset year given as string or int or range in an iterable

  • dataset_end_year : str | int | None Dataset end year given as string or int

Returns

  • list[int] The start and end year if supplied or sorted list of years

classmethod Dataset.list_valid_update_frequencies()list[str]

List of valid update frequency values

Returns

  • list[str] Allowed update frequencies

classmethod Dataset.transform_update_frequency(frequency: str | int)str | None

Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')

Parameters

  • frequency : str | int Update frequency in one format

Returns

  • str | None Update frequency in alternative format or None if not valid

method Dataset.get_expected_update_frequency()str | None

Get expected update frequency (in textual rather than numeric form)

Returns

  • str | None Update frequency in textual form or None if the update frequency doesn't exist or is blank.

method Dataset.set_expected_update_frequency(update_frequency: str | int)None

Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().

Parameters

  • update_frequency : str | int Update frequency

Returns

  • None None

Raises

  • HDXError

method Dataset.get_tags()list[str]

Return the dataset's list of tags

Returns

  • list[str] list of tags or [] if there are none

method Dataset.add_tag(tag: str, log_deleted: bool = True)tuple[list[str], list[str]]

Add a tag

Parameters

  • tag : str Tag to add

  • log_deleted : bool Whether to log informational messages about deleted tags. Defaults to True.

Returns

  • tuple[list[str], list[str]] Tuple containing list of added tags and list of deleted tags and tags not added

method Dataset.add_tags(tags: Sequence[str], log_deleted: bool = True)tuple[list[str], list[str]]

Add a list of tags

Parameters

  • tags : Sequence[str] List of tags to add

  • log_deleted : bool Whether to log informational messages about deleted tags. Defaults to True.

Returns

  • tuple[list[str], list[str]] Tuple containing list of added tags and list of deleted tags and tags not added

method Dataset.clean_tags(log_deleted: bool = True)tuple[list[str], list[str]]

Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped

Parameters

  • log_deleted : bool Whether to log informational messages about deleted tags. Defaults to True.

Returns

  • tuple[list[str], list[str]] Tuple containing list of mapped tags and list of deleted tags and tags not added

method Dataset.remove_tag(tag: str)bool

Remove a tag

Parameters

  • tag : str Tag to remove

Returns

  • bool True if tag removed or False if not

method Dataset.is_subnational()bool

Return if the dataset is subnational

Returns

  • bool True if the dataset is subnational, False if not

method Dataset.set_subnational(subnational: bool)None

Set if dataset is subnational or national

Parameters

  • subnational : bool True for subnational, False for national

Returns

  • None None

method Dataset.get_location_iso3s(locations: Sequence[str] | None = None)list[str]

Return the dataset's location

Parameters

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

Returns

  • list[str] list of location iso3s

method Dataset.get_location_names(locations: Sequence[str] | None = None)list[str]

Return the dataset's location

Parameters

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

Returns

  • list[str] list of location names

method Dataset.add_country_location(country: str, exact: bool = True, locations: Sequence[str] | None = None, use_live: bool = True)bool

Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.

Parameters

  • country : str Country to add

  • exact : bool True for exact matching or False to allow fuzzy matching. Defaults to True.

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

  • use_live : bool Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

  • bool True if country added or False if country already present

Raises

  • HDXError

method Dataset.add_country_locations(countries: Sequence[str], locations: Sequence[str] | None = None, use_live: bool = True)bool

Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.

Parameters

  • countries : Sequence[str] List of countries to add

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

  • use_live : bool Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

  • bool True if all countries added or False if any already present.

method Dataset.add_region_location(region: str, locations: Sequence[str] | None = None, use_live: bool = True)bool

Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.

Parameters

  • region : str M49 region, intermediate region or subregion to add

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

  • use_live : bool Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

  • bool True if all countries in region added or False if any already present.

method Dataset.add_other_location(location: str, exact: bool = True, alterror: str | None = None, locations: Sequence[str] | None = None)bool

Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.

Parameters

  • location : str Location to add

  • exact : bool True for exact matching or False to allow fuzzy matching. Defaults to True.

  • alterror : str | None Alternative error message to builtin if location not found. Defaults to None.

  • locations : Sequence[str] | None Valid locations list. Defaults to list downloaded from HDX.

Returns

  • bool True if location added or False if location already present

Raises

  • HDXError

method Dataset.remove_location(location: str)bool

Remove a location. If the location is already added, it is ignored.

Parameters

  • location : str Location to remove

Returns

  • bool True if location removed or False if not

method Dataset.get_maintainer()User

Get the dataset's maintainer.

Returns

  • User Dataset's maintainer

method Dataset.set_maintainer(maintainer: Union['User', dict, str])None

Set the dataset's maintainer.

Parameters

  • maintainer : Union['User', dict, str] Either a user id or User metadata from a User object or dictionary.

  • Returns None

Raises

  • HDXError

method Dataset.get_organization()Organization

Get the dataset's organization.

Returns

method Dataset.set_organization(organization: Union['Organization', dict, str])None

Set the dataset's organization.

Parameters

  • organization : Union['Organization', dict, str] Either an Organization id or Organization metadata from an Organization object or dictionary.

  • Returns None

Raises

  • HDXError

method Dataset.get_showcases()list['Showcase']

Get any showcases the dataset is in

Returns

  • list['Showcase'] List of showcases

method Dataset.add_showcase(showcase: Union['Showcase', dict, str], showcases_to_check: Sequence['Showcase'] = None)bool

Add dataset to showcase

Parameters

  • showcase : Union['Showcase', dict, str] Either a showcase id or showcase metadata from a Showcase object or dictionary

  • showcases_to_check : Sequence['Showcase'] List of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns

  • bool True if the showcase was added, False if already present

method Dataset.add_showcases(showcases: Sequence[Union['Showcase', dict, str]], showcases_to_check: Sequence['Showcase'] = None)bool

Add dataset to multiple showcases

Parameters

  • showcases : Sequence[Union['Showcase', dict, str]] A list of either showcase ids or showcase metadata from Showcase objects or dictionaries

  • showcases_to_check : Sequence['Showcase'] list of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns

  • bool True if all showcases added or False if any already present

method Dataset.remove_showcase(showcase: Union['Showcase', dict, str])None

Remove dataset from showcase

Parameters

  • showcase : Union['Showcase', dict, str] Either a showcase id string or showcase metadata from a Showcase object or dictionary

Returns

  • None None

method Dataset.is_requestable()bool

Return whether the dataset is requestable or not

Returns

  • bool Whether the dataset is requestable or not

method Dataset.set_requestable(requestable: bool = True)None

Set the dataset to be of type requestable or not

Parameters

  • requestable : bool Set whether dataset is requestable. Defaults to True.

Returns

  • None None

method Dataset.get_fieldnames()list[str]

Return list of fieldnames in your data. Only applicable to requestable datasets.

Returns

  • list[str] List of field names

Raises

method Dataset.add_fieldname(fieldname: str)bool

Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.

Parameters

  • fieldname : str Fieldname to add

Returns

  • bool True if fieldname added or False if tag already present

Raises

method Dataset.add_fieldnames(fieldnames: Sequence[str])bool

Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.

Parameters

  • fieldnames : Sequence[str] List of fieldnames to add

Returns

  • bool True if all fieldnames added or False if any already present

Raises

method Dataset.remove_fieldname(fieldname: str)bool

Remove a fieldname. Only applicable to requestable datasets.

Parameters

  • fieldname : str Fieldname to remove

Returns

  • bool True if fieldname removed or False if not

Raises

method Dataset.get_filetypes()list[str]

Return list of filetypes in your data

Returns

  • list[str] List of filetypes

method Dataset.add_filetype(filetype: str)bool

Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.

Parameters

  • filetype : str filetype to add

Returns

  • bool True if filetype added or False if tag already present

Raises

method Dataset.add_filetypes(filetypes: Sequence[str])bool

Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.

Parameters

  • filetypes : Sequence[str] list of filetypes to add

Returns

  • bool True if all filetypes added or False if any already present

Raises

method Dataset.remove_filetype(filetype: str)bool

Remove a filetype

Parameters

  • filetype : str Filetype to remove

Returns

  • bool True if filetype removed or False if not

Raises

method Dataset.set_custom_viz(url: str)None

Set custom visualization url for dataset

Parameters

  • url : str Custom visualization url

Returns

  • None None

method Dataset.get_custom_viz()str | None

Get custom visualization url for dataset

Returns

  • Custom visualization url or None

method Dataset.preview_off()None

Set dataset preview off

Returns

  • None None

method Dataset.preview_resource()None

Set dataset preview on for an unspecified resource

Returns

  • None None

method Dataset.set_preview_resource(resource: Union['Resource', dict, str, int])Resource

Set the resource that will be used for displaying previews in dataset preview

Parameters

  • resource : Union['Resource', dict, str, int] Either resource id or name, resource metadata from a Resource object or a dictionary or position

Returns

  • Resource Resource that is used for preview or None if no preview set

Raises

  • HDXError

method Dataset.create_default_views(create_datastore_views: bool = False)None

Create default resource views for all resources in dataset

Parameters

  • create_datastore_views : bool Whether to try to create resource views that point to the datastore

Returns

  • None None

method Dataset.get_name_or_id(prefer_name: bool = True)str | None

Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Parameters

  • prefer_name : bool Whether name is preferred over id. Default to True.

Returns

  • str | None HDX dataset id or name or None if not available

method Dataset.get_hdx_url(prefer_name: bool = True)str | None

Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Parameters

  • prefer_name : bool Whether name is preferred over id in url. Default to True.

Returns

  • str | None Url of the dataset on HDX or None if the dataset is missing fields

method Dataset.get_api_url(prefer_name: bool = True)str | None

Get the API url of the dataset on HDX

Parameters

  • prefer_name : bool Whether name is preferred over id in url. Default to True.

Returns

  • str | None API url of the dataset on HDX or None if the dataset is missing fields

method Dataset.generate_resource(folder: Path | str, filename: str, rows: Iterable[Sequence | Mapping], resourcedata: dict, headers: int | Sequence[str] | None = None, columns: Sequence[int] | Sequence[str] | None = None, format: str = 'csv', encoding: str | None = None, datecol: int | str | None = None, yearcol: int | str | None = None, date_function: Callable[[dict], dict | None] | None = None, no_empty: bool = True)tuple[bool, dict]

Write rows to file and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values. Specific columns to include can be specified (ie. a subset of the headers).

The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

Parameters

  • folder : Path | str Folder to which to write file containing rows

  • filename : str Filename of file to write rows

  • rows : Iterable[Sequence | Mapping] List of rows in dict or list form

  • resourcedata : dict Resource data

  • headers : int | Sequence[str] | None All headers. Defaults to None.

  • columns : Sequence[int] | Sequence[str] | None Columns to write. Defaults to all.

  • format : str Format to write. Defaults to csv.

  • encoding : str | None Encoding to use. Defaults to None (infer encoding).

  • datecol : int | str | None Date column for setting time period. Defaults to None (don't set).

  • yearcol : int | str | None Year column for setting dataset year range. Defaults to None (don't set).

  • date_function : Callable[[dict], dict | None] | None Date function to call for each row. Defaults to None.

  • no_empty : bool Don't generate resource if there are no data rows. Defaults to True.

Returns

  • tuple[bool, dict] (True if resource added, dictionary of results)

Raises

  • HDXError

method Dataset.download_generate_resource(downloader: BaseDownload, url: str, folder: Path | str, filename: str, resourcedata: dict, header_insertions: Sequence[tuple[int, str]] | None = None, row_function: Callable[[list[str], dict], dict] | None = None, columns: Sequence[int] | Sequence[str] | None = None, format: str = 'csv', encoding: str | None = None, datecol: int | str | None = None, yearcol: int | str | None = None, date_function: Callable[[dict], dict | None] | None = None, no_empty: bool = True, **kwargs: Any)tuple[bool, dict]

Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

Optionally, headers can be inserted at specific positions. This is achieved using the header_insertions argument. If supplied, it is a list of tuples of the form (position, header) to be inserted. A function is called for each row. If supplied, it takes as arguments: headers (prior to any insertions) and row (which will be in dict or list form depending upon the dict_rows argument) and outputs a modified row.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

Parameters

  • downloader : BaseDownload A Download or Retrieve object

  • url : str URL to download

  • folder : Path | str Folder to which to write file containing rows

  • filename : str Filename of file to write rows

  • resourcedata : dict Resource data

  • header_insertions : Sequence[tuple[int, str]] | None List of (position, header) to insert. Defaults to None.

  • row_function : Callable[[list[str], dict], dict] | None Function to call for each row. Defaults to None.

  • columns : Sequence[int] | Sequence[str] | None Columns to write. Defaults to all.

  • format : str Format to write. Defaults to csv.

  • encoding : str | None Encoding to use. Defaults to None (infer encoding).

  • datecol : int | str | None Date column for setting time period. Defaults to None (don't set).

  • yearcol : int | str | None Year column for setting dataset year range. Defaults to None (don't set).

  • date_function : Callable[[dict], dict | None] | None Date function to call for each row. Defaults to None.

  • no_empty : bool Don't generate resource if there are no data rows. Defaults to True.

  • **kwargs : Any Any additional args to pass to downloader.get_tabular_rows

Returns

  • tuple[bool, dict] (True if resource added, dictionary of results)

method Dataset.add_hapi_error(error_message: str, resource_name: str | None = None, resource_id: str | None = None)bool

Writes error messages that were uncovered while processing data for the HAPI database to a resource's metadata on HDX. If the resource already has an error message, it is only overwritten if the two messages are different.

Parameters

  • error_message : str Error(s) uncovered

  • resource_name : str | None Resource name. Defaults to None

  • resource_id : str | None Resource id. Defaults to None

Returns

  • bool True if a message was added, False if not