hdx.data.dataset

module hdx.data.dataset

Dataset class containing all logic for creating, checking, and updating datasets and associated resources.

Classes

NotRequestableError
Dataset — Dataset class enabling operations on datasets and associated resources.

class NotRequestableError()

Bases : HDXError

class Dataset(initial_data: dict | None = None, configuration: Configuration | None = None)

Bases : HDXObject

Dataset class enabling operations on datasets and associated resources.

Parameters

initial_data : dict | None — Initial dataset metadata dictionary. Defaults to None.
configuration : Configuration | None — HDX configuration. Defaults to global configuration.

Methods

actions — Dictionary of actions that can be performed on object
separate_resources — Move contents of resources key in internal dictionary into self.resources
unseparate_resources — Move self.resources into resources key in internal dictionary
get_dataset_dict — Move self.resources into resources key in internal dictionary
save_to_json — Save dataset to JSON. If follow_urls is True, resource urls that point to datasets are followed to retrieve final urls.
load_from_json — Load dataset from JSON
init_resources — Initialise self.resources list
add_update_resource — Add new or update existing resource in dataset with new metadata
add_update_resources — Add new to the dataset or update existing resources with new metadata
delete_resource — Delete a resource from the dataset and also from HDX by default
get_resources — Get dataset's resources
get_resource — Get one resource from dataset by index
number_of_resources — Get number of dataset's resources
move_resource — Move resource in dataset to be before the resource whose name starts with the value of insert_before.
update_from_yaml — Update dataset metadata with static metadata from YAML file
update_from_json — Update dataset metadata with static metadata from JSON file
read_from_hdx — Reads the dataset given by identifier from HDX and returns Dataset object
reorder_resources — Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.
check_resources_url_filetoupload — Check for error where both url or file to upload are provided for resources
check_resources_fields — Check that metadata for resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.
check_required_fields — Check that metadata for dataset is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.
revise — Revises an HDX dataset in HDX
update_in_hdx — Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.
create_in_hdx — Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.
delete_from_hdx — Deletes a dataset from HDX.
search_in_hdx — Searches for datasets in HDX
get_all_dataset_names — Get all dataset names in HDX
get_all_datasets — Get all datasets from HDX (just calls search_in_hdx)
get_all_resources — Get all resources from a list of datasets (such as returned by search)
autocomplete — Autocomplete a dataset name and return matches
get_time_period — Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).
set_time_period — Set time period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.
set_time_period_year_range — Set time period as a range from year or start and end year.
list_valid_update_frequencies — List of valid update frequency values
transform_update_frequency — Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')
get_expected_update_frequency — Get expected update frequency (in textual rather than numeric form)
set_expected_update_frequency — Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().
get_tags — Return the dataset's list of tags
add_tag — Add a tag
add_tags — Add a list of tags
clean_tags — Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped
remove_tag — Remove a tag
is_subnational — Return if the dataset is subnational
set_subnational — Set if dataset is subnational or national
get_location_iso3s — Return the dataset's location
get_location_names — Return the dataset's location
add_country_location — Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.
add_country_locations — Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.
add_region_location — Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.
add_other_location — Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.
remove_location — Remove a location. If the location is already added, it is ignored.
get_maintainer — Get the dataset's maintainer.
set_maintainer — Set the dataset's maintainer.
get_organization — Get the dataset's organization.
set_organization — Set the dataset's organization.
get_showcases — Get any showcases the dataset is in
add_showcase — Add dataset to showcase
add_showcases — Add dataset to multiple showcases
remove_showcase — Remove dataset from showcase
is_requestable — Return whether the dataset is requestable or not
set_requestable — Set the dataset to be of type requestable or not
get_fieldnames — Return list of fieldnames in your data. Only applicable to requestable datasets.
add_fieldname — Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.
add_fieldnames — Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.
remove_fieldname — Remove a fieldname. Only applicable to requestable datasets.
get_filetypes — Return list of filetypes in your data
add_filetype — Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.
add_filetypes — Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.
remove_filetype — Remove a filetype
set_custom_viz — Set custom visualization url for dataset
get_custom_viz — Get custom visualization url for dataset
preview_off — Set dataset preview off
preview_resource — Set dataset preview on for an unspecified resource
set_preview_resource — Set the resource that will be used for displaying previews in dataset preview
create_default_views — Create default resource views for all resources in dataset
get_name_or_id — Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.
get_hdx_url — Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.
get_api_url — Get the API url of the dataset on HDX
generate_resource — Write rows to file and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values. Specific columns to include can be specified (ie. a subset of the headers).
download_generate_resource — Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.
add_hapi_error — Writes error messages that were uncovered while processing data for the HAPI database to a resource's metadata on HDX. If the resource already has an error message, it is only overwritten if the two messages are different.

staticmethod Dataset.actions() → dict[str, str]

Dictionary of actions that can be performed on object

Returns

dict[str, str] — Dictionary of actions that can be performed on object

method Dataset.separate_resources() → None

Move contents of resources key in internal dictionary into self.resources

Returns

None — None

method Dataset.unseparate_resources() → None

Move self.resources into resources key in internal dictionary

Returns

None — None

method Dataset.get_dataset_dict() → dict

Move self.resources into resources key in internal dictionary

Returns

dict — Dataset dictionary

method Dataset.save_to_json(path: Path | str, follow_urls: bool = False, session: Session | None = None) → None

Save dataset to JSON. If follow_urls is True, resource urls that point to datasets are followed to retrieve final urls.

Parameters

path : Path | str — Path to save dataset
follow_urls : bool — Whether to follow urls. Defaults to False.
session : Session | None

Returns

None — None

staticmethod Dataset.load_from_json(path: Path | str) → Optional['Dataset']

Load dataset from JSON

Parameters

path : Path | str — Path to load dataset

Returns

Optional['Dataset'] — Dataset created from JSON or None

method Dataset.init_resources() → None

Initialise self.resources list

Returns

None — None

method Dataset.add_update_resource(resource: Union['Resource', dict, str], ignore_datasetid: bool = False) → Resource

Add new or update existing resource in dataset with new metadata

Parameters

resource : Union['Resource', dict, str] — Either resource id or resource metadata from a Resource object or a dictionary
ignore_datasetid : bool — Whether to ignore dataset id in the resource

Returns

Resource — The resource that was added after matching with any existing resource

Raises

HDXError

method Dataset.add_update_resources(resources: Sequence[Union['Resource', dict, str]], ignore_datasetid: bool = False) → None

Add new to the dataset or update existing resources with new metadata

Parameters

resources : Sequence[Union['Resource', dict, str]] — A list of either resource ids or resources metadata from either Resource objects or dictionaries
ignore_datasetid : bool — Whether to ignore dataset id in the resource. Defaults to False.

Returns

None — None

Raises

HDXError

method Dataset.delete_resource(resource: Union['Resource', dict, str], delete: bool = True) → bool

Delete a resource from the dataset and also from HDX by default

Parameters

resource : Union['Resource', dict, str] — Either resource id or resource metadata from a Resource object or a dictionary
delete : bool — Whetehr to delete the resource from HDX (not just the dataset). Defaults to True.

Returns

bool — True if resource removed or False if not

Raises

HDXError

method Dataset.get_resources() → list['Resource']

Get dataset's resources

Returns

list['Resource'] — List of Resource objects

method Dataset.get_resource(index: int = 0) → Resource

Get one resource from dataset by index

Parameters

index : int — Index of resource in dataset. Defaults to 0.

Returns

Resource — Resource object

method Dataset.number_of_resources() → int

Get number of dataset's resources

Returns

int — Number of Resource objects

method Dataset.move_resource(resource_name: str, insert_before: str) → Resource

Move resource in dataset to be before the resource whose name starts with the value of insert_before.

Parameters

resource_name : str — Name of resource to move
insert_before : str — Resource to insert before

Returns

Resource — The resource that was moved

method Dataset.update_from_yaml(path: Path | str = Path('config', 'hdx_dataset_static.yaml')) → None

Update dataset metadata with static metadata from YAML file

Parameters

path : Path | str — Path to YAML dataset metadata. Defaults to config/hdx_dataset_static.yaml.

Returns

None — None

method Dataset.update_from_json(path: Path | str = Path('config', 'hdx_dataset_static.json')) → None

Update dataset metadata with static metadata from JSON file

Parameters

path : Path | str — Path to JSON dataset metadata. Defaults to config/hdx_dataset_static.json.

Returns

None — None

staticmethod Dataset.read_from_hdx(identifier: str, configuration: Configuration | None = None) → Optional['Dataset']

Reads the dataset given by identifier from HDX and returns Dataset object

Parameters

identifier : str — Identifier of dataset
configuration : Configuration | None — HDX configuration. Defaults to global configuration.

Returns

Optional['Dataset'] — Dataset object if successful read, None if not

method Dataset.reorder_resources(resource_ids: Sequence[str]) → None

Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

Parameters

resource_ids : Sequence[str] — List of resource ids

Returns

None — None

Raises

HDXError

method Dataset.check_resources_url_filetoupload() → None

Check for error where both url or file to upload are provided for resources

Returns

None — None

method Dataset.check_resources_fields(ignore_fields: Sequence[str] = ()) → None

Check that metadata for resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

Parameters

ignore_fields : Sequence[str] — Fields to ignore. Default is ().

Returns

None — None

method Dataset.check_required_fields(ignore_fields: Sequence[str] = (), allow_no_resources: bool = False, **kwargs: Any) → None

Check that metadata for dataset is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation.

Parameters

ignore_fields : Sequence[str] — Fields to ignore. Default is ().
allow_no_resources : bool — Whether to allow no resources. Defaults to False.

Returns

None — None

Raises

HDXError

staticmethod Dataset.revise(match: dict[str, Any], filter: Sequence[str] = (), update: dict[str, Any] = {}, files_to_upload: dict[str, str] = {}, configuration: Configuration | None = None, **kwargs: Any) → Dataset

Revises an HDX dataset in HDX

Parameters

match : Dict[str,Any] — Metadata on which to match dataset
filter : Sequence[str] — Filters to apply. Defaults to tuple().
update : dict[str, Any] — Metadata updates to apply. Defaults to {}.
files_to_upload : dict[str, str] — Files to upload to HDX. Defaults to {}.
configuration : Configuration | None — HDX configuration. Defaults to global configuration.
**kwargs : Any — Additional arguments to pass to package_revise

Returns

Dataset — Dataset object

method Dataset.update_in_hdx(allow_no_resources: bool = False, update_resources: bool = True, match_resources_by_metadata: bool = True, keys_to_delete: Sequence[str] = (), remove_additional_resources: bool = False, match_resource_order: bool = False, create_default_views: bool = True, **kwargs: Any) → dict

Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Returns a dictionary with key resource name and value status code

0 = no file to upload and last_modified set to now (resource creation or data_updated flag is True), 1 = no file to upload and data_updated flag is False, 2 = file uploaded to filestore (resource creation or either hash or size of file has changed), 3 = file not uploaded to filestore (hash and size of file are the same), 4 = file not uploaded (hash, size unchanged), given last_modified ignored

Parameters

allow_no_resources : bool — Whether to allow no resources. Defaults to False.
update_resources : bool — Whether to update resources. Defaults to True.
match_resources_by_metadata : bool — Compare resource metadata rather than position in list. Defaults to True.
keys_to_delete : Sequence[str] — List of top level metadata keys to delete. Defaults to tuple().
remove_additional_resources : bool — Remove additional resources found in dataset. Defaults to False.
match_resource_order : bool — Match order of given resources by name. Defaults to False.
create_default_views : bool — Whether to call package_create_default_resource_views. Defaults to True.
**kwargs : Any — See below
keep_crisis_tags : bool — Whether to keep existing crisis tags. Defaults to True.
updated_by_script : str — String to identify your script. Defaults to your user agent.
batch : str — A string you can specify to show which datasets are part of a single batch update
force_update : bool — Forces files to be updated even if they haven't changed

Returns

dict — Status codes of resources

Raises

HDXError

method Dataset.create_in_hdx(allow_no_resources: bool = False, update_resources: bool = True, match_resources_by_metadata: bool = True, keys_to_delete: Sequence[str] = (), remove_additional_resources: bool = False, match_resource_order: bool = False, create_default_views: bool = True, **kwargs: Any) → dict

Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Returns a dictionary with key resource name and value status code

0 = no file to upload and last_modified set to now (resource creation or data_updated flag is True), 1 = no file to upload and data_updated flag is False, 2 = file uploaded to filestore (resource creation or either hash or size of file has changed), 3 = file not uploaded to filestore (hash and size of file are the same), 4 = file not uploaded (hash, size unchanged), given last_modified ignored

Parameters

allow_no_resources : bool — Whether to allow no resources. Defaults to False.
update_resources : bool — Whether to update resources (if updating). Defaults to True.
match_resources_by_metadata : bool — Compare resource metadata rather than position in list. Defaults to True.
keys_to_delete : Sequence[str] — List of top level metadata keys to delete. Defaults to tuple().
remove_additional_resources : bool — Remove additional resources found in dataset (if updating). Defaults to False.
match_resource_order : bool — Match order of given resources by name. Defaults to False.
create_default_views : bool — Whether to call package_create_default_resource_views (if updating). Defaults to True.
**kwargs : Any — See below
keep_crisis_tags : bool — Whether to keep existing crisis tags. Defaults to True.
updated_by_script : str — String to identify your script. Defaults to your user agent.
batch : str — A string you can specify to show which datasets are part of a single batch update
force_update : bool — Forces files to be updated even if they haven't changed

Returns

dict — Status codes of resources

method Dataset.delete_from_hdx() → None

Deletes a dataset from HDX.

Returns

None — None

classmethod Dataset.search_in_hdx(query: str | None = ':', configuration: Configuration | None = None, page_size: int = 1000, **kwargs: Any) → list['Dataset']

Searches for datasets in HDX

Parameters

query : str | None — Query (in Solr format). Defaults to ':'.
configuration : Configuration | None — HDX configuration. Defaults to global configuration.
page_size : int — Size of page to use internally to query HDX. Defaults to 1000.
**kwargs : Any — See below
fq : string — Any filter queries to apply
rows : int — Number of matching rows to return. Defaults to all datasets (sys.maxsize).
start : int — Offset in the complete result for where the set of returned datasets should begin
sort : string — Sorting of results. Defaults to 'relevance asc, metadata_modified desc' if rows<=page_size or 'metadata_modified asc' if rows>page_size.
facet : string — Whether to enable faceted results. Default to True.
facet.mincount : int — Minimum counts for facet fields should be included in the results
facet.limit : int — Maximum number of values the facet fields return (- = unlimited). Defaults to 50.
facet.field : list[str] — Fields to facet upon. Default is empty.
use_default_schema : bool — Use default package schema instead of custom schema. Defaults to False.

Returns

list['Dataset'] — list of datasets resulting from query

Raises

HDXError

staticmethod Dataset.get_all_dataset_names(configuration: Configuration | None = None, **kwargs: Any) → list[str]

Get all dataset names in HDX

Parameters

configuration : Configuration | None — HDX configuration. Defaults to global configuration.
**kwargs : Any — See below
rows : int — Number of rows to return. Defaults to all datasets (sys.maxsize)
start : int — Offset in the complete result for where the set of returned dataset names should begin

Returns

list[str] — list of all dataset names in HDX

classmethod Dataset.get_all_datasets(configuration: Configuration | None = None, page_size: int = 1000, **kwargs: Any) → list['Dataset']

Get all datasets from HDX (just calls search_in_hdx)

Parameters

configuration : Configuration | None — HDX configuration. Defaults to global configuration.
page_size : int — Size of page to use internally to query HDX. Defaults to 1000.
**kwargs : Any — See below
fq : string — Any filter queries to apply
rows : int — Number of matching rows to return. Defaults to all datasets (sys.maxsize).
start : int — Offset in the complete result for where the set of returned datasets should begin
sort : string — Sorting of results. Defaults to 'metadata_modified asc'.
facet : string — Whether to enable faceted results. Default to True.
facet.mincount : int — Minimum counts for facet fields should be included in the results
facet.limit : int — Maximum number of values the facet fields return (- = unlimited). Defaults to 50.
facet.field : list[str] — Fields to facet upon. Default is empty.
use_default_schema : bool — Use default package schema instead of custom schema. Defaults to False.

Returns

list['Dataset'] — list of datasets resulting from query

staticmethod Dataset.get_all_resources(datasets: Sequence['Dataset']) → list['Resource']

Get all resources from a list of datasets (such as returned by search)

Parameters

datasets : Sequence['Dataset'] — list of datasets

Returns

list['Resource'] — list of resources within those datasets

classmethod Dataset.autocomplete(name: str, limit: int = 20, configuration: Configuration | None = None) → list

Autocomplete a dataset name and return matches

Parameters

name : str — Name to autocomplete
limit : int — Maximum number of matches to return
configuration : Configuration | None — HDX configuration. Defaults to global configuration.

Returns

list — Autocomplete matches

method Dataset.get_time_period(date_format: str | None = None, today: datetime = now_utc()) → dict

Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).

Parameters

date_format : str | None — Date format. None is taken to be ISO 8601. Defaults to None.
today : datetime — Date to use for today. Defaults to now_utc().

Returns

dict — Dictionary of date information

method Dataset.set_time_period(startdate: datetime | str, enddate: datetime | str | None = None, ongoing: bool = False, ignore_timeinfo: bool = True) → None

Set time period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.

Parameters

startdate : datetime | str — Dataset start date
enddate : datetime | str | None — Dataset end date. Defaults to None.
ongoing : bool — True if ongoing, False if not. Defaults to False.
ignore_timeinfo : bool — Ignore time and time zone of date. Defaults to True.

Returns

None — None

method Dataset.set_time_period_year_range(dataset_year: str | int | Iterable, dataset_end_year: str | int | None = None) → list[int]

Set time period as a range from year or start and end year.

Parameters

dataset_year : str | int | Iterable — Dataset year given as string or int or range in an iterable
dataset_end_year : str | int | None — Dataset end year given as string or int

Returns

list[int] — The start and end year if supplied or sorted list of years

classmethod Dataset.list_valid_update_frequencies() → list[str]

List of valid update frequency values

Returns

list[str] — Allowed update frequencies

classmethod Dataset.transform_update_frequency(frequency: str | int) → str | None

Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')

Parameters

frequency : str | int — Update frequency in one format

Returns

str | None — Update frequency in alternative format or None if not valid

method Dataset.get_expected_update_frequency() → str | None

Get expected update frequency (in textual rather than numeric form)

Returns

str | None — Update frequency in textual form or None if the update frequency doesn't exist or is blank.

method Dataset.set_expected_update_frequency(update_frequency: str | int) → None

Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().

Parameters

update_frequency : str | int — Update frequency

Returns

None — None

Raises

HDXError

method Dataset.get_tags() → list[str]

Return the dataset's list of tags

Returns

list[str] — list of tags or [] if there are none

method Dataset.add_tag(tag: str, log_deleted: bool = True) → tuple[list[str], list[str]]

Add a tag

Parameters

tag : str — Tag to add
log_deleted : bool — Whether to log informational messages about deleted tags. Defaults to True.

Returns

tuple[list[str], list[str]] — Tuple containing list of added tags and list of deleted tags and tags not added

method Dataset.add_tags(tags: Sequence[str], log_deleted: bool = True) → tuple[list[str], list[str]]

Add a list of tags

Parameters

tags : Sequence[str] — List of tags to add
log_deleted : bool — Whether to log informational messages about deleted tags. Defaults to True.

Returns

tuple[list[str], list[str]] — Tuple containing list of added tags and list of deleted tags and tags not added

method Dataset.clean_tags(log_deleted: bool = True) → tuple[list[str], list[str]]

Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped

Parameters

log_deleted : bool — Whether to log informational messages about deleted tags. Defaults to True.

Returns

tuple[list[str], list[str]] — Tuple containing list of mapped tags and list of deleted tags and tags not added

method Dataset.remove_tag(tag: str) → bool

Remove a tag

Parameters

tag : str — Tag to remove

Returns

bool — True if tag removed or False if not

method Dataset.is_subnational() → bool

Return if the dataset is subnational

Returns

bool — True if the dataset is subnational, False if not

method Dataset.set_subnational(subnational: bool) → None

Set if dataset is subnational or national

Parameters

subnational : bool — True for subnational, False for national

Returns

None — None

method Dataset.get_location_iso3s(locations: Sequence[str] | None = None) → list[str]

Return the dataset's location

Parameters

locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.

Returns

list[str] — list of location iso3s

method Dataset.get_location_names(locations: Sequence[str] | None = None) → list[str]

Return the dataset's location

Parameters

locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.

Returns

list[str] — list of location names

method Dataset.add_country_location(country: str, exact: bool = True, locations: Sequence[str] | None = None, use_live: bool = True) → bool

Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.

Parameters

country : str — Country to add
exact : bool — True for exact matching or False to allow fuzzy matching. Defaults to True.
locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.
use_live : bool — Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

bool — True if country added or False if country already present

Raises

HDXError

method Dataset.add_country_locations(countries: Sequence[str], locations: Sequence[str] | None = None, use_live: bool = True) → bool

Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.

Parameters

countries : Sequence[str] — List of countries to add
locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.
use_live : bool — Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

bool — True if all countries added or False if any already present.

method Dataset.add_region_location(region: str, locations: Sequence[str] | None = None, use_live: bool = True) → bool

Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.

Parameters

region : str — M49 region, intermediate region or subregion to add
locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.
use_live : bool — Try to get use latest country data from web rather than file in package. Defaults to True.

Returns

bool — True if all countries in region added or False if any already present.

method Dataset.add_other_location(location: str, exact: bool = True, alterror: str | None = None, locations: Sequence[str] | None = None) → bool

Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.

Parameters

location : str — Location to add
exact : bool — True for exact matching or False to allow fuzzy matching. Defaults to True.
alterror : str | None — Alternative error message to builtin if location not found. Defaults to None.
locations : Sequence[str] | None — Valid locations list. Defaults to list downloaded from HDX.

Returns

bool — True if location added or False if location already present

Raises

HDXError

method Dataset.remove_location(location: str) → bool

Remove a location. If the location is already added, it is ignored.

Parameters

location : str — Location to remove

Returns

bool — True if location removed or False if not

method Dataset.get_maintainer() → User

Get the dataset's maintainer.

Returns

User — Dataset's maintainer

method Dataset.set_maintainer(maintainer: Union['User', dict, str]) → None

Set the dataset's maintainer.

Parameters

maintainer : Union['User', dict, str] — Either a user id or User metadata from a User object or dictionary.
Returns — None

Raises

HDXError

method Dataset.get_organization() → Organization

Get the dataset's organization.

Returns

Organization — Dataset's organization

method Dataset.set_organization(organization: Union['Organization', dict, str]) → None

Set the dataset's organization.

Parameters

organization : Union['Organization', dict, str] — Either an Organization id or Organization metadata from an Organization object or dictionary.
Returns — None

Raises

HDXError

method Dataset.get_showcases() → list['Showcase']

Get any showcases the dataset is in

Returns

list['Showcase'] — List of showcases

method Dataset.add_showcase(showcase: Union['Showcase', dict, str], showcases_to_check: Sequence['Showcase'] = None) → bool

Add dataset to showcase

Parameters

showcase : Union['Showcase', dict, str] — Either a showcase id or showcase metadata from a Showcase object or dictionary
showcases_to_check : Sequence['Showcase'] — List of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns

bool — True if the showcase was added, False if already present

method Dataset.add_showcases(showcases: Sequence[Union['Showcase', dict, str]], showcases_to_check: Sequence['Showcase'] = None) → bool

Add dataset to multiple showcases

Parameters

showcases : Sequence[Union['Showcase', dict, str]] — A list of either showcase ids or showcase metadata from Showcase objects or dictionaries
showcases_to_check : Sequence['Showcase'] — list of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns

bool — True if all showcases added or False if any already present

method Dataset.remove_showcase(showcase: Union['Showcase', dict, str]) → None

Remove dataset from showcase

Parameters

showcase : Union['Showcase', dict, str] — Either a showcase id string or showcase metadata from a Showcase object or dictionary

Returns

None — None

method Dataset.is_requestable() → bool

Return whether the dataset is requestable or not

Returns

bool — Whether the dataset is requestable or not

method Dataset.set_requestable(requestable: bool = True) → None

Set the dataset to be of type requestable or not

Parameters

requestable : bool — Set whether dataset is requestable. Defaults to True.

Returns

None — None

method Dataset.get_fieldnames() → list[str]

Return list of fieldnames in your data. Only applicable to requestable datasets.

Returns

list[str] — List of field names

Raises

NotRequestableError

method Dataset.add_fieldname(fieldname: str) → bool

Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.

Parameters

fieldname : str — Fieldname to add

Returns

bool — True if fieldname added or False if tag already present

Raises

NotRequestableError

method Dataset.add_fieldnames(fieldnames: Sequence[str]) → bool

Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.

Parameters

fieldnames : Sequence[str] — List of fieldnames to add

Returns

bool — True if all fieldnames added or False if any already present

Raises

NotRequestableError

method Dataset.remove_fieldname(fieldname: str) → bool

Remove a fieldname. Only applicable to requestable datasets.

Parameters

fieldname : str — Fieldname to remove

Returns

bool — True if fieldname removed or False if not

Raises

NotRequestableError

method Dataset.get_filetypes() → list[str]

Return list of filetypes in your data

Returns

list[str] — List of filetypes

method Dataset.add_filetype(filetype: str) → bool

Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.

Parameters

filetype : str — filetype to add

Returns

bool — True if filetype added or False if tag already present

Raises

NotRequestableError

method Dataset.add_filetypes(filetypes: Sequence[str]) → bool

Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.

Parameters

filetypes : Sequence[str] — list of filetypes to add

Returns

bool — True if all filetypes added or False if any already present

Raises

NotRequestableError

method Dataset.remove_filetype(filetype: str) → bool

Remove a filetype

Parameters

filetype : str — Filetype to remove

Returns

bool — True if filetype removed or False if not

Raises

NotRequestableError

method Dataset.set_custom_viz(url: str) → None

Set custom visualization url for dataset

Parameters

url : str — Custom visualization url

Returns

None — None

method Dataset.get_custom_viz() → str | None

Get custom visualization url for dataset

Returns

Custom visualization url or None

method Dataset.preview_off() → None

Set dataset preview off

Returns

None — None

method Dataset.preview_resource() → None

Set dataset preview on for an unspecified resource

Returns

None — None

method Dataset.set_preview_resource(resource: Union['Resource', dict, str, int]) → Resource

Set the resource that will be used for displaying previews in dataset preview

Parameters

resource : Union['Resource', dict, str, int] — Either resource id or name, resource metadata from a Resource object or a dictionary or position

Returns

Resource — Resource that is used for preview or None if no preview set

Raises

HDXError

method Dataset.create_default_views(create_datastore_views: bool = False) → None

Create default resource views for all resources in dataset

Parameters

create_datastore_views : bool — Whether to try to create resource views that point to the datastore

Returns

None — None

method Dataset.get_name_or_id(prefer_name: bool = True) → str | None

Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Parameters

prefer_name : bool — Whether name is preferred over id. Default to True.

Returns

str | None — HDX dataset id or name or None if not available

method Dataset.get_hdx_url(prefer_name: bool = True) → str | None

Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Parameters

prefer_name : bool — Whether name is preferred over id in url. Default to True.

Returns

str | None — Url of the dataset on HDX or None if the dataset is missing fields

method Dataset.get_api_url(prefer_name: bool = True) → str | None

Get the API url of the dataset on HDX

Parameters

prefer_name : bool — Whether name is preferred over id in url. Default to True.

Returns

str | None — API url of the dataset on HDX or None if the dataset is missing fields

Write rows to file and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values. Specific columns to include can be specified (ie. a subset of the headers).

The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

Parameters

folder : Path | str — Folder to which to write file containing rows
filename : str — Filename of file to write rows
rows : Iterable[Sequence | Mapping] — List of rows in dict or list form
resourcedata : dict — Resource data
headers : int | Sequence[str] | None — All headers. Defaults to None.
columns : Sequence[int] | Sequence[str] | None — Columns to write. Defaults to all.
format : str — Format to write. Defaults to csv.
encoding : str | None — Encoding to use. Defaults to None (infer encoding).
datecol : int | str | None — Date column for setting time period. Defaults to None (don't set).
yearcol : int | str | None — Year column for setting dataset year range. Defaults to None (don't set).
date_function : Callable[[dict], dict | None] | None — Date function to call for each row. Defaults to None.
no_empty : bool — Don't generate resource if there are no data rows. Defaults to True.

Returns

tuple[bool, dict] — (True if resource added, dictionary of results)

Raises

HDXError

Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

Optionally, headers can be inserted at specific positions. This is achieved using the header_insertions argument. If supplied, it is a list of tuples of the form (position, header) to be inserted. A function is called for each row. If supplied, it takes as arguments: headers (prior to any insertions) and row (which will be in dict or list form depending upon the dict_rows argument) and outputs a modified row.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

Parameters

downloader : BaseDownload — A Download or Retrieve object
url : str — URL to download
folder : Path | str — Folder to which to write file containing rows
filename : str — Filename of file to write rows
resourcedata : dict — Resource data
header_insertions : Sequence[tuple[int, str]] | None — List of (position, header) to insert. Defaults to None.
row_function : Callable[[list[str], dict], dict] | None — Function to call for each row. Defaults to None.
columns : Sequence[int] | Sequence[str] | None — Columns to write. Defaults to all.
format : str — Format to write. Defaults to csv.
encoding : str | None — Encoding to use. Defaults to None (infer encoding).
datecol : int | str | None — Date column for setting time period. Defaults to None (don't set).
yearcol : int | str | None — Year column for setting dataset year range. Defaults to None (don't set).
date_function : Callable[[dict], dict | None] | None — Date function to call for each row. Defaults to None.
no_empty : bool — Don't generate resource if there are no data rows. Defaults to True.
**kwargs : Any — Any additional args to pass to downloader.get_tabular_rows

Returns

tuple[bool, dict] — (True if resource added, dictionary of results)

method Dataset.add_hapi_error(error_message: str, resource_name: str | None = None, resource_id: str | None = None) → bool

Writes error messages that were uncovered while processing data for the HAPI database to a resource's metadata on HDX. If the resource already has an error message, it is only overwritten if the two messages are different.

Parameters

error_message : str — Error(s) uncovered
resource_name : str | None — Resource name. Defaults to None
resource_id : str | None — Resource id. Defaults to None

Returns

bool — True if a message was added, False if not