Skip to content

hdx.data.dataset

Dataset class containing all logic for creating, checking, and updating datasets and associated resources.

Dataset Objects

class Dataset(HDXObject)

[view_source]

Dataset class enabling operations on datasets and associated resources.

Arguments:

  • initial_data Optional[Dict] - Initial dataset metadata dictionary. Defaults to None.
  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.

actions

@staticmethod
def actions() -> Dict[str, str]

[view_source]

Dictionary of actions that can be performed on object

Returns:

Dict[str, str]: Dictionary of actions that can be performed on object

__setitem__

def __setitem__(key: Any, value: Any) -> None

[view_source]

Set dictionary items but do not allow setting of resources

Arguments:

  • key Any - Key in dictionary
  • value Any - Value to put in dictionary

Returns:

None

separate_resources

def separate_resources() -> None

[view_source]

Move contents of resources key in internal dictionary into self.resources

Returns:

None

unseparate_resources

def unseparate_resources() -> None

[view_source]

Move self.resources into resources key in internal dictionary

Returns:

None

get_dataset_dict

def get_dataset_dict() -> Dict

[view_source]

Move self.resources into resources key in internal dictionary

Returns:

  • Dict - Dataset dictionary

save_to_json

def save_to_json(path: str, follow_urls: bool = False)

[view_source]

Save dataset to JSON. If follow_urls is True, resource urls that point to datasets, HXL proxy urls etc. are followed to retrieve final urls.

Arguments:

  • path str - Path to save dataset
  • follow_urls bool - Whether to follow urls. Defaults to False.

Returns:

None

load_from_json

@staticmethod
def load_from_json(path: str) -> Optional["Dataset"]

[view_source]

Load dataset from JSON

Arguments:

  • path str - Path to load dataset

Returns:

  • Optional[Dataset] - Dataset created from JSON or None

init_resources

def init_resources() -> None

[view_source]

Initialise self.resources list

Returns:

None

add_update_resource

def add_update_resource(resource: Union["Resource", Dict, str],
                        ignore_datasetid: bool = False) -> None

[view_source]

Add new or update existing resource in dataset with new metadata

Arguments:

  • resource Union[Resource,Dict,str] - Either resource id or resource metadata from a Resource object or a dictionary
  • ignore_datasetid bool - Whether to ignore dataset id in the resource

Returns:

None

add_update_resources

def add_update_resources(resources: ListTuple[Union["Resource", Dict, str]],
                         ignore_datasetid: bool = False) -> None

[view_source]

Add new to the dataset or update existing resources with new metadata

Arguments:

  • resources ListTuple[Union[Resource,Dict,str]] - A list of either resource ids or resources metadata from either Resource objects or dictionaries
  • ignore_datasetid bool - Whether to ignore dataset id in the resource. Defaults to False.

Returns:

None

delete_resource

def delete_resource(resource: Union["Resource", Dict, str],
                    delete: bool = True) -> bool

[view_source]

Delete a resource from the dataset and also from HDX by default

Arguments:

  • resource Union[Resource,Dict,str] - Either resource id or resource metadata from a Resource object or a dictionary
  • delete bool - Whetehr to delete the resource from HDX (not just the dataset). Defaults to True.

Returns:

  • bool - True if resource removed or False if not

get_resources

def get_resources() -> List["Resource"]

[view_source]

Get dataset's resources

Returns:

  • List[Resource] - List of Resource objects

get_resource

def get_resource(index: int = 0) -> "Resource"

[view_source]

Get one resource from dataset by index

Arguments:

  • index int - Index of resource in dataset. Defaults to 0.

Returns:

  • Resource - Resource object

number_of_resources

def number_of_resources() -> int

[view_source]

Get number of dataset's resources

Returns:

  • int - Number of Resource objects

reorder_resources

def reorder_resources(resource_ids: ListTuple[str],
                      hxl_update: bool = True) -> None

[view_source]

Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

Arguments:

  • resource_ids ListTuple[str] - List of resource ids
  • hxl_update bool - Whether to call package_hxl_update. Defaults to True.

Returns:

None

update_from_yaml

def update_from_yaml(path: str = join("config",
                                      "hdx_dataset_static.yaml")) -> None

[view_source]

Update dataset metadata with static metadata from YAML file

Arguments:

  • path str - Path to YAML dataset metadata. Defaults to config/hdx_dataset_static.yaml.

Returns:

None

update_from_json

def update_from_json(path: str = join("config",
                                      "hdx_dataset_static.json")) -> None

[view_source]

Update dataset metadata with static metadata from JSON file

Arguments:

  • path str - Path to JSON dataset metadata. Defaults to config/hdx_dataset_static.json.

Returns:

None

read_from_hdx

@staticmethod
def read_from_hdx(
        identifier: str,
        configuration: Optional[Configuration] = None) -> Optional["Dataset"]

[view_source]

Reads the dataset given by identifier from HDX and returns Dataset object

Arguments:

  • identifier str - Identifier of dataset
  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.

Returns:

  • Optional[Dataset] - Dataset object if successful read, None if not

check_required_fields

def check_required_fields(ignore_fields: ListTuple[str] = tuple(),
                          allow_no_resources: bool = False,
                          **kwargs: Any) -> None

[view_source]

Check that metadata for dataset and its resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation. Prepend "resource:" for resource fields.

Arguments:

  • ignore_fields ListTuple[str] - Fields to ignore. Default is tuple().
  • allow_no_resources bool - Whether to allow no resources. Defaults to False.

Returns:

None

revise

@staticmethod
def revise(match: Dict[str, Any],
           filter: ListTuple[str] = tuple(),
           update: Dict[str, Any] = {},
           files_to_upload: Dict[str, str] = {},
           configuration: Optional[Configuration] = None,
           **kwargs: Any) -> "Dataset"

[view_source]

Revises an HDX dataset in HDX

Arguments:

  • match Dict[str,Any] - Metadata on which to match dataset
  • filter ListTuple[str] - Filters to apply. Defaults to tuple().
  • update Dict[str,Any] - Metadata updates to apply. Defaults to {}.
  • files_to_upload Dict[str,str] - Files to upload to HDX. Defaults to {}.
  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.
  • **kwargs - Additional arguments to pass to package_revise

Returns:

  • Dataset - Dataset object

update_in_hdx

def update_in_hdx(update_resources: bool = True,
                  match_resources_by_metadata: bool = True,
                  keys_to_delete: ListTuple[str] = tuple(),
                  remove_additional_resources: bool = False,
                  match_resource_order: bool = False,
                  create_default_views: bool = True,
                  hxl_update: bool = True,
                  **kwargs: Any) -> None

[view_source]

Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Arguments:

  • update_resources bool - Whether to update resources. Defaults to True.
  • match_resources_by_metadata bool - Compare resource metadata rather than position in list. Defaults to True.
  • keys_to_delete ListTuple[str] - List of top level metadata keys to delete. Defaults to tuple().
  • remove_additional_resources bool - Remove additional resources found in dataset. Defaults to False.
  • match_resource_order bool - Match order of given resources by name. Defaults to False.
  • create_default_views bool - Whether to call package_create_default_resource_views. Defaults to True.
  • hxl_update bool - Whether to call package_hxl_update. Defaults to True.
  • **kwargs - See below
  • updated_by_script str - String to identify your script. Defaults to your user agent.
  • batch str - A string you can specify to show which datasets are part of a single batch update

Returns:

None

create_in_hdx

def create_in_hdx(allow_no_resources: bool = False,
                  update_resources: bool = True,
                  match_resources_by_metadata: bool = True,
                  keys_to_delete: ListTuple[str] = tuple(),
                  remove_additional_resources: bool = False,
                  match_resource_order: bool = False,
                  create_default_views: bool = True,
                  hxl_update: bool = True,
                  **kwargs: Any) -> None

[view_source]

Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.

Arguments:

  • allow_no_resources bool - Whether to allow no resources. Defaults to False.
  • update_resources bool - Whether to update resources (if updating). Defaults to True.
  • match_resources_by_metadata bool - Compare resource metadata rather than position in list. Defaults to True.
  • keys_to_delete ListTuple[str] - List of top level metadata keys to delete. Defaults to tuple().
  • remove_additional_resources bool - Remove additional resources found in dataset (if updating). Defaults to False.
  • match_resource_order bool - Match order of given resources by name. Defaults to False.
  • create_default_views bool - Whether to call package_create_default_resource_views (if updating). Defaults to True.
  • hxl_update bool - Whether to call package_hxl_update. Defaults to True.
  • **kwargs - See below
  • updated_by_script str - String to identify your script. Defaults to your user agent.
  • batch str - A string you can specify to show which datasets are part of a single batch update

Returns:

None

delete_from_hdx

def delete_from_hdx() -> None

[view_source]

Deletes a dataset from HDX.

Returns:

None

hxl_update

def hxl_update() -> None

[view_source]

Checks dataset for HXL in resources and updates tags and other metadata to trigger HXL preview.

Returns:

None

search_in_hdx

@classmethod
def search_in_hdx(cls,
                  query: Optional[str] = "*:*",
                  configuration: Optional[Configuration] = None,
                  page_size: int = 1000,
                  **kwargs: Any) -> List["Dataset"]

[view_source]

Searches for datasets in HDX

Arguments:

  • query Optional[str] - Query (in Solr format). Defaults to ':'.
  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.
  • page_size int - Size of page to return. Defaults to 1000.
  • **kwargs - See below
  • fq string - Any filter queries to apply
  • rows int - Number of matching rows to return. Defaults to all datasets (sys.maxsize).
  • start int - Offset in the complete result for where the set of returned datasets should begin
  • sort string - Sorting of results. Defaults to 'relevance asc, metadata_modified desc' if rows<=page_size or 'metadata_modified asc' if rows>page_size.
  • facet string - Whether to enable faceted results. Default to True.
  • facet.mincount int - Minimum counts for facet fields should be included in the results
  • facet.limit int - Maximum number of values the facet fields return (- = unlimited). Defaults to 50.
  • facet.field List[str] - Fields to facet upon. Default is empty.
  • use_default_schema bool - Use default package schema instead of custom schema. Defaults to False.

Returns:

  • List[Dataset] - list of datasets resulting from query

get_all_dataset_names

@staticmethod
def get_all_dataset_names(configuration: Optional[Configuration] = None,
                          **kwargs: Any) -> List[str]

[view_source]

Get all dataset names in HDX

Arguments:

  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.
  • **kwargs - See below
  • rows int - Number of rows to return. Defaults to all datasets (sys.maxsize)
  • start int - Offset in the complete result for where the set of returned dataset names should begin

Returns:

  • List[str] - list of all dataset names in HDX

get_all_datasets

@classmethod
def get_all_datasets(cls,
                     configuration: Optional[Configuration] = None,
                     page_size: int = 1000,
                     **kwargs: Any) -> List["Dataset"]

[view_source]

Get all datasets from HDX (just calls search_in_hdx)

Arguments:

  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.
  • page_size int - Size of page to return. Defaults to 1000.
  • **kwargs - See below
  • fq string - Any filter queries to apply
  • rows int - Number of matching rows to return. Defaults to all datasets (sys.maxsize).
  • start int - Offset in the complete result for where the set of returned datasets should begin
  • sort string - Sorting of results. Defaults to 'metadata_modified asc'.
  • facet string - Whether to enable faceted results. Default to True.
  • facet.mincount int - Minimum counts for facet fields should be included in the results
  • facet.limit int - Maximum number of values the facet fields return (- = unlimited). Defaults to 50.
  • facet.field List[str] - Fields to facet upon. Default is empty.
  • use_default_schema bool - Use default package schema instead of custom schema. Defaults to False.

Returns:

  • List[Dataset] - list of datasets resulting from query

get_all_resources

@staticmethod
def get_all_resources(datasets: ListTuple["Dataset"]) -> List["Resource"]

[view_source]

Get all resources from a list of datasets (such as returned by search)

Arguments:

  • datasets ListTuple[Dataset] - list of datasets

Returns:

  • List[Resource] - list of resources within those datasets

autocomplete

@classmethod
def autocomplete(cls,
                 name: str,
                 limit: int = 20,
                 configuration: Optional[Configuration] = None) -> List

[view_source]

Autocomplete a dataset name and return matches

Arguments:

  • name str - Name to autocomplete
  • limit int - Maximum number of matches to return
  • configuration Optional[Configuration] - HDX configuration. Defaults to global configuration.

Returns:

  • List - Autocomplete matches

get_time_period

def get_time_period(date_format: Optional[str] = None,
                    today: datetime = now_utc()) -> Dict

[view_source]

Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).

Arguments:

  • date_format Optional[str] - Date format. None is taken to be ISO 8601. Defaults to None.
  • today datetime - Date to use for today. Defaults to now_utc().

Returns:

  • Dict - Dictionary of date information

set_time_period

def set_time_period(startdate: Union[datetime, str],
                    enddate: Union[datetime, str, None] = None,
                    ongoing: bool = False,
                    ignore_timeinfo: bool = True) -> None

[view_source]

Set time period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.

Arguments:

  • startdate Union[datetime, str] - Dataset start date
  • enddate Union[datetime, str, None] - Dataset end date. Defaults to None.
  • ongoing bool - True if ongoing, False if not. Defaults to False.
  • ignore_timeinfo bool - Ignore time and time zone of date. Defaults to True.

Returns:

None

set_time_period_year_range

def set_time_period_year_range(
        dataset_year: Union[str, int, Iterable],
        dataset_end_year: Optional[Union[str, int]] = None) -> List[int]

[view_source]

Set time period as a range from year or start and end year.

Arguments:

  • dataset_year Union[str, int, Iterable] - Dataset year given as string or int or range in an iterable
  • dataset_end_year Optional[Union[str, int]] - Dataset end year given as string or int

Returns:

  • List[int] - The start and end year if supplied or sorted list of years

list_valid_update_frequencies

@classmethod
def list_valid_update_frequencies(cls) -> List[str]

[view_source]

List of valid update frequency values

Returns:

  • List[str] - Allowed update frequencies

transform_update_frequency

@classmethod
def transform_update_frequency(cls, frequency: Union[str,
                                                     int]) -> Optional[str]

[view_source]

Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')

Arguments:

  • frequency Union[str, int] - Update frequency in one format

Returns:

  • Optional[str] - Update frequency in alternative format or None if not valid

get_expected_update_frequency

def get_expected_update_frequency() -> Optional[str]

[view_source]

Get expected update frequency (in textual rather than numeric form)

Returns:

  • Optional[str] - Update frequency in textual form or None if the update frequency doesn't exist or is blank.

set_expected_update_frequency

def set_expected_update_frequency(update_frequency: Union[str, int]) -> None

[view_source]

Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().

Arguments:

  • update_frequency Union[str, int] - Update frequency

Returns:

None

get_tags

def get_tags() -> List[str]

[view_source]

Return the dataset's list of tags

Returns:

  • List[str] - list of tags or [] if there are none

add_tag

def add_tag(tag: str, log_deleted: bool = True) -> Tuple[List[str], List[str]]

[view_source]

Add a tag

Arguments:

  • tag str - Tag to add
  • log_deleted bool - Whether to log informational messages about deleted tags. Defaults to True.

Returns:

Tuple[List[str], List[str]]: Tuple containing list of added tags and list of deleted tags and tags not added

add_tags

def add_tags(tags: ListTuple[str],
             log_deleted: bool = True) -> Tuple[List[str], List[str]]

[view_source]

Add a list of tags

Arguments:

  • tags ListTuple[str] - List of tags to add
  • log_deleted bool - Whether to log informational messages about deleted tags. Defaults to True.

Returns:

Tuple[List[str], List[str]]: Tuple containing list of added tags and list of deleted tags and tags not added

clean_tags

def clean_tags(log_deleted: bool = True) -> Tuple[List[str], List[str]]

[view_source]

Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped

Arguments:

  • log_deleted bool - Whether to log informational messages about deleted tags. Defaults to True.

Returns:

Tuple[List[str], List[str]]: Tuple containing list of mapped tags and list of deleted tags and tags not added

remove_tag

def remove_tag(tag: str) -> bool

[view_source]

Remove a tag

Arguments:

  • tag str - Tag to remove

Returns:

  • bool - True if tag removed or False if not

is_subnational

def is_subnational() -> bool

[view_source]

Return if the dataset is subnational

Returns:

  • bool - True if the dataset is subnational, False if not

set_subnational

def set_subnational(subnational: bool) -> None

[view_source]

Set if dataset is subnational or national

Arguments:

  • subnational bool - True for subnational, False for national

Returns:

None

get_location_iso3s

def get_location_iso3s(
        locations: Optional[ListTuple[str]] = None) -> List[str]

[view_source]

Return the dataset's location

Arguments:

  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.

Returns:

  • List[str] - list of location iso3s

get_location_names

def get_location_names(
        locations: Optional[ListTuple[str]] = None) -> List[str]

[view_source]

Return the dataset's location

Arguments:

  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.

Returns:

  • List[str] - list of location names

add_country_location

def add_country_location(country: str,
                         exact: bool = True,
                         locations: Optional[ListTuple[str]] = None,
                         use_live: bool = True) -> bool

[view_source]

Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.

Arguments:

  • country str - Country to add
  • exact bool - True for exact matching or False to allow fuzzy matching. Defaults to True.
  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
  • use_live bool - Try to get use latest country data from web rather than file in package. Defaults to True.

Returns:

  • bool - True if country added or False if country already present

add_country_locations

def add_country_locations(countries: ListTuple[str],
                          locations: Optional[ListTuple[str]] = None,
                          use_live: bool = True) -> bool

[view_source]

Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.

Arguments:

  • countries ListTuple[str] - List of countries to add
  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
  • use_live bool - Try to get use latest country data from web rather than file in package. Defaults to True.

Returns:

  • bool - True if all countries added or False if any already present.

add_region_location

def add_region_location(region: str,
                        locations: Optional[ListTuple[str]] = None,
                        use_live: bool = True) -> bool

[view_source]

Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.

Arguments:

  • region str - M49 region, intermediate region or subregion to add
  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
  • use_live bool - Try to get use latest country data from web rather than file in package. Defaults to True.

Returns:

  • bool - True if all countries in region added or False if any already present.

add_other_location

def add_other_location(location: str,
                       exact: bool = True,
                       alterror: Optional[str] = None,
                       locations: Optional[ListTuple[str]] = None) -> bool

[view_source]

Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.

Arguments:

  • location str - Location to add
  • exact bool - True for exact matching or False to allow fuzzy matching. Defaults to True.
  • alterror Optional[str] - Alternative error message to builtin if location not found. Defaults to None.
  • locations Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.

Returns:

  • bool - True if location added or False if location already present

remove_location

def remove_location(location: str) -> bool

[view_source]

Remove a location. If the location is already added, it is ignored.

Arguments:

  • location str - Location to remove

Returns:

  • bool - True if location removed or False if not

get_maintainer

def get_maintainer() -> "User"

[view_source]

Get the dataset's maintainer.

Returns:

  • User - Dataset's maintainer

set_maintainer

def set_maintainer(maintainer: Union["User", Dict, str]) -> None

[view_source]

Set the dataset's maintainer.

Arguments:

  • maintainer Union[User,Dict,str] - Either a user id or User metadata from a User object or dictionary.

Returns:

None

get_organization

def get_organization() -> "Organization"

[view_source]

Get the dataset's organization.

Returns:

  • Organization - Dataset's organization

set_organization

def set_organization(organization: Union["Organization", Dict, str]) -> None

[view_source]

Set the dataset's organization.

Arguments:

  • organization Union[Organization,Dict,str] - Either an Organization id or Organization metadata from an Organization object or dictionary.

Returns:

None

get_showcases

def get_showcases() -> List["Showcase"]

[view_source]

Get any showcases the dataset is in

Returns:

  • List[Showcase] - List of showcases

add_showcase

def add_showcase(showcase: Union["Showcase", Dict, str],
                 showcases_to_check: ListTuple["Showcase"] = None) -> bool

[view_source]

Add dataset to showcase

Arguments:

  • showcase Union[Showcase,Dict,str] - Either a showcase id or showcase metadata from a Showcase object or dictionary
  • showcases_to_check ListTuple[Showcase] - List of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns:

  • bool - True if the showcase was added, False if already present

add_showcases

def add_showcases(showcases: ListTuple[Union["Showcase", Dict, str]],
                  showcases_to_check: ListTuple["Showcase"] = None) -> bool

[view_source]

Add dataset to multiple showcases

Arguments:

  • showcases ListTuple[Union[Showcase,Dict,str]] - A list of either showcase ids or showcase metadata from Showcase objects or dictionaries
  • showcases_to_check ListTuple[Showcase] - list of showcases against which to check existence of showcase. Defaults to showcases containing dataset.

Returns:

  • bool - True if all showcases added or False if any already present

remove_showcase

def remove_showcase(showcase: Union["Showcase", Dict, str]) -> None

[view_source]

Remove dataset from showcase

Arguments:

  • showcase Union[Showcase,Dict,str] - Either a showcase id string or showcase metadata from a Showcase object or dictionary

Returns:

None

is_requestable

def is_requestable() -> bool

[view_source]

Return whether the dataset is requestable or not

Returns:

  • bool - Whether the dataset is requestable or not

set_requestable

def set_requestable(requestable: bool = True) -> None

[view_source]

Set the dataset to be of type requestable or not

Arguments:

  • requestable bool - Set whether dataset is requestable. Defaults to True.

Returns:

None

get_fieldnames

def get_fieldnames() -> List[str]

[view_source]

Return list of fieldnames in your data. Only applicable to requestable datasets.

Returns:

  • List[str] - List of field names

add_fieldname

def add_fieldname(fieldname: str) -> bool

[view_source]

Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.

Arguments:

  • fieldname str - Fieldname to add

Returns:

  • bool - True if fieldname added or False if tag already present

add_fieldnames

def add_fieldnames(fieldnames: ListTuple[str]) -> bool

[view_source]

Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.

Arguments:

  • fieldnames ListTuple[str] - List of fieldnames to add

Returns:

  • bool - True if all fieldnames added or False if any already present

remove_fieldname

def remove_fieldname(fieldname: str) -> bool

[view_source]

Remove a fieldname. Only applicable to requestable datasets.

Arguments:

  • fieldname str - Fieldname to remove

Returns:

  • bool - True if fieldname removed or False if not

get_filetypes

def get_filetypes() -> List[str]

[view_source]

Return list of filetypes in your data

Returns:

  • List[str] - List of filetypes

add_filetype

def add_filetype(filetype: str) -> bool

[view_source]

Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.

Arguments:

  • filetype str - filetype to add

Returns:

  • bool - True if filetype added or False if tag already present

add_filetypes

def add_filetypes(filetypes: ListTuple[str]) -> bool

[view_source]

Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.

Arguments:

  • filetypes ListTuple[str] - list of filetypes to add

Returns:

  • bool - True if all filetypes added or False if any already present

remove_filetype

def remove_filetype(filetype: str) -> bool

[view_source]

Remove a filetype

Arguments:

  • filetype str - Filetype to remove

Returns:

  • bool - True if filetype removed or False if not

preview_off

def preview_off() -> None

[view_source]

Set dataset preview off

Returns:

None

preview_resource

def preview_resource() -> None

[view_source]

Set dataset preview on for an unspecified resource

Returns:

None

set_quickchart_resource

def set_quickchart_resource(
        resource: Union["Resource", Dict, str, int]) -> "Resource"

[view_source]

Set the resource that will be used for displaying QuickCharts in dataset preview

Arguments:

  • resource Union[Resource,Dict,str,int] - Either resource id or name, resource metadata from a Resource object or a dictionary or position

Returns:

  • Resource - Resource that is used for preview or None if no preview set

quickcharts_resource_last

def quickcharts_resource_last() -> bool

[view_source]

Move the QuickCharts resource to be last. Assumes that it's name begins 'QuickCharts-'.

Returns:

  • bool - True if QuickCharts resource found, False if not

create_default_views

def create_default_views(create_datastore_views: bool = False) -> None

[view_source]

Create default resource views for all resources in dataset

Arguments:

  • create_datastore_views bool - Whether to try to create resource views that point to the datastore

Returns:

None

generate_quickcharts

def generate_quickcharts(
        resource: Union["Resource", Dict, str, int] = 0,
        path: Optional[str] = None,
        bites_disabled: Optional[ListTuple[bool]] = None,
        indicators: Optional[ListTuple[Dict]] = None,
        findreplace: Optional[Dict] = None) -> resource_view.ResourceView

[view_source]

Create QuickCharts for the given resource in a dataset. If you do not supply a path, then the internal indicators resource view template will be used. You can disable specific bites by providing bites_disabled, a list of 3 bools where True indicates a specific bite is disabled and False indicates leave enabled. The parameter indicators is a list with 3 dictionaries of form: {"code": "MY_INDICATOR_CODE", "title": "MY_INDICATOR_TITLE", "unit": "MY_INDICATOR_UNIT"}. Optionally, the following defaults can be overridden in the parameter indicators: {"code_col": "indicator+code", "value_col": "indicator+value+num", "date_col": "date+year", "date_format": "%Y", "aggregate_col": "null"}.

Creation of the resource view will be delayed until after the next dataset create or update if a resource id is not yet available and will be disabled if there are no valid charts to display.

Arguments:

  • resource Union[Resource,Dict,str,int] - Either resource id or name, resource metadata from a Resource object or a dictionary or position. Defaults to 0.
  • path Optional[str] - Path to YAML resource view metadata. Defaults to None (config/hdx_resource_view_static.yaml or internal template).
  • bites_disabled Optional[ListTuple[bool]] - Which QC bites should be disabled. Defaults to None (all bites enabled).
  • indicators Optional[ListTuple[Dict]] - Indicator codes, QC titles and units for resource view template. Defaults to None (don't use template).
  • findreplace Optional[Dict] - Replacements for anything else in resource view. Defaults to None.

Returns:

  • resource_view.ResourceView - The resource view if QuickCharts created, None is not

get_name_or_id

def get_name_or_id(prefer_name: bool = True) -> Optional[str]

[view_source]

Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Arguments:

  • prefer_name bool - Whether name is preferred over id. Default to True.

Returns:

  • Optional[str] - HDX dataset id or name or None if not available

get_hdx_url

def get_hdx_url(prefer_name: bool = True) -> Optional[str]

[view_source]

Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.

Arguments:

  • prefer_name bool - Whether name is preferred over id in url. Default to True.

Returns:

  • Optional[str] - Url of the dataset on HDX or None if the dataset is missing fields

get_api_url

def get_api_url(prefer_name: bool = True) -> Optional[str]

[view_source]

Get the API url of the dataset on HDX

Arguments:

  • prefer_name bool - Whether name is preferred over id in url. Default to True.

Returns:

  • Optional[str] - API url of the dataset on HDX or None if the dataset is missing fields

remove_dates_from_title

def remove_dates_from_title(
        change_title: bool = True,
        set_time_period: bool = False) -> List[Tuple[datetime, datetime]]

[view_source]

Remove dates from dataset title returning sorted the dates that were found in title. The title in the dataset metadata will be changed by default. The dataset's metadata field time period will not be changed by default, but if set_time_period is True, then the range with the lowest start date will be used to set the time period field.

Arguments:

  • change_title bool - Whether to change the dataset title. Defaults to True.
  • set_time_period bool - Whether to set time period from date or range in title. Defaults to False.

Returns:

  • List[Tuple[datetime,datetime]] - Date ranges found in title

generate_resource_from_rows

def generate_resource_from_rows(folder: str,
                                filename: str,
                                rows: List[ListTupleDict],
                                resourcedata: Dict,
                                headers: Optional[ListTuple[str]] = None,
                                encoding: Optional[str] = None) -> "Resource"

[view_source]

Write rows to csv and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values.

Arguments:

  • folder str - Folder to which to write file containing rows
  • filename str - Filename of file to write rows
  • rows List[ListTupleDict] - List of rows in dict or list form
  • resourcedata Dict - Resource data
  • headers Optional[ListTuple[str]] - List of headers. Defaults to None.
  • encoding Optional[str] - Encoding to use. Defaults to None (infer encoding).

Returns:

  • Resource - The created resource

generate_qc_resource_from_rows

def generate_qc_resource_from_rows(
        folder: str,
        filename: str,
        rows: List[Dict],
        resourcedata: Dict,
        hxltags: Dict[str, str],
        columnname: str,
        qc_identifiers: ListTuple[str],
        headers: Optional[ListTuple[str]] = None,
        encoding: Optional[str] = None) -> Optional["Resource"]

[view_source]

Generate QuickCharts rows by cutting down input rows by relevant identifiers and optionally restricting to certain columns. Output to csv and create resource, adding it to the dataset.

Arguments:

  • folder str - Folder to which to write file containing rows
  • filename str - Filename of file to write rows
  • rows List[Dict] - List of rows in dict form
  • resourcedata Dict - Resource data
  • hxltags Dict[str,str] - Header to HXL hashtag mapping
  • columnname str - Name of column containing identifier
  • qc_identifiers ListTuple[str] - List of ids to match
  • headers Optional[ListTuple[str]] - List of headers to output. Defaults to None (all headers).
  • encoding Optional[str] - Encoding to use. Defaults to None (infer encoding).

Returns:

  • Optional[Resource] - The created resource or None

generate_resource_from_iterable

def generate_resource_from_iterable(
        headers: ListTuple[str],
        iterable: Iterable[Union[ListTuple, Dict]],
        hxltags: Dict[str, str],
        folder: str,
        filename: str,
        resourcedata: Dict,
        datecol: Optional[Union[int, str]] = None,
        yearcol: Optional[Union[int, str]] = None,
        date_function: Optional[Callable[[Dict], Optional[Dict]]] = None,
        quickcharts: Optional[Dict] = None,
        encoding: Optional[str] = None) -> Tuple[bool, Dict]

[view_source]

Given headers and an iterable, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

If the parameter quickcharts is supplied then various QuickCharts related actions will occur depending upon the keys given in the dictionary and the returned dictionary will contain the QuickCharts resource in the key qc_resource. If the keys: hashtag - the HXL hashtag to examine - and values - the 3 values to look for in that column - are supplied, then a list of booleans indicating which QuickCharts bites should be enabled will be returned in the key bites_disabled in the returned dictionary. For the 3 values, if the key: numeric_hashtag is supplied then if that column for a given value contains no numbers, then the corresponding bite will be disabled. If the key: cutdown is given, if it is 1, then a separate cut down list is created containing only columns with HXL hashtags and rows with desired values (if hashtag and values are supplied) for the purpose of driving QuickCharts. It is returned in the key qcrows in the returned dictionary with the matching headers in qcheaders. If cutdown is 2, then a resource is created using the cut down list. If the key cutdownhashtags is supplied, then only the provided hashtags are used for cutting down otherwise the full list of HXL tags is used.

Arguments:

  • headers ListTuple[str] - Headers
  • iterable Iterable[Union[ListTuple,Dict]] - Iterable returning rows
  • hxltags Dict[str,str] - Header to HXL hashtag mapping
  • folder str - Folder to which to write file containing rows
  • filename str - Filename of file to write rows
  • resourcedata Dict - Resource data
  • datecol Optional[Union[int,str]] - Date column for setting time period. Defaults to None (don't set).
  • yearcol Optional[Union[int,str]] - Year column for setting dataset year range. Defaults to None (don't set).
  • date_function Optional[Callable[[Dict],Optional[Dict]]] - Date function to call for each row. Defaults to None.
  • quickcharts Optional[Dict] - Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtags
  • encoding Optional[str] - Encoding to use. Defaults to None (infer encoding).

Returns:

Tuple[bool, Dict]: (True if resource added, dictionary of results)

download_and_generate_resource

def download_and_generate_resource(
        downloader: BaseDownload,
        url: str,
        hxltags: Dict[str, str],
        folder: str,
        filename: str,
        resourcedata: Dict,
        header_insertions: Optional[ListTuple[Tuple[int, str]]] = None,
        row_function: Optional[Callable[[List[str], Dict], Dict]] = None,
        datecol: Optional[str] = None,
        yearcol: Optional[str] = None,
        date_function: Optional[Callable[[Dict], Optional[Dict]]] = None,
        quickcharts: Optional[Dict] = None,
        **kwargs: Any) -> Tuple[bool, Dict]

[view_source]

Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.

Optionally, headers can be inserted at specific positions. This is achieved using the header_insertions argument. If supplied, it is a list of tuples of the form (position, header) to be inserted. A function is called for each row. If supplied, it takes as arguments: headers (prior to any insertions) and row (which will be in dict or list form depending upon the dict_rows argument) and outputs a modified row.

The time period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the time period and are returned in the results dictionary in keys startdate and enddate.

If the parameter quickcharts is supplied then various QuickCharts related actions will occur depending upon the keys given in the dictionary and the returned dictionary will contain the QuickCharts resource in the key qc_resource. If the keys: hashtag - the HXL hashtag to examine - and values - the 3 values to look for in that column - are supplied, then a list of booleans indicating which QuickCharts bites should be enabled will be returned in the key bites_disabled in the returned dictionary. For the 3 values, if the key: numeric_hashtag is supplied then if that column for a given value contains no numbers, then the corresponding bite will be disabled. If the key: cutdown is given, if it is 1, then a separate cut down list is created containing only columns with HXL hashtags and rows with desired values (if hashtag and values are supplied) for the purpose of driving QuickCharts. It is returned in the key qcrows in the returned dictionary with the matching headers in qcheaders. If cutdown is 2, then a resource is created using the cut down list. If the key cutdownhashtags is supplied, then only the provided hashtags are used for cutting down otherwise the full list of HXL tags is used.

Arguments:

  • downloader BaseDownload - A Download or Retrieve object
  • url str - URL to download
  • hxltags Dict[str,str] - Header to HXL hashtag mapping
  • folder str - Folder to which to write file containing rows
  • filename str - Filename of file to write rows
  • resourcedata Dict - Resource data
  • header_insertions Optional[ListTuple[Tuple[int,str]]] - List of (position, header) to insert. Defaults to None.
  • row_function Optional[Callable[[List[str],Dict],Dict]] - Function to call for each row. Defaults to None.
  • datecol Optional[str] - Date column for setting time period. Defaults to None (don't set).
  • yearcol Optional[str] - Year column for setting dataset year range. Defaults to None (don't set).
  • date_function Optional[Callable[[Dict],Optional[Dict]]] - Date function to call for each row. Defaults to None.
  • quickcharts Optional[Dict] - Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtags
  • **kwargs - Any additional args to pass to downloader.get_tabular_rows

Returns:

Tuple[bool, Dict]: (True if resource added, dictionary of results)