hdx.data.dataset
Dataset class containing all logic for creating, checking, and updating datasets and associated resources.
Dataset Objects
class Dataset(HDXObject)
Dataset class enabling operations on datasets and associated resources.
Arguments:
initial_data
Optional[Dict] - Initial dataset metadata dictionary. Defaults to None.configuration
Optional[Configuration] - HDX configuration. Defaults to global configuration.
actions
@staticmethod
def actions() -> Dict[str, str]
Dictionary of actions that can be performed on object
Returns:
Dict[str, str]: Dictionary of actions that can be performed on object
__setitem__
def __setitem__(key: Any, value: Any) -> None
Set dictionary items but do not allow setting of resources
Arguments:
key
Any - Key in dictionaryvalue
Any - Value to put in dictionary
Returns:
None
separate_resources
def separate_resources() -> None
Move contents of resources key in internal dictionary into self.resources
Returns:
None
unseparate_resources
def unseparate_resources() -> None
Move self.resources into resources key in internal dictionary
Returns:
None
get_dataset_dict
def get_dataset_dict() -> Dict
Move self.resources into resources key in internal dictionary
Returns:
Dict
- Dataset dictionary
save_to_json
def save_to_json(path: str, follow_urls: bool = False)
Save dataset to JSON. If follow_urls is True, resource urls that point to datasets, HXL proxy urls etc. are followed to retrieve final urls.
Arguments:
path
str - Path to save datasetfollow_urls
bool - Whether to follow urls. Defaults to False.
Returns:
None
load_from_json
@staticmethod
def load_from_json(path: str) -> Optional["Dataset"]
Load dataset from JSON
Arguments:
path
str - Path to load dataset
Returns:
Optional[Dataset]
- Dataset created from JSON or None
init_resources
def init_resources() -> None
Initialise self.resources list
Returns:
None
add_update_resource
def add_update_resource(resource: Union["Resource", Dict, str],
ignore_datasetid: bool = False) -> None
Add new or update existing resource in dataset with new metadata
Arguments:
resource
Union[Resource,Dict,str] - Either resource id or resource metadata from a Resource object or a dictionaryignore_datasetid
bool - Whether to ignore dataset id in the resource
Returns:
None
add_update_resources
def add_update_resources(resources: ListTuple[Union["Resource", Dict, str]],
ignore_datasetid: bool = False) -> None
Add new to the dataset or update existing resources with new metadata
Arguments:
resources
ListTuple[Union[Resource,Dict,str]] - A list of either resource ids or resources metadata from either Resource objects or dictionariesignore_datasetid
bool - Whether to ignore dataset id in the resource. Defaults to False.
Returns:
None
delete_resource
def delete_resource(resource: Union["Resource", Dict, str],
delete: bool = True) -> bool
Delete a resource from the dataset and also from HDX by default
Arguments:
resource
Union[Resource,Dict,str] - Either resource id or resource metadata from a Resource object or a dictionarydelete
bool - Whetehr to delete the resource from HDX (not just the dataset). Defaults to True.
Returns:
bool
- True if resource removed or False if not
get_resources
def get_resources() -> List["Resource"]
Get dataset's resources
Returns:
List[Resource]
- List of Resource objects
get_resource
def get_resource(index: int = 0) -> "Resource"
Get one resource from dataset by index
Arguments:
index
int - Index of resource in dataset. Defaults to 0.
Returns:
Resource
- Resource object
number_of_resources
def number_of_resources() -> int
Get number of dataset's resources
Returns:
int
- Number of Resource objects
reorder_resources
def reorder_resources(resource_ids: ListTuple[str],
hxl_update: bool = True) -> None
Reorder resources in dataset according to provided list. Resources are updated in the dataset object to match new order. However, the dataset is not refreshed by rereading from HDX. If only some resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.
Arguments:
resource_ids
ListTuple[str] - List of resource idshxl_update
bool - Whether to call package_hxl_update. Defaults to True.
Returns:
None
update_from_yaml
def update_from_yaml(path: str = join("config",
"hdx_dataset_static.yaml")) -> None
Update dataset metadata with static metadata from YAML file
Arguments:
path
str - Path to YAML dataset metadata. Defaults to config/hdx_dataset_static.yaml.
Returns:
None
update_from_json
def update_from_json(path: str = join("config",
"hdx_dataset_static.json")) -> None
Update dataset metadata with static metadata from JSON file
Arguments:
path
str - Path to JSON dataset metadata. Defaults to config/hdx_dataset_static.json.
Returns:
None
read_from_hdx
@staticmethod
def read_from_hdx(
identifier: str,
configuration: Optional[Configuration] = None) -> Optional["Dataset"]
Reads the dataset given by identifier from HDX and returns Dataset object
Arguments:
identifier
str - Identifier of datasetconfiguration
Optional[Configuration] - HDX configuration. Defaults to global configuration.
Returns:
Optional[Dataset]
- Dataset object if successful read, None if not
check_required_fields
def check_required_fields(ignore_fields: ListTuple[str] = tuple(),
allow_no_resources: bool = False,
**kwargs: Any) -> None
Check that metadata for dataset and its resources is complete. The parameter ignore_fields should be set if required to any fields that should be ignored for the particular operation. Prepend "resource:" for resource fields.
Arguments:
ignore_fields
ListTuple[str] - Fields to ignore. Default is tuple().allow_no_resources
bool - Whether to allow no resources. Defaults to False.
Returns:
None
revise
@staticmethod
def revise(match: Dict[str, Any],
filter: ListTuple[str] = tuple(),
update: Dict[str, Any] = dict(),
files_to_upload: Dict[str, str] = dict(),
configuration: Optional[Configuration] = None,
**kwargs: Any) -> "Dataset"
Revises an HDX dataset in HDX
Arguments:
match
Dict[str,Any] - Metadata on which to match datasetfilter
ListTuple[str] - Filters to apply. Defaults to tuple().update
Dict[str,Any] - Metadata updates to apply. Defaults to dict().files_to_upload
Dict[str,str] - Files to upload to HDX. Defaults to dict().configuration
Optional[Configuration] - HDX configuration. Defaults to global configuration.**kwargs
- Additional arguments to pass to package_revise
Returns:
Dataset
- Dataset object
update_in_hdx
def update_in_hdx(update_resources: bool = True,
match_resources_by_metadata: bool = True,
keys_to_delete: ListTuple[str] = tuple(),
remove_additional_resources: bool = False,
match_resource_order: bool = False,
create_default_views: bool = True,
hxl_update: bool = True,
**kwargs: Any) -> None
Check if dataset exists in HDX and if so, update it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.
Arguments:
update_resources
bool - Whether to update resources. Defaults to True.match_resources_by_metadata
bool - Compare resource metadata rather than position in list. Defaults to True.keys_to_delete
ListTuple[str] - List of top level metadata keys to delete. Defaults to tuple().remove_additional_resources
bool - Remove additional resources found in dataset. Defaults to False.match_resource_order
bool - Match order of given resources by name. Defaults to False.create_default_views
bool - Whether to call package_create_default_resource_views. Defaults to True.hxl_update
bool - Whether to call package_hxl_update. Defaults to True.**kwargs
- See belowupdated_by_script
str - String to identify your script. Defaults to your user agent.batch
str - A string you can specify to show which datasets are part of a single batch update
Returns:
None
create_in_hdx
def create_in_hdx(allow_no_resources: bool = False,
update_resources: bool = True,
match_resources_by_metadata: bool = True,
keys_to_delete: ListTuple[str] = tuple(),
remove_additional_resources: bool = False,
match_resource_order: bool = False,
create_default_views: bool = True,
hxl_update: bool = True,
**kwargs: Any) -> None
Check if dataset exists in HDX and if so, update it, otherwise create it. match_resources_by_metadata uses ids if they are available, otherwise names only if names are unique or format in addition if not.
Arguments:
allow_no_resources
bool - Whether to allow no resources. Defaults to False.update_resources
bool - Whether to update resources (if updating). Defaults to True.match_resources_by_metadata
bool - Compare resource metadata rather than position in list. Defaults to True.keys_to_delete
ListTuple[str] - List of top level metadata keys to delete. Defaults to tuple().remove_additional_resources
bool - Remove additional resources found in dataset (if updating). Defaults to False.match_resource_order
bool - Match order of given resources by name. Defaults to False.create_default_views
bool - Whether to call package_create_default_resource_views (if updating). Defaults to True.hxl_update
bool - Whether to call package_hxl_update. Defaults to True.**kwargs
- See belowupdated_by_script
str - String to identify your script. Defaults to your user agent.batch
str - A string you can specify to show which datasets are part of a single batch update
Returns:
None
delete_from_hdx
def delete_from_hdx() -> None
Deletes a dataset from HDX.
Returns:
None
hxl_update
def hxl_update() -> None
Checks dataset for HXL in resources and updates tags and other metadata to trigger HXL preview.
Returns:
None
search_in_hdx
@classmethod
def search_in_hdx(cls,
query: Optional[str] = "*:*",
configuration: Optional[Configuration] = None,
page_size: int = 1000,
**kwargs: Any) -> List["Dataset"]
Searches for datasets in HDX
Arguments:
query
Optional[str] - Query (in Solr format). Defaults to ':'.configuration
Optional[Configuration] - HDX configuration. Defaults to global configuration.page_size
int - Size of page to return. Defaults to 1000.**kwargs
- See belowfq
string - Any filter queries to applyrows
int - Number of matching rows to return. Defaults to all datasets (sys.maxsize).start
int - Offset in the complete result for where the set of returned datasets should beginsort
string - Sorting of results. Defaults to 'relevance asc, metadata_modified desc' if rows<=page_size or 'metadata_modified asc' if rows>page_size.facet
string - Whether to enable faceted results. Default to True.facet.mincount
int - Minimum counts for facet fields should be included in the resultsfacet.limit
int - Maximum number of values the facet fields return (- = unlimited). Defaults to 50.facet.field
List[str] - Fields to facet upon. Default is empty.use_default_schema
bool - Use default package schema instead of custom schema. Defaults to False.
Returns:
List[Dataset]
- list of datasets resulting from query
get_all_dataset_names
@staticmethod
def get_all_dataset_names(configuration: Optional[Configuration] = None,
**kwargs: Any) -> List[str]
Get all dataset names in HDX
Arguments:
configuration
Optional[Configuration] - HDX configuration. Defaults to global configuration.**kwargs
- See belowrows
int - Number of rows to return. Defaults to all datasets (sys.maxsize)start
int - Offset in the complete result for where the set of returned dataset names should begin
Returns:
List[str]
- list of all dataset names in HDX
get_all_datasets
@classmethod
def get_all_datasets(cls,
configuration: Optional[Configuration] = None,
page_size: int = 1000,
**kwargs: Any) -> List["Dataset"]
Get all datasets from HDX (just calls search_in_hdx)
Arguments:
configuration
Optional[Configuration] - HDX configuration. Defaults to global configuration.page_size
int - Size of page to return. Defaults to 1000.**kwargs
- See belowfq
string - Any filter queries to applyrows
int - Number of matching rows to return. Defaults to all datasets (sys.maxsize).start
int - Offset in the complete result for where the set of returned datasets should beginsort
string - Sorting of results. Defaults to 'metadata_modified asc'.facet
string - Whether to enable faceted results. Default to True.facet.mincount
int - Minimum counts for facet fields should be included in the resultsfacet.limit
int - Maximum number of values the facet fields return (- = unlimited). Defaults to 50.facet.field
List[str] - Fields to facet upon. Default is empty.use_default_schema
bool - Use default package schema instead of custom schema. Defaults to False.
Returns:
List[Dataset]
- list of datasets resulting from query
get_all_resources
@staticmethod
def get_all_resources(datasets: ListTuple["Dataset"]) -> List["Resource"]
Get all resources from a list of datasets (such as returned by search)
Arguments:
datasets
ListTuple[Dataset] - list of datasets
Returns:
List[Resource]
- list of resources within those datasets
autocomplete
@classmethod
def autocomplete(cls,
name: str,
limit: int = 20,
configuration: Optional[Configuration] = None) -> List
Autocomplete a dataset name and return matches
Arguments:
name
str - Name to autocompletelimit
int - Maximum number of matches to returnconfiguration
Optional[Configuration] - HDX configuration. Defaults to global configuration.
Returns:
List
- Autocomplete matches
get_reference_period
def get_reference_period(date_format: Optional[str] = None,
today: datetime = now_utc()) -> Dict
Get dataset date as datetimes and strings in specified format. If no format is supplied, the ISO 8601 format is used. Returns a dictionary containing keys startdate (start date as datetime), enddate (end date as datetime), startdate_str (start date as string), enddate_str (end date as string) and ongoing (whether the end date is a rolls forward every day).
Arguments:
date_format
Optional[str] - Date format. None is taken to be ISO 8601. Defaults to None.today
datetime - Date to use for today. Defaults to now_utc().
Returns:
Dict
- Dictionary of date information
set_reference_period
def set_reference_period(startdate: Union[datetime, str],
enddate: Union[datetime, str, None] = None,
ongoing: bool = False,
ignore_timeinfo: bool = True) -> None
Set reference period from either datetime objects or strings. Any time and time zone information will be ignored by default (meaning that the time of the start date is set to 00:00:00, the time of any end date is set to 23:59:59 and the time zone is set to UTC). To have the time and time zone accounted for, set ignore_timeinfo to False. In this case, the time will be converted to UTC.
Arguments:
startdate
Union[datetime, str] - Dataset start dateenddate
Union[datetime, str, None] - Dataset end date. Defaults to None.ongoing
bool - True if ongoing, False if not. Defaults to False.ignore_timeinfo
bool - Ignore time and time zone of date. Defaults to True.
Returns:
None
set_reference_period_year_range
def set_reference_period_year_range(
dataset_year: Union[str, int, Iterable],
dataset_end_year: Optional[Union[str, int]] = None) -> List[int]
Set reference period as a range from year or start and end year.
Arguments:
dataset_year
Union[str, int, Iterable] - Dataset year given as string or int or range in an iterabledataset_end_year
Optional[Union[str, int]] - Dataset end year given as string or int
Returns:
List[int]
- The start and end year if supplied or sorted list of years
list_valid_update_frequencies
@classmethod
def list_valid_update_frequencies(cls) -> List[str]
List of valid update frequency values
Returns:
List[str]
- Allowed update frequencies
transform_update_frequency
@classmethod
def transform_update_frequency(cls, frequency: Union[str,
int]) -> Optional[str]
Get numeric update frequency (as string since that is required field format) from textual representation or vice versa (eg. 'Every month' = '30', '30' or 30 = 'Every month')
Arguments:
frequency
Union[str, int] - Update frequency in one format
Returns:
Optional[str]
- Update frequency in alternative format or None if not valid
get_expected_update_frequency
def get_expected_update_frequency() -> Optional[str]
Get expected update frequency (in textual rather than numeric form)
Returns:
Optional[str]
- Update frequency in textual form or None if the update frequency doesn't exist or is blank.
set_expected_update_frequency
def set_expected_update_frequency(update_frequency: Union[str, int]) -> None
Set expected update frequency. You can pass frequencies like "Every week" or '7' or 7. Valid values for update frequency can be found from Dataset.list_valid_update_frequencies().
Arguments:
update_frequency
Union[str, int] - Update frequency
Returns:
None
get_tags
def get_tags() -> List[str]
Return the dataset's list of tags
Returns:
List[str]
- list of tags or [] if there are none
add_tag
def add_tag(tag: str, log_deleted: bool = True) -> Tuple[List[str], List[str]]
Add a tag
Arguments:
tag
str - Tag to addlog_deleted
bool - Whether to log informational messages about deleted tags. Defaults to True.
Returns:
Tuple[List[str], List[str]]: Tuple containing list of added tags and list of deleted tags and tags not added
add_tags
def add_tags(tags: ListTuple[str],
log_deleted: bool = True) -> Tuple[List[str], List[str]]
Add a list of tags
Arguments:
tags
ListTuple[str] - List of tags to addlog_deleted
bool - Whether to log informational messages about deleted tags. Defaults to True.
Returns:
Tuple[List[str], List[str]]: Tuple containing list of added tags and list of deleted tags and tags not added
clean_tags
def clean_tags(log_deleted: bool = True) -> Tuple[List[str], List[str]]
Clean tags in an HDX object according to tags cleanup spreadsheet, deleting invalid tags that cannot be mapped
Arguments:
log_deleted
bool - Whether to log informational messages about deleted tags. Defaults to True.
Returns:
Tuple[List[str], List[str]]: Tuple containing list of mapped tags and list of deleted tags and tags not added
remove_tag
def remove_tag(tag: str) -> bool
Remove a tag
Arguments:
tag
str - Tag to remove
Returns:
bool
- True if tag removed or False if not
is_subnational
def is_subnational() -> bool
Return if the dataset is subnational
Returns:
bool
- True if the dataset is subnational, False if not
set_subnational
def set_subnational(subnational: bool) -> None
Set if dataset is subnational or national
Arguments:
subnational
bool - True for subnational, False for national
Returns:
None
get_location_iso3s
def get_location_iso3s(
locations: Optional[ListTuple[str]] = None) -> List[str]
Return the dataset's location
Arguments:
locations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
Returns:
List[str]
- list of location iso3s
get_location_names
def get_location_names(
locations: Optional[ListTuple[str]] = None) -> List[str]
Return the dataset's location
Arguments:
locations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
Returns:
List[str]
- list of location names
add_country_location
def add_country_location(country: str,
exact: bool = True,
locations: Optional[ListTuple[str]] = None,
use_live: bool = True) -> bool
Add a country. If an iso 3 code is not provided, value is parsed and if it is a valid country name, converted to an iso 3 code. If the country is already added, it is ignored.
Arguments:
country
str - Country to addexact
bool - True for exact matching or False to allow fuzzy matching. Defaults to True.locations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.use_live
bool - Try to get use latest country data from web rather than file in package. Defaults to True.
Returns:
bool
- True if country added or False if country already present
add_country_locations
def add_country_locations(countries: ListTuple[str],
locations: Optional[ListTuple[str]] = None,
use_live: bool = True) -> bool
Add a list of countries. If iso 3 codes are not provided, values are parsed and where they are valid country names, converted to iso 3 codes. If any country is already added, it is ignored.
Arguments:
countries
ListTuple[str] - List of countries to addlocations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.use_live
bool - Try to get use latest country data from web rather than file in package. Defaults to True.
Returns:
bool
- True if all countries added or False if any already present.
add_region_location
def add_region_location(region: str,
locations: Optional[ListTuple[str]] = None,
use_live: bool = True) -> bool
Add all countries in a region. If a 3 digit UNStats M49 region code is not provided, value is parsed as a region name. If any country is already added, it is ignored.
Arguments:
region
str - M49 region, intermediate region or subregion to addlocations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.use_live
bool - Try to get use latest country data from web rather than file in package. Defaults to True.
Returns:
bool
- True if all countries in region added or False if any already present.
add_other_location
def add_other_location(location: str,
exact: bool = True,
alterror: Optional[str] = None,
locations: Optional[ListTuple[str]] = None) -> bool
Add a location which is not a country or region. Value is parsed and compared to existing locations in HDX. If the location is already added, it is ignored.
Arguments:
location
str - Location to addexact
bool - True for exact matching or False to allow fuzzy matching. Defaults to True.alterror
Optional[str] - Alternative error message to builtin if location not found. Defaults to None.locations
Optional[ListTuple[str]] - Valid locations list. Defaults to list downloaded from HDX.
Returns:
bool
- True if location added or False if location already present
remove_location
def remove_location(location: str) -> bool
Remove a location. If the location is already added, it is ignored.
Arguments:
location
str - Location to remove
Returns:
bool
- True if location removed or False if not
get_maintainer
def get_maintainer() -> "User"
Get the dataset's maintainer.
Returns:
User
- Dataset's maintainer
set_maintainer
def set_maintainer(maintainer: Union["User", Dict, str]) -> None
Set the dataset's maintainer.
Arguments:
maintainer
Union[User,Dict,str] - Either a user id or User metadata from a User object or dictionary.
Returns:
None
get_organization
def get_organization() -> "Organization"
Get the dataset's organization.
Returns:
Organization
- Dataset's organization
set_organization
def set_organization(organization: Union["Organization", Dict, str]) -> None
Set the dataset's organization.
Arguments:
organization
Union[Organization,Dict,str] - Either an Organization id or Organization metadata from an Organization object or dictionary.
Returns:
None
get_showcases
def get_showcases() -> List["Showcase"]
Get any showcases the dataset is in
Returns:
List[Showcase]
- List of showcases
add_showcase
def add_showcase(showcase: Union["Showcase", Dict, str],
showcases_to_check: ListTuple["Showcase"] = None) -> bool
Add dataset to showcase
Arguments:
showcase
Union[Showcase,Dict,str] - Either a showcase id or showcase metadata from a Showcase object or dictionaryshowcases_to_check
ListTuple[Showcase] - List of showcases against which to check existence of showcase. Defaults to showcases containing dataset.
Returns:
bool
- True if the showcase was added, False if already present
add_showcases
def add_showcases(showcases: ListTuple[Union["Showcase", Dict, str]],
showcases_to_check: ListTuple["Showcase"] = None) -> bool
Add dataset to multiple showcases
Arguments:
showcases
ListTuple[Union[Showcase,Dict,str]] - A list of either showcase ids or showcase metadata from Showcase objects or dictionariesshowcases_to_check
ListTuple[Showcase] - list of showcases against which to check existence of showcase. Defaults to showcases containing dataset.
Returns:
bool
- True if all showcases added or False if any already present
remove_showcase
def remove_showcase(showcase: Union["Showcase", Dict, str]) -> None
Remove dataset from showcase
Arguments:
showcase
Union[Showcase,Dict,str] - Either a showcase id string or showcase metadata from a Showcase object or dictionary
Returns:
None
is_requestable
def is_requestable() -> bool
Return whether the dataset is requestable or not
Returns:
bool
- Whether the dataset is requestable or not
set_requestable
def set_requestable(requestable: bool = True) -> None
Set the dataset to be of type requestable or not
Arguments:
requestable
bool - Set whether dataset is requestable. Defaults to True.
Returns:
None
get_fieldnames
def get_fieldnames() -> List[str]
Return list of fieldnames in your data. Only applicable to requestable datasets.
Returns:
List[str]
- List of field names
add_fieldname
def add_fieldname(fieldname: str) -> bool
Add a fieldname to list of fieldnames in your data. Only applicable to requestable datasets.
Arguments:
fieldname
str - Fieldname to add
Returns:
bool
- True if fieldname added or False if tag already present
add_fieldnames
def add_fieldnames(fieldnames: ListTuple[str]) -> bool
Add a list of fieldnames to list of fieldnames in your data. Only applicable to requestable datasets.
Arguments:
fieldnames
ListTuple[str] - List of fieldnames to add
Returns:
bool
- True if all fieldnames added or False if any already present
remove_fieldname
def remove_fieldname(fieldname: str) -> bool
Remove a fieldname. Only applicable to requestable datasets.
Arguments:
fieldname
str - Fieldname to remove
Returns:
bool
- True if fieldname removed or False if not
get_filetypes
def get_filetypes() -> List[str]
Return list of filetypes in your data
Returns:
List[str]
- List of filetypes
add_filetype
def add_filetype(filetype: str) -> bool
Add a filetype to list of filetypes in your data. Only applicable to requestable datasets.
Arguments:
filetype
str - filetype to add
Returns:
bool
- True if filetype added or False if tag already present
add_filetypes
def add_filetypes(filetypes: ListTuple[str]) -> bool
Add a list of filetypes to list of filetypes in your data. Only applicable to requestable datasets.
Arguments:
filetypes
ListTuple[str] - list of filetypes to add
Returns:
bool
- True if all filetypes added or False if any already present
remove_filetype
def remove_filetype(filetype: str) -> bool
Remove a filetype
Arguments:
filetype
str - Filetype to remove
Returns:
bool
- True if filetype removed or False if not
preview_off
def preview_off() -> None
Set dataset preview off
Returns:
None
preview_resource
def preview_resource() -> None
Set dataset preview on for an unspecified resource
Returns:
None
set_quickchart_resource
def set_quickchart_resource(
resource: Union["Resource", Dict, str, int]) -> "Resource"
Set the resource that will be used for displaying QuickCharts in dataset preview
Arguments:
resource
Union[Resource,Dict,str,int] - Either resource id or name, resource metadata from a Resource object or a dictionary or position
Returns:
Resource
- Resource that is used for preview or None if no preview set
quickcharts_resource_last
def quickcharts_resource_last() -> bool
Move the QuickCharts resource to be last. Assumes that it's name begins 'QuickCharts-'.
Returns:
bool
- True if QuickCharts resource found, False if not
create_default_views
def create_default_views(create_datastore_views: bool = False) -> None
Create default resource views for all resources in dataset
Arguments:
create_datastore_views
bool - Whether to try to create resource views that point to the datastore
Returns:
None
generate_quickcharts
def generate_quickcharts(
resource: Union["Resource", Dict, str, int] = 0,
path: Optional[str] = None,
bites_disabled: Optional[ListTuple[bool]] = None,
indicators: Optional[ListTuple[Dict]] = None,
findreplace: Optional[Dict] = None) -> resource_view.ResourceView
Create QuickCharts for the given resource in a dataset. If you do
not supply a path, then the internal indicators resource view template
will be used. You can disable specific bites by providing
bites_disabled, a list of 3 bools where True indicates a specific bite
is disabled and False indicates leave enabled. The parameter indicators
is a list with 3 dictionaries of form:
{"code": "MY_INDICATOR_CODE", "title": "MY_INDICATOR_TITLE",
"unit": "MY_INDICATOR_UNIT"}. Optionally, the following defaults can be
overridden in the parameter indicators: {"code_col": "indicator
+code",
"value_col": "indicator
+value+num", "date_col": "date
+year",
"date_format": "%Y", "aggregate_col": "null"}.
Creation of the resource view will be delayed until after the next dataset create or update if a resource id is not yet available and will be disabled if there are no valid charts to display.
Arguments:
resource
Union[Resource,Dict,str,int] - Either resource id or name, resource metadata from a Resource object or a dictionary or position. Defaults to 0.path
Optional[str] - Path to YAML resource view metadata. Defaults to None (config/hdx_resource_view_static.yaml or internal template).bites_disabled
Optional[ListTuple[bool]] - Which QC bites should be disabled. Defaults to None (all bites enabled).indicators
Optional[ListTuple[Dict]] - Indicator codes, QC titles and units for resource view template. Defaults to None (don't use template).findreplace
Optional[Dict] - Replacements for anything else in resource view. Defaults to None.
Returns:
resource_view.ResourceView
- The resource view if QuickCharts created, None is not
get_name_or_id
def get_name_or_id(prefer_name: bool = True) -> Optional[str]
Get dataset name or id eg. for use in urls. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.
Arguments:
prefer_name
bool - Whether name is preferred over id. Default to True.
Returns:
Optional[str]
- HDX dataset id or name or None if not available
get_hdx_url
def get_hdx_url(prefer_name: bool = True) -> Optional[str]
Get the url of the dataset on HDX or None if the dataset name and id fields are missing. If prefer_name is True, name is preferred over id if available, otherwise id is preferred over name if available.
Arguments:
prefer_name
bool - Whether name is preferred over id in url. Default to True.
Returns:
Optional[str]
- Url of the dataset on HDX or None if the dataset is missing fields
get_api_url
def get_api_url(prefer_name: bool = True) -> Optional[str]
Get the API url of the dataset on HDX
Arguments:
prefer_name
bool - Whether name is preferred over id in url. Default to True.
Returns:
Optional[str]
- API url of the dataset on HDX or None if the dataset is missing fields
remove_dates_from_title
def remove_dates_from_title(
change_title: bool = True,
set_reference_period: bool = False) -> List[Tuple[datetime, datetime]]
Remove dates from dataset title returning sorted the dates that were found in title. The title in the dataset metadata will be changed by default. The dataset's metadata field reference period will not be changed by default, but if set_reference_period is True, then the range with the lowest start date will be used to set the reference period field.
Arguments:
change_title
bool - Whether to change the dataset title. Defaults to True.set_reference_period
bool - Whether to set reference period from date or range in title. Defaults to False.
Returns:
List[Tuple[datetime,datetime]]
- Date ranges found in title
generate_resource_from_rows
def generate_resource_from_rows(folder: str,
filename: str,
rows: List[ListTupleDict],
resourcedata: Dict,
headers: Optional[ListTuple[str]] = None,
encoding: Optional[str] = None) -> "Resource"
Write rows to csv and create resource, adding it to the dataset. The headers argument is either a row number (rows start counting at 1), or the actual headers defined as a list of strings. If not set, all rows will be treated as containing values.
Arguments:
folder
str - Folder to which to write file containing rowsfilename
str - Filename of file to write rowsrows
List[ListTupleDict] - List of rows in dict or list formresourcedata
Dict - Resource dataheaders
Optional[ListTuple[str]] - List of headers. Defaults to None.encoding
Optional[str] - Encoding to use. Defaults to None (infer encoding).
Returns:
Resource
- The created resource
generate_qc_resource_from_rows
def generate_qc_resource_from_rows(
folder: str,
filename: str,
rows: List[Dict],
resourcedata: Dict,
hxltags: Dict[str, str],
columnname: str,
qc_identifiers: ListTuple[str],
headers: Optional[ListTuple[str]] = None,
encoding: Optional[str] = None) -> Optional["Resource"]
Generate QuickCharts rows by cutting down input rows by relevant identifiers and optionally restricting to certain columns. Output to csv and create resource, adding it to the dataset.
Arguments:
folder
str - Folder to which to write file containing rowsfilename
str - Filename of file to write rowsrows
List[Dict] - List of rows in dict formresourcedata
Dict - Resource datahxltags
Dict[str,str] - Header to HXL hashtag mappingcolumnname
str - Name of column containing identifierqc_identifiers
ListTuple[str] - List of ids to matchheaders
Optional[ListTuple[str]] - List of headers to output. Defaults to None (all headers).encoding
Optional[str] - Encoding to use. Defaults to None (infer encoding).
Returns:
Optional[Resource]
- The created resource or None
generate_resource_from_iterator
def generate_resource_from_iterator(
headers: ListTuple[str],
iterator: Iterator[Union[ListTuple, Dict]],
hxltags: Dict[str, str],
folder: str,
filename: str,
resourcedata: Dict,
datecol: Optional[Union[int, str]] = None,
yearcol: Optional[Union[int, str]] = None,
date_function: Optional[Callable[[Dict], Optional[Dict]]] = None,
quickcharts: Optional[Dict] = None,
encoding: Optional[str] = None) -> Tuple[bool, Dict]
Given headers and an iterator, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.
The reference period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the reference period and are returned in the results dictionary in keys startdate and enddate.
If the parameter quickcharts is supplied then various QuickCharts related actions will occur depending upon the keys given in the dictionary and the returned dictionary will contain the QuickCharts resource in the key qc_resource. If the keys: hashtag - the HXL hashtag to examine - and values - the 3 values to look for in that column - are supplied, then a list of booleans indicating which QuickCharts bites should be enabled will be returned in the key bites_disabled in the returned dictionary. For the 3 values, if the key: numeric_hashtag is supplied then if that column for a given value contains no numbers, then the corresponding bite will be disabled. If the key: cutdown is given, if it is 1, then a separate cut down list is created containing only columns with HXL hashtags and rows with desired values (if hashtag and values are supplied) for the purpose of driving QuickCharts. It is returned in the key qcrows in the returned dictionary with the matching headers in qcheaders. If cutdown is 2, then a resource is created using the cut down list. If the key cutdownhashtags is supplied, then only the provided hashtags are used for cutting down otherwise the full list of HXL tags is used.
Arguments:
headers
ListTuple[str] - Headersiterator
Iterator[Union[ListTuple,Dict]] - Iterator returning rowshxltags
Dict[str,str] - Header to HXL hashtag mappingfolder
str - Folder to which to write file containing rowsfilename
str - Filename of file to write rowsresourcedata
Dict - Resource datadatecol
Optional[Union[int,str]] - Date column for setting reference period. Defaults to None (don't set).yearcol
Optional[Union[int,str]] - Year column for setting dataset year range. Defaults to None (don't set).date_function
Optional[Callable[[Dict],Optional[Dict]]] - Date function to call for each row. Defaults to None.quickcharts
Optional[Dict] - Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtagsencoding
Optional[str] - Encoding to use. Defaults to None (infer encoding).
Returns:
Tuple[bool, Dict]: (True if resource added, dictionary of results)
download_and_generate_resource
def download_and_generate_resource(
downloader: BaseDownload,
url: str,
hxltags: Dict[str, str],
folder: str,
filename: str,
resourcedata: Dict,
header_insertions: Optional[ListTuple[Tuple[int, str]]] = None,
row_function: Optional[Callable[[List[str], Dict], Dict]] = None,
datecol: Optional[str] = None,
yearcol: Optional[str] = None,
date_function: Optional[Callable[[Dict], Optional[Dict]]] = None,
quickcharts: Optional[Dict] = None,
**kwargs: Any) -> Tuple[bool, Dict]
Download url, write rows to csv and create resource, adding to it the dataset. The returned dictionary will contain the resource in the key resource, headers in the key headers and list of rows in the key rows.
Optionally, headers can be inserted at specific positions. This is achieved using the header_insertions argument. If supplied, it is a list of tuples of the form (position, header) to be inserted. A function is called for each row. If supplied, it takes as arguments: headers (prior to any insertions) and row (which will be in dict or list form depending upon the dict_rows argument) and outputs a modified row.
The reference period can optionally be set by supplying a column in which the date or year is to be looked up. Note that any timezone information is ignored and UTC assumed. Alternatively, a function can be supplied to handle any dates in a row. It should accept a row and should return None to ignore the row or a dictionary which can either be empty if there are no dates in the row or can be populated with keys startdate and/or enddate which are of type timezone-aware datetime. The lowest start date and highest end date are used to set the reference period and are returned in the results dictionary in keys startdate and enddate.
If the parameter quickcharts is supplied then various QuickCharts related actions will occur depending upon the keys given in the dictionary and the returned dictionary will contain the QuickCharts resource in the key qc_resource. If the keys: hashtag - the HXL hashtag to examine - and values - the 3 values to look for in that column - are supplied, then a list of booleans indicating which QuickCharts bites should be enabled will be returned in the key bites_disabled in the returned dictionary. For the 3 values, if the key: numeric_hashtag is supplied then if that column for a given value contains no numbers, then the corresponding bite will be disabled. If the key: cutdown is given, if it is 1, then a separate cut down list is created containing only columns with HXL hashtags and rows with desired values (if hashtag and values are supplied) for the purpose of driving QuickCharts. It is returned in the key qcrows in the returned dictionary with the matching headers in qcheaders. If cutdown is 2, then a resource is created using the cut down list. If the key cutdownhashtags is supplied, then only the provided hashtags are used for cutting down otherwise the full list of HXL tags is used.
Arguments:
downloader
BaseDownload - A Download or Retrieve objecturl
str - URL to downloadhxltags
Dict[str,str] - Header to HXL hashtag mappingfolder
str - Folder to which to write file containing rowsfilename
str - Filename of file to write rowsresourcedata
Dict - Resource dataheader_insertions
Optional[ListTuple[Tuple[int,str]]] - List of (position, header) to insert. Defaults to None.row_function
Optional[Callable[[List[str],Dict],Dict]] - Function to call for each row. Defaults to None.datecol
Optional[str] - Date column for setting reference period. Defaults to None (don't set).yearcol
Optional[str] - Year column for setting dataset year range. Defaults to None (don't set).date_function
Optional[Callable[[Dict],Optional[Dict]]] - Date function to call for each row. Defaults to None.quickcharts
Optional[Dict] - Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtags**kwargs
- Any additional args to pass to downloader.get_tabular_rows
Returns:
Tuple[bool, Dict]: (True if resource added, dictionary of results)