DataValidator¶
Data validation checks if timeseries data values are within specified ranges.
Consider the example below:
- variable: Primary Energy
year: 2010
validation:
- upper_bound: 5
lower_bound: 1
- warning_level: low
upper_bound: 2.5
lower_bound: 1
- variable: Primary Energy|Gas
year: 2010
value: 2
validation:
- rtol: 0.5
- warning_level: low
rtol: 0.1
- variable: Primary Energy|Coal
year: 2010
value: 2
atol: 1
Each criteria item contains data filter arguments and validation arguments.
Data filter arguments include: model, scenario, region, variable,
unit, and year.
For the first criteria item, the data is filtered for variable Primary Energy
and year 2010.
The validation arguments include: upper_bound/lower_bound or
value/rtol/atol (relative tolerance, absolute tolerance). Only one
of the two can be set for each warning_level.
The possible levels are: error, high, medium, or low.
For the same data filters, multiple warning levels with different criteria each
can be set. These must be listed in descending order of severity, otherwise a
ValidationError is raised.
In the example, for the first criteria item, the validation arguments are set
for warning level error (by default, in case of omission) and low,
using explicit upper and lower bounds.
Flagged datapoints are skipped for lower severity warnings in the same criteria
item (e.g.: if datapoints are flagged for the error level, they will not be
checked again for low).
Validation arguments that are identical for all warning levels can be set once as part
of the filter arguments. This is illustrated in the second criteria item, where the
relevant data values must be within a 50% range to value (i.e., between 1 and 3)
with warning level error and within a 10% range (i.e., between 1.8 and 2.2) with
warning level low.
The third criteria item (for variable Primary Energy|Coal) uses short-hand notation
where all filter and validation arguments are given as a simple dictionary.
This notation can be used if there are no multiple warning levels for the same filters.
The specific example indicates that the relevant data values must be within an absolute
tolerance of 1 to the value of 2, with warning level error by default.
Standard usage¶
from nomenclature import DataValidator
# ...setting directory/file paths and loading dataset
DataValidator.from_file(yaml_file_containing_data_validation_criteria).apply(df)
- class nomenclature.DataValidator(*, input_data=None, input_meta=None, output_data=None, output_meta=None, fail_ok=False, criteria_items, file, output_path=None)[source]¶
Processor for validating IAMC datapoints
Methods
apply(df)Validates data in IAMC format according to specified criteria.
from_codelist(codelist[, output_path])Create a
DataValidatorfrom aVariableCodeList.from_file(file[, output_path])Create a
DataValidatorfrom a YAML file.Validate the criteria items against a
DataStructureDefinition.- apply(df)[source]¶
Validates data in IAMC format according to specified criteria.
Logs warning/error messages for each criterion that is not met.
- Parameters:
- dfpyam.IamDataFrame
Data in IAMC format to be validated
- Returns:
- pyam.IamDataFrame
- Raises:
ValueErrorif any criterion has a warning level oferror
- classmethod from_codelist(codelist, output_path=None)[source]¶
Create a
DataValidatorfrom aVariableCodeList.Extracts validation criteria from variables in the codelist that define bounds or tolerance ranges.
- Parameters:
- codelistVariableCodeList
Variable codelist containing validation arguments.
- output_path
pathlib.Path, optional Path to write an Excel file with all flagged datapoints.
- Returns:
- DataValidator
- classmethod from_file(file, output_path=None)[source]¶
Create a
DataValidatorfrom a YAML file.- Parameters:
- file
pathlib.Pathor str Path to the YAML file containing the validation criteria.
- output_path
pathlib.Pathor str, optional Path to write an Excel file with all flagged datapoints.
- file
- Returns:
- DataValidator
- validate_with_definition(dsd)[source]¶
Validate the criteria items against a
DataStructureDefinition.Checks that all variables and regions referenced in the criteria exist in the provided definition.
- Parameters:
- dsdDataStructureDefinition
Data structure definition to validate against.
- Raises:
- ExceptionGroup
If any criteria item references unknown variables or regions.