RequiredDataValidator

Required data validation checks if certain models, variables, regions and/or periods of time are covered in the timeseries data.

For this, a configuration file specifies the model(s) and dimension(s) expected in the dataset. These are variable, region and/or year. Alternatively, instead of using variable, it is possible to declare measurands, which jointly specify variables and units.

description: Required variables for running MAGICC
model: model_a
required_data:
  - measurand:
      Emissions|CO2:
        unit: Mt CO2/yr
    region: World
    year: [2020, 2030, 2040, 2050]

In the example above, for model_a, the dataset must include datapoints of the variable Emissions|CO2 (measured in Mt CO2/yr), in the region World, for the years 2020, 2030, 2040 and 2050.

Standard usage

from nomenclature import RequiredDataValidator

# ...setting directory/file paths and loading dataset

RequiredDataValidator.from_file(yaml_file_containing_required_data).apply(df)
class nomenclature.RequiredDataValidator(*, input_data=None, input_meta=None, output_data=None, output_meta=None, fail_ok=False, description=None, model=None, required_data, file)[source]

Processor for validating required dimensions in IAMC datapoints

Methods

apply(df)

Validates data in IAMC format according to required models and dimensions.

check_required_data_per_model(df, model)

Check which required data is missing for a single model.

from_file(file)

Create a RequiredDataValidator from a YAML file.

validate_with_definition(dsd)

Validate the required data specification against a DataStructureDefinition.

apply(df)[source]

Validates data in IAMC format according to required models and dimensions.

Parameters:
dfpyam.IamDataFrame

Data in IAMC format to be validated

Returns:
pyam.IamDataFrame
Raises:
ValueError if any required dimension is not found in the data
check_required_data_per_model(df, model)[source]

Check which required data is missing for a single model.

Parameters:
dfpyam.IamDataFrame

Data in IAMC format to check.

modelstr

Model name to filter the data for.

Returns:
list of pandas.DataFrame

List of DataFrames describing missing data, one per unfulfilled requirement. Empty if all requirements are satisfied.

classmethod from_file(file)[source]

Create a RequiredDataValidator from a YAML file.

Parameters:
filepathlib.Path or str

Path to the YAML file containing the required data specification.

Returns:
RequiredDataValidator
validate_with_definition(dsd)[source]

Validate the required data specification against a DataStructureDefinition.

Checks that all variables, regions, and units referenced in the required data exist in the provided definition.

Parameters:
dsdDataStructureDefinition

Data structure definition to validate against.

Raises:
ExceptionGroup

If any required data item references unknown variables, regions, or units.