.. _model_mapping:

.. currentmodule:: nomenclature

Region processing using model mappings
======================================

The **nomenclature** package supports automated region aggregation as part of a
scenario processing workflow. The instructions for region aggregation are provided
as a *model mapping*.

The region-processing supports multiple methods for aggregation of data, including
summation and (weighted) average across regions. The method can be specified
for each variable via the codelist, see :ref:`region_aggregation_attributes`.

Model mapping format specification
----------------------------------

This example illustrates a model mapping:

.. code:: yaml

  model: Model A v1.0
  native_regions:
    - region_a: alternative_name_a
    - region_b
  common_regions:
    - common_region_1:
      - region_a
      - region_b
    - common_region_2:
      - ...
  exclude_regions:
    - region_c
    - ... 

The properties *model* and (at least) one of *native_regions* and *common_regions* are
required in a valid model mapping. See :ref:`region` for more information.

*  *model* (str or list of str): the model name(s) for which the mapping applies.
*  *native_regions* (list): a list of model native regions serves as
   a selection as to which regions to keep.

   *  In the above example *region_a* is to be renamed to
      *alternative_name_a*. This is done by defining a key-value pair
      of *model_native_name: new_name*.
   *  *region_b* is selected but the name is not changed.
   *  Assuming *model_a* also defines a third region *region_c*,
      since it is not mentioned it will be **dropped** from the data.

*  *common_regions* (list): list of common regions which will be computed as aggregates.
   They are defined as list entries which themselves have a list of constituent regions.
   These constituent regions must be model native regions.

   The names of the constituent regions **must** refer to the **original** model native
   region names, i.e., *region_a* and *region_b*, **not** *alternative_name_a*
   in the example shown above.

* *exclude_regions* optional (list of str): If input data for region processing contains
  regions which are not mentioned in *native_regions*, in *common_regions* (as the name
  of a common region or a constituent region) an error will be raised. This is a
  safeguard against silently dropping regions which are not in named in *native_regions*
  or *common_regions*. 
  
  If regions are to be excluded, they can be explicitly named in the *exclude_regions*
  section which causes their presence to no longer raise an error.

Region aggregation
------------------

In order to illustrate how region aggregation is performed, consider the following model
mapping:

.. code:: yaml

   model: model_a  
   common_regions:
     - common_region_1:
       - region_a
       - region_b

If the data provided for region aggregation contains results for *common_region_1* they
are compared and combined according to the following logic:

1. If a variable is **not** reported for *common_region_1*, it is calculated through
   region aggregation of regions *region_a* and *region_b*.
2. If a variable is **only** reported for *common_region_1* level it is used directly.
3. If a variable is reported for *common_region_1* **as well as** *region_a* and
   *region_b*. The **provided results** take **precedence** over the aggregated ones.
   Additionally, the aggregation is computed and compared to the provided results. If
   there are discrepancies, a warning is written to the logs.

   .. note::

      Please note that in case of differences no error is raised. Therefore it is
      necessary to check the logs to find out if there were any differences. This is
      intentional since some differences might be expected.

Computing differences between original and aggregated data
----------------------------------------------------------

In order to get the differences between the original data (e.g., results reported by the model)
and the data aggregated according to the region mapping, perform the following steps:

1. Make sure you have ``pyam-iamc >= 1.7.0`` and ``nomenclature-iamc>=0.10.0`` installed.
2. Clone the workflow directory of your project
3. Navigate to the workflow directory
4. Using a Jupyter notebook or Python script run the following:

.. code:: python

  from pyam import IamDataFrame
  from nomenclature import DataStructureDefinition, RegionProcessor

  data = IamDataFrame("/path/to/your/input/data.xlsx")

  dsd = DataStructureDefinition("definitions")
  processor = RegionProcessor.from_directory("mappings", dsd)

  # get the differences as a pandas dataframe
  # the value for the relative tolerances can be adjusted, defaults to 0.01
  processed_data, differences = processor.check_region_aggregation(data, rtol_difference=0.01)
  # save the result of the region processing
  processed_data.to_excel("results.xlsx")
  # and the differences
  differences.to_excel("differences.xlsx")

Please refer to :py:meth:`RegionProcessor.check_region_aggregation` for details.

Alternatively you can also use the nomenclature cli:

.. code-block:: bash

  $ nomenclature check-region-aggregation /path/to/your/input/data.xlsx
  -w workflow_directory --processed_data results.xlsx --differences differences.xlsx

For cli details please refer to :ref:`cli`.