.. _tutorial-simple-energy-system:

Simplified energy system model
==============================

This tutorial illustrates, step by step, how to build a simple energy system
optimization model with CVXlab. The tutorial mirrors the workflow described 
in :ref:`model generation from scratch <model_generation_from_scratch>`, so that 
the transition from conceptual design to numerical solution stays visible 
throughout the documentation. 

This tutorial is the right place to start if you are new to CVXlab.


.. rubric:: Problem statement

Let us consider the following energy system planning problem, applied to a generic 
region. The goal is to define the *least-cost energy production plan* over a defined 
time horizon, considering the following assumptions:

- The energy demand is assumed to be known in advance over the whole time horizon
  (i.e., perfect foresight).
- The energy can be supplied by a number of available technologies, each characterized 
  by known values for: 
  
  - installed capacities (MW, variable over time).
  - specific production costs (€/kWh, constant).
  - availabilities (i.e. values able to convert installed capacity in MW to energy supplied 
    in MWh, assumed as constants).


Conceptual model definition
---------------------------

Related user guide step: :ref:`conceptual-model-definition`


.. rubric:: Defining Sets

Sets defined for the model are summarized in the table below. 

.. list-table:: Sets defining model's domain
  :header-rows: 1

  * - Set name
    - Symbol
    - Coordinates
    - Cardinality
    - Set type
  * - Technologies
    - :math:`t`
    - Solar, Gas, Nuclear
    - 3
    - Dimension
  * - Time periods
    - :math:`y`
    - 2025, 2026, 2027, 2028, 2029, 2030
    - 6
    - Dimension
  * - Demand scenarios
    - :math:`d`
    - Low_demand, High_demand
    - 2
    - Inter-problem

Notice that:

- Inter-problem sets (:math:`d`) define multiple problem instances.  
  This implies that one optimization problem is generated and solved for each 
  combination of demand scenario (in this case, only :math:`2` problem instances).
- Dimension sets (:math:`t`, :math:`y`) are used to define the scope of data tables 
  and the shapes of related variables.
- Coordinates of each set can be associated to filters to define sub-domains. As 
  example, the *technologies* set may classify technologies as *renewable* and
  *non-renewable*, allowing to define variables with sub-domains including only 
  specific categories. In this simplified example, all variables are defined over 
  *full domains* (no filtering is applied).


.. rubric:: Defining Data Tables and related Variables

The following tables summarizes the Data Tables and associated variables for the 
energy system model. 

.. list-table:: Data tables properties
  :header-rows: 1

  * - Type
    - Name
    - Domain [Cardinality]
    - Description
  * - Exogenous
    - :math:`cost(t)`
    - :math:`t - [3]`
    - Specific costs of generation by cost scenario and technology (in *€/MWh*).
  * - Exogenous
    - :math:`capacity(t,y)`
    - :math:`t \times y - [3 \times 6 = 18]`
    - Installed capacity by technology and time period (in *MW*).
  * - Exogenous
    - :math:`availability(t)`
    - :math:`t - [3]`
    - Availability factors by technology (in *MWh/MW*).
  * - Exogenous
    - :math:`demand(d,y)`
    - :math:`d \times y - [2 \times 1 \times 6 = 12]`
    - Energy demand defined by demand scenarios and time periods.
  * - Endogenous
    - :math:`supply(d,y,t)`
    - :math:`d \times y \times t - [2 \times 6 \times 3 = 72]`
    - Energy supply defined by demand and cost scenarios, technology and time period.
  * - Constant
    - :math:`constant(t)`
    - :math:`t - [3]`
    - Model constants defined based on the shape of :math:`t` set.


Regarding data tables above:

- Endogenous data table has a domain defined over all model sets, while exogenous 
  data tables are defined over specific sets.
- For each data table, the cardinality (i.e. the total number of data entries) is 
  reported, calculated as the product of the cardinalities of all sets in the domain. 
  As example, the `availability(t)` data table includes 3 entries only, one for each 
  technology, due to its domain defined over the *technologies* set only.


.. list-table:: Variables properties
  :header-rows: 1

  * - Related data table
    - Variable name
    - Shape (rows,columns)
    - Intra-problem sets
    - Inter-problem sets
  * - :math:`cost(t)`
    - :math:`c`
    - :math:`1, t - [1, 3]`
    - :math:`-`
    - :math:`-`
  * - :math:`capacity(t, y)`
    - :math:`cap`
    - :math:`1, t - [1, 3]`
    - :math:`y - [6]`
    - :math:`-`
  * - :math:`availability(d, y)`
    - :math:`av`
    - :math:`1, t - [1, 3]`
    - :math:`-`
    - :math:`-`
  * - :math:`demand(d, y)`
    - :math:`E_d`
    - :math:`1, 1 - [1, 1]`
    - :math:`y - [6]`
    - :math:`d - [2]`
  * - :math:`supply(d,y,t)`
    - :math:`E_s`
    - :math:`1, t - [1, 3]`
    - :math:`y - [6]`
    - :math:`d - [2]`
  * - :math:`consant(t)`
    - :math:`i_t`
    - :math:`t, 1 - [3, 1]`
    - :math:`-`
    - :math:`-`


Regarding variables above:

- Each variable stem from a related data table, inheriting its properties: the 
  domain (defined by sets) and the data type (exogenous, endogenous, constant).
- Each variable is characterized by a specific allocation of dimensions sets 
  into shapes and intra-problem sets. As example, the energy supply `E_s` variable 
  has 1 row and 3 columns (defined by the *technologies* set), it is indexed over 
  6 intra-problem coordinates (defined by the *time periods* set) and over 2 
  inter-problem coordinates (defined by the *demand scenarios* set).
- Constants can be defined with different built-in or user defined types (see 
  :ref:`api_constants_types`). In the example above, the `i_t` variable is
  defined as a *summation vector*, consisting in a column vector of 1s, useful to 
  perform summations by matrix multiplications.


.. rubric:: Defining Problem and related Expressions

For the current energy system model, a symbolic problem can be defined as a linear 
optimization problem as follows.

.. math::
  \begin{aligned}
  \min_{E_s} \quad & c \cdot E_s' & \forall \, y\\
  \text{s.t.} \quad & E_s \cdot i_t \geq E_d & \forall \, y \\
  & E_s \leq cap \cdot \widehat{av} & \forall \, y \\
  & E_s \geq 0 & \forall \, y
  \end{aligned}


Notice that:

- The problem is defined a number of times equal to the cardinality of the inter-problem 
  set. Specifically, one problem instance is defined and solved for each energy demand 
  scenarios :math:`d`. In case of multiple inter-problem sets, the problem is defined
  for each coordinate combination in the Cartesian product of all inter-problem sets.
- For each simbolic expression, a number of numerical expressions is generated, equal 
  to the Cartesian product of all intra-problem sets of the related variables. 
  In this case, all expressions are defined over the intra-problem set *time periods* 
  :math:`y`, generating one numerical expression per time period for all symbolic 
  expressions.
- In case of variables defined over different intra-problem sets, automatic broadcasting 
  is applied, Variables not defined over specific intra-problem sets are automatically 
  reused across all generated numerical expressions.
- In this problem, the dot operator :math:`\cdot` represents matrix multiplication, 
  the :math:`\widehat{(*)}` is the diagonalization operator, and the :math:`(*)'` represents 
  the transposition operator (see :ref:`api_symbolic_operators` for a comprehensive 
  description of built-in symbolic operators).


A note on dimensional formulations:

The allocation of dimension sets to shapes and intra-problem sets offers significant 
modeling flexibility. The same problem can be formulated in multiple equivalent ways.

**Matrix-based formulation (as in the example above):**

- Expressions must be dimensionally consistent, and variables shapes must be 
  compatible for matrix operations. Multiple variables can stem from the same data 
  table, each characterized by different allocations of dimension sets: this allows
  for flexible model definitions.
- Expressions works with matrix operations (multiplication, transposition, ...).
- Compact symbolic representation with fewer expression instances.
- Potentially more efficient numerical problem generation and solution.

**Scalar-based formulation (extreme case):**

All dimension sets can be allocated as *intra-problem sets*, reducing all variables 
to scalars (shape :math:`(1,1)`). In this case, for each energy demand scenario :math:`d`,
the problem can be reformulated as:

.. math::
  \begin{aligned}
  \min_{E_s} \quad & \sum_{t} c \cdot E_s \\
  \text{s.t.} \quad & \sum_{t} E_s \geq E_d & \forall \, y \\
  & E_s \leq cap \cdot av & \forall \, t \, y \\
  & E_s \geq 0 & \forall \, t \, y
  \end{aligned}

where all variables become scalars indexed over :math:`t` and :math:`y`.


Generation of model directory
-----------------------------

Related user guide step: :ref:`generation-of-model-directory`

At this stage the conceptual model is already defined. The next step is to
create a model directory that will contain the setup files, the sets workbook,
the input-data files, and the SQLite database of the tutorial model.

For this tutorial, a compact Excel-based workflow is convenient because all
setup information can be stored in a single workbook.

.. code-block:: python

    import cvxlab

    cvxlab.create_model_dir(
        model_dir_name="simple_energy_model",
        main_dir_path="path/to/tutorial_workspace",
        settings_file_type="xlsx",
        include_user_defined_templates=False,
    )

For the simple energy system model, this step typically creates:

- A model directory named ``simple_energy_model``.
- A ``model_settings.xlsx`` workbook with sheets for sets, variables, and
  problems.
- The directory structure expected by the following tutorial steps.

If you prefer YAML files instead of Excel, the same tutorial structure still
applies. Only the format of the setup files changes.


.. _simple-tutorial-fill-model-setup-files:

Step 3. Fill the model setup files
----------------------------------

Related user guide step: :ref:`fill-model-setup-files`

In this step, the conceptual structure of the energy system model is translated
into CVXlab setup files. The same information can be written either in
``model_settings.xlsx`` or in the YAML files generated by
:py:func:`cvxlab.create_model_dir`.


Sets structure
~~~~~~~~~~~~~~

For the tutorial model, the three sets can be represented as follows in YAML:

.. code-block:: yaml

    Demand_scenarios:
        description: demand levels corresponding to different scenarios
        split_problem: true

    Technologies:
        description: technologies available in the system

    Time_periods:
        description: time periods considered in the model

The ``Demand_scenarios`` set is marked with ``split_problem: true`` because the
model must be solved independently for each scenario. The other two sets define
the internal dimensions of variables.


Data tables and variables
~~~~~~~~~~~~~~~~~~~~~~~~~

The structural definition of data tables and variables can be organized as
follows:

.. code-block:: yaml

    cost:
        description: specific generation costs by technology (EUR/MWh)
        type: exogenous
        coordinates: [Technologies]
        variables_info:
            c:
                Technologies:
                    dim: cols

    capacity:
        description: installed capacity by technology and time period (MW)
        type: exogenous
        coordinates: [Technologies, Time_periods]
        variables_info:
            cap:
                Technologies:
                    dim: cols
                Time_periods:
                    dim: intra

    availability:
        description: availability factors by technology (MWh/MW)
        type: exogenous
        coordinates: [Technologies]
        variables_info:
            av:
                Technologies:
                    dim: cols

    demand:
        description: energy demand by scenario and time period (MWh)
        type: exogenous
        coordinates: [Demand_scenarios, Time_periods]
        variables_info:
            E_d:
                Time_periods:
                    dim: intra

    supply:
        description: energy supply by scenario, technology, and time period (MWh)
        type: endogenous
        coordinates: [Demand_scenarios, Technologies, Time_periods]
        variables_info:
            E_s:
                Technologies:
                    dim: cols
                Time_periods:
                    dim: intra

    constant:
        description: model constants
        type: constant
        coordinates: [Technologies]
        variables_info:
            i_t:
                value: sum_vector
                Technologies:
                    dim: rows

The resulting symbolic variables are summarized below.

.. list-table:: Variables used in the tutorial model
  :header-rows: 1

  * - Related data table
    - Variable
    - Shape
    - Intra-problem sets
    - Inter-problem sets
  * - :math:`cost(t)`
    - :math:`c`
    - :math:`1 \times t`
    - :math:`-`
    - :math:`-`
  * - :math:`capacity(t,y)`
    - :math:`cap`
    - :math:`1 \times t`
    - :math:`y`
    - :math:`-`
  * - :math:`availability(t)`
    - :math:`av`
    - :math:`1 \times t`
    - :math:`-`
    - :math:`-`
  * - :math:`demand(d,y)`
    - :math:`E_d`
    - :math:`1 \times 1`
    - :math:`y`
    - :math:`d`
  * - :math:`supply(d,y,t)`
    - :math:`E_s`
    - :math:`1 \times t`
    - :math:`y`
    - :math:`d`
  * - :math:`constant(t)`
    - :math:`i_t`
    - :math:`t \times 1`
    - :math:`-`
    - :math:`-`


Problem definition
~~~~~~~~~~~~~~~~~~

The optimization problem can be represented in ``problem.yml`` as:

.. code-block:: yaml

    energy_system:
        objective:
            - Minimize(c @ tran(E_s))
        expressions:
            - E_s @ i_t >= E_d
            - E_s <= cap @ diag(av)
            - E_s >= 0

This structure keeps the tutorial aligned with the conceptual formulation
introduced in :ref:`simple-tutorial-conceptual-model-definition`.


.. _simple-tutorial-generate-model-instance:

Step 4. Generate the Model instance
-----------------------------------

Related user guide step: :ref:`generate-model-class-instance`

Once the setup files are filled, the tutorial model can be loaded into a CVXlab
``Model`` instance. This object will then be used for all remaining steps.


Typical initialization
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import cvxlab

    model = cvxlab.Model(
        model_dir_name="simple_energy_model",
        main_dir_path="path/to/tutorial_workspace",
        model_settings_from="xlsx",
        use_existing_data=False,
    )


What to expect
~~~~~~~~~~~~~~

At this point CVXlab validates the structure of:

- The three sets of the tutorial model.
- The exogenous, endogenous, and constant data tables.
- The symbolic variables associated with those tables.
- The problem definition stored in the setup files.

If validation succeeds, the model directory is ready for the next operational
step. In a workflow from scratch, the most important generated artifact is the
``sets.xlsx`` file, which is filled in the next step of this tutorial.


.. _simple-tutorial-fill-sets-data:

Step 5. Fill the sets workbook
------------------------------

Related user guide step: :ref:`fill-sets-data`

After the ``Model`` instance is created, CVXlab generates a workbook for the
set coordinates. For the simple energy system model, the coordinates are the
actual items over which scenarios, technologies, and time periods are defined.


Coordinates of the tutorial model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table:: Sets of the simple energy system model
  :header-rows: 1

  * - Set name
    - Symbol
    - Coordinates
    - Cardinality
    - Set type
  * - Technologies
    - :math:`t`
    - Solar, Gas, Nuclear
    - 3
    - Dimension
  * - Time periods
    - :math:`y`
    - 2025, 2026, 2027, 2028, 2029, 2030
    - 6
    - Dimension
  * - Demand scenarios
    - :math:`d`
    - Low_demand, High_demand
    - 2
    - Inter-problem


How these coordinates are used
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- The demand-scenario coordinates create two independent problem instances.
- The technology coordinates define the columns of the main decision variable.
- The time-period coordinates define the intra-problem expansion of the
  expressions.

No filters are required for this first tutorial model, so all variables are
defined on full domains.


.. _simple-tutorial-data-structures-init:

Step 6. Initialize the data structures
--------------------------------------

Related user guide step: :ref:`data-structures-init`

Once the set coordinates are available, CVXlab can generate the underlying data
structures required by the tutorial model.


Typical command
~~~~~~~~~~~~~~~

.. code-block:: python

    model.initialize_model_environment()


What this prepares for the tutorial
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For the simple energy system model, this step:

- Loads the set coordinates into the model index.
- Assigns coordinates and dimensions to the tables ``cost``, ``capacity``,
  ``availability``, ``demand``, ``supply``, and ``constant``.
- Creates a blank SQLite database with normalized tables.
- Generates blank input-data file(s) for the exogenous tables that will be
  filled in the next step.

After this step, the model structure is complete and ready to receive numerical
input data.


.. _simple-tutorial-fill-exogenous-data:

Step 7. Fill the exogenous data
-------------------------------

Related user guide step: :ref:`fill-exogenous-data`

The blank input-data files generated by CVXlab must now be populated with the
exogenous data of the tutorial model. Only exogenous tables are filled by the
user at this stage.


Input tables to populate
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table:: Exogenous data tables of the tutorial model
  :header-rows: 1

  * - Data table
    - Domain
    - Description
  * - :math:`cost(t)`
    - :math:`t`
    - Specific generation costs by technology in EUR/MWh
  * - :math:`capacity(t,y)`
    - :math:`t \times y`
    - Installed capacity by technology and time period in MW
  * - :math:`availability(t)`
    - :math:`t`
    - Availability factor by technology in MWh/MW
  * - :math:`demand(d,y)`
    - :math:`d \times y`
    - Energy demand by scenario and time period in MWh


Important distinction
~~~~~~~~~~~~~~~~~~~~~

- ``cost``, ``capacity``, ``availability``, and ``demand`` are filled by the
  user because they are exogenous inputs.
- ``supply`` is not filled manually because it is endogenous and will be solved
  by the optimizer.
- ``constant`` is not an external input table in the usual sense: it is defined
  structurally through the setup files and used to build symbolic expressions.


.. _simple-tutorial-numerical-problem-init:

Step 8. Initialize the numerical problem
----------------------------------------

Related user guide step: :ref:`numerical-problem-init`

At this point the symbolic model and the exogenous data are both available, so
CVXlab can generate the numerical optimization problem.


Typical command
~~~~~~~~~~~~~~~

.. code-block:: python

    model.refresh_database_and_initialize_problem()


Expressions generated for the tutorial
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The symbolic problem of the simple energy system model is:

.. math::
  \begin{aligned}
  \min_{E_s} \quad & c \cdot E_s' & \forall \, y\\
  \text{s.t.} \quad & E_s \cdot i_t \geq E_d & \forall \, y \\
  & E_s \leq cap \cdot \widehat{av} & \forall \, y \\
  & E_s \geq 0 & \forall \, y
  \end{aligned}

During initialization:

- One numerical problem is built for each demand scenario.
- One numerical expression instance is generated for each time period.
- Variables not indexed on ``Time_periods`` are broadcast across the generated
  expression instances.

The result of this step is a CVXPY-ready representation of the energy planning
problem for all scenarios of the tutorial model.


.. _simple-tutorial-numerical-problem-run:

Step 9. Solve the numerical problem
-----------------------------------

Related user guide step: :ref:`numerical-problem-run`

The energy system tutorial defines a standard convex optimization problem, so
the numerical problem can now be solved directly once initialization is
complete.


Typical command
~~~~~~~~~~~~~~~

.. code-block:: python

    model.run_model(
        integrated_problems=False,
        solver="ECOS",
    )


What happens in this tutorial
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- The model is solved independently for each demand scenario.
- For each scenario, the optimizer computes the least-cost feasible value of
  the endogenous supply variable :math:`E_s`.
- The solution covers all technologies and all time periods defined in the sets
  workbook.

Since this is a single linear optimization problem, no iterative decomposition
is required in the basic tutorial workflow.


.. _simple-tutorial-export-model-results:

Step 10. Export the results
---------------------------

Related user guide step: :ref:`export-model-results`

Once the optimization problem has been solved, the endogenous values can be
written back to the SQLite database for inspection and reporting.


Typical command
~~~~~~~~~~~~~~~

.. code-block:: python

    model.load_results_to_database()


Main result of the tutorial
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The key exported table is the endogenous supply table:

.. math::
  supply(d,y,t)

This table stores the optimal energy supplied by each technology, for each time
period, and for each demand scenario. After export, the results can be explored
through the CVXlab utilities, direct SQLite inspection, or downstream reporting
tools.


Practical example
-----------------

Let us consider a model with Set structure defined as below. Notice that it is 
possible to define Set structure in both the ``structure_sets.yml`` file, or in 
the ``structure_sets`` tab in the ``settings.xlsx`` Excel file. The following
tabs show the same Set structure in both formats.

.. tabs::

  .. tab:: YAML

    .. code-block:: yaml

        Scenarios:
            description: Scenarios analyzed in the model
            split_problem: True
        
        Technologies:
            description: Technologies included in the model
            filters:
                Type: [Supply, Demand, Storage]
                Category: [Renewable, Non-renewable]
            aggregations: [Sectors]

  .. tab:: XLSX (tab ``structure_sets``)

    .. list-table::
        :header-rows: 1
        :align: center

        * - set_key
          - description
          - split_problem
          - filters
          - aggregations
        * - Scenarios
          - Scenarios analyzed in the model
          - True
          -
          -
        * - Technologies
          - Technologies included in the model
          -
          - Type: [Supply, Demand, Storage], Category: [Renewable, Non-renewable]
          - Sectors


The tabs of ``sets.xlsx`` file are reported below. The header will be 
automatically generated based on the set definition, while the entries are defined 
by the user.


.. tabs::

    .. tab:: tab ``_set_SCENARIOS``

      .. list-table:: 
        :header-rows: 1
        :align: center

        * - Scenarios_Name
        * - Business As Usual
        * - Net Zero emissions
        * - Stated Policies

    .. tab:: tab ``_set_TECHNOLOGIES``

      .. list-table:: 
        :header-rows: 1
        :align: center

        * - Technologies_Name
          - Technologies_Type
          - Technologies_Category
          - Technologies_Sector
        * - Power by Coal
          - Supply
          - Non-renewable
          - Power sector
        * - Power by Solar
          - Supply
          - Renewable
          - Power sector
        * - Boiler
          - Supply
          - 
          - Heat sector
        * - Batteries
          - Storage
          - 
          - Power sector
        * - Households
          - Demand
          - 
          - Demand

In the example above, the unused fields in the structure file(s) have been omitted 
(e.g., ``copy_from`` for the all Sets).