Conceptual model definition#

The CVXlab modeling process must be grounded on a solid conceptualization and mathematical definition of the problem to be solved. As its name suggests, CVXlab is primarily designed for convex optimization problems.

Definition of convex optimization problems and related mathematical concepts lies outside the scope of this documentation. Fundational knowledge of Opearations Research and can be found in several references. Among others, we suggest the textbook: Introduction to Operations Research (F.Hillier and G.Lieberman, McGraw Hill Education, 2024)

Since numerical problem generation and solution in CVXlab is grounded on the CVXPY package, we also recommend referring to the CVXPY documentation for a comprehensive description of supported problem types.

Before generating a CVXlab Model, the items below must be conceptually defined.

Sets#

Let \(\mathcal{S}_1, \ldots, \mathcal{S}_k\) be finite, generic non-empty index sets. Sets represent the dimensions of the model, defining its scope. Each set \(\mathcal{S}_i\) is characterized by a list of elements called coordinates. A domain (or shape) is defined as a Cartesian product of a subset of sets:

\[\Omega = \mathcal{S}_{i_1} \times \cdots \times \mathcal{S}_{i_p} \subseteq \mathcal{S}_1 \times \cdots \times \mathcal{S}_k.\]

Definition of domains is useful to identify the scope of data tables (and consequently of related variables) in the model.

A sub-domain can be identified by filtering each set \(\Omega' \subseteq \Omega\) based on defined criteria. Defining sub-domains is useful in defining variables pointing to a subset of values in data table.

A specific element in the domain is identified by a generic index tuple \(s\) as:

\[s = (s_1,\ldots,s_k) \in \mathcal{S}_1 \times \cdots \times \mathcal{S}_k\]

Sets can be partitioned into two disjoint categories based on their role in problem structure:

\[\mathcal{S}_1 \times \cdots \times \mathcal{S}_k = \underbrace{(\mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m})}_{\text{Inter-problem}} \times \underbrace{(\mathcal{S}_{D_1} \times \cdots \times \mathcal{S}_{D_n})}_{\text{Dimension}}\]

where \(m + n = k\)

  • Inter-problem sets \(\mathcal{S}_{I_1}, \ldots, \mathcal{S}_{I_m}\): Define the space over which the numerical problem is solved. All variables and expressions in the model are defined and the numerical problem solved for each coordinate combination in the Cartesian product of inter-problem sets: \(\iota \in \mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m}\). Each combination is called scenario, defining a distinct instance of the optimization problem. Inter-problem sets are used to define multiple scenarios, e.g., to represent different demand projections, cost assumptions, or to perform sensitivity analysis.

  • Dimensions sets \(\mathcal{S}_{D_1}, \ldots, \mathcal{S}_{D_n}\): Specify the shape and indexing of model variables — i.e., how variables are arranged into rows and columns and indexed across intra-problem coordinates. Depending by each variable, dimension sets can be further classified as:

    • Shape sets: Define rows and columns of variables (matrix structure).

    • Intra-problem sets: Variables with a given shape are indexed over the Cartesian product of the remaining dimensions sets, defined as intra-problem sets.

    A consistent definition of variables dimensions is fundamental to correctly define symbolic expressions in the model, that must be dimensionally consistent.

Data Tables and Variables#

Data tables represent collections of model data identified by a sets domain. Specifically, a data table \(D\) over domain \(\Omega\) is a function:

\[D : \Omega \to \mathbb{R} \quad \text{(or } \mathbb{Z}, \{0,1\}\text{)}\]

where \(\Omega \subseteq \mathcal{S}_1 \times \cdots \times \mathcal{S}_k\).

Data tables can be classified as:

  • Exogenous: known parameters \(d(s)\) for \(s \in \Omega\)

  • Endogenous: unknowns to be determined (can be further classified as decision variables or auxiliary variables). Can be further defined as continuous or integer.

  • Constants: fixed values.

In case of integrated problems solved iteratively, variables can be defined as endogenous or exogenous based on the role they play in each problem, avoiding circular dependencies (see Expressions and Problems).

A variable \(x\) associated with a data table \(D\) is a symbolic reference to values in \(D\), defined over the same domain \(\Omega\), or over a filtered sub-domain \(\Omega'\). Multiple variables can reference the same data table, characterized by:

  • A different allocation of dimensions sets (i.e. defining different sets as shapes and intra-problem sets)

  • Different filterings (i.e. referring to sub-domains \(\Omega' \subseteq \Omega\))

  • In case of constants, different built-in constant types are supported, and user-defined constants can also be defined.

The reason why data tables and variables are defined separately is to allow multiple variables to reference the same data table in different ways, increasing flexibility in problem definition, and optimizing data management in the model SQLite database.

Expressions and Problems#

An expression \(f\) is a symbolic composition of variables and linear operators:

\[f(x_1, \ldots, x_n) = \sum_{j=1}^{n} A_j(x_j)\]

where:

  • \(x_j\) are variables, defined over domains \(\Omega_j\)

  • \(A_j : \mathbb{R}^{\Omega_j} \to \mathbb{R}^{\Theta}\) are linear aggregation operators (summations, weighted sums, matrix multiplications, etc.). Complex operations not expressible through built-in operators can be implemented as user-defined operators (see Symbolic operators).

Symbolic expressions must be dimensionally consistent: variables shapes must be compatible (defined over matching or properly aligned domains/sub-domains) Moreover, when a variable is characterized by intra-problem sets, one numerical expression instance is generated for each coordinate combination of those intra-problem sets. Variables with different intra-problem sets can appear in the same symbolic expression: each variable is automatically broadcast/reused across all generated expression instances, allowing for a flexible problem definition.

This can be formalized as follows. Let \(\mathcal{S}_{D_1}, \ldots, \mathcal{S}_{D_n}\) be dimension sets partitioned as:

\[\mathcal{S}_{D_1} \times \cdots \times \mathcal{S}_{D_n} = \underbrace{\mathcal{S}_{R}}_{\text{Shape (rows)}} \times \underbrace{\mathcal{S}_{C}}_{\text{Shape (columns)}} \times \underbrace{(\mathcal{S}_{P_1} \times \cdots \times \mathcal{S}_{P_p})}_{\text{Intra-problem}}\]

where \(n = 2 + p\) (one row set, one column set, and \(p\) intra-problem sets).

An expression \(f(x_1, \ldots, x_k)\) with variables over domains \(\Omega_1, \ldots, \Omega_k\) is dimensionally consistent if all shape components are compatible (matching row/column dimensions or properly broadcastable).

For each \(\pi \in \mathcal{S}_{P_1} \times \cdots \times \mathcal{S}_{P_p}\), a numerical expression instance \(f_{\pi}\) is generated:

\[f_{\pi}(x_1|_{\pi}, \ldots, x_k|_{\pi})\]

where \(x_j|_{\pi}\) denotes the restriction/broadcast of variable \(x_j\) to the intra-problem coordinate \(\pi\). Variables with different intra-problem sets are automatically broadcast across all instances.

A problem is defined by one or more symbolic expressions, as either:

1. System of linear equations:

\[\begin{split}\begin{aligned} f_1(x) &= b_1 \\ &\vdots \\ f_p(x) &= b_p \end{aligned}\end{split}\]

2. Convex optimization problem:

\[\begin{split}\begin{aligned} \min_{x} \quad & f_0(x) \\ \text{s.t.} \quad & f_i(x) \le b_i, \quad i = 1, \ldots, p \\ & h_j(x) = c_j, \quad j = 1, \ldots, q \\ & \ell \le x \le u \end{aligned}\end{split}\]

where \(f_0\) is convex, \(f_i\) are convex, and \(h_j\) are affine.

Multiple problems can be defined in the same CVXlab model, all sharing common sets, data tables, and variables.

A CVXlab model can formulate and solve multiple problems that share the same sets, data tables, and variables. Problems are all solved over the same inter-problem sets \(\mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m}\). Two execution schemes are supported:

  • Parallel (independent problems): problems with no coupling (no shared endogenous variables in expressions) can be solved independently and in parallel over the inter-problem sets.

  • Iterative decomposition (coupled/nonlinear): if a problem is nonlinear due to products of endogenous variables, split it into two or more convex subproblems. CVXlab solves them iteratively with a block Gauss-Seidel (alternating optimization) scheme, updating shared endogenous variables between subproblems until convergence. In this case, data tables must be classified as endogenous or exogenous per subproblem to allow proper information exchange and avoid circular dependencies.

Example: energy system model#

Problem statement

Let us consider a conceptual energy system planning model, where the goal is to define the least-cost energy production plan for one region over a defined time horizon, able to satisfy energy demand (assumed as known data defined according to different scenarios). Energy can be supplied by different technologies, characterized by specific production costs and installed capacities and availabilities (i.e. values able to convert installed capacity in MW to energy supplied in MWh). Installed capacity is varying over time, while costs and availabilities are assumed as fixed.

Definition of model sets and related coordinates

Sets defined for the model are summarized in the table below.

Sets defining model’s domain#

Set name

Symbol

Coordinates

Cardinality

Set type

Technologies

\(t\)

Solar, Gas, Nuclear

3

Dimension

Time periods

\(y\)

2025, 2026, 2027, 2028, 2029, 2030

6

Dimension

Demand scenarios

\(d\)

Low_demand, High_demand

2

Inter-problem

Notice that:

  • Inter-problem sets (\(d\)) define multiple problem instances. This implies that one optimization problem is generated and solved for each combination of demand scenario and cost sensitivity (in this case, only \(2\) problem instances).

  • Dimension sets (\(t\), \(y\)) are used to define the scope of data tables and the shapes of related variables.

  • Coordinates of each set can be associated to filters to define sub-domains. As example, the technologies set may classify technologies as renewable and non-renewable, allowing to define variables with sub-domains including only specific categories. In this simplified example, all variables are defined over full domains (no filtering is applied).

Definition of data tables and variables

The following tables summarizes the data tables and associated variables for the energy system model.

Data tables properties#

Type

Name

Domain [Cardinality]

Description

Exogenous

\(cost(t)\)

\(t - [3]\)

Specific costs of generation by cost scenario and technology (in €/MWh).

Exogenous

\(capacity(t,y)\)

\(t \times y - [3 \times 6 = 18]\)

Installed capacity by technology and time period (in MW).

Exogenous

\(availability(t)\)

\(t - [3]\)

Availability factors by technology (in MWh/MW).

Exogenous

\(demand(d,y)\)

\(d \times y - [2 \times 1 \times 6 = 12]\)

Energy demand defined by demand scenarios and time periods.

Endogenous

\(supply(d,y,t)\)

\(d \times y \times t - [2 \times 6 \times 3 = 72]\)

Energy supply defined by demand and cost scenarios, technology and time period.

Constant

\(constant(t)\)

\(t - [3]\)

Model constants defined based on the shape of \(t\) set.

Regarding data tables above:

  • Endogenous data table has a domain defined over all model sets, while exogenous data tables are defined over specific sets.

  • For each data table, the cardinality (i.e. the total number of data entries) is reported, calculated as the product of the cardinalities of all sets in the domain. As example, the availability(t) data table includes 3 entries only, one for each technology, due to its domain defined over the technologies set only.

Variables properties#

Related data table

Variable name

Shape (rows,columns)

Intra-problem sets

Inter-problem sets

\(cost(t)\)

\(c\)

\(1, t - [1, 3]\)

\(-\)

\(-\)

\(capacity(t, y)\)

\(cap\)

\(1, t - [1, 3]\)

\(y - [6]\)

\(-\)

\(availability(d, y)\)

\(av\)

\(1, t - [1, 3]\)

\(-\)

\(-\)

\(demand(d, y)\)

\(E_d\)

\(1, 1 - [1, 1]\)

\(y - [6]\)

\(d - [2]\)

\(supply(d,y,t)\)

\(E_s\)

\(1, t - [1, 3]\)

\(y - [6]\)

\(d - [2]\)

\(consant(t)\)

\(i_t\)

\(t, 1 - [3, 1]\)

\(-\)

\(-\)

Regarding variables above:

  • Each variable stem from a related data table, inheriting its properties: the domain (defined by sets) and the data type (exogenous, endogenous, constant).

  • Each variable is characterized by a specific allocation of dimensions sets into shapes and intra-problem sets. As example, the energy supply E_s variable has 1 row and 3 columns (defined by the technologies set), it is indexed over 6 intra-problem coordinates (defined by the time periods set) and over 2 inter-problem coordinates (defined by the demand scenarios set).

  • Constants can be defined with different built-in or user defined types (see Constant data types). In the example above, the i_t variable is defined as a summation vector, consisting in a column vector of 1s, useful to perform summations by matrix multiplications.

Definition of symbolic problem

For the current energy system model, a symbolic problem can be defined as a linear optimization problem as follows.

\[\begin{split}\begin{aligned} \min_{E_s} \quad & c \cdot E_s' & \forall \, y\\ \text{s.t.} \quad & E_s \cdot i_t \geq E_d & \forall \, y \\ & E_s \leq cap \cdot \widehat{av} & \forall \, y \\ & E_s \geq 0 & \forall \, y \end{aligned}\end{split}\]

Notice that:

  • The problem is defined a number of times equal to the cardinality of the inter-problem set. Specifically, one problem instance is defined and solved for each energy demand scenarios \(d\). In case of multiple inter-problem sets, the problem is defined for each coordinate combination in the Cartesian product of all inter-problem sets.

  • For each simbolic expression, a number of numerical expressions is generated, equal to the Cartesian product of all intra-problem sets of the related variables. In this case, all expressions are defined over the intra-problem set time periods \(y\), generating one numerical expression per time period for all symbolic expressions.

  • In case of variables defined over different intra-problem sets, automatic broadcasting is applied, Variables not defined over specific intra-problem sets are automatically reused across all generated numerical expressions.

  • In this problem, the dot operator \(\cdot\) represents matrix multiplication, the \(\widehat{(*)}\) is the diagonalization operator, and the \((*)'\) represents the transposition operator (see Symbolic operators for a comprehensive description of built-in symbolic operators).

A note on dimensional consistency#

The allocation of dimension sets to shapes and intra-problem sets offers significant modeling flexibility. The same problem can be formulated in multiple equivalent ways.

Matrix-based formulation (as in the example above):

  • Expressions must be dimensionally consistent, and variables shapes must be compatible for matrix operations. Multiple variables can stem from the same data table, each characterized by different allocations of dimension sets: this allows for flexible model definitions.

  • Expressions works with matrix operations (multiplication, transposition, …).

  • Compact symbolic representation with fewer expression instances.

  • Potentially more efficient numerical problem generation and solution.

Scalar-based formulation (extreme case):

All dimension sets can be allocated as intra-problem sets, reducing all variables to scalars (shape \((1,1)\)). In this case, for each energy demand scenario \(d\), the problem can be reformulated as:

\[\begin{split}\begin{aligned} \min_{E_s} \quad & \sum_{t} c \cdot E_s \\ \text{s.t.} \quad & \sum_{t} E_s \geq E_d & \forall \, y \\ & E_s \leq cap \cdot av & \forall \, t \, y \\ & E_s \geq 0 & \forall \, t \, y \end{aligned}\end{split}\]

where all variables become scalars indexed over \(t\) and \(y\).

Trade-offs:

Aspect

Matrix-based

Scalar-based

Symbolic complexity

Lower (fewer expressions)

Higher (many expression instances)

Mathematical notation

Compact and elegant

Explicit summations

Computational overhead

Efficient matrix operations

More expression instances to generate

Model readability

High-level abstraction

Detailed element-wise view

Debugging

Harder (matrix operations)

Easier (scalar operations)

Recommendation:

Use matrix formulations when:

  • The problem has natural matrix structure (e.g., flows over networks, resource allocation).

  • A compact symbolic expression is preferred.

  • Computational efficiency matters (large-scale problems).

Use scalar formulations when:

  • Element-wise constraints are complex and benefit from explicit indexing.

  • Debugging and transparency are priorities.

  • The problem is small-scale.

CVXlab supports both approaches and any intermediate allocation, allowing users to choose the most appropriate abstraction level for their specific modeling needs. All formulations are mathematically equivalent and produce identical numerical solutions.