Conceptual model definition#

The CVXlab modeling process must be grounded on a solid conceptualization and mathematical definition of the problem to be solved. As its name suggests, CVXlab is primarily designed for convex optimization problems.

Definition of convex optimization problems and related mathematical concepts lies outside the scope of this documentation. Foundational knowledge of Operations Research can be found in several references. Among others, we suggest the textbook: Introduction to Operations Research (F. Hillier and G. Lieberman, McGraw Hill Education, 2024)

Since numerical problem generation and solution in CVXlab is grounded on the CVXPY package, we also recommend referring to the CVXPY documentation for a comprehensive description of supported problem types.

Before generating a CVXlab Model, the items below must be conceptually defined.

Concept	Defines	Used for
Sets	The model domain and indexing space	Defining model Variables based on a coordinates system
Data Tables	The model data over a domain defined by multiple Sets	Storing data in the database, defining Variables
Variables	Symbolic references to Data Table values	Defining model symbolic Expressions
Expressions	Symbolic combinations of Variables and operators	Defining numerical Problems
Problems	Numerical problem(s) of the model	Determining endogenous Data Tables values

Sets#

Let \(\mathcal{S}_1, \ldots, \mathcal{S}_k\) be finite, generic non-empty index sets. Sets represent the dimensions of the model, defining its scope. Each set \(\mathcal{S}_i\) is characterized by a list of elements called coordinates. A domain (or shape) is defined as a Cartesian product of a subset of sets:

\[\Omega = \mathcal{S}_{i_1} \times \cdots \times \mathcal{S}_{i_p} \subseteq \mathcal{S}_1 \times \cdots \times \mathcal{S}_k.\]

Definition of domains is useful to identify the scope of data tables (and consequently of related variables) in the model.

A sub-domain can be identified by filtering each set \(\Omega' \subseteq \Omega\) based on defined criteria. Defining sub-domains is useful in defining variables pointing to a subset of values in a data table.

A specific element in the domain is identified by a generic index tuple \(s\) as:

\[s = (s_1,\ldots,s_k) \in \mathcal{S}_1 \times \cdots \times \mathcal{S}_k\]

Sets can be partitioned into two disjoint categories based on their role in the problem structure:

\[\mathcal{S}_1 \times \cdots \times \mathcal{S}_k = \underbrace{(\mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m})}_{\text{Inter-problem sets}} \times \underbrace{(\mathcal{S}_{D_1} \times \cdots \times \mathcal{S}_{D_n})}_{\text{Dimension sets}}\]

where \(m + n = k\). Inter-problem sets and dimension sets are defined as below. A consistent definition of variable dimensions is fundamental to correctly define symbolic expressions in the model, which must be dimensionally consistent.

Inter-problem sets

\(\mathcal{S}_{I_1}, \ldots, \mathcal{S}_{I_m}\)

Define the space over which the numerical problem is solved. All variables and expressions in the model are defined and the numerical problem is solved for each coordinate combination in the Cartesian product of inter-problem sets: \(\iota \in \mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m}\). Each combination is defined as a scenario, identifying a distinct instance of the optimization problem. Inter-problem sets are used to define multiple scenarios, for example to represent different demand projections, cost assumptions, or sensitivity cases.

Dimension sets

\(\mathcal{S}_{D_1}, \ldots, \mathcal{S}_{D_n}\)

Specify the shape and indexing of model variables, that is, how variables are arranged into rows and columns and indexed across intra-problem coordinates. Depending on each variable, dimension sets can be further classified as:

Shape sets: define rows and columns of variables (matrix structure). Multiple sets can be assigned to the same row or column, and the resulting dimension is the Cartesian product of the assigned sets. For rows over \(\mathcal{S}_{R} = \mathcal{S}_{a} \times \mathcal{S}_{b}\), the variable has \(|\mathcal{S}_{R}| = |\mathcal{S}_{a}| \cdot |\mathcal{S}_{b}|\) rows with row coordinates \((s_a, s_b) \in \mathcal{S}_{a} \times \mathcal{S}_{b}\). The same applies to columns.
Intra-problem sets: variables with a given shape are indexed over the Cartesian product of the remaining dimension sets. For intra-problem coordinates over \(\mathcal{S}_{P} = \mathcal{S}_{P_1} \times \cdots \times \mathcal{S}_{P_p}\), each variable \(x\) of shape \(\mathcal{S}_{R} \times \mathcal{S}_{C}\) has \(|\mathcal{S}_{P}| = \prod_{j=1}^{p} |\mathcal{S}_{P_j}|\) instances, one for each \(\pi = (s_{P_1},\ldots,s_{P_p}) \in \mathcal{S}_{P}\).

Data Tables and Variables#

Data tables represent collections of model data identified by a set domain. Specifically, a data table \(D\) over domain \(\Omega\) is a function:

\[D : \Omega \to \mathbb{R} \quad \text{(or } \mathbb{Z}, \{0,1\}\text{)}\]

where \(\Omega \subseteq \mathcal{S}_1 \times \cdots \times \mathcal{S}_k\).

Data tables can be classified as:

Exogenous: known parameters \(d(s)\) for \(s \in \Omega\)
Endogenous: unknowns to be determined. Can be further classified as continuous (if nothing is specificed, integer (\(\mathbb{Z}\)), or boolean (\(\{0,1\}\)).
Constants: fixed values. Multiple built-in constant types are supported, and user-defined constants can also be defined at the variable level (see Constant data types).

Hybrid Data Tables

In case of integrated problems solved iteratively, variables can be defined as endogenous or exogenous depending on the role they play in each problem, avoiding circular dependencies (see Expressions and Problems). In this case, Data Tables can be classified as hybrid, meaning that the variables stemming from them are defined as endogenous for some problems and exogenous for others.

A variable \(x\) associated with a data table \(D\) is a symbolic reference to values in \(D\), defined over the same domain \(\Omega\), or over a filtered sub-domain \(\Omega'\). Multiple variables can reference the same data table, characterized by:

A different allocation of dimension sets, defining different sets as shapes and intra-problem sets.
Different filterings, referring to sub-domains \(\Omega' \subseteq \Omega\).
Different constant definitions, in case the table stores constants.

The reason why data tables and variables are defined separately is to allow multiple variables to reference the same data table in different ways, increasing flexibility in problem definition and optimizing data management in the model SQLite database.

Expressions and Problems#

An expression \(f\) is a symbolic composition of variables and linear operators:

\[f(x_1, \ldots, x_n) = \sum_{j=1}^{n} A_j(x_j)\]

where:

\(x_j\) are variables, defined over domains \(\Omega_j\)
\(A_j : \mathbb{R}^{\Omega_j} \to \mathbb{R}^{\Theta}\) are linear aggregation operators (summations, weighted sums, matrix multiplications, etc.). Complex operations not expressible through built-in operators can be implemented as user-defined operators (see Symbolic operators).

Symbolic expressions must be dimensionally consistent: variable shapes must be compatible and, when needed, properly aligned or broadcast. Moreover, when a variable is characterized by intra-problem sets, one numerical expression instance is generated for each coordinate combination of those sets. Variables with different intra-problem sets can appear in the same symbolic expression: each variable is automatically broadcast or reused across all generated expression instances.

This can be formalized using the notation introduced in Sets: dimension sets are partitioned into shape sets \(\mathcal{S}_{R}\) (rows) and \(\mathcal{S}_{C}\) (columns), plus \(p\) intra-problem sets \(\mathcal{S}_{P_1}, \ldots, \mathcal{S}_{P_p}\).

An expression \(f(x_1, \ldots, x_k)\) with variables over domains \(\Omega_1, \ldots, \Omega_k\) is dimensionally consistent if all shape components are compatible.

For each \(\pi \in \mathcal{S}_{P_1} \times \cdots \times \mathcal{S}_{P_p}\), a numerical expression instance \(f_{\pi}\) is generated:

\[f_{\pi}(x_1|_{\pi}, \ldots, x_k|_{\pi})\]

where \(x_j|_{\pi}\) denotes the restriction or broadcast of variable \(x_j\) to the intra-problem coordinate \(\pi\).

A problem is defined by one or more symbolic expressions, as either:

1. System of linear equations:

\[\begin{split}\begin{aligned} f_1(x) &= b_1 \\ &\vdots \\ f_p(x) &= b_p \end{aligned}\end{split}\]

2. Convex optimization problem:

\[\begin{split}\begin{aligned} \min_{x} \quad & f_0(x) \\ \text{s.t.} \quad & f_i(x) \le b_i, \quad i = 1, \ldots, p \\ & h_j(x) = c_j, \quad j = 1, \ldots, q \\ & \ell \le x \le u \end{aligned}\end{split}\]

where \(f_0\) is convex, \(f_i\) are convex, and \(h_j\) are affine.

Multiple problems can be defined in the same CVXlab model, all sharing common sets, data tables, and variables.

A CVXlab model can formulate and solve multiple problems that share the same sets, data tables, and variables. Problems are all solved over the same inter-problem sets \(\mathcal{S}_{I_1} \times \cdots \times \mathcal{S}_{I_m}\). Two execution schemes are supported:

Parallel (independent problems): problems with no coupling, meaning no shared endogenous variables in expressions, can be solved independently and in parallel over the inter-problem sets.
Iterative decomposition (coupled or nonlinear): if a problem is nonlinear due to products of endogenous variables, it can be split into two or more convex subproblems. CVXlab solves them iteratively with a block Gauss-Seidel (alternating optimization) scheme, updating shared endogenous variables between subproblems until convergence. In this case, data tables must be classified as endogenous or exogenous per subproblem to allow proper information exchange and avoid circular dependencies within the same subproblem.

Non-linear problem decomposition

A simple conceptual example is the following coupled system of equalities:

\[\begin{split}\text{Problem 0: } \left\{ \begin{aligned} a + xy &= 0 \\ b + cx &= 0 \end{aligned} \right. \begin{aligned} &\text{endogenous: } x, y \\ \end{aligned}\end{split}\]

The term \(xy\) makes the system nonlinear because it multiplies two endogenous variables. Problem 0 can be decomposed into two separate linear subproblems 1 and 2:

\[\begin{split}\begin{aligned} &\text{Problem 1: } \left\{ a + xy = 0 \right. \text{endogenous: } y \\ &\text{Problem 2: } \left\{ b + cx = 0 \right. \text{endogenous: } x \end{aligned}\end{split}\]

In this way, the same shared variable \(x\) appears in both subproblems with different roles: as endogenous in Problem 2 and exogenous in Problem 1. Each subproblem is linear once the shared variable coming from the other block is fixed.

Dimensional consistency#

The allocation of dimension sets to shapes and intra-problem sets offers significant modeling flexibility. The same problem can be formulated in multiple equivalent ways.

In case variables Dimension sets are defined by including Shape sets, the variable is defined as vector or matrix. This leads to a matrix-based formulation of mathematical problem, including one or multiple vectorized expressions. In this case, vectorized expressions must be dimensionally consistent, and variable shapes must be compatible for matrix operations such as matrix multiplication or transposition. This representation is compact and usually leads to more efficient computations; however, it could be more complex to be formulated and understood.

In case Dimension sets are all defined as Intra-problem sets, all variables are reduced to scalars with shape \((1,1)\). This leads to a scalar-based formulation of mathematical problem, where all expressions are written in an element-wise way. This representation is more explicit and easier to understand, but it can lead to a higher number of expression instances and less efficient computations.

Trade-offs between the two formulations are summarized in the table below.

Aspect	Matrix-based	Scalar-based
Symbolic complexity	Lower (fewer expressions)	Higher (many expression instances)
Mathematical notation	Compact and elegant	Explicit summations
Computational overhead	Efficient matrix operations	More expression instances to generate
Model readability	High-level abstraction	Detailed element-wise view
Debugging	Harder (matrix operations)	Easier (scalar operations)

It is recommended to rely on a matrix-based formulations when the problem has a natural matrix structure, a compact symbolic expression is preferred, or computational efficiency matters. Conversely, use scalar formulations when element-wise constraints are complex and benefit from explicit indexing, debugging and transparency are priorities, or the problem is small-scale.

CVXlab supports both approaches and any intermediate allocation, allowing users to choose the most appropriate abstraction level for their specific modeling needs. All formulations are mathematically equivalent and produce identical numerical solutions.