文章

Notes | Convex Optimization | Chapter2 Convex sets

Notes | Convex Optimization | Chapter2 Convex sets

2.1 Affine and convex sets

2.1.1 Lines and line segments

Suppose x1x2 are two points in Rn. Points of the form

y=θx1+(1θ)x2

where θR, form the line passing through x1 and x2. Expressing y in the form

y=x2+θ(x1x2)

gives another interpretation: y is the sum of the base point x2 (corresponding to θ=0) and the direction x1x2 (which points from x2 to x1) scaled by the parameter θ.

2.1.2 Affine sets

A set CRn is affine if the line through any two distinct points in C lies in C. We refer to a point of the form θ1x1++θkxk, where θ1++θk=1, as an affine combination of the points x1,,xk.

If C is an affine set, x1,,xkC, and θ1++θk=1, then the point θ1x1++θkxk also belongs to C.

If C is an affine set and x0C, then the set

V=Cx0=xx0|xC

is a subspace. Suppose v1,v2V and α,βR. Then we have v1+x0C and v2+x0C, and so

αv1+βv2+x0=α(v1+x0)+β(v2+x0)+(1αβ)x0C

since C is affine, and α+β+(1αβ)=1. we conclude that αv1+βv2V, since αv1+βv2+x0C.

Thus, the affine set C can be expressed as

C=V+x0=v+x0|vV

as a subspace plus an offset.

Every affine set can be expressed as the solution set of a system of linear equation. (See Example2.1)

The set of all affine combinations of points in some set CRn is called the affine hull if C, and denoted affC:

affC=θ1x1++θkxk|x1,,xkC,θ1++θk=1

The affine hull is smallest affine set that contains C, in the following sense: if S is any affine set with CS, then affCS

2.1.3 Affine dimension and relative interior

We define the affine dimension of a set C as the dimension of its affine hull.

If the affine dimension of a set CRn is less than n, then the set lies in the affine set affCRn. We define the relative interior of the set C, denoted relintC , as its interior relative to affC:

relintC=xC|B(x,r)affCC for some r>0
where $B(x,r)=\{y\lVert y-x\rVert\leq r\},theballofradiusrandcenterxinthenorm\lVert\cdot\rVert$. We can then define the relative boundary of a set C as clCrelintC, where clC is the closure of C.

2.1.4 Convex sets

A set C is convex if the line segment between any two point in C lies in C. If for any x1,x2C and any θ with 0θ1, we have

θx1+(1θ)x2C

We call a point of the form θ1x1++θkxk, where θ1++θk=1 and θi0,i=1,,k, a convex combination of the points x1,,xk. A convex combination of points can be thought of as a mixture or weighted average of the points, with θi the fraction of xi in the mixture.

The convex hull of a set C, denoted convC, is the set of all convex combinations of points in C:

convC=θ1x1++θkxk|xiC,θi0,i=1,,k,θ1++θk=1

If B is any convex set that contains C, then convCB

2.1.5 Cones

A set C is called a cone, or nonnegative homogeneous, if for every xC and θ0 we have θxC. A set C is a convex cone if it is convex and a cone, which means that for any x1,x2C and θ1,θ20, we have

θ1x1+θ2x2C

A point of the form θ1x1++θkxk with θ1,,θk0 is called a conic combination (or a nonnegative linear combination) of x1,,xk.

A set C is a convex cone if and only if it contains all conic combinations of its elements.

The conic hull of a set C is the set of all conic combinations of points in C

θ1x1+θkxk|xiC,θi0,i=1,,k

which is also the smallest convex cone that contains C

2.2 Some important examples

  • The empty set , any single point (i.e., singleton) {x0}, and the whole space Rn are affine (hence, convex) subsets of Rn.
  • Any line is affine. If it passes through zero, it is a subspace, hence also a convex cone.
  • A line segment is convex, but not affine (unless it reduces to a point).
  • A ray, which has the form $\{x_0 +\theta_v\theta\geq0\},wherev\neq0,isconvex,butnotaffine.Itisaconvexconeifitsbasex_0$ is 0.
  • Any subspace is affine, and a convex cone (hence convex)

2.2.1 Hyperplanes and halfspaces

A hyperplane is as set of the form

x|aTx=b

where aRn,a0, and bR.

A hyperplane divides Rn into two halfspaces. A (closed) halfspace is a set of the form

x|aTxb

where a0, the solution set of one (nontrivial) linear inequality.

Halfspaces are convex, but not affine.

The halfspace can also be expressed as

x|aT(xx0)0

where x0 is any point on the associated hyperplane, satisfies aTx0=b.

The boundary of the halfspace is the hyperplane $\{xa^Tx=b\}.Theset\{xa^Tx<b\},whichistheinteriorofthehalfspace\{xa^Tx\leq b\}$, is called an open halfspace.

2.2.2 Euclidean balls and ellipsoids

A (Euclidean) ball (or just ball) in Rn has the form

B(xc,r)=x|xxc2r=x|(xxc)T(xxc)r2

where r>0, and 2 denotes the Euclidean norm. The vector xc is the center of the ball and the scalar r is its radius; B(xc,r) consists of all points within a distance r of center xc. Another common representation for the Euclidean ball is

B(xc,r)=xc+ru|u21

A related family of convex sets is the ellipsoids, which have the form

ε=x|(xxc)TP1(xxc)1

where P=PT0, P is symmetric and positive definite. the vector xcRn is the center of the ellipsoid. The matrix P determines how far the ellipsoid extends in every direction from xc; the lengths of the semi-axes of ε are give by λi, where λi are the eigenvalue of P. A ball is an ellipsoid with P=r2I.

Another common representation of an ellipsoid is

ε=xc+Au|u21

where A is square and nonsingular. In this representation we can assume without loss of generality that A is symmetric and positive definite. When the matrix A is symmetric positive semidefinite but singular, the set is called a degenerate ellipsoid; its affine dimension is equa to the rank of A. Degenerate ellipsoids are also convex.

2.2.3 Norm balls and norm cones

From the general properties of norms it can be shown that a norm ball of radius r and center xc, given by $\{x\lVert x-x_c\rVert\leq r\}$ is convex. The norm cone associated with the norm is the set
C=(x,t)|xtRn+1

It is (as the name suggests) a convex cone.

The second-order cone is the norm cone for the Euclidean norm,

Missing \left or extra \right

The second-order cone is also known by several other names. It is called the quadratic cone, since it is defined by a quadratic inequality. It is also called the Lorentz cone or ice-cream cone.

2.2.4 Polyhedra

A polyhedron is defined as the solution set of a finite number of line equalities and inequalities:

P=x|ajTxbj,j=1,,m,cjTx=dj,j=1,,p

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes.

Affine sets (e.g., subspaces, hyperplanes, lines), rays, line segments, and halfspaces are all polyhedra.

A bounded polyhedron is sometimes called a polytope, but sume authors use the opposite convention (i.e., polytope for any set of the preceding form, and polyhedron when it is bounded).

It will be convenient to use the compact notation

P=x|Axb,Cx=d

where

A=[a1TamT],C=[c1TcpT]

and the symbol denotes vector inequality or componentwise inequality in Rm: uv means uivi for i=1,,m.

Simplexes:

Simplexes is another important family of polyhedra. Suppose the k+1 points v0,,vkRn are affinely independent, which means v1v0,,vkv0 are linearly independent. The simplex determined by them is given by

C=convv0,,vk=θ0v0++θkvk|θ0,1Tθ=1

The affine dimension of this simplex is k, so it is sometimes referred to as a k-dimensional simplex in Rn.

Some common simplexes. A 1-dimensional simplex is a line segment; a 2-dimensional simplex is a triangle (including its interior); and a 3-dimensional simplex is a tetrahedron.

Convex hull description of polyhedra:

The convex hull of the finite set {v1,,vk} is

convv1,,vk=θ1v1++θkvk|θ0,1Tθ=1

This set is a polyhedron, and bounded, but (except is special cases, e.g., a simplex) it is not simple to express it in the form by a set of linear equalities and inequalities.

2.2.5 The positive semidefinite cone

We use the notation Sn to denote the set of symmetric n×n matrices,

Sn=XRn×n|X=XT

which is a vector space with dimension n(n+1)/2. We use the notation S+n to denote the set of symmetric positive semidefinite matrices:

S+n=XSn|X0

and the notation S++n to denote the set of symmetric positive definite matrices:

S++n=XSn|X0

The set S+n is a convex cone: if θ1,θ20 and A,BS+n, then θ1A+θ2BS+n. This can be seen directly from the definition of positive semidefiniteness: for any xRn, we have

xT(θ1A+θ2B)x=θ1xTAx+θ2xTBx0

if A0,B0 and θ1,θ20.

2.3 Operations that preserve convexity

2.3.1 Intersection

Convexity is preserved under intersection: if S1 and S2 are convex, then S1S2 is convex. This property extends to the intersection of an infinite number of sets: if Sα is convex for every αA, then αASα is convex.

every closed convex set S is a (usually infinite) intersection of halfspaces. In fact, a closed convex set S is the intersection of all halfspaces that contain it:

S=H|H halfspace,SH

2.3.2 Affine functions

Recall that a function f:RnRm is affine if it is a sum of a linear function and a constant, i.e., if it has the form f(x)=Ax+b, where ARm×n and bRm. Suppose SRn is convex and f:RnRm is an affine function. Then the image of S under f,

f(S)=f(x)|xS

is convex.

2.3.3 Linear-fractional and perspective functions

The perspective function:

We define the perspective function P:Rn+1Rn, with domain domP=Rn×R++, as P(z,t)=z/t. The perspective function scales or normalizes vectors so the last component is one, and then drops the last component

Remark. We can interpret the perspective function as the action of a pin-hole camera.

Linear-fractional functions:

A linear-fractional function is formed by composing the perspective function with an affine function. Suppose g:RnRm+1 is affine, i.e.,

g(x)=[AcT]x+[bd]

where ARm×n,bRm,cRn, and dR. The function f:RnRm given by f=Pg, i.e.,

f(x)=(Ax+b)/(cTx+d),domf=x|cTx+d>0

is called a linear-fractional (or projective) function. If c=0 and d>0, the domain of f is Rn, and f is an affine function. So we can think of affine and linear functions as special cases of linear-fractional functions.

2.4 Generalized inequalities

2.4.1 Proper cones and generalized inequalities

A cone KRn is called a proper cone if it satisfies the followings:

  • K is convex.
  • K is closed.
  • K is solid, which means it has nonempty interior.
  • K is pointed, which means that it contains no line (or equivalently, xK,xKx=0)

A proper cone K can be used to define a generalized inequality, which is a partial ordering on Rn that has many of the properties of the standard ordering on R.

Cone of polynomials nonnegative on [0,1]. Let K be defined as

K=cRn|c1+c2t++cntn10 for t[0,1]

i.e., K is the cone of (coefficients of) polynomials of degree n1 that are nonnegative on the interval [0,1].

Properties of generalized inequalities:

A generalized inequality K satisfies many properties, such as

  • $\preceq K$ _is preserved under addition: if xKy and yKv, then x+uKy+v.
  • $\preceq K$ _is transitive: if xKy and yKz, then xKz.
  • $\preceq K$ _is preserved under nonnegative scaling: if xKy and α0, then αxKαy.
  • $\preceq K$ _is reflexive: xKx.
  • $\preceq K$ _is antisymmetric: if xKy and yKx, then x=y.
  • $\preceq K$ _is preserved under limits: if xiKyi for i=1,2,,xix and yiy as i\infin, then xKy.

The correspoinding strict generalized inequality K satisfies, for example,

  • if xKy then xKy.
  • if xKy and uKv then x+uKy+v.
  • if xKy and α>0 then αxKαy.
  • xKx.
  • if xKy, then for u and v small enough, x+uKy+v.

2.4.2 Minimum and minimal elements

The most difference is that on R is a linear ordering: any two points are comparable, meaning either xy or yx. This property does not hold for other generalized inequalities. One implication is that concepts like minimum and maximum are more complicated in the context of generalized inequalities.

We can describe minimum and minimal elements using simple set notation. A point xS is the minimum element of S if and only if

Sx+K

Here x+K denotes all the points that are comparable to x and greater than or equal to x (according to K). A point xS is a minimal element if and only if

(xK)S=x

Here xK denotes all the points that are comparable to x and less than or equal to x (according to K); the only point in common with S is x.

example

  • Left. The set S1 has a minimum element x1 with respect to componentwise inequality in R2. The set x1+K is shaded lightly; x1 is the minimum element of S1 since S1x1+K.
  • Right. The point x2 is a minimal point of S2. The set x2K is shown lightly shaded. The point x2 is minimal because x2K and S2 intersect only at x2.

2.5 Separating and supporting hyperplanes

2.5.1 Separating hyperplane theorem

separating hyperplane theorem: Suppose C and D are nonempty disjoint convex sets, i.e., CD=. Then there exist a0 and b such that aTxb for all xC and aTxb for all xD.In other words, the affine function aTxb is nonpositive on C and nonnegative on D. The hyperplane $\{xa^Tx=b\}$ is call a separating hyperplane for the sets C and D, or is said to separate the sets C and D.

Proof of separating hyperplane theorem:

Here we consider a special case, we assume that the (Euclidean) distance between C and D, defined as

dist(C,D)=infuv2|uC,vD

is positive, and that there exist points cC and dD that achieve the minimum distance, i.e., cd2=dist(C,D). (These conditions are satisfied, for example, when C and D are closed and one set is bounded.)

Define

a=dc,b=d22c222

We will show that the affine function

f(x)=aTxb=(dc)T(x(1/2)(d+c))
is nonpositive on C and nonnegative on D, i.e., that the hyperplane $\{xa^Tx=b\}separatesCandD$.

This hyperplane is perpendicular to the line segment between c and d, and passes through its midpoint.

Fig. 1. separating hyperplane

We first show that f is nonnegative on D. The proof that f is nonpositive on C is similar (or follows by swapping C and D and considering f). Suppose there were a point uD for which

f(u)=(dc)T(u(1/2)(d+c))<0

We can express f(u) as

f(u)=(dc)T(ud+(1/2)(dc))=(dc)T(ud)+(1/2)dc22

We see that (dc)T(ud)<0. Now we observe that

ddtd+t(ud)c22|t=0=2(dc)T(ud)<0

so for some small t>0, with t1, we have

d+t(ud)c2<dc2

i.e., the point d+t(ud) is closer to c than d is. Since D is convex and contains d and u, we have d+t(ud)D. But this is impossible, since d is assumed to be the point in D that is closest to C.

Strict separation:

The separation hyperplane we constructed above satisfies the stronger condition that aTx<b for all xC and aTx>b for all xD. This is called strict separation of the sets C and D.

Converse separating hyperplane theorems:

Any two convex sets C and D, at least one of which is open, are disjoint if and only if there exists a separating hyperplane.

2.5.2 Supporting hyperplanes

Suppose CRn, and x0 is a point in its boundary bdC, i.e.,

x0bdC=clCintC
If a0 satisfies aTxaTx0 for all xC, then the hyperplane $\{xa^Tx=a^Tx_0\}$ is called a supporting hyperplane to C at the point x0. This is equivalent to saying that the point x0 and the set C are separated by the hyperplane $\{xa^Tx=a^Tx_0\}.Thegeometricinterpretationisthatthehyperplane\{xa^Tx=a^Tx_0\}istangenttoCatx_0,andthehalfspace\{xa^Tx\leq a^Tx_0\}containsC$.
![Fig. 2. The hyperplane ${xa^Tx=a^Tx_0}supportsCatx_0$](/assets/images/2024-02-12-convex-optimization-chapter2/2024-02-11-09-30-17.png)

A basic result, called the supporting hyperplane theorem, states that for any nonempty convex set C, and any x0bdC, there exists a supporting hyperplane to C at x0.

There is also a partial converse of the supporting hyperplane theorem: If a set is closed, has nonempty interior, and has a supporting hyperplane at every point in its boundary, then it is convex.

2.6 Dual cones and generalized inequalities

2.6.1 Dual cones

Let K be a cone. The set

K=y|xTy0 for all xK

is called the dual cone of K. As the name suggests, K is a cone, and is always convex, even when the original cone K is not.

Geometrically, yK if and only if y is the normal of a hyperplane that supports K at the origin.

Fig. 3. Left: The halfspace with inward normal $y$ contains the cone $K$, so $y\in K^*$. Right: The halfspace with inward normal $z$ does not contain $K$, so $z\notin K^*$.

Subspace. The dual cone of a subspace VRn (which is a cone) is its orthogonal complement $V^{\perp}=\{yv^Ty=0\text{ for all }v\in V\}$.

Nonnegative orthant. The cone R+n is its own dual:

xTy0 for all x0y0

We call such a cone self-dual.

Duyal cones satisfy several properties, such as:

  • K is closed and convex.
  • K1K2 implies K2K1.
  • If K has nonempty interior, then K is pointed.
  • If the closure of K is pointed then K has nonempty interior.
  • K is the closure of the convex hull of K. (Hence if K is convex and closed, K=K.)

These properties show that if K is a proper cone, then so is its dual K, and moreover, that K=K.

2.6.2 Dual generalized inequalities

Now suppose that the convex cone K is proper, so it induces a generalized inequality K. Then its dual cone K is also proper, and therefore induces a generalized inequality. We refer to the generalized inequality K as the dual of the generalized inequality K.

Some important properties relating a generalized inequality and its dual are:

  • xKy if and only if λTvλTy for all λK0.
  • xKy if and only if λTv<λTy for all λK0,λ0.

Since K=K, the dual generalized inequality associated with K is K, so these properties hold if the generalized inequality and its dual are swapped. As a specific example, we have λKμ if and only if λTxμTx for all xK0.

2.6.3 Minimum and minimal elements via dual inequalities

Dual characterization of minimum element:

We first consider a characterization of the minimum element: x is the minimum element of S, with respect to the generalized inequality K, if and only if for all λK0,x is the unique minimizer of λTx over zS. Geometrically, this means that for any λK0, the hyperplane

z|λT(zx)=0

is a strict support hyperplane to S at x. Note that convexity of the set S is not required.

Fig. 4. Dual characterization of minimum element.

Dual characterization of minimal element:

We now turn to a similar characterization of minimal elements. Here there is a gap between the necessary and sufficient conditions. If λK0 and x minimizes λTz over zS, then x is minimal.

Fig. 5. A set $S\in\mathbf{R}^2$.

The converse is in general false: a point x can be minimal in S, but not a minimizer of λTz over zS, for any λ.

Fig. 6. The point $x$ is a minimal element but not minimum.

Provided the set S is convex, we can say that for any minimal element x there exists a nonzero λK0 such that x minimizes λTz over zS.

This converse theorem cannot be strengthened to λK0.

Fig. 7. Left. Minimal but not minimum. Right. minimum but not minimal.

本文由作者按照 CC BY 4.0 进行授权