Linear-Algebra-Friedberg-Insel-Spence-4th-E.pdf

INEAR LGEBRA
STEPHEN H. FRIEDBERG ARNOLD J. INSEL LAWRENCE E. SPENCE

List oi
Aj
A'1
Al
A*
Al3
A'
(A\B)
Bi@---®Bk
B(V)
0*
Px
C
Q
cond(/4)
C"{R)
Coc
C(R)
C([0,1])
Cx
D
det(A)
^»i
dim(V)
eA
?A
EA
F
/(^)
F"
f Symbols
the ij-th entry of the matrix A
the inverse of the matrix A
the pseudoinverse of the matrix A
the adjoint of the matrix A
the matrix A with row i and column j deleted
the transpose of the matrix A
the matrix A augmented by the matrix B
the direct sum of matrices Bi through B^
the set of bilinear forms on V
the dual basis of 0
the T-cyclic basis generated by x
the field of complex numbers
the iih Gerschgorin disk
the condition number of the matrix A
set of functions / on R with /'"' continuous
set of functions with derivatives of every order
the vector space of continuous functions on R
the vector space of continuous functions on [0,1]
the T-cyclic subspace generated by x
the derivative operator on C°°
the determinant of the matrix A
the Kronecker delta
the dimension of V
limm_00 (/ + A + 4r + • • • + AS)
the ith standard vector of F"
the eigenspace of T corresponding to A
a field
the polynomial f(x) evaluated at the matrix A
the set of n-tuples with entries in a field F
page 9
page 100
page 414
page 331
page 210
page 17
page 161
page 320
page 422
page 120
page 526
page 7
page 296
page 469
page 21
page 130
page 18
page 331
page 525
page 131
page 232
page 89
page 47
page 312
page 43
page 264
page 6
page 565
page 8

/(T) the polynomial /(•»') evaluated at the operator T
J-'iS, F) the set of functions from S to a field F
H space of continuous complex functions on [0. 2TT]
/„ or / the // x n identity matrix
lv or I the identity operator on V
K \ generalized eigenspace of T corresponding to A
K,;, {./•: (<.'>(T))''(.r) = 0 for sonic positive integer p}
L \ left-multiplicatioD transformation by matrix A
Inn Am the limit of a sequence of matrices
m—oo
£(V) the space of linear transformations from V to V
£(V. W) the space of linear transformations from V to W
Mt/)x;i(/') the set of m x // matrices with entries in F
is(A) the column sum of the matrix A
Vj(A) the jth column sum of the matrix A
N(T) the null space of T
nullity(T) the dimension of the null space of T
0 the zero matrix
per(.V) the permanent of the 2x2 matrix M
P(F) the space of polynomials with coefficients in F
P„(/'") the polynomials in P(F) of degree at most //
o.i standard representation with respect to basis 3
R the field of real numbers
rank(/4) the rank of the matrix A
rank(T) the rank of the linear transformation T
f>(A) the row sum of the matrix A
Pi(A) the ith row sum of the matrix A
R(T) the range of the linear transformation T
page 565
page 9
page 332
page 89
page 07
page 485
page 525
page 92
page 284
page 82
page 82
page 9
page 295
age 295
age 67
>age 69
age 8
ige 448
age 10
ige 18
ge 104
page 7
;e 152
;e 69
Pi
P<
P
P
pa
pa
Da
page 295
•v 295
re 67
CONTINUED ON REAR ENDPAPERS

T^ 7
~n

Contents
Preface IX
1 Vector Spaces 1
1.1 Introduction 1
1.2 Vector Spaces 6
1.3 Subspaces 16
1.4 Linear Combinations and Systems of Linear Equations . ... 24
1.5 Linear Dependence and Linear Independence 35
1.6 Bases and Dimension 42
1.7* Maximal Linearly Independent Subsets 58
Index of Definitions 62
2 Linear Transformations and Matrices 64
2.1 Linear Transformations. Null Spaces, and Ranges 64
2.2 The Matrix Representation of a Linear Transformation ... 79
2.3 Composition of Linear Transformations
and Matrix Multiplication 86
2.4 Invertibility and Isomorphisms 99
2.5 The Change of Coordinate Matrix 110
2.6" Dual Spaces 119
2.7* Homogeneous Linear Differential Equations
with Constant Coefficients 127
Index of Definitions 145
3 Elementary Matrix Operations and Systems of Linear
Equations 147
3.1 Elementary Matrix Operations and Elementary Matrices . . 147
Sections denoted by an asterisk are optional.

vi Table of Contents
3.2 The Rank of a Matrix and Matrix Inverses 152
3.3 Systems of Linear Equations Theoretical Aspects 168
3.4 Systems of Linear Equations Computational Aspects .... 182
Index of Definitions 198
4 Determinants 199
4.1 Determinants of Order 2 199
4.2 Determinants of Order n 209
4.3 Properties of Determinants 222
4.4 Summary Important Facts about Determinants 232
4.5* A Characterization of the Determinant 238
Index of Definitions 244
5 Diagonalization 245
5.1 Eigenvalues and Eigenvectors 245
5.2 Diagonalizability 261
5.3s Matrix Limits and Markov Chains 283
5.4 Invariant Subspaces and the Cayley Hamilton Theorem . . . 313
Index of Definitions 328
6 Inner Product Spaces 329
6.1 Inner Products and Norms 329
6.2 The Gram-Schmidt Orthogonalization Process
and Orthogonal Complements 341
6.3 The Adjoint of a Linear Operator 357
6.4 Normal and Self-Adjoint. Operators 369
6.5 Unitary and Orthogonal Operators and Their Matrices . . . 379
6.6 Orthogonal Projections and the Spectral Theorem 398
6.7* The Singular Value Decomposition and the Pseudoinverse . . 405
6.8* Bilinear and Quadratic Forms 422
6.9* Einstein As Special Theory of Relativity 151
6.10* Conditioning and the Rayleigh Quotient 464
6.11* The Geometry of Orthogonal Operators 472
Index of Definitions 480

Table of Contents
7 Canonical Forms 482
7.1 The Jordan Canonical Form I 482
7.2 The Jordan Canonical Form II 497
7.3 The Minimal Polynomial 516
7.4* The Rational Canonical Form 524
Index of Definitions 548
Appendices 549
A Sets 549
B Functions 551
C Fields 552
D Complex Numbers 555
E Polynomials 561
Answers to Selected Exercises 571
Index 589

--— _• . » z

Preface
The language and concepts of matrix theory and, more generally, of linear
algebra have come into widespread usage in the social and natural sciences,
computer science, and statistics. In addition, linear algebra continues to be
of great importance in modern treatments of geometry and analysis.
The primary purpose of this fourth edition of Linear Algebra is to present
a careful treatment of the principal topics of linear algebra and to illustrate
the power of the subject through a variety of applications. Our major thrust
emphasizes the symbiotic relationship between linear transformations and
matrices. However, where appropriate, theorems are stated in the more gen
eral infinite-dimensional case. For example, this theory is applied to finding
solutions to a homogeneous linear differential equation and the best approx
imation by a trigonometric polynomial to a continuous function.
Although the only formal prerequisite for this book is a one-year course
in calculus, it requires the mathematical sophistication of typical junior and
senior mathematics majors. This book is especially suited for a. second course
in linear algebra that emphasizes abstract vector spaces, although it can be
used in a first course with a strong theoretical emphasis.
The book is organized to permit a number of different courses (ranging
from three to eight semester hours in length) to be taught from it. The
core material (vector spaces, linear transformations and matrices, systems of
linear equations, determinants, diagonalization. and inner product spaces) is
found in Chapters 1 through 5 and Sections 6.1 through 6.5. Chapters 6 and
7, on inner product spaces and canonical forms, are completely independent
and may be studied in either order. In addition, throughout the book are
applications to such areas as differential equations, economics, geometry, and
physics. These applications are not central to the mathematical development,
however, and may be excluded at the discretion of the instructor.
We have attempted to make it possible for many of the important topics
of linear algebra to be covered in a one-semester course. This goal has led
us to develop the major topics with fewer preliminaries than in a traditional
approach. (Our treatment of the Jordan canonical form, for instance, does
not require any theory of polynomials.) The resulting economy permits us to
cover the core material of the book (omitting many of the optional sections
and a detailed discussion of determinants) in a one-semester four-hour course
for students who have had some prior exposure to linear algebra.
Chapter 1 of the book presents the basic theory of vector spaces: sub-
spaces, linear combinations, linear dependence and independence, bases, and
dimension. The chapter concludes with an optional section in which we prove

x Preface
that every infinite-dimensional vector space has a basis.
Linear transformations and their relationship to matrices are the subject
of Chapter 2. We discuss the null space and range of a linear transformation,
matrix representations of a linear transformation, isomorphisms, and change
of coordinates. Optional sections on dual spaces and homogeneous linear
differential equations end the chapter.
The application of vector space theory and linear transformations to sys
tems of linear equations is found in Chapter 3. We have chosen to defer this
important subject so that it can be presented as a consequence of the pre
ceding material. This approach allows the familiar topic of linear systems to
illuminate the abstract theory and permits us to avoid messy matrix computa
tions in the presentation of Chapters 1 and 2. There arc occasional examples
in these chapters, however, where we solve systems of linear equations. (Of
course, these examples are not a part of the theoretical development.) The
necessary background is contained in Section 1.4.
Determinants, the subject of Chapter 4, are of much less importance than
they once were. In a short course (less than one year), we prefer to treat
determinants lightly so that more time may be devoted to the material in
Chapters 5 through 7. Consequently we have presented two alternatives in
Chapter 4 -a complete development of the theory (Sections 4.1 through 4.3)
and a summary of important facts that are needed for the remaining chapters
(Section 4.4). Optional Section 4.5 presents an axiomatic development of the
determinant.
Chapter 5 discusses eigenvalues, eigenvectors, and diagonalization. One of
the most important applications of this material occurs in computing matrix
limits. We have therefore included an optional section on matrix limits and
Markov chains in this chapter even though the most general statement of some
of the results requires a knowledge of the Jordan canonical form. Section 5.4
contains material on invariant subspaces and the Cayley Hamilton theorem.
Inner product spaces are the subject of Chapter 6. The basic mathe
matical theory (inner products; the Grain Schmidt process: orthogonal com
plements; the adjoint of an operator; normal, self-adjoint, orthogonal and
unitary operators; orthogonal projections; and the spectral theorem) is con
tained in Sections 6.1 through 6.6. Sections 6.7 through 6.11 contain diverse
applications of the rich inner product space structure.
Canonical forms are treated in Chapter 7. Sections 7.1 and 7.2 develop
the Jordan canonical form, Section 7.3 presents the minimal polynomial, and
Section 7.4 discusses the rational canonical form.
There are five appendices. The first four, which discuss sets, functions,
fields, and complex numbers, respectively, are intended to review basic ideas
used throughout the book. Appendix E on polynomials is used primarily
in Chapters 5 and 7, especially in Section 7.4. We prefer to cite particular
results from the appendices as needed rather than to discuss the appendices

Preface xi
independently.
The following diagram illustrates the dependencies among the various
chapters.
Chapter 1
Chapter 2
Chapter 3
Sections 4.1 4.3
or Section 4.4
Sections 5.1 and 5.2
Section 5.4
Chapter 6
Chapter 7
One final word is required about our notation. Sections and subsections
labeled with an asterisk (*) are optional and may be omitted as the instructor
sees fit. An exercise accompanied by the dagger symbol (f) is not optional,
however -we use this symbol to identify an exercise that is cited in some later
section that is not optional.
DIFFERENCES BETWEEN THE THIRD AND FOURTH EDITIONS
The principal content change of this fourth edition is the inclusion of a
new section (Section 6.7) discussing the singular value decomposition and
the pseudoinverse of a matrix or a linear transformation between finite-
dimensional inner product spaces. Our approach is to treat this material as
a generalization of our characterization of normal and self-adjoint operators.
The organization of the text is essentially the same as in the third edition.
Nevertheless, this edition contains many significant local changes that im-

xii Preface
prove the book. Section 5.1 (Eigenvalues and Eigenvectors) has been stream
lined, and some material previously in Section 5.1 has been moved to Sec
tion 2.5 (The Change of Coordinate Matrix). Further improvements include
revised proofs of some theorems, additional examples, new exercises, and
literally hundreds of minor editorial changes.
We are especially indebted to Jane M. Day (San Jose State University)
for her extensive and detailed comments on the fourth edition manuscript.
Additional comments were provided by the following reviewers of the fourth
edition manuscript: Thomas Banchoff (Brown University), Christopher Heil
(Georgia Institute of Technology), and Thomas Shemanske (Dartmouth Col
lege).
To find the latest information about this book, consult our web site on
the World Wide Web. We encourage comments, which can be sent to us by
e-mail or ordinary post. Our web site and e-mail addresses are listed below.
web site: http://www.math.ilstu.edu/linalg
e-mail: [email protected] u.edu
Stephen H. Friedberg
Arnold J. Insel
Lawrence R. Spence

1
Vector Spaces
1.1 Introduction
1.2 Vector Spaces
1.3 Subspaces
1.4 Linear Combinations and Systems of Linear Equations
1.5 Linear Dependence and Linear Independence
1.6 Bases and Dimension
1.7* Maximal Linearly Independent Subsets
1.1 INTRODUCTION
Many familiar physical notions, such as forces, velocities,1 and accelerations,
involve both a magnitude (the amount of the force, velocity, or acceleration)
and a direction. Any such entity involving both magnitude and direction is
called a "vector." A vector is represented by an arrow whose length denotes
the magnitude of the vector and whose direction represents the direction of
the vector. In most physical situations involving vectors, only the magnitude
and direction of the vector are significant; consequently, we regard vectors
with the same magnitude and direction as being equal irrespective of their
positions. In tins section the geometry of vectors is discussed. This geometry
is derived from physical experiments that, test the manner in which two vectors
interact.
Familiar situations suggest that when two like physical quantities act si
multaneously at a point, the magnitude of their effect need not equal the sum
of the magnitudes of the original quantities. For example, a swimmer swim
ming upstream at the rate of 2 miles per hour against a current of 1 mile per
hour does not progress at the rate of 3 miles per hour. Lor in this instance
the motions of the swimmer and current, oppose each other, and the rate of
progress of the swimmer is only 1 mile per hour upstream. If, however, the
'The word velocity is being used here in its scientific sense as an entity having
both magnitude and direction. The magnitude of a velocity (without, regard for the
direction of motion) is called its speed.

2 Chap. 1 Vector Spaces
swimmer is moving downstream (with the current), then his or her rate of
progress is 3 miles per hour downstream.
Experiments show that if two like quantities act together, their effect is
predictable. In this case, the vectors used to represent these quantities can be
combined to form a resultant vector that represents the combined effects of
the original quantities. This resultant vector is called the sum of the original
vectors, and the rule for their combination is called the parallelogram law.
(See Figure 1.1.)
Figure 1.1
Parallelogram Law for Vector Addition. The sum of two vectors
x and y that act at the same point P is the vector beginning at P that is
represented by the diagonal of parallelogram having x and y as adjacent sides.
Since opposite sides of a parallelogram are parallel and of equal length, the
endpoint Q of the arrow representing x + y can also be obtained by allowing
x to act at P and then allowing y to act at the endpoint of x. Similarly, the
endpoint of the vector x + y can be obtained by first permitting y to act at
P and then allowing x to act at the endpoint of y. Thus two vectors x and
y that both act at the point P may be added "tail-to-head"; that is, either
x or y may be applied at P and a vector having the same magnitude and
direction as the other may be applied to the endpoint of the first. If this is
done, the endpoint of the second vector is the endpoint of x + y.
The addition of vectors can be described algebraically with the use of
analytic geometry. In the plane containing x and y, introduce a coordinate
system with P at the origin. Let (o,i, (Z2) denote the endpoint of x and (fei, 62)
denote the endpoint of y. Then as Figure 1.2(a) shows, the endpoint Q of x+y
is (ay + b\, a-2 + b?). Henceforth, when a reference is made to the coordinates
of the endpoint. of a vector, the vector should be assumed to emanate from
the origin. Moreover, since a vector- beginning at the origin is completely
determined by its endpoint, we sometimes refer to the point x rather than
the endpoint of the vector x if x is a vector emanating from the origin.
Besides the operation of vector addition, there is another natural operation
that, can be performed on vectors—the length of a vector may be magnified

Sec. 1.1 Introduction
{taittaa)
1 (CM +6i,a2 4- hi)
{ai +61,62]
Figure 1.2
or contracted. This operation, called scalar multiplication, consists of mul
tiplying the vector by a real number. If the vector x is represented by an
arrow, then for any nonzero real number t, the vector tx is represented by an
arrow in the same direction if t > 0 and in the opposite direction if t < 0.
The length of the arrow tx is |i| times the length of the arrow x. Two nonzero
vectors x and y are called parallel if y = tx for some nonzero real number t.
(Thus nonzero vectors having the same or opposite directions are parallel.)
To describe scalar multiplication algebraically, again introduce a coordi
nate system into a plane containing the vector x so that x emanates from the
origin. If the endpoint of x has coordinates (01,02), then the coordinates of
the endpoint of tx are easily seen to be (toi,t02). (See Figure 1.2(b).)
The algebraic descriptions of vector addition and scalar multiplication for
vectors in a plane yield the following properties:
1. For all vectors x and y, x + y = y + x.
2. For all vectors x, y. and z, (x + y) + z = x + (y + z).
3. There exists a vector denoted 0 such that x + 0 = x for each vector x.
4. For each vector x, there is a vector y such that x + y = 0.
5. For each vector x, lx — x.
6. For each pair of real numbers a and 6 and each vector x, (ab)x = a(bx).
7. For each real number a and each pair of vectors x and y, a(x + y) =
ax + ay.
8. For each pair of real numbers a and b and each vector x, (a + b)x =
ax + bx.
Arguments similar to the preceding ones show that these eight properties,
as well as the geometric interpretations of vector addition and scalar multipli
cation, are true also for vectors acting in space rather than in a plane. These
results can be used to write equations of lines and planes in space.

4 Chap. 1 Vector Spaces
Consider first the equation of a line in space that passes through two
distinct points A and B. Let O denote the origin of a coordinate system in
space, and let u and v denote the vectors that begin at. O and end at A and
B, respectively. If w denotes the vector beginning at A and ending at B. then
"tail-to-head" addition shows that u + w — v, and hence w — v — u, where —u
denotes the vector (—l)u. (See Figure 1.3, in which the quadrilateral OABC
is a parallelogram.) Since a scalar multiple of w is parallel to w but possibly
of a different length than w, any point on the line joining A and B may be
obtained as the endpoint of a vector that begins at A and has the form tw
for some real number t. Conversely, the endpoint of every vector of the form
tw that begins at A lies on the line joining A and B. Thus an equation of the
line through A and B is x = u + tw = u + t(v — u), where t is a real number
and x denotes an arbitrary point on the line. Notice also that the endpoint
C of the vector v — u in Figure 1.3 has coordinates equal to the difference of
the coordinates of B and A.
^C
Figure 1.3
Example 1
Let A and B be points having coordinates (—2,0,1) and (4,5,3), respectively.
The endpoint C of the vector emanating from the origin and having the same
direction as the vector beginning at A and terminating at B has coordinates
(4,5,3) - (-2,0,1) = (6, 5, 2). Hence the equation of the line through A and
B is
x = (-2,0.1)-R(6,5,2). •
Now let A, IS, and C denote any three noncollinear points in space. These
points determine a unique plane, and its equation can be found by use of our
previous observations about vectors. Let u and v denote vectors beginning at
A and ending at B and C, respectively. Observe that any point in the plane
containing A, B, and C is the endpoint S of a vector x beginning at A and
having the form su + tv for some real numbers s and t. The endpoint of su is
the point of intersection of the line through A and B with the line through S

Sec. 1.1 Introduction
*-C
Figure 1.4
parallel to the line through A and C. (See Figure 1.4.) A similar procedure
locates the endpoint of tv. Moreover, for any real numbers 5 and i, the vector
su + tv lies in the plane containing A, B, and C. It follows that an equation
of the plane containing A, B, and C is
x = A + su + tv.
where s and t are arbitrary real numbers and x denotes an arbitrary point in
the plane.
Example 2
Let A, B, and C be the points having coordinates (1,0,2), (—3,-2,4), and
(1,8, -5), respectively. The endpoint of the vector emanating from the origin
and having the same length and direction as the vector beginning at A and
terminating at B is
(-3,-2,4)-(1,0,2) = (-4,-2,2).
Similarly, the endpoint of a vector emanating from the origin and having the
same length and direction as the vector beginning at A and terminating at C
is (1,8, —5) — (1,0,2) = (0,8,-7). Hence the equation of the plane containing
the three given points is
x = (l,0,2) + .s(-4,-2,2) + f.(0,8,-7). •
Any mathematical structure possessing the eight properties on page 3 is
called a vector space. In the next section we formally define a vector space
and consider many examples of vector spaces other than the ones mentioned
above.
1.
EXERCISES
Determine whether the vectors emanating from the origin and termi
nating at the following pairs of points are parallel.

Chap. 1 Vector Spaces
(a) (3. 1,2) and (6,4,2)
(b) (-3,1,7) and (9.-3,-21)
(c) (5.-6.7) and (-5,6,-7)
(d) (2.0,-5) and (5,0,-2)
2. hind the equations of the lines through the following pairs of points in
space.
(a) (3,-2,4) and (-5.7,1)
(b) (2,4.0) and (-3,-6.0)
(c) (3, 7. 2) and (3, 7. -8)
(d) (-2,-1,5) and (3,9,7)
3. Find the equations of the planes containing the following points in space.
(a) (2. -5,-1). (0,4.6). and (-3,7.1)
(b) (3, -6, 7), (-2,0. -4), and (5, -9, -2)
(c) (-8. 2,0), (1.3,0), and (6, -5,0)
(d) (1.1.1). (5,5.5). and (-6,4,2)
4. What arc the coordinates of the vector 0 in the Euclidean plane that
satisfies property 3 on page 3? Justify your answer.
5. Prove that if the vector x emanates from the origin of the Euclidean
plane and terminates at the point, with coordinates (01,02), then the
vector tx that emanates from the origin terminates at the point with
coord i n ates (/ a \. Ia •>) •
6. Show that the midpoint of the line segment joining the points (a.b) and
(c.d) is ((a \ c)/2,(6 + d)/2).
7. Prove that the diagonals of a parallelogram bisect each other.
1.2 VECTOR SPACES
In Section 1.1. we saw that, with the natural definitions of vector addition and
scalar" multiplication, the vectors in a plane; satisfy the eight properties listed
on page 3. Many other familiar algebraic systems also permit definitions of
addition and scalar multiplication that satisfy the same eight properties. In
this section, we introduce some of these systems, but first we formally define
this type of algebraic structure.
Definitions. A vector space (or linear space) V over a field2 F
consists of a set on which two operations (called addition and scalar mul
tiplication, respectively) are defined so that for each pair of elements x, y.
2Fields are discussed in Appendix C.

Sec. 1.2 Vector Spaces 7
in V there is a unique element x + y in V, and for each element a in F and
each element x in V there is a unique element ax in V, such that the following
conditions hold.
(VS 1) For all x, y in V, x + y = y + x (commutativity of addition).
(VS 2) For all x, y, z in V, (x + y) + z = x + (y + z) (associativity of
addition).
(VS 3) There exists an element in V denoted by 0 such that x + 0 = x for
each x in V.
(VS 4) For each element x in V there exists an clement y in V such that
x + y = 0.
(VS 5) For each clement x in V, 1.x = x.
(VS 6) For each pair of elements a, b in F and each element x in V,
(ab)x — a(bx).
(VS 7) For each clement a in F and each pair of elements x, y in V,
a(x + y) — ax + ay.
(VS 8) For each pair of elements o, b in F and each clement x in V,
(a + b)x — ax + bx.
The elements x + y and ax are called the sum of x and y and the product
of a and x, respectively.
The elements of the field F are called scalars and the elements of the
vector space V are called vectors. The reader should not confuse this use of
the word "vector'' with the physical entity discussed in Section 1.1: the word
"vector" is now being used to describe any element of a vector space.
A vector space is frequently discussed in the text without explicitly men
tioning its field of scalars. The reader is cautioned to remember, however,
that every vector space is regarded as a vector space over a given field, which
is denoted by F. Occasionally we restrict our attention to the fields of real
and complex numbers, which are denoted R and C. respectively.
Observe that (VS 2) permits us to define the addition of any finite number
of vectors unambiguously (without the use of parentheses).
In the remainder of this section we introduce several important examples
of vector spaces that are studied throughout this text. Observe that in de
scribing a vector space, it is necessary to specify not only the vectors but also
the operations of addition and scalar multiplication.
An object of the form (ai ,02, • • •, an), where the entries a\, a2,..., an are
elements of a field F, is called an n-tuple with entries from F. The elements

8 Chap. 1 Vector Spaces
01,02 "„ are called the entries or components of the //-tuple. Two
n-tuples (</\.a-2 an) and (61,62 6n) with entries from a field F are
called equal if a, — b, lor / 1.2 11.
Example 1
Thesel of all n-tuples with enl ries from a held F is denoted by F". This set is a
vector space over F with the operat ions of coordinatewise addition and scalar
multiplical ion; that is, if u — (01,02 a,,) € F". v — (61, 62 ....!>,,) € F",
and c € F, then
;/ I i'- (a.| I 61,02 + 62 "» +6„) and CK = (cai,C02 <""»)•
Thus R'5 is n vector space over R. In this vector space,
(3,-2,0) + (-1,1,4) = (2, I. II and -5(1. 2,0) = (-5,10,0).
Similarly. C is a vector space over ('. In this vector space.
(1 + i, 2) + (2 - 3i, 4z) = (3 - 2/, 2 + Ai) and i( I I i, 2) = (-1 + 1, 2t).
Vectors in F" may be written as column vectors
A'.
W
rat her than as row vectors (a \.'/_- an). Since a 1 -1 uple whose only enl ry
is from F can be regarded as an element of F, we usually write F rather than
F1 for the vector space of l-tuples with entry from F. •
An /// x 11 matrix with entries from a field F is a rectangular array of the
form
fau "u
"21 022
aln
"2n
O-mn)
where each entry alj (1 < i < m, I • j < n) is an element of F. We
call the entries c/,; with / ~ 7 the diagonal entries of the matrix. The
entries a,\.a;-2 </,„ compose the ?'th row of the matrix, and the entries
a\r(i<i O-rnj compose the jth column of the matrix. The rows of the
preceding matrix are regarded as vectors in F". and the columns are regarded
as vectors in F'". The m - n matrix in which each entry equals zero is called
the zero matrix and is denoted by ().

Sec. 1.2 Vector Spaces 9
In this book, we denote matrices by capital italic letters (e.g., A. B, and
C). and we denote the entry of a matrix A that lies in row i and column j by
Ajj. In addition, if the number of rows and columns of a matrix are equal.
the matrix is called square.
Two m x n matrices A and B are called equal if all their corresponding
entries are equal, that is, if Ajj = By for 1 < i < m and 1 < j < n.
Example 2
The set of all mxn matrices with entries from a Held F is a vector space, which
we denote by Mmxn(F), with the following operations of matrix addition
and scalar multiplication: For A, B C Mmxn(F) and c £ F,
(4 + B)y = Ay + By and (cA)ij - Ci4y
for 1 —9
3
6
4 -
-3
1
-2 5
1 3
and
-3
0 -2
-3 2
0 6
•6 -9
111 M2X3 R)
Example 3
Let S be any nonempty set and /•" be any field, and let F(S,F) denote the
set of all functions from S to F. Two functions / and g in T(S, F) are called
equal if f(s) = g(s) for each s £ S. The set J-'iS.F) is a vector space with
the operations of addition and scalar multiplication defined for /. g G- .F(S. F)
and c € F by
(/ + 0)OO=/(*)+0(«) and (c/)(.s)=c[/(,s)]
for each s e S. Note that these are the familiar operations of addition and
scalar multiplication for functions used in algebra and calculus. •
A polynomial with coefficients from a. field F is an expression of the form
f(x) = anxn + an-1x1 + <I]:V + do,
where n is a nonnegative integer and each a.f,.. called the coefficient of x , is
in F. If f(x) = 0. that is, if a„ — an-\ — • • • = at) = 0. then j'(.r) is called
the zero polynomial and. for convenience, its degree is defined to be —1;

10 Chap. 1 Vector Spaces
otherwise, the degree of a, polynomial is defined to be the largest exponent
of x that appears in the representation
/(./;) = anxn + an_i:rn-] + • • • + axx + a0
with a nonzero coefficient. Note that the polynomials of degree zero may be
written in the form f(x) — c for some nonzero scalar c. Two polynomials.
f(x) = anxn + an-ixn~l + h axx + a0
and
g(x) = bmxm + 6m_1x"'-1 + • • • + blX + b0,
are called equal if m = n and a-i = b,. for ? — 0.1...., n.
When F is a field containing infinitely many scalars, we usually regard
a polynomial with coefficients from F as a function from F into F. (See
page 569.) In this case, the value of the function
f(x) = anxn + an-yxn x + •••+ ayx + a0
at c e F is the scalar
/(c) = anc" + an_1cn l + • • • + aye + aQ.
Here either of the notations / or /(./;) is used for the polynomial function
f{x) = anxn + on_i^n_1 + • • • + axx + a0.
Example 4
Let
f(x) = anxn + an-ixn~l H h axx + a0
and
g{x) = bmxm + 6m_iarm-1 + • • • + 6,.r + 60
be polynomials with coefficients from a field F. Suppose that m < n, and
define 6m_i = 6m+2 = • • • = 6n = 0. Then g(x) can be written as
9{x) = bnx" + 6n_ixn_1 + • • • + bix + b0.
Define
f(x)+g{x) - (on + 6„)a;n + (an_i+ 6n. i).r"-' +• • • + («, + 6,).r + (a0 + 60)
and for any c € F, define
c/(;r) = canxn + can. ixn~l H h coia; + ca0.
With these operations of addition and scalar multiplication, the set of all
polynomials with coefficients from F is a vector space, which we denote by
P(F). •

Sec. 1.2 Vector Spaces 11
We will see in Exercise 23 of Section 2.4 that the vector space defined in
the next, example is essentially the same as P(F).
Example 5
Let F be any field. A sequence in F is a function a from the positive integers
into F. In this book, the sequence o such that, o~(n) — an for n — 1.2.... is
denoted {a,,}. Let V consist of all sequences {on} in F that have only a finite
number of nonzero terms a„. If {«„} and {6n} are in V and t € F. define
{/'.„} + {6n} = {an + bn} and t{an} = {tan}.
With these operations V is a vector space. •
Our next two examples contain sets on which addition and scalar multi
plication are defined, but which are not vector spaces.
Example 6
Let S = {(ai, 02): 01,02 £ R}- For (CM , 0-2), (61,62) E S and c € R, define
(oi,a2) + (61,62) = (CM + 61,02 - 62) and c(oi,a2) = (cai,ca2).
Since (VS f). (VS 2), and (VS 8) fail to hold, S is not a vector space with
these operations. •
Example 7
Let S be as in Example 6. For (01,02), (61,62) £ S and c 6 R, define
(a i. a-)) + (61,62) = [a 1 + b\, 0) and c(o\, a->) = (ca 1,0).
Then S is not a vector space with these operations because (VS 3) (hence
(VS 4)) and (VS 5) fail. •
We conclude this section with a few of the elementary consequences of the
definition of a vector space.
Theorem 1.1 (Cancellation Law for Vector Addition). If x, y,
and z are vectors in a vector space V .such that x + z = y + z, then x = y.
Proof. There exists a vector v in V such that 2 + v = 0 (VS 4). Thus
x = x + 0 = x + (z + v) = (x + z) + v
= (y + z) + v = y + (z + v) =y+0=y
by (VS 2) and (VS 3). I
Corollary 1. The vector 0 described in (VS 3) is unique.

12 Chap. 1 Vector Spaces
Proof. Exercise. ]|
Corollary 2. The vector y described in (VS 4) is unique.
Proof. Exercise. i
The vector 0 in (VS 3) is called the zero vector of V, and the vector y in
(VS 4) (that is. the unique vector such that x + y = 0) is called the additive
inverse of x and is denoted by —x.
The next result, contains some of the elementary properties of scalar mul
tiplication.
Theorem 1.2. In any vector space V, the following statements are true:
(a) Ox = 0 for each x 6 V.
(b) (—a)x = —(ax) = a.(—x) for each a £ F and each x £ V.
(c) aO = 0 for each a € F.
Proof, (a) By (VS 8), (VS 3), and (VS 1), it follows that
Ox + Ox - (0 + 0> = 0:/: = 0.r +0 = 0 + Ox.
Hence Ox = 0 by Theorem 1.1.
(b) The vector —(ax) is the unique element of V such that ax + [—(ax)] —
0. Thus if ax + (—a)x — 0, Corollary 2 to Theorem 1.1 implies that {-a).r =
-(ax). But. by (VS 8),
ax + (-a)x = [a + {-a)]x = Ox = 0
by (a). Consequently (—a)x = —(ax). In particular. (—l)x = —x. So,
by (VS 6).
a(-x)=a[(-l)x] = [a(-l)]x = (-a)x.
The proof of (c) is similar to the proof of (a). 1
EXERCISES
1. Label the following statements as true or false.
(a) Every vector space contains a zero vector.
(b) A vector space may have more than one zero vector.
(c) In any vector space, ax = bx implies that a = 6.
(d) In any vector space, ax = ay implies that x = y.
(e) A vector in F" may be regarded as a matrix in M.„xi(F).
(f) An m x n matrix has m columns and n rows.
(g) In P(F). only polynomials of the same degree may be added.
(h) If / and o are polynomials of degree n, then / + g is a polynomial
of degree n.
(i) If / is a polynomial of degree n and c is a nonzero scalar, then cf
is a polynomial of degree n.

Sec. 1.2 Vector Spaces 13
(j) A nonzero scalar of F may be considered to be a polynomial in
P(F) having degree zero,
(k) Two functions in F(S, F) are equal if and only if they have the
same value at each element of S.
2. Write the zero vector of
3. If
"3x4
(F).
M =
1 2 3
4 5 6
what are Mia,M2i, and M22?
4. Perform the indicated operations.
w <\ I 1
(d)
(e) (2.T4 - 7x3 + Ax + 3) + (8x3 + 2x2 - 6x + 7)
(f) (-3a;3 + 7a;2 + 8a - 6) + (2a:3 - 8ar + 10)
(g) 5(2.r7 - 6x4 + 8x2 - 3a)
(h) 3(.x5 - 2a:3 + 4a; + 2)
Exercises 5 and 6 show why the definitions of matrix addition and scalar
multiplication (as defined in Example 2) are the appropriate ones.
5. Richard Card ("Effects of Beaver on Trout in Sagehen Creek. Cali
fornia," ./. Wildlife Management. 25, 221-242) reports the following
number of trout, having crossed beaver dams in Sagehen Creek.
Upstream Crossings
Fall Spring Summer
Brook trout.
Rainbow trout
Brown trout.

14 Chap. 1 Vector Spaces
Downstream Crossings
Fall Spring Summer
Brook trout
Rainbow trout
Brown trout
9
*
1
1
(J
1
4
0
0
Record the upstream and downstream crossings in two 3 x 3 matrices.
and verify that the sum of these matrices gives the total number of
crossings (both upstream and downstream) categorized by trout species
and season.
6. At the end of May, a furniture1 store had the following inventory.
Early Mediter-
American Spanish ranean Danish
Living room suites
Bedroom suites
Dining room suites
1
5
3
2
1
1
]
I
2
3
4
6
5
(i
1
3
2
0
1
1
3
2
• )
3
Record these data as a 3x4 matrix M. To prepare for its June sale,
the store decided to double its inventory on each of the items listed in
the preceding table. Assuming that none of the present stock is sold
until the additional furniture arrives, verify that the inventory on hand
after the order is filled is described by the matrix 2M. If the inventory
at the end of June is described by the matrix
A =
interpret 2M — A. How many suites were sold during the June sale?
7. Let 5 = {(). 1} and F - R. In F(S, R), show that. / = g and / -t g = h.
where f(t) = 2t + 1. g(t) = 1 + At - It2, and h(t) = 5' + 1.
8. In any vector space1 V, show that, (a + b)(x + y) = ax + ay + bx + by for
any x.y £ V and any o,6 £ F.
9. Prove Corollaries 1 and 2 of Theorem 1.1 and Theorem 1.2(c).
10. Let V denote the set of all differentiablc real-valued functions defined
on the real line. Prove that V is a vector space with the operations of
addition and scalar multiplication defined in Example 3.

Sec. 1.2 Vector Spaces 15
11. Let V = {()} consist of a single vector 0 and define 0 + 0 = 0 and
cO = 0 for each scalar c in F. Prove that V is a vector space over F.
(V is called the zero vector space.)
12. A real-valued function / defined on the real line is called an even func
tion if /(—/,) = f(t) for each real number t. Prove that the set. of even
functions defined on the real line with the operations of addition and
scalar multiplication defined in Example 3 is a vector space1.
13. Let V denote the set of ordered pairs of real numbers. If (01,02) and
(6], 62) are elements of V and c £ R, define
(a 1, <I-I) + (61,62) = (O] + 6], o262) and c.(a{, a2) = (cai, a2).
Is V a vector space over R with these operations? Justify your answer.
14. Let V = {(c/.i, a2..... on): c/.( £ C for i = 1,2,... n}; so V is a vector
space over C by Example 1. Is V a. vector space over the field of real
numbers with the operations of coordinatewise addition and multipli
cation?
15. Let V = {(a.]. 0'2: • • • • an): aL £ Riori — 1,2, ...n}; so V is a vec
tor space over R by Example 1. Is V a vector space over the field of
complex numbers with the operations of coordinatewise addition and
multiplication?
16. Let V denote the set of all m x n matrices with real entries; so V
is a vector space over R by Example 2. Let. F be the field of rational
numbers. Is V a vector space over F with the usual definitions of matrix
addition and scalar multiplication?
17. Let V = {(01,02): 01,02 £ F}, where F is a field. Define addition of
elements of V coordinatewise. and for c £ F and (01,02) £ V, define
c(ai,o2) = (c/.|,0).
Is V a vector space over F with these operations? Justify your answer.
18. Let V = {(01,02): oi,a2 £ R}. For (ai,a2),(6i,62) £ V and c £ R.
define
(a,\.a2) + (6].62) = (ai + 26i.a-2 + 3b2) and c{a.\,a2) = (cai,ca2)-
Is V a vector space over R with these operations? Justify your answer.

16 Chap. 1 Vector Spaces
19. Let V = {(01,02): 01,02 £ R}- Define addition of elements of V coor
dinatewise, and for (01,02) in V and c £ R, define
;o, 0) if c = 0
cai, — 1 it c / 0.
Is V a vector space over R with these operations? Justify your answer.
20. Let V be the set of sequences {an} of real numbers. (See Example 5 for
the definition of a sequence.) For {«„}, {6n} £ V and any real number
t. define
{<>•„} + {6n} - {«.„ + 6„} and t{a„} = {tan}.
Prove that, with these operations. V is a vector space over R.
21. Let V and W be vector spaces over a field F. Let
Z= {(v,w): v £ Vandv/; £ W}.
Prove that Z is a vector space over F with the operations
(vi,wi) + (v2.w2) = (ci + u2, w\ + w2) ;md c(v\,Wi) = (cvi,cw1).
22. How many matrices arc there in the vector space Mmxn(Z2)? (See
Appendix C.)
1.3 SUBSPACES
In the study of any algebraic structure, it is of interest to examine subsets that
possess the same structure as the set under consideration. The appropriate
notion of substructure for vector spaces is introduced in this section.
Definition. .4 subset W of a vector space V over a field F is called a
subspace of V if W is a vector space over F with the operations of addition
and scalar multiplication defined on V.
In any vector space V, note that V and {0} are subspaces. The latter is
called the zero subspace of V.
Fortunately it is not. necessary to verify all of the vector space properties
to prove that a subset is a subspace. Because properties (VS 1), (VS 2).
(VS 5), (VS 6), (VS 7), and (VS 8) hold for all vectors in the vector space,
these properties automatically hold for the vectors in any subset. Thus a
subset W of a. vector space V is a subspace of V if and only if the following
four properties hold.

Sec. 1.3 Subspaces 17
1. x+y £ W whenever ./• c W and y ( W. (W is closed under addition.)
2. ex £ W whenever c £ /•" and x C W. (W is closed under scalar
multiplication.)
3. W has a zero vector.
4. Each vector in W has an additive inverse in W.
The next theorem shows that the zero vector of W must be the same as
the zero vector of V ami thai property I is redundant.
Theorem 1.3. Let V be a vector space and W a subset ofV. Then W
is a subspace ofV if and only if the following three conditions hold for the
operations defined in V.
(a) 0eW.
(b) x + y £ W whenever x i- W and y i W.
(c) ex £ W whenever c - F and x C W.
Proof. If W is a subspace of V. t hen W is a vector space wit h the operat ions
of addition and scalar multiplication defined on V. Hence conditions (!>] and
(c) hold, and there exists a vector 0' > W Mich that x — 0' = X for each
x £ W. But also x + 0 = x, and thus 0' - 0 by Theorem 1.1 (p. I I). So
condition (a) holds.
Conversely, if conditions (a), (b), and (c) hold, the discussion preceding
tins theorem shows thai W is a subspace of V if the additive inverse of each
vector in W lies in W. But if a i W. then ( -l)x C W by condition (c). and
—x = ( — 1 )x by Theorem 1.2 (p. 12). Hence W is a subspace of V. I
The preceding theorem provides a simple method for determining whether
or not a given subset of a vector space is a subspace. Normally, it is this result
that is used to prove that a subset is. in fact, a subspace.
The transpose A1 of an m x n matrix A is the n x m matrix obtained
from A by interchanging the rows with the columns: that is, (Al)ij = Aj%.
For example.
1 -2 3
0 5-1 2 3
1 2
2 3
A symmetric matrix is a matrix .4 such that .4' = A. For example, the
2x2 matrix displayed above is a symmetric matrix. Clearly, a symmetric
matrix must be square. The set W of all symmetric matrices "u :F) is
a subspace of Mnx„(F) since the conditions of Theorem 1.3 hold:
1. I"he zero matrix is equal to its transpose and hence belongs to W.
It is easily proved that for any matrices A and B and any scalars a and 6.
(aA + bB)' = a A' + bB'. (See Exercise 3.) Using this fact, we show that the
set of symmetric matrices is closed under addition and scalar multiplication.

18 Chap. 1 Vector Spaces
2. If A £ W and B € W, then A' = A and B* = B. Thus (A + B)1 =
A1 + Bf = A + B. so that A + B £ W.
3. If A £ W. then A' = A. So for any a £ F, we have1 (a/1)' = «.4' = aA.
Thus aA £ W.
The example's that follow provide further illustrations of the concept of a
subspace. The first, three are particularly important.
Example 1
Let it be a nonnegative integer, and let P„(F) consist of all polynomials in
P(F) having ck'gree1 k'ss than or equal to n. Since the zc;ro polynomial has
degree —1, it is in P.„(F). Moreover, the1 sum of two polynomials with degrees
less than or equal to n is another polynomial of degree less than or equal to n,
and the product of a scalar and a polynomial of degree le'ss than or equal to
n is a polynomial of degree less than or equal te> n. Se> P„(F) is edoseel under
addition and scalar multiplication. It therefore follows from Theorem 1.3 that
P„(F) is a subspace of P(F). •
Example 2
Lc1!. C(7?) denote the set of all continuous real-valued functions defined on R.
Clearly C(7?) is a subset of the vector space1 T(R.R) defined in Example 3
of Section 1.2. We claim that C(/t*) is a subspace of J-(R.R). First note
that the1 zero of 1F(R. II) is the1 constant function defined by f(t) = 0 for all
t € R. Since constant functions are1 continuous, we- have / £ C(R). Moreover,
the sum of two continuous functions is continuous, and the product of a real
number and a. continuous function is continuous. So C(R) is closed under
addition and scalar multiplication and hene-e1 is a subspace of J-(R.R) by
Theorem 1.3. •
Example 3
An n x n matrix M is called a. diagonal matrix if M,j = 0 whenever i ^ j.
that is. if all its nondiagonal entries are zero. Clearly the1 zero matrix is a
diagonal matrix because all of its emtric's are1 0. Moreover, if ,4 and B are
diagonal v x n matrices, then whenever i ^ j,
(A f B)ij = A,, + Bi:j =0 + 0 = 0 and (cA)l:J = eA,, = cO = 0
for any scalar c. Hence1 A + B and cA are1 diagonal matrices for any scalar
c. Therefore the set of diagonal matrices is a subspace of M„X„(F) by Theo
rem 1.3. •
Example 4
The1 trace of an n x n matrix M. denoted tr(M), is the- sum of the diagonal
entries of M; that is.
tr(M) = Mn +M22 + --- + Mnn.

Sec. 1.3 Subspaces 19
It follows from Exercise 6 that the set of n x n matrices having trace equal
to zero is a subspace of Mnxn(F). •
Example 5
The set of matrices in M,IIXII(R.) having nonnegative entries is not a subspace
of Mmxn{R) because it is not closed under scalar multiplication (by negative
scalars). •
The next theorem shows how te) form a new subspace from other sub-
spaces.
Theorem 1.4. Any intersection of subspaces of a vector si>ace V is a
subspace ofV.
Proof. Let C be a collection of subspaces of V. and lei W demote' the
intersection of the1 subspaces in C. Since1 every subspace contains the zcre>
vector. 0 £ W. Let c; £ /•' and x,y £ W. Then x and y are' contained in each
subspace in C. Because- each subspace in C is closed under addition and scalar
multiplication, it follows that x + y and ax are contained in each subspace in
C. Hence X f y and a.r are also contained in W, se> that W is a subspace of V
by Theorem 1.3. I
Having shown that the intersection of subspaces of a vector space V is a
subspace of V. it is natural to consider whether or not the union of subspaces
of V is a subspace of V. It is easily seen that the- union of subspaces must
contain the zero vector and be' closed under scalar multiplication, but in
general the union of subspaces of V need not be1 closed under addition. In fact,
it can be readily shown that the union of two subspaces of V is a subspace of V
if and only if one of the subspaces contains the other. (See Exercise 19.) There
is. however, a natural way to combine two subspaces Wi and W_) to obtain
a subspace that contains both W| and W-j. As we already have suggested,
the key to finding such a subspace is to assure thai it must be closed under
addition. This idea is explored in Exercise 23.
EXERCISES
1. Label the following statements as true or false.
(a) If V is a vector space and W is a subset of V that is a vector space,
then W is a subspace of V.
(b) The empty set is a subspace of every vector space.
(c) If V is a vector space other than the zero vector space, then V
contains a subspace W such thai W f V.
(d) The intersection of any two subsets of V is a subspace of V.

20 Chap. 1 Vector Spaces
(e) An n x n diagonal matrix can never have more than n nonzero
entries.
(f) The trace of a square matrix is the product of its diagonal entries.
(g) Let W be the ay-plane in R3; that is, W = {(ai,a2,0): ai,a2 £ R}.
ThenW = R2.
2. Determine the transpose of each of the matrices that follow. In addition,
if the matrix is square, compute its trace.
(b)
(d)
(v.) ( I 13 5) (f)
(h)
0 8-6
3 4 7
-2 5 I 4
7 0 1-6
3. Prove that (aA + bB)1 = aA* + bB1 for any A, B £ Mmxn(F) and any
a, 6 £ F.
4. Prove that (A1)* = A for each A £ MmXn(F).
5. Prove that A + A1 is symmetric for any square matrix A.
6. Prove that tr(aA + bB) = atr(A) + 6tr(£) for any A. B £ Mnxn(F).
7. Prove that diagonal matrices are symmetric matrices.
8. Determine whether the following sets are subspaces of R3 under the
operations of addition and scalar multiplication defined on R3. Justify
your answers.
(a) W) = {(01,02,03) £ R3: ai = 3a2 and 03 = —a2)
(b) W2 - {(01,02,03) £ R3: a, = a3 + 2}
(c) W3 = {(01,02,03) £ R3: 2ai - 7o2 + a3 = 0}
(d) W4 = {(ai,a2, a3) £ R3: O] - 4a2 - a3 = 0}
(e) W5 = {(aj, a2,03) £ R3: ai + 2a2 - 3o3 = 1}
(f) W6 = {(01, a2, a3) £ R3: 5a? - 3a| + 60^ = 0}
9. Let Wi, W3, and VV4 be as in Exercise 8. Describe Wi n W3, Wi n W4,
and W3 n W4, and observe that each is a subspace of R3.

Sec. 1.3 Subspaces 21
10. Prove that W, = {(oi,a2,..., an) £ F" : ai + o2 + • • • + an = 0} is a
subspace of Fn, but W2 = {(a \, a2,.... an) £ Fn: a i + a2 H h an = 1}
is not.
11. Is the set. W = {f(x) £ P(F): f(x) = 0 or f(x) has degree; n] a subspace
of P(F) if n > 1? Justify your answer.
12. An m x n matrix A is called upper triangular if all entries lying below
the diagonal entries are zero, that is, if Aij = 0 whenever i > j. Prove
that, the upper triangular matrices form a subspace of M.mxn(F).
13. Let S be a nonempty set and F a. field. Prove that for any SQ £ S,
{/ £ T(S, F): /(s0) = 0}, is a subspace of F(S, F).
14. Let S be a nonempty set and F a. field. Let C(S, F) denote the set of
all functions / £ T(S, F) such that f(s) — 0 for all but a finite number
of elements of S. Prove that C(S, F) is a subspace of ^(S, F).
15. Is the set of all differentiable real-valued functions defined on R a sub-
space of C(R)? Justify your answer.
16. Let Cn(R.) denote the set of all real-valued functions defined on the
real line that have a continuous nth derivative. Prove that Cn(R) is a
subspace of F(R,R).
17. Prove that a subset W of a vector space V is a subspace; of V if and
only if W ^ 0, and, whenever a £ F and x,y £ W, then ax £ W and
x + y £ W.
18. Prove that a subset W of a vector space; V is a subspace of V if and only
if 0 £ W and ax + y £ W whenever a £ F and x. y £ W.
19. Let W, and W2 be subspaces of a vector space V. Prove that W| U W2
is a subspace of V if and only if W, C W2 or W2 C W,.
20.' Prove that if W is a subspace of a vector space V and w\, w2, • • •, wn are
in W, then a\W\ + a2w2 + • • • + anwn £ W for any scalars a\, a2,..., an.
21. Show that the set of convergent sequences {on} (i.e., those for which
limn-too a,n exists) is a subspace of the vector space V in Exercise 20 of
Section 1.2.
22. Let F\ and F2 be fiedds. A function g £ T(Fl,F2) is called an even
function if g(—t) = g(t) for each t £ F\ and is called an odd function
if g(—t) = —g(t) for each t £ Fi. Prove that the set of all even functions
in T(F\, F2) and the set of all odd functions in T(I'\, F2) are subspace;s
of.F(Fi,F2).
'A dagger means that, this exercise is essential for a later sectiejii.

22 Chap. 1 Vector Spaces
The following definitions are used in Exercises 23 30.
Definition. If S\ and S2 are nonempty subsets of a vector space V. then
the sum of Si and S2, denoted S\ +S2, is the set [x+y: x £ S\ and y £ S2}.
Definition. A vector space V is called the direct sum of W] and VV2 if
W, and W2 are subspaces of'W such that W2 D W2 = {0} and W, +W2 = V.
Wc denote that V is the direct sum of W] and W2 by writing V = W, © W2.
23. Let Wi and W2 be subspaces of a vector space V.
(a) Prove that Wi + VV2 is a subspace of V that contains both W] and
W2.
(b) Prove that any subspace of V that contains both Wi and W2 must
also contain Wt + W2.
24. Show that. Fn is the direct sum of the subspaces
W, ={(oi,o2,...,on)£Fn:on = 0}
and
W2 = {(01,o2,..., on) £ F" : a, = a2 = • • • = an_] = ()}.
25. Let W[ denote the set of all polynomials /(./:) in P(F) such that in the
representation
f(x) ~ anx" + fln_ia;"_1 -| h aia + c/(l.
we have o^ = 0 whenever / is even. Likewise let W2 denote the set of
all polynomials g(x) in P(F) such that in the representation
g(x) = bmx'" + 6m_1a;m-1 + • • • + bxx + b0,
we have 6* = 0 whenever i is odd. Prove that P(F) = W, © W2.
26. In Mmxn(F) define W, - {A £ MTOXn(F): Ai3 = 0 whenever i > j}
and W2 = {A £ MmXn(F): Ai;j - 0 whenever i < j}. (W, is the
set of all upper triangular matrices defined in Exercise 12.) Show that
MWXn(F)=W1(DW2.
27. Let V demote the vector space consisting of all upper triangular n x n
matrices (as defined in Exercise 12), and let W| denote the subspace of
V consisting of all diagonal matrices. Show that V = Wi © W2, where
W2 = {A £ V: Aij = 0 whenever i > j).

Sec. 1.3 Subspaces 23
28. A matrix M is called skew-symmetric if Ml = — M. Clearly, a skew-
symmetric matrix is square. Let F be a. fiedd. Prove that the set Wi
of all skew-symmetric n x n matrices with entries from F is a subspae;e
of Mnxn(F). Now assume that F is not of characteristic 2 (see Ap
pendix C), and let W2 be the subspace of Mnxn(F) consisting of all
symmetric n x n matrices. Prove that Mnxn(F) — Wi © W2.
29. Let F be a field that is not of characteristic 2. Define
W, - {A£ F): Aij = 0 whenever i < ?'}
and W2 to be the set of all symmetric n x n matrices with entries
from /•'. Both Wi and W2 are subspaces of MnX„(F). Prove that
M„xn(F) = Wi + W2. Compare this exercise with Exercise 28.
30. Let Wi and W2 be subspaces of a vector space V. Prove that V is the
direct sum of Wi and W2 if and only if each vector in V can be uniquely
written as x.\ + x2: where ari £ Wi and x2 £ W2.
31. Let W be a subspace of a vector space V over a field F. For any v £ V
the set {v} + W = {v + w: w £ W} is called the coset of W containing
v. It is customary to denote this coset. by v + W rather than {v} + W.
(a) Prove that v + W is a subspace of V if and only if v £ W.
(b) Prove that c, + W = v2 + W if and only if vx - v2 £ W.
Addition and scalar multiplication by scalars of F can be defined in the
collection S = [v f W: v £ V} of all cosets of W as follows:
(vi + W) + (v2 + W) = (Vj + v2) + W
for all v\,v2 £ V and
for all v £ V and a £ F.
a(v + W) = av + W
(c) Prove that the preceding operations are well defined; that is, show
that if vi + W = v[ + W and v2 + W = v'2 + W, then
and
(c, + W) + (v2 + W) = (v[ + W) + (v'2 + W)
a(Vl + W) = a(v[ + W)
(d)
for all a £ F.
Prove that the set S is a vector space with the operations defined in
(c). This vector space is called the quotient space of V modulo
W and is denoted by V/W.

24 Chap. 1 Vector Spaces
1.4 LINEAR COMBINATIONS AND SYSTEMS OF LINEAR
EQUATIONS
In Section 1.1. it was shown that the equaticju of the plane through three
noncollinear points A, B, and C in space is x = A + su + tv, where u and
v denote the vectors beginning at. A and ending at B and C, respectively,
and s and t deneDte arbitrary real numbers. An important special case occurs
when A is the origin. In this case, the equation of the plane simplifies to
x = su + tv, and the set of all points in this plane is a subspace of R3. (This
is proved as Theorem 1.5.) Expressions of the form su + tv, where s and t
are scalars and u and v are vectors, play a central role in the theory of vector
spacers. The appropriate; generalization of such expressions is presented in the
following deli nit ions.
Definitions. Let V be a vector space and S a nonempty subset ofV. A
vector v £ V is called a linear combination of vectors of S if there exist
a finite number of vectors u\, u2,.... un in S and scalars ai. a2,.... an in F
such that v = oiU\ + a2u2 + • • • f anun. In this case we also say that v is
a linear combination of ii\, a2 ti„ and call a.\, a2...., an the coefficients
of the linear combination.
Observe that in any vector space V, Ov — 0 for each v £ V. Thus the zero
vector is a, linear combination of any nonempty subset of V.
Example 1
TABLE 1.1 Vitamin Content of 100 Grams of Certain Foods
A 13 [ I5_. Niacin C
(units) (mg) (nig) (mg) (nig)
Apple butter 0 0.01 0.02 0.2 2
Raw, unpared apples (freshly harvested) 90 0.03 0.02 0.1 4
Chocolate-coated candy with coconut 0 0.02 0.07 0.2 0
center
Clams (meat, only) 100 0.10 0.18 L.3 10
Cupcake from mix (dry form) 0 0.05 0.06 0.3 0
Cooked farina (unenriched) (0)a 0.01 0.01 0.1 (0)
Jams and preserves In 0.01 0.03 0.2 2
Coconut custard pie (baked from mix) 0 0.02 0.02 0.4 0
Raw brown rice (0) 0.34 0.05 4.7 (0)
Soy sauce 0 0.02 0.25 0.4 0
Cooked spaghetti (unenriched) 0 0.01 0.01 0.3 0
Raw wild rice (0) 0.45 0.03 6.2 (0)
Source: Rernice K. Watt and Annabel I.. Merrill, Composition of Foods (Agriculture Hand
book Number 8). Consumer and food Economics Research Division, U.S. Department of
Agriculture, Washington, D.C., 1963.
aZeros in parentheses indicate that the amount, of a vitamin present is either none or too
small to measure.

Sec. 1.4 Linear Combinations and Systems of Linear Equations 25
Table 1.1 shows the vitamin content of 100 grams of 12 foods with respect to
vitamins A, Bi (thiamine), B2 (riboflavin), niacin, and C (ascorbic acid).
The vitamin content of 100 grams of each food can be recorded as a column
vector in R:> for example1, the vitamin vector for apple butter is
/0.00
0.01
0.02
0.20
\2.00J
Considering the vitamin vectors for cupcake, coconut custard pie, raw brown
rice, soy sauce, and wild rice1, we see that
/0.0()
0.05
0.06
0.30
^0.00y
+
/o.oo
0.02
0.02
0.40
lO.OOy
4
/0.00
0.34
0.05
4.70
Ki)i)oj
+ 2
/o.oo
0.02
0.25
0.40
\0.00y
=
/0.00
0.45
0.63
6.20
\0.()()/
Thus the vitamin vector for wild rice1 is a linear combination of the vitamin
vectors for cupcake, coconut custard pie, raw brown rice, and soy sauce. So
100 grams of cupcake, 100 grains of coconut custard pie, 100 grams of raw
brown rice, and 200 grains of soy sauce provide exactly the same amounts of
the five vitamins as 100 grams of raw wild rice. Similarly, since
fom
0.01
0.02
0.20
^2.ooy
+
/90.00
0.03
0.02
0.10
v 4.00y
-f
/0.00
0.02
0.07
0.20
K0.00j
+
/o.oo
0.01
0.01
0.10
\0.00yi
+
/io.oo
0.01
0.03
0.20
V 2-00y
+
/o.oo
0.01
0.01
0.30
yO.oo/
=
/IOO.OOX
0.10
0.18
1.30
\ 10.00/
200 grains of apple butter. 100 grams of apples, 100 grams of chocolate candy,
100 grams of farina, 100 grams of jam, and 100 grams of spaghetti provide
exactly the same amounts of the five vitamins as 100 grams of clams. •
Throughout. Chapters 1 and 2 we encounter many different situations in
which it is necessary to determine1 whether or not a vector can be expressed
as a linear combination of other vectors, and if so, how. This question often
reduces to the problem e>f solving a system of linear equations. In Chapter 3,
we discuss a general method for using matrices to solve any system of linear
equations. For now. we illustrate how to solve a system of linear equations by
showing how to determine if the vector (2,6,8) can be; expressed as a. linear
combination of
ui = (1. 2.1), u2 = (-2, -4, -2), u3 = (0,2,3),

26 Chap. 1 Vector Spaces
u4 = (2,0, -3), and u5 = (-3.8.16).
Thus we must determine if there are scalars 01,02,03,04, and 05 such that
(2,6,8) = tti'Ui + a2u2 + 03113 + ei4U,i + 05145
= Oi(l, 2,1) + a2(-2, -4, -2) + o3(0,2,3)
+ a..,(2.0.-3) + a5(-3.8.10)
= (ai — 2o.2 + 2«4 — 3as, 2a 1 — 4a2 + 203 + 805,
<i] — 2a2 + 3c?3 — 3c/4 + I6U5).
Hence (2,6,8) can be expressed as a linear combination of u\, u2, U3, u.\. and
W5 if and only if there is a 5-tuple of scalars (01,02,03,04,05) satisfying the
system of linear equations
a 1 — 2a2 f- 204 — 305 — 2
2a, - 4o2 +2o3 + 805 = 6 (1)
ai — 2a2 + 3a3 - 3a4 + I605 = 8.
which is obtained by equating the corresponding coordinates in the preceding
equation.
To solve system (I), we replace it by another system with the same solu
tions, but which is easier to solve. The procedure to be used expresses some
of the unknowns in terms of others by eliminating certain unknowns from
all the equations except one. To begin, we eliminate 01 from every equation
except the first by adding —2 times the first equation to the second and —1
times the first equation to the third. The result is the following new system:
Oi — 2o2 + 2a 4 — 3a5 = 2
2o3 - 4o4 + 14a5 = 2 (2)
3a;j - 5a4 + 19a5 = 6.
In this case, it happened that while eliminating 01 from every equation
except, the first, we also eliminated a2 from every equation except the first.
This need not happen in general. We now want to make the coefficient of 03 in
the second equation equal to 1, and then eliminate 03 from the third equation.
To do this, we first multiply the second equation by |, which produces
Oi — 2a2 + 2a4 — 3a.r, = 2
03 — 2ei4 + 705 = 1
3a,j — 5ci4 + 19a,5 = 6.
Next we add —3 times the second equation to the third, obtaining
Oi — 2a2 + 204 — 3a,r> — 2
03 — 2a,4 + 7a;3 = 1 (3)
04 — 2ar, = 3.

Sec. 1.4 Linear Combinations and Systems of Linear Equations 27
We continue by eliminating a4 from every equation of (3) except the third.
This yic'lds
a i — 2a2 + o,r> — —A
o3 +3o5= 7 (4)
04 — 2a,r, = 3.
System (4) is a system of the desired form: It is easy to solve for the first
unknown present in each of the equations (01,03, and 04) in terms of the
other unknowns (a2 and 0.5). Rewriting system (4) in this form, we find that
Oi = 2a2 — a.-5 — 4
03 = — 3a-, + 7
a.4 = 2a 5 + 3.
Thus for any choice of scalars a2 and a.5, a vector of the form
(oj. a2.a-.iy 04,05) = (2a2 - 05 - 4. a2, — 3ar, + 7, 2a5 + 3,05)
is a solution to system (1). In particular, the vector (—4,0,7,3,0) obtained
by setting a2 = 0 and a.5 = 0 is a solution to (1). Therefore
(2,6,8) = -Aui + 0u2 + 7u3 + 3w4 + 0u5,
so that. (2,6,8) is a linear combination of U\,u2,1/3,04, and 1/5.
The procedure: just illustrated uses three1 types of operations to simplify
the original system:
1. interchanging the order of any two equations in the system;
2. multiplying any equation in the system by a nonzero constant;
3. adding a. constant, multiple of any equation to another equation in the
system.
In Section 3.4, we prove that these operations do not. change the set of
solutions to the original system. Note that we employed these operations to
obtain a system of equations that had the following properties:
1. The first nonzero coefficient in each equation is one.
2. If an unknown is the first, unknown with a. nonzero coefficient in some
equation, then that, unknown occurs with a zero coefficient in each of
the other equations.
3. The first unknown with a nonzero coefficient in any equation has a
larger subscript than the first unknown with a nonzc:ro coefficient in
any prex-cxling equation.

28 Chap. 1 Vector Spaces
To help clarify the meaning of these properties, note that none of the
following systems meets these requirements.
X\ + 3X2 + X4 =7 ,_.
2x3-5x4 = -l (0)
x,i - 2x2 + 3x3 + x5 = -5
x3 -2x5= 9 (6)
x4 + 3.T5 = 6
xi - 2x8 + xr} = 1
x4 - 6x5 = 0 (7)
x2 + 5x3 - 3x5 = 2.
Specifically, system (5) does not satisfy property 1 because; the first nonzero
coefficient in the second equation is 2; system (6) does not. satisfy property 2
because X3, the first unknown with a nonzero ccxuficient in the second equa
tion, occurs with a nonzero e;oeffieie:nt in the first equation; and system (7)
doe:s not. satisfy property 3 because x2, the first unknown with a nonzero
coefficient, in the third equation, does not have a larger subscript than ./|. the
first unknown with a nonzero coefficient, in the second equation.
One:e: a system with properties 1, 2. and 3 has been obtained, it is easy
to solve lor some of the unknowns in terms of the others (as in the preceding
example). If, however, in the course of tisiug operations 1, 2, and 3 a system
containing an equation of the form 0 = c, where c is nonzero, is obtained,
then the original system has no solutions. (Sex1 Example 2.)
We return to the study of systems of linear equations in Chapter 3. We
diseniss there the theoretical basis for this method of solving systems of linear
equations and further simplify the procedure by use of matrices.
Example 2
We claim that
2x3 - 2x2 + 12x - 6
is a linear combination of
x3 - 2x2 - 5x - 3 and 3x3 - 5x2 - 4x - 9
in P-.i(R), but that,
3x3 - 2x2 + 7x + 8
is not. In the first, case we wish to find scalars a and b such that
2x:j - 2x2 + 12x - 6 = a(xz - 2x2 - 5x - 3) + b(3x:i - -r>x2 - 4x - 9)

Sec. 1.4 Linear Combinations and Systems of Linear Equations 29
= (a + 3b)x3 + (-2o - 56)x2 + (-5a - 46)x + (-3a. - 96).
Thus we are led to the following system of linear equations:
a + 36 = 2
-2a - 56 = -2
-5a - 46 = 12
-3a - 96 = -6.
Adding appropriate multiples of the first equation to the others in order to
eliminate a. we find that
a + 36 = 2
6= 2
116 = 22
06= 0.
Now adding the appropriate multiples of the second equation to the others
yields
o= -4
6=2
0= 0
0= 0.
Hence
2x3 - 2x2 + 12x - 6 = -A(x3 - 2x2 - 5x - 3) + 2(3x:i - 5x2 - 4x - 9).
In the second case, we wish to show that, there are no scalars a and 6 for
which
3x3 - 2x2 + 7x + 8 = a(x3 - 2x2 - 5x - 3) + 6(3x3 - 5x2 - 4x - 9).
Using the preceding technique, we obtain a system of linear equations
a + 36 = 3
-2a - 56 = -2
-5a - 46 = 7
-3a - 96 = 8.
(8)
Eliminating a. as before yields
a + 36= 3
6= 4
116 = 22
0 = 17.
But the presence of the inconsistent equation 0 = 17 indicates that (8)
has no solutions. Hence 3x3 — 2x2 + 7x + 8 is not a linear combination of
x:i - 2x2 - 5x - 3 and 3x3 - 5x2 - 4x - 9. •

30 Chap. 1 Vector Spaces
Throughout this book, we form the set of all linear combinations of some
set of vectors. We now name such a. set of linear combinations.
Definition. Let S be a nonempty subset of a vector space V. The span
of S, denoted span(S'), is the set consisting of all linear combinations of the
vectors in S. For convenience, we define span(0) = {()}.
In R3, for instance, the span of the set {(1.0,0), (0.1, 0)} consists of all
vectors in R3 that have the form a( 1,0,0) 4- 6(0,1.0) = (a, 6,0) for some
scalars a and 6. Thus the span of {(1,0,0), (0, 1,0)} contains all the points in
the xy-plane. In this case, the span of the set is a subspace of R. This fact
is true in general.
Theorem 1.5. The span of any subset S of a vector space V is a subspace
of V. Moreover, any subspace of V that contains S must also contain the
span of S.
Proof This result is immediate if .S1 = 0 because span(0) = {()}, which
is a subspace that is contained in any subspace of V.
If S ^ 0, then S contains a vector z. So Oz = 0 is in span(5"). Let
^iV £ span (5). Then there exist vectors U\, «2,..., um, V\.v2...., v„ in 5
and scalars oi, 02,..., am, b\, 62,..., bn such that
x = ai«i + a2u2 H f- amu.m and y = 6, c, + b2v2 -\ \- bnvn.
Then
x + y = a, «i + a2u.2 + • • • + amum + 6|t,1 + 62t'2 4 + b„e„
and, for any scalar c,
ex = (cai)wi + (c.a2)u2 4 f- (cam)um
are clearly linear combinations of the vectors in S: so x + y and ex are in
span(5). Thus span(S) is a subspace of V.
Now let W denote any subspace of V that contains S. If w C span(.S'). then
w has the form w = CiWj +C2if2 + " • -+CkWk for some vectors W\, w2 u\. in
S and some scalars c\, c2,..., c.f,-. Since S C W, we have u)\, w2,..., Wk € W.
Therefore w = c+wi + c2w2 + • • • + Ck'Wk is in W by Exercise 20 of Section
1.3. Because iu, an arbitrary vector in span(<Sr), belongs to W, it follows that
span(5) C W. |
Definition. A subset S of a vector space V generates (or spans) V
if span(5) = V. In this case, we also say that the vectors of S generate (or
span) V.

Sec. 1.4 Linear Combinations and Systems of Linear Equations 31
Example 3
The vectors (1,1,0), (1,0,1), and (0,1,1) generate R3 since an arbitrary vector
{0.1,0,2,0,3) in R3 is a linear combination of the three given vectors; in fact,
the scalars r, s, and t for which
r(l, 1,0) + a(l, 0,1) + t{0,1,1) = (aua2,a3)
are
r = -(a\ + 02 — a3), s = -{a\ - a2 + a3), and t = -(—a\ + 0,2 + a3). •
Example 4
The polynomials x2 + 3x — 2, 2x2 + 5x — 3, and -a;2 — 4x + 4 generate P2(-R)
since each of the three given polynomials belongs to P2(i?) and each polyno
mial ax2 + bx + c in P2(-R) is a linear combination of these three, namely,
(-8a + 56 + 3c)(x2 + 3x - 2) + (4a - 26 - c)(2a-2 + 5x - 3)
+(-a + 6 + c)(-x2 - 4a; + 4) = az2 + bx + a •
/
Example 5
The matrices
1 1
1 0
1 1
0 1
1 0
1 1
and
0 1
1 1
generate M2x2(.ft) since an arbitrary matrix A in M2x2(.R) can be expressed
as a linear combination of the four given matrices as follows:
an aV2
a2i a22
.1 1 1 2 .
= (^ail + oa12+ oa21 ~ 3^22)
,1
•3an
1 1
oa12 - ofl21 + oa22,
,1 2 1 1 ,
,2 1 1 1 x
+ (""Q0!! + oa12+ ofl21 + oa22)
1
0
1
) 1
0
1
0 1
1 1
On the other hand, the matrices
1 0
0 1
1 1
0 1
and
1 0
1 1

32 Chap. 1 Vector Spaces
do not generate M2x2(i?) because each of these matrices has equal diagonal
entries. So any linear combination of these matrices has equal diagonal en
tries. Hence not every 2x2 matrix is a linear combination of these three
matrices. •
At the beginning of this section we noted that the equation of a plane
through three noncollinear points in space, one of which is the origin, is of
the form x — su + tv, where u, v € R3 and s and t are scalars. Thus x G R3 is
a linear combination of u, v E R if and only if x lies in the plane containing
u and v. (See Figure 1.5.)
Figure 1.5
Usually there are many different subsets that generate a subspace W. (See
Exercise 13.) It is natural to seek a subset of W that generates W and is as
small as possible. In the next section we explore the circumstances under
which a vector can be removed from a generating set to obtain a smaller
generating set.
EXERCISES
1. Label the following statements as true or false.
(a) The zero vector is a linear combination of any nonempty set of
vectors.
(b) The span of 0 is 0.
(c) If 5 is a subset of a vector space V, then span(5) equals the inter
section of all subspaces of V that contain S.
(d) In solving a system of linear equations, it is permissible to multiply
an equation by any constant.
(e) In solving a system of linear equations, it is permissible to add any
multiple of one equation to another.
(f) Every system of linear equations has a solution.

Sec. 1.4 Linear Combinations and Systems of Linear Equations 33
2. Solve the following systems of linear equations by the method intro
duced in this section.
(a)
(b)
(c)
(d)
(e)
(f)
2si - 1x2 - 3x3 = -2
3xi — 3x2 — 2x3 + 5x4 = 7
X\ — X2 — 2x3 — x4 = —3
3xi - lx2 + 4x3 = 10
x\ - 2x2 + X3 = 3
2x\ — X2 — 2x3 = 6
x\ + 2x2 — X3 + a*4 = 5
xi 4- 4x2 - 3x3 — 3x4 — 6
2xi 4- 3x2 — X3 4- 4x4 = 8
xi 4- 2x2 + 2x3 = 2
xi + 8x3 + 5.T4 = -6
xi 4- x2 4- 5.T3 + 5x4 = 3
Xi + 2x2 - 4x3 - X4 + x5 = 7
-Xi 4- l()x3 - 3x4 - 4x5 = -16
2xi + 5x2 - 5x3 ~ 4x4 - x5 = 2
4xi + llx2 - 7x3 - IOX4 - 2x5 = 7
Xi 4- 2x2 + 6x3 = -1
2xi + x2 + x3 = 8
3xi + x2 - X3 = 15
xi 4- 3x2 + IOX3 = -5
/
3. For each of the following lists of vectors in R3, determine whether the
first vector can be expressed as a linear combination of the other two.
(a) (-2,0,3),(1,3,0),(2,4,-1)
(b) (1,2, -3), (-3,2,1), (2,-1,-1)
(c) (3,4,1), (1,-2,1), (-2,-1,1)
(d) (2,-1,0), (1,2,-3), (1,-3,2)
(e) (5,1,-5), (1,-2,-3), (-2,3,-4)
(f) (-2,2,2), (1,2,-1), (-3,-3,3)
4. For each list of polynomials in P3(K), determine whether the first poly
nomial can be expressed as a linear combination of the other two.
(a)
(b)
(c)
(d)
(e)
(f)
x3 - 3x 4- 5, x3 4- 2x2 - x + 1, x3 + 3x2 - 1
4x3 + 2x2 - 6, x3 - 2x2 + 4x + 1,3x3 - 6x2 + x + 4
-2x3 - 1 lx2 4- 3x 4- 2, x3 - 2x2 + 3x - 1,2x3 + x2 + 3x - 2
x3 4- x2 + 2x 4-13,2x3 - 3x2 + 4x + 1, x3 - x2 + 2x 4- 3
x3 - 8x2 + 4x, x3 - 2x2 + 3x - 1, x3 - 2x + 3
6x3 - 3x2 4- x + 2, x3 - x2 + 2x 4- 3,2x3 - 3x 4- 1

34 Chap. 1 Vector Spaces
5. In each part, determine whether the given vector is in the span of S.
(a) (2,-1,1), S = {(1,0,2), (-1,1,1)}
(b) (-1,2,1), S = {(1,0,2), (-1,1,1)}
(c) (-1,1,1,2), S = {(1,0,1,-1), (0,1,1,1)}
(d) (2,-1,1,-3), S = {(1,0,1,-1), (0,1,1,1)}
(e) -x3 4-2x2 4-3x 4-3, S = {x3 4- x2 4-x + l,x2 4- x 4- 1,X 4- 1}
x + 3, S= {x3 + x2 + x + l,x2 4-x 4- l,x4- 1} (f) 2x3-x2
(g)
1 2
-3 4
(h)
1 0
0 1
s =
S =
1 0
-1 0
1 0
1 0
0 1
0 1
1 1
0 0
0 1
0 1
1 1
0 0
6. Show that the vectors (1,1,0), (1,0,1), and (0,1,1) generate F3.
7. In Fn, let ej denote the vector whose jth coordinate is 1 and whose
other coordinates are 0. Prove that {ei, e2,..., en} generates Fn.
8. Show that Pn(F) is generated by {1, x,..., xn}.
9. Show that the matrices
1 0
0 0
generate M2X2(F)-
10. Show that if
1 0
0 1
0 0
0 0
1 0
and
0 0
0 1
Mi =
0 0
M2 =
0 0
0 1
and M3 =
0 1
1 0
then the span of {Mi, M2, M3} is the set of all symmetric 2x2 matrices.
11.T Prove that span({x}) = {ax: a £ F} for any vector x in a vector space.
Interpret this result geometrically in R3.
12. Show that a subset W of a vector space V is a subspace of V if and only
if span(W) = W.
13. ^ Show that if Si and S2 are subsets of a vector space V such that Si C 52,
then span(Si) C span(52). In particular, if Si C 52 and span(Si) = V,
deduce that span(52) — V.
14. Show that if Si and S2 are arbitrary subsets of a vector space V, then
span(S'iU5,2) = span(5i)+span(S'2). (The sum of two subsets is defined
in the exercises of Section 1.3.)

Sec. 1.5 Linear Dependence and Linear Independence 35
15. Let S\ and S2 be subsets of a vector space V. Prove that span (Si DS2) C
span(Si) n span(S2). Give an example in which span(Si D S2) and
span(Si) nspan(S2) are equal and one in which they are unequal.
16. Let V be a vector space and S a subset of V with the property that
whenever v\, 1)2,... ,vn € S and aiVi 4- a2v2 4- • • • 4- anvn = 0, then
o\ = 0,2 = • • • = an = 0. Prove that every vector in the span of S can
be uniquely written as a linear combination of vectors of S.
17. Let W be a subspace of a vector space V. Under what conditions are
there only a finite number of distinct subsets S of W such that S gen
erates W?
1.5 LINEAR DEPENDENCE AND LINEAR INDEPENDENCE
Suppose that V is a vector space over an infinite field and that W is a subspace
of V. Unless W is the zero subspace, W is an infinite set. It is desirable to
find a "small" finite subset S that generates W because we can then describe
each vector in W as a linear combination of the finite number of vectors in
S. Indeed, the smaller that S is, the fewer computations that are required
to represent vectors in W. Consider, for example, the subspace W of R3
generated by S = {u\,U2,^3,^/4}, where ui — (2,-1,4), u2 = (1,-1,3),
U3 = (1,1, —1), and U4 = (1, —2, —1). Let us attempt to find a proper subset
of S that also generates W. The search for this subset is related to the
question of whether or not some vector in S is a linear combination of the
other vectors in S. Now U4 is a linear combination of the other vectors in S
if and only if there arc scalars a\, a2, and a3 such that
u4 = aiui 4- a2w2 4- a3w3,
that is, if and only if there are scalars ai,a2, and a3 satisfying
(1, -2, -1) = (2ai 4- a2 4- a3, —ai - a2 4- a3,4a 1 4- 3a2 - a3).
Thus U4 is a linear combination of ui,u2, and U3 if and only if the system of
linear equations
2ai 4- a2 4- a3 = 1
-ai - a2 4- a3 = -2
4ai 4- 3a2 — 03 = — 1
has a solution. The reader should verify that no such solution exists. This
does not, however, answer our question of whether some vector in S is a linear
combination of the other vectors in S. It can be shown, in fact, that U3 is a
linear combination of wi,u2, and U4, namely, u3 — 2u\ ~ 3w2 4- OW4.

36 Chap. 1 Vector Spaces
In the preceding example, checking that some vector in S is a linear
combination of the other vectors in S could require that we solve several
different systems of linear equations before we determine which, if any, of
ui,U2,U3, and U4 is a linear combination of the others. By formulating
our question differently, we can save ourselves some work. Note that since
U3 = 2ui — 3w2 4- OW4, we have
—2ui 4- 3u2 4- w3 — OW4 = 0.
That is, because some vector in S is a linear combination of the others, the
zero vector can be expressed as a linear combination of the vectors in S using
coefficients that are not all zero. The converse of this statement is also true:
If the zero vector can be written as a linear combination of the vectors in S
in which not all the coefficients are zero, then some vector in S is a linear
combination of the others. For instance, in the example above, the equation
—2wi 4- 3u2 4- «3 — OW4 = 0 can be solved for any vector having a nonzero
coefficient; so U\, w2, or U3 (but not U4) can be written as a linear combination
of the other three vectors. Thus, rather than asking whether some vector in
S is a linear combination of the other vectors in S, it is more efficient to
ask whether the zero vector can be expressed as a linear combination of the
vectors in S with coefficients that are not all zero. This observation leads us
to the following definition.
Definition. A subset S of a vector space V is called linearly dependent
if there exist a finite number of distinct vectors W], u2,... ,un in S and scalars
ai, a2,..., an, not all zero, such that
aiUi 4- a2w2 4- • • • + anur = 0.
In this case we also say that the vectors of S are linearly dependent.
For any vectors Wi, w2,..., un, we have aiUi 4- a2w2 4- • • • 4- anun = 0
if ai = a-2 = • • • = an = 0. We call this the trivial representation of 0 as
a linear combination of u\, w2,..., un. Thus, for a set to be linearly depen
dent, there must exist a nontrivial representation of 0 as a linear combination
of vectors in the set. Consequently, any subset of a vector space that con
tains the zero vector is linearly dependent, because 0 = 1 • 0 is a nontrivial
representation of 0 as a linear combination of vectors in the set.
Example 1
Consider the set
S = {(1,3, -4,2), (2,2, -4,0), (1, -3,2, -4), (-1,0,1,0)}
in R4. We show that S is linearly dependent and then express one of the
vectors in S as a linear combination of the other vectors in S- To show that

Sec. 1.5 Linear Dependence and Linear Independence 37
S is linearly dependent, we must find scalars ai, a2, a3, and 04, not all zero,
such that
ai(l,3,-4,2)4-a2(2,2,-4,0)4-a3(l,-3,2,-4)4-a4(-l,0,l,0) = 0.
Finding such scalars amounts to finding a nonzero solution to the system of
linear equations
ai 4- 2a2 4- a3 — a4 = 0
3ai 4- 2a2 — 3a3 = 0
—4ai — 4a2 4- 2a3 4- a4 = 0
2ai — 4a3 = 0.
One such solution is ai = 4, a2 = —3, 03 = 2, and 04 = 0. Thus S is a
linearly dependent subset of R4, and
4(1,3, -4,2) - 3(2,2, -4,0) + 2(1, -3,2, -4) 4- 0(-l, 0,1,0) = 0. •
Example 2
In M2x3(i?), the set
1 -3 2
-4 0 5
is linearly dependent because
-3 7 4
6 -2 -7
-2 3
-1 -3
11
2
1 -3 2
-4 0 5
4-3
-3 7 4\_ (-2 3 11
6 -2 -7 V—1 —3 2
0 0 0
0 0 0
Definition. A subset S of a vector space that is not linearly dependent
is called linearly independent. As before, we also say that the vectors of
S are linearly independent.
The following facts about linearly independent sets are true in any vector
space.
1. The empty set is linearly independent, for linearly dependent sets must
be nonempty.
2. A set consisting of a single nonzero vector is linearly independent. For
if {u} is linearly dependent, then au = 0 for some nonzero scalar a.
Thus
u = a (au) — a 0 = 0.
3. A set is linearly independent if and only if the only representations of
0 as linear combinations of its vectors are trivial representations.

38 Chap. 1 Vector Spaces
The condition in item 3 provides a useful method for determining whether
a finite set is linearly independent. This technique is illustrated in the exam
ples that follow.
Example 3
To prove that the set
S = {(1,0,0,-1), (0,1,0,-1), (0,0,1,-1), (0,0,0,1)}
is linearly independent, we must show that the only linear combination of
vectors in S that equals the zero vector is the one in which all the coefficients
are zero. Suppose that ai, 0,2,0,3, and 04 are scalars such that
ai(l,0,0,-1) 4-a2(0,l,0,-l) + a3(0,0,1,-1)+ a4(0,0,0,1) = (0,0,0,0).
Equating the corresponding coordinates of the vectors on the left and the right
sides of this equation, we obtain the following system of linear equations.
ai =0
a2 =0
a3 =0
—ai — a2 — 03 4- 04 = 0
Clearly the only solution to this system is ai = a2 = a3 = a4 = 0, and so S
is linearly independent. •
Example 4
For k = 0,1,...,n let pk(x) = xk + xk+1 4- • • • 4- xn. The set
{po(x),pi(x),...,pn(x)}
is linearly independent in Pn(F). For if
a0p0(x) 4- aipi(x) H 1- anpn(x) = 0
for some scalars ao, ai,..., an, then
ao 4- (a0 4- ai)x 4- (a0 4- ai 4- a2)x2 -| 1- (a0 4- ai -I h an)xn = 0.
By equating the coefficients of xk on both sides of this equation for k =
1,2,..., n, we obtain
ao
a0 4- ai
ao 4- a\ 4- a2
ao 4- ai 4- a2 4- •
= 0
= 0
= 0
• 4- an = 0
Clearly the only solution to this system of linear equations is ao = ai = • • • =
an = 0. •

Sec. 1.5 Linear Dependence and Linear Independence 39
The following important results are immediate consequences of the defi
nitions of linear dependence and linear independence.
Theorem 1.6. Let V be a vector space, and let S] C S2 C V. If Si is
linearly dependent, then S2 is linearly dependent.
Proof. Exercise. SSI
Corollary. Let V be a vector space, and let St C S2 C V. If S2 is linearly
independent, then Si is linearly independent.
Proof. Exercise. I
Earlier in this section, we remarked that the issue of whether S is the
smallest generating set for its span is related to the question of whether
some vector in S is a linear combination of the other vectors in S. Thus
the issue of whether S is the smallest generating set for its span is related
to the question of whether S is linearly dependent. To see why. consider
the subset S = {u\,1*2,^3,114} of R3, where //, = (2,-1,4), w2 = (1, —1,3),
113 = (1,1,-1), and «4 — (1,-2,-1). We have previously noted that S is
linearly dependent: in fact.
-2u.\ 4- 3u2 4- U3 - O//4 = 0.
This equation implies that w3 (or alternatively, U\ or u2) is a linear combina
tion of the other vectors in S. For example, 11.3 = 2ii\ — 3u2 4- OU4. Therefore
every linear combination a\Ui f a2u2 + a3":i + 04114 of vectors in S can be
written as a linear combination of U[, u2, and 114:
a ii 4- a.2u2 4- a3?i3 4- 0.4U4 = aiUi + o.2u2 4- a3(2ui — 3u2 4- OM4) 4- 0411.4
= (ai 4- 2a3)jvi + (a2 - 3a3)»2 + 04^4.
Thus the subset S' — {ai,u2,U4} of S has the same span as S!
More generally, suppose that S is any linearly dependent set containing
two or more vectors. Then some vector v € S can be written as a linear
combination of the other vectors in S. and the subset obtained by removing
v from S has the same span as S. It follows that if no proper subset of S
generates the span of S, then S must be linearly independent. Another way
to view the preceding statement is given in Theorem 1.7.
Theorem 1.7. Let S be a linearly independent subset of a vector space
V, and let V be a vector in V that is not in S. Then S U {v} is linearly
dependent if and only ifv € span(S).

40 Chap. 1 Vector Spaces
Proof. If Su{v} is linearly dependent, then there are vectors u\, u2,..., un
in S U {v} such that ai^i 4- a2u2 4- • • • + anun = 0 for some nonzero scalars
ai,a2,... ,an. Because S is linearly independent, one of the u^s, say «i,
equals v. Thus aiv 4- a2u2 4- • • • 4- anun = 0, and so
v = a1 (-a2u2 onun) = -(ax a2)u2 (ax 1an)un.
Since v is a linear combination of u2,... ,un, which are in S, we have v £
span(S).
Conversely, let v £ span(S). Then there exist vectors Vi,V2, • • • ,vm in S
and scalars 6i, 62,..., 6m such that v = biVi + b2v2 4- • • • 4- bmvm. Hence
0 = b\vi + b2v2 4- -l)v.
Since v ^ Vi for i = 1,2,..., ra, the coefficient of v in this linear combination
is nonzero, and so the set {vi,v2,... ,vm,v} is linearly dependent. Therefore
S U {v} is linearly dependent by Theorem 1.6. H
Linearly independent generating sets are investigated in detail in Sec
tion 1.6.
EXERCISES
1. Label the following statements as true or false.
(a) If S is a linearly dependent set, then each vector in S is a linear
combination of other vectors in S.
(b) Any set containing the zero vector is linearly dependent.
(c) The empty set is linearly dependent.
(d) Subsets of linearly dependent sets are linearly dependent.
(e) Subsets of linearly independent sets are linearly independent.
(f) If aixi 4- a2x2 4- • • • 4- anxn = 0 and Xi,x2,... ,xn are linearly
independent, then all the scalars a« are zero.
2. Determine whether the following sets are linearly dependent or linearly
independent.
1 -3
-2 4
(a)
-2 6
4 -8
in M2x2(fl)
^ H-l A)\ 2 _4J)-M2><2^
(c) {x3 4- 2x2, -x2 4- 3.x 4- 1, x3 - x2 4- 2x - 1} in P3(R)
3The computations in Exercise 2(g), (h), (i), and (j) are tedious unless technology is
used.

Sec. 1.5 Linear Dependence and Linear Independence 41
(d) {x3 - x, 2x2 4- 4, -2x3 4- 3x2 4- 2x 4- 6} in P3(R)
(e) {(l,-l,2),(l,-2,l),(l,l,4)}inR3
(f) {(l,-l,2),(2,0,l),(-l,2,-l)}inR3
(g)
(h)
1 0
~2 1
1 0
~2 1
0 -1
1 1
0 -1
1 1
-1 2
1 0
-1 2
1 0
2 1
-4 4
2 1
2 -2
in M2x2(tf)
in M2x2(#)
(i) {x4 - x3 4- 5x2 - 8x 4- 6, -x4 4-x3 - 5x2 + 5x - 3,
x4 4- 3x2 - 3x 4- 5,2x4 4- 3x3 4- 4x2 - x 4-1, x3 - x 4- 2} in P4(R)
(j) {x4 - x3 + ox2 - 8x 4- 6, -x4 4- x3 - 5x2 4- 5x - 3,
x4 + 3x2 - 3x 4- 5,2x4 4- x3 4- 4x2 4- 8x} in P4(R)
3. In M3x2(F), prove that the set
is linearly dependent.
/
4. In Fn, let ej denote the vector whose jth coordinate is 1 and whose other
coordinates are 0. Prove that {ei,e2,... ,en} is linearly independent.
5. Show that the set {l,x,x2,... ,xn} is linearly independent in Pn(F).
6. In MmXn(F), let E1* denote the matrix whose only nonzero entry is 1 in
the zth row and jth column. Prove that \Exi: 1 < i < m, 1 < j < n)
is linearly independent.
7. Recall from Example 3 in Section 1.3 that the set of diagonal matrices in
M2x2(^) is a subspace. Find a linearly independent set that generates
this subspace.
8. Let S = {(1,1,0), (1,0,1), (0,1,1)} be a subset of the vector space F3.
(a) Prove that if F = R, then S is linearly independent.
(b) Prove that if F has characteristic 2, then S is linearly dependent.
9.* Let u and v be distinct vectors in a vector space V. Show that {u, v} is
linearly dependent if and only if u or v is a multiple of the other.
10. Give an example of three linearly dependent vectors in R3 such that
none of the three is a multiple of another.

42 Chap. 1 Vector Spaces
11. Let S = {ui,U2, • • • ,Wn} be a linearly independent subset of a vector
space V over the field Z2. How many vectors are there in span(S)?
Justify your answer.
12. Prove Theorem 1.6 and its corollary.
13. Let V be a vector space over a field of characteristic not equal to two.
(a) Let u and v be distinct vectors in V. Prove that {u, v} is linearly
independent if and only if {u 4- v, u — v} is linearly independent.
(b) Let u, v, and w be distinct vectors in V. Prove that {u,v,w} is
linearly independent if and only if {u + v,u + w, v 4- w} is linearly
independent.
14. Prove that a set S is linearly dependent if and only if S = {0} or
there exist distinct vectors v,U\,U2, • • •, un in S such that v is a linear
combination of ui,u2,...,un.
15. Let S = {ui,u2,... ,un} be a finite set of vectors. Prove that S is
linearly dependent if and only if ui = 0 or Wfc+i € span({tii, w2,..., Uk})
for some k (1 < k < n).
16. Prove that a set S of vectors is linearly independent if and only if each
finite subset of S is linearly independent.
17. Let M be a square upper triangular matrix (as defined in Exercise 12
of Section 1.3) with nonzero diagonal entries. Prove that the columns
of M are linearly independent.
18. Let S be a set of nonzero polynomials in P(F) such that no two have
the same degree. Prove that S is linearly independent.
19. Prove that if {Ai,A2, • • • ,Ak} is a linearly independent subset of
Mnxn(-F)j then {A\, A2, • • - , Alk} is also linearly independent.
20. Let /, g, € F(R, R) be the functions defined by f(t) = ert and g(t) = est,
where r ^ s. Prove that / and g are linearly independent in J-(R, R).
1.6 BASES AND DIMENSION
We saw in Section 1.5 that if S is a generating set for a subspace W and
no proper subset of S is a generating set for W, then S must be linearly
independent. A linearly independent generating set for W possesses a very
useful property—every vector in W can be expressed in one and only one way
as a linear combination of the vectors in the set. (This property is proved
below in Theorem 1.8.) It is this property that makes linearly independent
generating sets the building blocks of vector spaces.

Sec. 1.6 Bases and Dimension 43
Definition. A basis j3 for a vector space V is a linearly independent
subset of V that generates V. If ft is a basis for V, we aiso say that the
vectors of ft form a basis for V.
Example 1
Recalling that span(0) = {()} and 0 is linearly independent, we see that 0
is a basis for the zero vector space. •
Example 2
In Fn, let ei = (1,0,0,... ,0),e2 = (0,1,0,.. .,0),... ,e„ = (0,0,... ,0,1);
{ei, e2,..., en} is readily seen to be a basis for Fn and is called the standard
basis for Fn. •
Example 3
In MTOXn(F), let E*i denote the matrix whose only nonzero entry is a 1 in
the ith. row and jth column. Then {E*? : 1 < i < m, 1 < j < n} is a basis for
Mmxn(F). •'
Example 4
In Pn(F) the set {1, x, x2,..., xn} is a basis. We call this basis the standard
basis for Pn(F). • /
Example 5
In P(F) the set {1,x,x2,...} is a basis. •
Observe that Example 5 shows that a basis need not be finite. In fact,
later in this section it is shown that no basis for P(F) can be finite. Hence
not every vector space has a finite basis.
The next theorem, which is used frequently in Chapter 2, establishes the
most significant property of a basis.
Theorem 1.8. Let V be a vector space and ft = {ui,u2,... ,un} be a
subset ofV. Then ft is a basis for V if and only if each oGV can be uniquely
expressed as a linear combination of vectors of ft, that is, can be expressed in
the form
v = aiUi 4- a2u2 H \- anun
for unique scalars a\, a2,..., an.
Proof. Let ft be a basis for V. If v e V, then v € span(/?) because
span(/i/) = V. Tims v is a linear combination of the vectors of ft. Suppose
that
a\U\ +a2u2 4- anun and v = b\U\ + b2u2 + • • • + bnut

44 Chap. 1 Vector Spaces
are two such representations of v. Subtracting the second equation from the
first gives
0 = (a\ - b)ui + (a2 - b2)u2 4- h (an - bn)un.
Since ft is linearly independent, it follows that ai — b\ = a2 — b2 = • • • =
on - bn = 0. Hence a\ — l>i,a2 = b2,---,an = bn, and so v is uniquely
expressible as a linear combination of the vectors of ft.
The proof of the converse is an exercise. I
Theorem 1.8 shows that if the vectors u\,u2,... ,un form a basis for a
vector space V, then every vector in V can be uniquely expressed in the form
v = arui 4- a2u2 -\ h anun
for appropriately chosen scalars 0],a2,... ,an. Thus v determines a unique
n-tuple of scalars (ai,a2,..., an) and, conversely, each n-tuple of scalars de
termines a unique vector v £ V by using the entries of the n-tuple as the
coefficients of a linear combination of ui,u2,..., un. This fact suggests that
V is like the vector space Fn, where n is the number of vectors in the basis
for V. We see in Section 2.4 that this is indeed the case.
In this book, we are primarily interested in vector spaces having finite
bases. Theorem 1.9 identifies a large class of vector spaces of this type.
Theorem 1.9. If a vector space V is generated by a Bnite set S, then
some subset of S is a basis for V. Hence V has a finite basis.
Proof. If S = 0 or S = {0}, then V = {()} and 0 is a subset of S that is a
basis for V. Otherwise S contains a nonzero vector u\. By item 2 on page 37,
{ui} is a linearly independent set. Continue, if possible, choosing vectors
u2,...,itfc in S such that {ui,u2,.... Uk} is linearly independent. Since S is
a finite set, we must eventually reach a stage at which ft = {ui,U2,..., Uk) is
a linearly independent subset of S, but adjoining to ft any vector in S not in ft
produces a linearly dependent set. We claim that ft is a basis for V, Because
ft is linearly independent by construction, it suffices to show that ft spans V.
By Theorem 1.5 (p. 30) we need to show that S C span(/?). Let v £ S. If
v £ ft, then clearly v £ span(/3). Otherwise, if v £ ft, then the preceding
construction shows that ft U {v} is linearly dependent. So v £ span(/3) by
Theorem 1.7 (p. 39). Thus S C span(/?). I
Because of the method by which the basis ft was obtained in the proof
of Theorem 1.9, this theorem is often remembered as saying that a tinite
spanning set for V can be reduced to a basis for V. This method is illustrated
in the next example.

Sec. 1.6 Bases and Dimension 45
Example 6
Let
S = {(2, -3.5), (8. -12.20), (1,0, -2). (0.2, -1), (7.2.0)}.
It can be shown that S generates R3. We can select a basis for R3 that
is a subset of S by the technique used in proving Theorem 1.9. To start,
select any nonzero vector in S, say (2. -3. 5), to be a vector in the basis.
Since 4(2,-3,5) = (8,-12.20), the set {(2,-3,5), (8,-12,20)} is linearly
dependent by Exercise 9 of Section 1.5. Hence we do not include (8, —12,20)
in our basis. On the other hand, (1,0, -2) is not a multiple of (2, -3,5) and
vice versa, so that the set {(2. —3. 5), (1,0. —2)} is linearly independent. Thus
we include (1,0, —2) as part of our basis.
Now we consider the set {(2. -3.5), (1.0, -2). (0,2,-1)} obtained by ad
joining another vector in S to the two vectors that we have already included
in our basis. As before, wc include (0,2, — 1) in our basis or exclude it from
the basis according to whether {(2, -3,5). (1.0, -2), (0,2, -1)} is linearly in
dependent or linearly dependent. An easy calculation shows that this set is
linearly independent, and so we include (0,2, —1) in our basis. In a similar
fashion the final vector in S is included or excluded from our basis according
to whether the set
{(2,-3,5), (1,0,-2), (0,2,-1), (7,2,0)}
is linearly independent or linearly dependent. Because
2(2, -3,5) 4- 3(1,0, -2) 4- 4(0. 2. -1) - (7. 2,0) = (0,0.0),
we exclude (7,2,0) from our basis. We conclude that
{(2.-3.5),(1.0,-2).(0,2,-1)}
is a subset of S that is a basis for R3. •
The corollaries of the following theorem are perhaps the most significant
results in Chapter 1.
Theorem 1.10 (Replacement Theorem). Let V be a vector space
that is generated by a set G containing exactly n vectors, and let L be a
linearly independent subset of V containing exactly m vectors. Then rn < n
and there exists a subset HofG containing exactly n — rn vectors such that
L U H generates V.
Proof. The proof is by mathematical induction on m. The induction begins
with m = 0; for in this case L = 0, and so taking H = G gives the desired
result.

46 Chap. 1 Vector Spaces
Now suppose that the theorem is true for some integer m > 0. We prove
that the theorem is true for m 4- 1. Let L = {v\,V2, • • •,vm+i} be a linearly
independent subset of V consisting of ra 4- 1 vectors. By the corollary to
Theorem 1.6 (p. 39), {v\,v2,... ,vm} is linearly independent, and so we may
apply the induction hypothesis to conclude that rn < n and that there is a
subset {ui ,u2,..., un-m } of G such that {vi ,v2,... ,vm}u{ui,u2,..., un-m }
generates V. Thus there exist scalars a\, a2,..., aTO, b\, b2,..., bn-m such that
a\V\ 4- a2v2 + amvrn 4- biui 4- b2u2 4- bn-
n—rn **n—rn
= V TO+1 •
(9)
Note that n — m > 0, lest vm+i be a linear combination of v\, v%,..., vm,
which by Theorem 1.7 (p. 39) contradicts the assumption that L is linearly
independent. Hence n > ra; that is, n > rn + l. Moreover, some bi, say &i, is
nonzero, for otherwise we obtain the same contradiction. Solving (9) for wi
gives
ui = (-6^1Oi)«1 4- (-bi1a2)v2 + • • • 4- (-bilam)vm + (lhl)vni+i
4- (-bi1b2)u2 H 4- (-6r/16n_m)ufl_m.
Let H = {u2,..., Un-m). Then ui £ span(LUi/), and because V\, v2,..., um,
u2,..., un-m are clearly in span(L U H), it follows that
{vi, v2,..., vm,ui,u2,..., Un-m} Q span(L U H).
Because {v\,v2,..., vm, ui, u2,..., wn-m} generates V, Theorem 1.5 (p. 30)
implies that span(L U H) = V. Since H is a subset of G that contains
(n — rn) — 1 = n — (m + 1) vectors, the theorem is true for rn 4- 1. This
completes the induction. 1
Corollary 1. Let V be a vector space having a tinitc basis. Then every
basis for V contains the same number of vectors.
Proof. Suppose that ft is a finite basis for V that contains exactly n vectors,
and let 7 be any other basis for V. If 7 contains more than n vectors, then
we can select a subset S of 7 containing exactly n 4-1 vectors. Since S is
linearly independent and ft generates V, the replacement theorem implies that
n +1 < n, a contradiction. Therefore 7 is finite, and the number m of vectors
in 7 satisfies 77?. < n. Reversing the roles of ft and 7 and arguing as above, we
obtain n < ra. Hence rn = n. I
If a vector space has a finite basis, Corollary 1 asserts that the number
of vectors in any basis for V is an intrinsic property of V. This fact makes
possible the following important definitions.
Definitions. A vector space is called finite-dimensional if it has a
basis consisting of a Unite number of vectors. The unique number of vectors

Sec. 1.6 Bases and Dimension 47
in each basis for V is called the dimension of\/ and is denoted by dim(V).
A vector space that is not finite-dimensional is called infinite-dimensional.
The following results are consequences of Examples 1 through 4.
Example 7
The vector space {0} has dimension zero. •
Example 8
The vector space Fn has dimension n. •
Example 9
The vector space Mmxn(F) has dimension rnn. •
Example 10
The vector space Pn{F) has dimension n + 1. •
The following examples show that the dimension of a vector space depends
on its field of scalars.
/
Example 11
Over the field of complex numbers, the vector space of complex numbers has
dimension 1. (A basis is {1}.) •
Example 12
Over the field of real numbers, the vector space of complex numbers has
dimension 2. (A basis is {l,i}.) •
In the terminology of dimension, the first conclusion in the replacement
theorem states that if V is a finite-dimensional vector space, then no linearly
independent subset of V can contain more than dim(V) vectors. From this
fact it follows that the vector space P(F) is infinite-dimensional because it
has an infinite linearly independent set, namely {l,x, x2,...}. This set is,
in fact, a basis for P(F). Yet nothing that we have proved in this section
guarantees an infinite-dimensional vector space must have a basis. In Section
1.7 it is shown, however, that every vector space has a basis.
Just as no linearly independent subset of a finite-dimensional vector space
V can contain more than dim(V) vectors, a corresponding statement can be
made about the size of a generating set.
Corollary 2. Let V be a vector space with dimension n.
(a) Any finite generating set for V contains at least n vectors, and a gener
ating set for V that contains exactly n vectors is a basis for V.

48 Chap. 1 Vector Spaces
(b) Any linearly independent subset ofV that contains exactly n vectors is
a basis for V.
(c) Every linearly independent subset of V can be extended to a basis for
V.
Proof. Let ft be a basis for V.
(a) Let G be a finite generating set for V. By Theorem 1.9 some subset H
of G is a basis for V. Corollary 1 implies that H contains exactly n vectors.
Since a subset of G contains n vectors, G must contain at least n vectors.
Moreover, if G contains exactly n vectors, then we must have H = G, so that
G is a basis for V.
(b) Let L be a linearly independent subset of V containing exactly n
vectors. It follows from the replacement theorem that there is a subset H of
ft containing n — n = 0 vectors such that L U H generates V. Thus H = 0,
and L generates V. Since L is also linearly independent, L is a basis for V.
(c) If L is a linearly independent subset of V containing ra vectors, then
the replacement theorem asserts that there is a subset H of ft containing
exactly n — ra vectors such that LU H generates V. Now L U H contains at
most n vectors; therefore (a) implies that L U H contains exactly n vectors
and that L U H is a basis for V. 1
Example 13
It follows from Example 4 of Section 1.4 and (a) of Corollary 2 that
{x2 4- 3x - 2,2x2 4- 5x - 3, -x2 - 4x 4- 4}
is a basis for P2(R.). •
Example 14
It follows from Example 5 of Section 1.4 and (a) of Corollary 2 that
: ;)•(,: :)•(::).(::
is a basis for M2x2(R). ^
Example 15
It follows from Example 3 of Section 1.5 and (b) of Corollary 2 that
{(1,0,0, -1), (0,1,0, -1), (0,0,1, -1), (0,0,0,1)}
is a basis for R4. •

Sec. 1.6 Bases and Dimension
Example 16
49
For k = 0,1,..., n, let pk(x) = xfc4-xfc+14
of Section 1.5 and (b) of Corollary 2 that
f xn. It follows from Example 4
{po(x),Pi(x),...,pn(x)}
is a basis for Pn(F).
A procedure for reducing a generating set to a basis was illustrated in
Example 6. In Section 3.4, when we have learned more about solving systems
of linear equations, we will discover a simpler method for reducing a gener
ating set to a basis. This procedure also can be used to extend a linearly
independent set to a basis, as (c) of Corollary 2 asserts is possible.
An Overview of Dimension and Its Consequences
Theorem 1.9 as well as the replacement theorem and its corollaries contain
a wealth of information about the relationships among linearly independent
sets, bases, and generating sets. For this reason, we summarize here the main
results of this section in order to put them into better/perspective.
A basis for a vector space V is a linearly independent subset of V that
generates V. If V has a finite basis, then every basis for V contains the same
number of vectors. This number is called the dimension of V, and V is said
to be finite-dimensional. Thus if the dimension of V is n, every basis for V
contains exactly n vectors. Moreover, every linearly independent subset of
V contains no more than n vectors and can be extended to a basis for V
by including appropriately chosen vectors. Also, each generating set for V
contains at least n vectors and can be reduced to a basis for V by excluding
appropriately chosen vectors. The Venn diagram in Figure 1.6 depicts these
relationships.
Figure 1.6

50 Chap. 1 Vector Spaces
The Dimension of Subspaces
Our next result relates the dimension of a subspace to the dimension of
the vector space that contains it.
Theorem 1.11. Let W be a subspace of a finite-dimensional vector space
V. Then W is finite-dimensional and dim(W) < dim(V). Moreover, if
dim(W) = dim(V), then V = W.
Proof. Let dim(V) = n. If W = {()}, then W is finite-dimensional and
dim(W) = 0 < n. Otherwise, W contains a nonzero vector Xj; so {xi} is a
linearly independent set. Continue choosing vectors, Xi,X2, • • • , X& in W such
that {xi,X2, • • • ,Xfc} is linearly independent. Since no linearly independent
subset of V can contain more than n vectors, this process must stop at a
stage where A; < n and {xi,x2,... ,Xfc} is linearly independent but adjoining
any other vector from W produces a linearly dependent set. Theorem 1.7
(p. 39) implies that {xi,x2,... ,Xfc} generates W, and hence it is a basis for
W. Therefore dim(W) = k < n.
If dim(W) = n, then a basis for W is a linearly independent subset of V
containing n vectors. But Corollary 2 of the replacement theorem implies
that this basis for W is also a basis for V; so W = V. 1
Example 17
Let
/
W = {(o.\,a2,a:i,04,a&) £ F5: a.\ + 03 4- 05 = 0, a2 = 04).
It is easily shown that W is a subspace of F5 having
{(-1,0,1,0,0), (-1,0,0,0,1), (0,1,0,1,0)}
as a basis. Thus dim(W) = 3. •
Example 18
The set of diagonal n x n matrices is a subspace W of Mnxn(F) (see Example 3
of Section 1.3). A basis for W is
{En,E22,...,Enn},
where E%3 is the matrix in which the only nonzero entry is a 1 in the iih row
and jth column. Thus dim(W) = n. •
Example 19
We saw in Section 1.3 that the set of symmetric n x n matrices is a subspace
W of MnXn(F). A basis for W is
{Aij : 1 < i < j < n},

Sec. 1.6 Bases and Dimension 51
where A*3 is the n x n matrix having 1 in the ith row and jfth column, 1 in
the jth row and ith column, and 0 elsewhere. It follows that
dim(W) =n + (n~ 1)4- 4- 1 = -n(n+ 1
Corollary. If W is a subspace of a finite-dimensional vector space V. then
any basis for W can be extended to a basis for V.
Proof. Let S be a basis for W. Because S is a linearly independent, subset of
V, Corollary 2 of the replacement theorem guarantees that S can be extended
to a basis for V. J
Example 20
The set of all polynomials of the form
a18xl84-a16x164- 4- a2x 4- a0.
where ais,ai6, • • • ,a2,an £ F, is a subspace W of Pis(F). A basis for W is
{1. x2,..., x16, x18}, which is a subset of the standard basis for Pig(F). •
We can apply Theorem 1.11 to determine the subspaces of R2 and R
Since R2 has dimension 2, subspaces of R2 can be of dimensions 0, 1, or 2
only. The only subspaces of dimension 0 or 2 are {0} and R2. respectively.
Any subspace of R2 having dimension 1 consists of all scalar multiples of some
nonzero vector in R2 (Exercise 11 of Section 1.4).
If a point of R2 is identified in the natural way with a point in the Euclidean
plane, then it is possible to describe the subspaces of R geometrically: A
subspace of R2 having dimension 0 consists of the origin of the Euclidean
plane, a subspace of R with dimension 1 consists of a line through the origin,
and a subspace of R2 having dimension 2 is the entire Euclidean plane.
Similarly, the subspaces of R,} must have dimensions 0, 1, 2, or 3. Inter
preting these possibilities geometrically, we see that a subspace of dimension
zero must be the origin of Euclidean 3-space, a subspace of dimension 1 is
a line through the origin, a subspace of dimension 2 is a plane through the
origin, and a subspace of dimension 3 is Euclidean 3-space itself.
The Lagrange Interpolation Formula
Corollary 2 of the replacement theorem can be applied to obtain a useful
formula. Let Cq,Ci,...,cn be distinct scalars in an infinite field F. The
polynomials fo(x). fi(x) fn(x) defined by
fi(*) =
(X - Cp) - • • (X - Cj_!)(x - Cj+i) • • • (X - Cn)
{(•< -Co)--- (Ci - Ci~l)(Ci - Ci+l) •••(C-i- Cn)
n
fc=0
kjti
X -Cfc
Ci - Cfc

52 Chap. 1 Vector Spaces
are called the Lagrange polynomials (associated with co,ci,... ,cn). Note
that each fi(x) is a polynomial of degree n and hence is in Pn(F). By re
garding fi(x) as a polynomial function /*: F —> F, we see that
fi(cj) =
0 tii^j
1 if i = j.
(10)
This property of Lagrange polynomials can be used to show that ft =
{/o> /i) • • • > fn} is a linearly independent subset of Pn(F). Suppose that
/_, aifi — 0 f°r some scalars an, ai,..., an,
i=0
where 0 denotes the zero function. Then
n
^2 aifi(cj) = 0 for j = 0,1,..., n.
i=0
But also
n
22a.ifi(cj) = Oj
i=0
by (10). Hence aj = 0 for j = 0,1,... , n; so /3 is linearly independent. Since
the dimension of Pn(F) is n+1, it follows from Corollary 2 of the replacement
theorem that ft is a basis for Pn(F).
Because ft is a basis for Pn(F), every polynomial function g in Pn(F) is a
linear combination of polynomial functions of ft, say,
n
9 = ^>2bifi.
1=0
It follows that
9(cj) = ^2bifi(cj) = bJ'^
i=0
so
9 = ^9(ci)h
i-0
is the unique representation of g as a linear combination of elements of ft.
This representation is called the Lagrange interpolation formula. Notice

Sec. 1.6 Bases and Dimension 53
that the preceding argument shows that if bo,b\,...,bn are any n4-1 scalars
in F (not necessarily distinct), then the polynomial function
n
9 = Y1 Mi
i=0
is the unique polynomial in Pn(F) such that g(cj) = bj. Thus we have found
the unique polynomial of degree not exceeding n that has specified values
bj at given points Cj in its domain (j = 0,1,...,n). For example, let us
construct the real polynomial g of degree at most 2 whose graph contains the
points (1,8), (2,5), and (3, —4). (Thus, in the notation above, cn = 1, C\ = 2,
C2 = 3, bo = 8, &i = 5, and b2 = —4.) The Lagrange polynomials associated
with Co, Ci, and c2 are
, . , (x - 2)(x - 3) l,o.
/^)=(l-2)(l-3)-2-^-5- + 6^
/' (•'•) = t lit ^ = -Hx2 - 4x + 3),
(2-l)(2-3)
and
/
/*<*)=(r?i!r?M(*2-3*+2).
(3 - 1)(3 - 2) 2
Hence the desired polynomial is
2
g(x) = J2bifi(x) = 8/0(x) 4- 5/i(x) - 4/2(x)
i=0
= 4(x2 - 5x 4- 6) - 5(x2 - 4x 4- 3) - 2(x2 - 3x 4- 2)
= -3x2 + 6x 4- 5.
An important consequence of the Lagrange interpolation formula is the fol
lowing result: If / £ Pn(F) and f(ci) = 0 for n4T distinct scalars CQ, C\, ..., cn
in F, then / is the zero function.
EXERCISES
1. Label the following statements as true or false.
(a) The zero vector space has no basis.
(b) Every vector space that is generated by a finite set has a basis.
(c) Every vector space has a finite basis.
(d) A vector space cannot have more than one basis.

54 Chap. 1 Vector Spaces
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(1)
If a vector space has a finite basis, then the number of vectors in
every basis is the same.
The dimension of Pn(F) is n.
The dimension of MmXn(F) is rn 4- n.
Suppose that V is a finite-dimensional vector space, that Si is a
linearly independent subset of V, and that S2 is a subset of V that
generates V. Then Si cannot contain more vectors than S2.
If S generates the vector space V, then every vector in V can be
written as a linear combination of vectors in S in only one way.
Every subspace of a finite-dimensional space is finite-dimensional.
If V is a vector space having dimension n, then V has exactly one
subspace with dimension 0 and exactly one subspace with dimen
sion n.
If V is a vector space having dimension n, and if S is a subset of
V with n vectors, then S is linearly independent if and only if S
spans V.
2. Determine which of the following sets are bases for R3.
(a) {(1,0,-1), (2,5,1), (0,-4,3)}
(b) {(2,-4,1), (0,3,-1), (6,0,-1)}
(c) {(1,2,-1),(1,0,2),(2,1,1)}
(d) {(-1,3,1), (2, -4, -3), (-3,8,2)}
(e) {(1, -3, -2), (-3,1,3), (-2, -10, -2)}
3. Determine which of the following sets are bases for P2(R).
(a) {-l-x + 2x2,2 + x-2x2,l-2x4-4x2}
(b) {14-2X4-X2,3 4-X2,X4-X2}
(c) {1 - 2x - 2x2, -2 4- 3x - x2,1 - x 4- 6x2}
(d) {-1 4- 2x 4- 4x2,3 - 4x - 10x2, -2 - 5x - 6x2}
(e) {1 4- 2x - x2,4 - 2x 4- x2, -1 4- 18x - 9x2}
4. Do the polynomials x3 — 2x24T,4x2 — x4-3, and 3x —2 generate P,3(i?)?
Justify your answer.
5. Is {(1,4, -6), (1,5,8), (2,1,1), (0,1,0)} a linearly independent subset of
R3? Justify your answer.
6. Give three different bases for F2 and for M2X2(F).
7. The vectors u: = (2,-3,1), u2 = (1,4,-2), u3 = (-8,12,-4), uA =
(1,37, —17), and u*, = (—3, —5,8) generate R3. Find a subset of the set
{ui,u2,U3,U4,ur,} that is a basis for R3.

Sec. 1.6 Bases and Dimension 55
8. Let W denote the subspace of R5 consisting of all the vectors having
coordinates that sum to zero. The vectors
wi = (2, -3,4, -5,2), u2 = (-6,9, -12,15, -6),
u3 = (3, -2,7, -9,1), u4 = (2, -8,2, -2,6),
u5 = (-1,1,2,1, -3), uQ = (0, -3, -18,9,12),
u7 = (1,0, -2,3, -2), u8 = (2, -1,1, -9,7)
generate W. Find a subset of the set {ui, u2,... ,Ug} that is a basis for
W.
9. The vectors m = (1,1,1,1), u2 = (0,1,1,1), u3 = (0,0,1,1), and
W4 = (0,0,0,1) form a basis for F4. Find the unique representation
of an arbitrary vector (01,02,03,04) in F4 as a linear combination of
Mi, u2, U3, and U4.
10. In each part, use the Lagrange interpolation formula to construct the
polynomial of smallest degree whose graph contains the following points.
(a) (-2,-6), (-1,5), (1,3)
(b) (-4,24), (1,9), (3,3)
(c) (-2,3), (-1,-6), (1,0), (3,-2)
(d) (-3,-30), (-2,7), (0,15), (1,10)
/
11. Let u and v be distinct vectors of a vector space V. Show that if {u, v}
is a basis for V and a and 6 are nonzero scalars, then both {u 4- v, au}
and {au, bv} are also bases for V.
12. Let u, v, and w be distinct vectors of a vector space V. Show that if
{u, v, w} is a basis for V, then {u 4- v 4- w, v + w, w} is also a basis for V.
13. The set of solutions to the system of linear equations
xi - 2x2 4- x3 = 0
2xi — 3x2 4- X3 = 0
is a subspace of R3. Find a basis for this subspace.
14. Find bases for the following subspaces of F5:
Wi = {(ai,a2,a3,a4,a5) £ F5: ax - a3 - a4 = 0}
and
W2 = {(ai, a2, a3, a4, a5) £ F5: a2 = a3 = a4 and ax 4- a5 = 0}.
What are the dimensions of Wi and W2?

56 Chap. 1 Vector Spaces
15. The set of all n x n matrices having trace equal to zero is a subspace W
of Mnxn(F) (see Exam
is the dimension of W?
of Mnxn(F) (see Example 4 of Section 1.3). Find a basis for W. What
16. The set of all upper triangular n x n matrices is a subspace W of
MnXn(F) (see Exercise 12 of Section 1.3). Find a basis for W. What is
the dimension of W?
17. The set of all skew-symmetric n x n matrices is a subspace W of
Mnxn(^) (see Exercise 28 of Section 1.3). Find a basis for W. What is
the dimension of W?
18. Find a basis for the vector space in Example 5 of Section 1.2. Justify
your answer.
19. Complete the proof of Theorem 1.8.
20.* Let V be a vector space having dimension n, and let S be a subset of V
that generates V.
(a) Prove that there is a subset of S that is a basis for V. (Be careful
not to assume that S is finite.)
(b) Prove that S contains at least n vectors.
21. Prove that a vector space is infinite-dimensional if and only if it contains
an infinite linearly independent subset.
22. Let Wi and W2 be subspaces of a finite-dimensional vector space V.
Determine necessary and sufficient conditions on Wi and W2 so that
dim(WinW2) = dim(W1).
23. Let Vi,V2, • • • ,vk,v be vectors in a vector space V, and define Wi =
span({vi,v2,...,vk}), and W2 = span({vi,v2,... ,vk,v}).
(a) Find necessary and sufficient conditions on v such that dim(Wi) =
dim(W2).
(b) State and prove a relationship involving dim(Wi) and dim(W2) in
the case that dim(Wi) ^ dim(W2).
24. Let f(x) be a polynomial of degree n in Pn(R). Prove that for any
g(x) £ Pn(R) there exist scalars Co, c\,...,cn such that
g(x) = co/(x) 4- Cl/'(x) 4- c2f"(x) + • • • 4- cnf^(x),
where f^n'(x) denotes the nth derivative of f(x).
25. Let V, W, and Z be as in Exercise 21 of Section 1.2. If V and W are
vector spaces over F of dimensions m and n, determine the dimension
ofZ.

Sec. 1.6 Bases and Dimension 57
26. For a fixed a £ R, determine the dimension of the subspace of Pn(R)
defined by {f £ Pn(R): f(a) = 0}.
27. Let Wi and W2 be the subspaces of P(F) defined in Exercise 25 in
Section 1.3. Determine the dimensions of the subspaces Wi n Pn(F)
and W2 n Pn(F).
28. Let V be a finite-dimensional vector space over C with dimension n.
Prove that if V is now regarded as a vector space over R, then dim V =
2n. (See Examples 11 and 12.)
Exercises 29-34 require knowledge of the sum and direct sum of subspaces,
as defined in the exercises of Section 1.3.
29. (a) Prove that if W] and W2 are finite-dimensional subspaces of a
vector space V, then the subspace Wi 4- W2 is finite-dimensional,
and dim(Wi 4-W2) = dim(Wi) 4-dim(W2) - dim(W, n W2). Hint:
Start with a basis {ui,u2,... ,uk} for Wi f) W2 and extend this
set to a basis {ui, u2,..., uk,vi,v2,... vm} for Wi and to a basis
{uii«2,... ,uk,wi,w2,... wp} for W2.
(b) Let Wi and W2 be finite-dimensional subspaces of a vector space
V, and let V = Wi 4- W2. Deduce that V is4he direct sum of Wi
and W2 if and only if dim(V) = dim(Wi) 4- dim(W2).
30. Let
V = M2X2(F), Wi =
c a
eV: a,b,c£F
and
W2 =
—a
£\/:a,b£F
Prove that W] and W2 are subspaces of V, and find the dimensions of
Wi, W2, Wi 4- W2, and Wi n W2.
31. Let Wi and W2 be subspaces of a vector space V having dimensions rn
and n, respectively, where m> n.
(a) Prove that dim(Wi n W2) < n.
(b) Prove that dim(Wi 4- W2) <m + n.
32. (a) Find an example of subspaces Wi and W2 of R3 with dimensions
m and n, where m > n > 0, such that dim(Wi D W2) = n.
(b) Find an example of subspaces Wi and VV2 of R3 with dimensions
m and n, where m > n > 0, such that dim(Wi 4- VV2) = m + n.

58 Chap. 1 Vector Spaces
Find an example of subspaces Wi and W2 of R3 with dimensions
m and n, where rn > n, such that both dim(Wi D W2) < n and
dim(Wi 4- W2) < m 4-n.
(c)
33. (a) Let Wi and W2 be subspaces of a vector space V such that V =
Wi ©W2. If fti and ft2 are bases for Wi and W2, respectively, show
that fti n £k = 0 and fti U ft2 is a basis for V.
(b) Conversely, let fti and ft2 be disjoint bases for subspaces Wi and
W2, respectively, of a vector space V. Prove that if fti U ft2 is a
basis for V, then V = Wi © W2.
34. (a) Prove that if Wi is any subspace of a finite-dimensional vector
space V, then there exists a subspace VV2 of V such that V =
Wi©W2.
(b) Let V = R2 and Wi = {(ai,0): ai £ R}. Give examples of two
different subspaces W2 and W2 such that V = Wi © W2 and V =
Wi©W2.
The following exercise requires familiarity with Exercise 31 of Section 1.3.
35. Let W be a subspace of a finite-dimensional vector space V, and consider
the basis {ui,u2,... ,uk} for W. Let {ui,u2,... ,uk,uk+\,... ,un} be
an extension of this basis to a basis for V.
(a) Prove that {uk+i + W, wfc+2 4- W,..., un 4- W} is a basis for V/W.
(b) Derive a formula relating dim(V), dim(W), and dim(V/W).
1.7* MAXIMAL LINEARLY INDEPENDENT SUBSETS
In this section, several significant results from Section 1.6 are extended to
infinite-dimensional vector spaces. Our principal goal here is to prove that
every vector space has a basis. This result is important in the study of
infinite-dimensional vector spaces because it is often difficult to construct an
explicit basis for such a space. Consider, for example, the vector space of
real numbers over the field of rational numbers. There is no obvious way to
construct a basis for this space, and yet it follows from the results of this
section that such a basis does exist.
The difficulty that arises in extending the theorems of the preceding sec
tion to infinite-dimensional vector spaces is that the principle of mathematical
induction, which played a crucial role in many of the proofs of Section 1.6,
is no longer adequate. Instead, a more general result called the maximal
principle is needed. Before stating this principle, we need to introduce some
terminology.
Definition. Let J- be a family of sets. A member M of T is called
maximal (with respect to set inclusion) if M is contained in no member of
T other than M itself.

Sec. 1.7 Maximal Linearly Independent Subsets 59
Example 1
Let T be the family of all subsets of a nonempty set S. (This family T is
called the power set of S.) The set S is easily seen to be a maximal element
oiT. •
Example 2
Let S and T be disjoint nonempty sets, and let T be the union of their power
sets. Then S and T are both maximal elements of T. •
Example 3
Let T be the family of all finite subsets of an infinite set S. Then T has no
maximal element. For if M is any member of T and s is any element of S
that is not in M, then ML) {s} is a member of T that contains M as a proper
subset. •
Definition. A collection of sets C is called a chain (or nest or tower)
if for each pair of sets A and B in C, either A C B or B C A.
Example 4
For each positive integer n let An = {1,2, ...,n}. Then the collection of
sets C = {An: n — 1,2,3,...} is a chain. In fact, Am C An if and only if
m<n. •
With this terminology we can now state the maximal principle.
Maximal Principle.4 Let T be a family of sets. If, for each chain C C JF,
there exists a member of T that contains each member ofC, then T contains
a maximal member.
Because the maximal principle guarantees the existence of maximal el
ements in a family of sets satisfying the hypothesis above, it is useful to
reformulate the definition of a basis in terms of a maximal property. In The
orem 1.12, we show that this is possible; in fact, the concept defined next is
equivalent to a basis.
Definition. Let S be a subset of a vector space V. A maximal linearly
independent subset of S is a subset B of S satisfying both of the following
conditions.
(a) B is linearly independent.
(b) The only linearly independent subset of S that contains B is B itself.
4The Maximal Principle is logically equivalent to the Axiom of Choice, which
is an assumption in most axiomatic developments of set theory. For a treatment
of set theory using the Maximal Principle, see John L. Kelley, General Topology,
Graduate Texts in Mathematics Series, Vol. 27, Springer-Verlag, 1991.

60 Chap. 1 Vector Spaces
Example 5
Example 2 of Section 1.4 shows that
{x3 - 2x2 - 5x - 3,3.x3 - 5a;2 - Ax - 9}
is a maximal linearly independent subset of
S = {2x3 - 2x2 + \2x - 6,x3 - 2x2 - hx - 3,3x3 - 5x2 - Ax - 9}
in P2(-R). In this case, however, any subset of S consisting of two polynomials
is easily shown to be a maximal linearly independent subset of S. Thus
maximal linearly independent subsets of a set need not be unique. •
A basis ft for a vector space V is a maximal linearly independent subset
of V, because
1. ft is linearly independent by definition.
2. If v £ V and v ^ ft, then ft U {v} is linearly dependent by Theorem 1.7
(p. 39) because span(/?) = V.
Our next result shows that the converse of this statement is also true.
Theorem 1.12. Let V be a vector space and S a subset that generates
V. If ft is a maximal linearly independent subset ofS, then ft is a basis for V.
Proof. Let ft be a maximal linearly independent subset of S. Because ft
is linearly independent, it suffices to prove that ft generates V. We claim
that S C span(/3), for otherwi.se there exists a v £ S such that v £ span(/?).
Since Theorem 1.7 (p. 39) implies that ft U {v} is linearly independent, we
have contradicted the maximality of ft. Therefore S C span(/3). Because
span(S) = V, it follows from Theorem 1.5 (p. 30) that span(/3) = V. 0
Thus a subset of a vector space is a basis if and only if it is a maximal
linearly independent subset of the vector space. Therefore we can accomplish
our goal of proving that every vector space has a basis by showing that every
vector space contains a maximal linearly independent subset. This result
follows immediately from the next theorem.
Theorem 1.13. Let S be a linearly independent subset of a vector space
V. There exists a maximal linearly independent subset ofV that contains S.
Proof. Let T denote the family of all linearly independent subsets of V
that contain S. In order to show that T contains a maximal element, we must
show that if C is a chain in T, then there exists a member U of T that contains
each member of C. We claim that U, the union of the members of C, is the
desired set. Clearly U contains each member of C, and so it suffices to prove

Sec. 1.7 Maximal Linearly Independent Subsets 61
that U £ T (i.e., that U is a linearly independent subset, of V that contains S).
Because each member of C is a subset of V containing S, we have S C U C V.
Thus we need only prove that U is linearly independent. Let u\, u2,... ,un
be in U and ai,a2,..., an be scalars such that aiUi 4- a2w2 4- • • • 4- anun = 0.
Because Ui £ U for i — 1,2,... ,n, there exists a set Ai in C such that Ui £ Ai.
But since C is a chain, one of these sets, say Ak, contains all the others. Thus
ui £ Ak for i = 1,2, ...,n. However, Ak is a linearly independent set; so
a\U\ + a2u2 4- • • • 4- anun = 0 implies that a\ = a2 — • • • — an = 0. It follows
that U is linearly independent.
The maximal principle implies that T has a maximal element. This el
ement is easily seen to be a maximal linearly independent subset of V that
contains S. H
Corollary. Every vector space has a basis.
It can be shown, analogously to Corollary 1 of the replacement theorem
(p. 46), that every basis for an infinite-dimensional vector space has the same
cardinality. (Sets have the same cardinality if there is a one-to-one and onto
mapping between them.) (See, for example, N. Jacobson, Lectures in Ab
stract Algebra, vol. 2, Linear Algebra. D. Van Nostrand Company, New
York, 1953. p. 240.)
Exercises 4-7 extend other results from Section 1.6 tri infinite-dimensional
vector spaces.
EXERCISES
1. Label the following statements as true or false.
(a) Every family of sets contains a maximal element.
(b) Every chain contains a maximal element.
(c) If a family of sets has a maximal element, then that maximal
element is unique.
(d) If a chain of sets has a maximal clement, then that maximal ele
ment is unique.
(e) A basis for a vector space is a maximal linearly independent subset
of that vector space.
(f) A maximal linearly independent subset of a vector space is a basis
for that vector space.
2. Show that the set of convergent sequences is an infinite-dimensional
subspace of the vector space of all sequences of real numbers. (See
Exercise 21 in Section 1.3.)
3. Let V be the set of real numbers regarded as a vector space over the
field of rational numbers. Prove that V is infinite-dimensional. Hint:

62 Chap. 1 Vector Spaces
Use the fact that IT is transcendental, that is, it is not a zero of any
polynomial with rational coefficients.
4. Let W be a subspace of a (not necessarily finite-dimensional) vector
space V. Prove that any basis for W is a subset of a basis for V.
5. Prove the following infinite-dimensional version of Theorem 1.8 (p. 43):
Let ft be a subset of an infinite-dimensional vector space V. Then ft is a
basis for V if and only if for each nonzero vector v in V, there exist unique
vectors u\, u2,..., un in ft and unique nonzero scalars C\, c2,..., cn such
that v — ciui + c2u2 -\ 1- cnun.
6. Prove the following generalization of Theorem 1.9 (p. 44): Let Si and
S2 be subsets of a vector space V such that Si C S2. If Si is linearly
independent and S2 generates V, then there exists a basis ft for V such
that Si C ft C S2. Hint: Apply the maximal principle to the family of
all linearly independent subsets of S2 that contain Si, and proceed as
in the proof of Theorem 1.13.
7. Prove the following generalization of the replacement theorem. Let ft
be a basis for a vector space V, and let S be a linearly independent
subset of V. There exists a subset Si of ft such that S U Si is a basis
forV.
INDEX OF DEFINITIONS FOR CHAPTER 1
Additive inverse 12
Basis 43
Cancellation law 11
Column vector 8
Chain 59
Degree of a polynomial 9
Diagonal entries of a matrix £
Diagonal matrix 18
Dimension 47
Finite-dimensional space 46
Generates 30
Infinite-dimensional space 47
Lagrange interpolation formula
Lagrange polynomials 52
Linear combination 24
Linearly dependent 36
Linearly independent 37
Matrix 8
Maximal element of a family
of sets 58
52
Maximal linearly independent
subset 59
n-tuple 7
Polynomial 9
Row vector 8
Scalar 7
Scalar multiplication 6
Sequence 11
Span of a subset 30
Spans 30
Square matrix 9
Standard basis for F" 43
Standard basis for Pn(F) 43
Subspace 16
Subspace generated by the elements
of a set 30
Symmetric matrix 17
Trace 18
Transpose 17
Trivial representation of 0 36

Chap. 1 Index of Definitions
Vector 7
Vector addition 6
Vector space 6
Zero matrix 8
63
Zero polynomial 9
Zero subspace 16
Zero vector 12
Zero vector space 15
/

2
Linear Transformations
and Matrices
2.1
2.2
2.3
2.4
2.5
2.6*
2.7*
Linear Transformations, Null spaces, and Ranges
The Matrix Representation of a Linear Transformation
Composition of Linear Transformations and Matrix Multiplication
Invertibility and Isomorphisms
The Change of Coordinate Matrix
Dual Spaces
Homogeneous Linear Differential Equations with Constant Coefficients
in Chapter 1, we developed the theory of abstract vector spaces in consid
erable detail. It is now natural to consider those functions defined on vector
spaces that in some sense "preserve" the structure. These special functions
are called linear transformations, and they abound in both pure and applied
mathematics. In calculus, the operations of differentiation and integration
provide us with two of the most important examples of linear transforma
tions (see Examples 6 and 7 of Section 2.1). These two examples allow us
to reformulate many of the problems in differential and integral equations in
terms of linear transformations on particular vector spaces (see Sections 2.7
and 5.2).
In geometry, rotations, reflections, and projections (see Examples 2, 3,
and 4 of Section 2.1) provide us with another class of linear transformations.
Later we use these transformations to study rigid motions in Rn (Section
6.10).
In the remaining chapters, we see further examples of linear transforma
tions occurring in both the physical and the social sciences. Throughout this
chapter, we assume that all vector spaces are over a common field F.
2.1 LINEAR TRANSFORMATIONS, NULL SPACES, AND RANGES
In this section, we consider a number of examples of linear transformations.
Many of these transformations are studied in more detail in later sections.
Recall that a function T with domain V and codomain W is denoted by
64

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges
T: V -» W. (See Appendix B.)
65
Definition. Let V and W be vector spaces (over F). We call a function
T: V —* W a linear transformation from V to W if, for all x, y £ V and
c £ F, we have
(a) T(x + y) = T(x) 4- T(y) and
(b) T(cx) = cT(x).
If the underlying field F is the field of rational numbers, then (a) implies
(b) (see Exercise 37), but, in general (a) and (b) are logically independent.
See Exercises 38 and 39.
We often simply call T linear. The reader should verify the following
properties of a function T: V —» W. (See Exercise 7.)
1. If T is linear, then T(0) = 0.
2. T is linear if and only if T(cx + y) = cT(x) + T(y) for all x, y £ V and
c£F.
3. If T is linear, then T(x -y) = T(x) - T(y) for all x. y £ V.
4. T is linear if and only if, for xi,x2.... .xn £V and oi, o2,... , a„ £ F,
we have
T E">x<) = X>T(-<-<)-
We generally use property 2 to prove that a given transformation is linear.
Example 1
Define
T: R2 -> R2 by T(a,.a2) = (2a, + a2.ai).
To show that T is linear, let c £ R and x,y £ R2. where x = (bi.b2) and
y = (d\,d2). Since
ex + y= (cbi 4- dx, cb2 4- d2).
we have
T(cx + y) = (2(c6] +di) + cb2 + d2. cbx +di).
Also
cT(x) 4- T(y) = c(26, 4- 62,6j) 4- (2di 4- d2, rf,)
= (2c6, 4- c&2 + 2di 4- d2, cbi 4- r/,)
= (2(c6i 4- di) 4 c62 + ^2, c6] + di).
So T is linear.

66 Chap. 2 Linear Transformations and Matrices
T« (01,02]
(01,02]
fa) Rotation
(oi,a2]
T(oi,fl2)
(01,-02]
(b) Reflection
(01,02,
T(ai,02) =
(ai,0)
(c) Projection
Figure 2.1
As we will see in Chapter 6, the applications of linear algebra to geometry
are wide and varied. The main reason for this is that most of the important
geometrical transformations are linear. Three particular transformations that
we now consider are rotation, reflection, and projection. We leave the proofs
of linearity to the reader.
Example 2
For any angle 0, define T#: R2 —» R2 by the rule: Te(ai,a2) is the vector
obtained by rotating (ai,a2) counterclockwise by 6 if (oi,a2) ^ (0,0), and
T#(0,0) = (0,0). Then T#: R2 —» R2 is a linear transformation that is called
the rotation by 0.
We determine an explicit formula for T#. Fix a nonzero vector (ai,a2) £
R2. Let a be the angle that (oi,a2) makes with the positive x-axis (see
Figure 2.1(a)), and let r = ya2 + a2,. Then a,i — rcosc* and a2 = rsino;.
Also, Tff(ai,a2) has length r and makes an angle a + 9 with the positive
x-axis. It follows that
Tfl(ai,a2) = (rcos(a4-0),rsin(a!4-0))
= (r cos a cos 6 — r sin a sin 0, r cos a sin 0 + r sin a cos 0)
= (ai cos 9 — a2 sin 0,ai sin 0 4- a2 cos 9).
Finally, observe that this same formula is valid for (01,02) = (0,0).
It is now easy to show, as in Example 1, that T# is linear. •
Example 3
Define T: R2 —» R2 by T(ai,o2) = (01,-02). T is called the reflection
about the x -axis. (See Figure 2.1(b).) •
Example 4
Define T: R2 —> R2 by T(oi,o2) = (oi,0). T is called the projection on the
x-axis. (See Figure 2.1(c).) •

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 67
We now look at some additional examples of linear transformations.
Example 5
Define T: Mmxn(-F) —> Mnxm(F) by T(A) = A1, where A1 is the transpose
of A, defined in Section 1.3. Then T is a linear transformation by Exercise 3
of Section 1.3. •
Example 6
Define T: Pn(R) -* Pn-i(R) by T(/(x)) = f'(x), where f'(x) denotes the
derivative of f(x). To show that T is linear, let g(x), h(x) £ Pn(R) and a £ R.
Now
T(ag(x) 4- h(x)) = (ag(x) + h(x))' = ag'(x) 4- h'(x) = aT(g(x)) + T(h(x)).
So by property 2 above, T is linear. •
Example 7
Let V = C(R), the vector space of continuous real-valued functions on R. Let
a,b£ R, a<b. Define T: V -* R by
T(/)= / f(t)dt
Ja
/
for all / £ V. Then T is a linear transformation because the definite integral
of a linear combination of functions is the same as the linear combination of
the definite integrals of the functions. •
Two very important examples of linear transformations that appear fre
quently in the remainder of the book, and therefore deserve their own nota
tion, are the identity and zero transformations.
For vector spaces V and W (over F), we define the identity transfor
mation lv: V —• V by \v(x) = x for all x £ V and the zero transformation
T0: V -• W by T0(x) = 0 for all x £ V. It is clear that both of these
transformations are linear. We often write I instead of lv-
We now turn our attention to two very important sets associated with
linear transformations: the range and null space. The determination of these
sets allows us to examine more closely the intrinsic properties of a linear
transformation.
Definitions. Let V and W be vector spaces, and let T: V —* W be linear.
We define the null space (or kernel) N(T) of T to be the set of all vectors
x in V such that T(x) = 0; that is, N(T) = {iGV: T(X) = 0}.
We define the range (or image) R(T) of T to be the subset of W con
sisting of all images (under T) of vectors in V; that is, R(T) = {T(x): x £ V}.

68
Example 8
Chap. 2 Linear Transformations and Matrices
Let V and W be vector spaces, and let I: V —> V and TQ: V —> W be the
identity and zero transformations, respectively. Then N(l) = {0}, R(l) = V,
N(T0) = V, andR(To) = {0}. •
Example 9
Let T: R3 —> R2 be the linear transformation defined by
T(ai, 02,03) = (01 -a2,2o3).
It is left as an exercise to verify that
N(T) = {(a, a,0):a£R} and R(T) = R2. •
In Examples 8 and 9, we see that the range and null space of each of the
linear transformations is a subspace. The next result shows that this is true
in general.
Theorem 2.1. Let V and W be vector spaces and T: V -
Then N(T) and R(T) are subspaces ofV and W, respectively.
W be linear.
Proof. To clarify the notation, we use the symbols Oy and 0w to denote
the zero vectors of V and W, respectively.
Since T(0V) = #w, we have that 0\j £ N(T). Let x,y £ N(T) and c£ F.
Then T(x + y) = T(x) + T(y) = 0yv + 0w = #w, and T(cx) = cT(x) = c0w =
#w- Hence x + y £ N(T) and ex £ N(T), so that N(T) is a subspace of V.
Because T(0v) = 0\N, we have that #w € R(T). Now let x,y £ R(T) and
c £ F. Then there exist v and w in V such that T(v) — x and T(w) = y. So
T(v 4- w) = T(v) 4- T(w) = x + y, and T(cv) = cT(v) — ex. Thus x + y £ R(T)
and ex £ R(T), so R(T) is a subspace of W. 1
The next theorem provides a method for finding a spanning set for the
range of a linear transformation. With this accomplished, a basis for the
range is easy to discover using the technique of Example 6 of Section 1.6.
Theorem 2.2. Let V and W be vector spaces, and let T: V —• W be
linear. If ft = {vi,v2,..., vn} is a basis for V, then
R(T) = span(T(/?)) = span({T(<;i),T(V2),... ,T(vn)}).
Proof. Clearly T(vi) £ R(T) for each i. Because R(T) is a subspace,
R(T) contains span({T(i;i), T(v2),..., T(vn)}) = span(T(/3)) by Theorem 1.5
(p. 30).

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 69
Now suppose that w £ R(T). Then w — T(v) for some «GV. Because ft
is a basis for V, we have
n
v = 2J OiVi for some 01, a2,..., an 6 F.
i=i
Since T is linear, it follows that
n
= T(«) = ^aiT^) £ span(T(/3)).
10
i=l
So R(T) is contained in span(T (/?)). |J
It should be noted that Theorem 2.2 is true if ft is infinite, that is. R(T) =
Bpan.({T(v): v £ ft}). (See Exercise 33.)
The next example illustrates the usefulness of Theorem 2.2.
Example 10
Define the linear transformation T: P2(R) M2x2(R) by
T(/(*)) =
/(I) - /(2) 0
0 /(0),
Since ft — {l,x,x2} is a basis for P2(R), we have
R(T) = span(T(/5)) = span({T(l),T(,;),T(x2)})
= span
= span
0 0
0 1
0 0
0 1
-1 0
0 0
-1 0
0 0
-3 0
0 0
Thus we have found a basis for R(T), and so dim(R(T)) = 2. •
As in Chapter 1, we measure the "size" of a subspace by its dimension.
The null space and range are so important that we attach special names to
their respective dimensions.
Definitions. Let V and W be vector spaces, and let T: V —• W be
linear. If N(T) and R(T) are finite-dimensional, then we define the nullity
ofT, denoted nullity(T), and the rank of T, denoted rank(T), to be the
dimensions of N(T) and R(T), respectively.
Reflecting on the action of a linear transformation, we see intuitively that
the larger the nullity, the smaller the rank. In other words, the more vectors
that are carried into 0, the smaller the range. The same heuristic reasoning
tells us that the larger the rank, the smaller the nullity. This balance between
rank and nullity is made precise in the next theorem, appropriately called the
dimension theorem.

70 Chap. 2 Linear Transformations and Matrices
Theorem 2.3 (Dimension Theorem). Let V and W be vector spaces,
and let T: V —> W be linear. If V is finite-dimensional, then
nullity(T) 4- rank(T) = dim(V).
Proof. Suppose that dim(V) = n, dim(N(T)) = k, and {vi,v2,... ,vk} is
a basis for N(T). By the corollary to Theorem 1.11 (p. 51), we may extend
{vi,v2,... ,vk} to a basis ft = {vi,v2,... ,vn} for V. We claim that S =
{T(vk+i), T(vk+2),..., T(vn)} is a basis for R(T).
First we prove that S generates R(T). Using Theorem 2.2 and the fact
that T(v-i) — 0 for 1 < i < k, we have
R(T) = span({TM, T(v2),..., T(vn)}
= span({T(v/c+i),T(vfc+2),..., T(vn)} = span(S).
Now we prove that S is linearly independent. Suppose that
n
yi biT(vi) = 0 for 6fc+i, bk+2,..., bn £ F.
i=fc+l
Using the fact that T is linear, we have
T( J2 biVi) =°-
/ \i=fc+i /
So
^2 bivi e N(T)-
i=k+l
Hence there exist c,\,c2,...,ck £ F such that
n k k n
^2 biVi = y]ciVi or ^2(~Ci)vi + 22 °ivi = °-
i=k+l i=l i=l i=k+l
Since ft is a basis for V, we have bi — 0 for all i. Hence S is linearly indepen
dent. Notice that this argument also shows that T(i>fc+i), T(vk+2),..., T(vn)
are distinct; therefore rank(T) = n — k. B
If we apply the dimension theorem to the linear transformation T in Ex
ample 9, we have that nullity(T) 4- 2 = 3, so nullity(T) = 1.
The reader should review the concepts of "one-to-one" and "onto" pre
sented in Appendix B. Interestingly, for a linear transformation, both of these
concepts are intimately connected to the rank and nullity of the transforma
tion. This is demonstrated in the next two theorems.

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 71
Theorem 2.4. Let V and W be vector spaces, and let T: V —> W be
linear. Then T is one-to-one if and only if N(T) = {0}.
Proof. Suppose that T is one-to-one and x £ N(T). Then T(x) = 0 —
T(0). Since T is one-to-one, we have x — 0. Hence N(T) = {0}.
Now assume that N(T) = {0}, and suppose that T(x) = T(y). Then
0 = T(x) — T(y) = T(x — y) by property 3 on page 65. Therefore x — y £
N(T) = {0}. So x — y = 0, or x = y. This means that T is one-to-one. I
The reader should observe that Theorem 2.4 allows us to conclude that
the transformation defined in Example 9 is not one-to-one.
Surprisingly, the conditions of one-to-one and onto are equivalent in an
important special case.
Theorem 2.5. Let V and W be vector spaces of equal (finite) dimension,
and let T: V —• W be linear. Then the following are equivalent.
(a) T is one-to-one.
(b) T is onto.
(c) rank(T)=dim(V).
Proof. From the dimension theorem, we have
nullity (T) 4- rank(T) = dim(V).
Now, with the use of Theorem 2.4, we have that T is one-to-one if and only if
N(T) = {0}, if and only if nullity(T) = 0, if and only if rank(T) = dim(V), if
and only if rank(T) = dim(W), and if and only if dim(R(T)) = dim(W). By
Theorem 1.11 (p. 50), this equality is equivalent to R(T) = W, the definition
of T being onto. I
We note that if V is not finite-dimensional and T: V —» V is linear, then
it does not follow that one-to-one and onto are equivalent. (See Exercises 15,
16, and 21.)
The linearity of T in Theorems 2.4 and 2.5 is essential, for it is easy to
construct examples of functions from R into R that are not one-to-one, but
are onto, and vice versa.
The next two examples make use of the preceding theorems in determining
whether a given linear transformation is one-to-one or onto.
Example 11
Let T: P2(R) —* P3(R) be the linear transformation defined by
T(f(x)) = 2f'(x)-r fX3f(t)dt.
Jo

72
Now
Chap. 2 Linear Transformations and Matrices
R(T) = span({T(l), T(x), T(x2)}) = span({3x, 2 4- -x2, Ax + x3}).
Since {3x, 2 4- \x2,Ax 4- x3} is linearly independent, rank(T) = 3. Since
dim(P3(i?)) = 4, T is not onto. From the dimension theorem, nullity(T) +
3 = 3. So nullity(T) = 0, and therefore, N(T) = {0}. We conclude from
Theorem 2.4 that T is one-to-one. •
Example 12
Let T: F2 —+ F2 be the linear transformation defined by
T(oi,a2) = (oi +o2,oi).
It is easy to see that N(T) = {0}; so T is one-to-one. Hence Theorem 2.5
tells us that T must be onto. •
In Exercise 14, it is stated that if T is linear and one-to-one, then a
subset S is linearly independent if and only if T(S) is linearly independent.
Example 13 illustrates the use of this result.
Example 13
Let T: P2(R) —• R3 be the linear transformation defined by
T(a0 4- aix 4-a2a:2) = (ao,oi,o2).
Clearly T is linear and one-to-one. Let S = {2 — x + 3x2,x + x2,1 — 2.x2}.
Then S is linearly independent in P2(i?) because
T(S) = {(2,-l,3),(0,l,l),(l,0,-2)}
is linearly independent in R3. •
In Example 13, we transferred a property from the vector space of polyno
mials to a property in the vector space of 3-tuples. This technique is exploited
more fully later.
One of the most important properties of a linear transformation is that it is
completely determined by its action on a basis. This result, which follows from
the next theorem and corollary, is used frequently throughout the book.
Theorem 2.6. Let V and W be vector spaces over F, and suppose that
{vi, u2,..., vn} is a basis for V. For w\, w2,..., wn in W, there exists exactly
one linear transformation T: V —* W such that T(vi) = W{ for i = 1,2,..., n.

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges
Proof. Let x £ V. Then
73
X = y^QtTJj,
i=
where aio2,..., on are unique scalars. Define
n
T: V -» W by T(x) = J^ o^.
i=i
(a) T is linear: Suppose that u, v £ V and d £ F. Then we may write
n n
u — y biVi and v = y CiVi
i=l i=l
for some scalars bi, b2,..., bn, ci, c2,..., cn. Thus
du 4- v = ^2^bi + ('i)v>-
i=1
So
/
T(du 4- v) — y](dbj 4- c^w, = d y] bjWj 4- 22CiWj = dT(u) 4- T(v).
i=i i=l i=l
(b) Clearly
T(vi) = «;, for ?' = 1,2,..., n.
(c) T is unique: Suppose that U: V —* W is linear and U(t^) = Wi for
i = 1,2,..., n. Then for x € V with
= ^OiUi,
?;=i
have
U(x) = y^QjUfoj) = J^OiWi = T(x).
i=l i=l
Hence U = T. I
Corollary. Let V and W be vector spaces, and suppose that. V has a
finite basis {vi, v2, • • •, vn}. If U, T: V —> W are linear and U(??,;) = T(f^) for
i = 1,2,..., n, then U = T.

74 Chap. 2 Linear Transformations and Matrices
Example 14
Let T: R2 —> R2 be the linear transformation defined by
T(oi,o2) = (2a2 -oi,3ai),
and suppose that U: R2 —> R2 is linear. If we know that U(l,2) = (3,3) and
U(l, 1) = (1,3), then U = T. This follows from the corollary and from the
fact that {(1,2), (1,1)} is a basis for R2. •
EXERCISES
1. Label the following statements as true or false. In each part, V and W
are finite-dimensional vector spaces (over F), and T is a function from
V to W.
(a) If T is linear, then T preserves sums and scalar products.
(b) If T(x 4- y) = T(x) + T(y), then T is linear.
(c) T is one-to-one if and only if the only vector x such that T(x) = 0
is x = 0.
(d) If T is linear, then T(#v) = #w-
(e) If T is linear, then nullity(T) 4- rank(T) = dim(W).
(f) If T is linear, then T carries linearly independent subsets of V onto
linearly independent subsets of W.
(g) If T, U: V —> W are both linear and agree on a basis for V, then
T = U.
(h) Given xi,x2 € V and yi,y2 £ W, there exists a linear transforma
tion T: V —> W such that T(xi) = yi and T(x2) = y2.
For Exercises 2 through 6, prove that T is a linear transformation, and find
bases for both N(T) and R(T). Then compute the nullity and rank of T, and
verify the dimension theorem. Finally, use the appropriate theorems in this
section to determine whether T is one-to-one or onto.
2. T: R3 —> R2 defined by T(oi,02,03) = (ai — 02,203).
3. T: R2 -> R3 defined by T(al5a2) = (01 4-o2,0,2o! -o2).
4. T: M2x3(F) -» M2x2(F) defined by
On 012 013 \ /'2an — 012 ai3 + 2ai2
T
021 022 023 0 0
5. T: P2(R) - P3(R) defined by T(/(x)) = x/(x) 4- f'(x).

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 75
6. T: Mnxn(F) -¥ F defined by T(A) = tr(A). Recall (Example 4, Sec
tion 1.3) that
tv(A) = Y/Aii.
7. Prove properties 1, 2, 3, and 4 on page 65.
8. Prove that the transformations in Examples 2 and 3 are linear.
9. In this exercise, T: R2 —> R2 is a function. For each of the following
parts, state why T is not linear.
(a) T(oi,o2) = (l,o2)
(b) T(ai,a2) = (oi,of)
(c) T(ai, a2) = (sinai,0)
(d) T(oi,a2) = (|oi|,o2)
(e) T(ai,a2) = (ai 4-1, a2)
10. Suppose that T: R2 -> R2 is linear, T(1,0) = (1,4), and T(l, 1) = (2,5).
What is T(2,3)? Is T one-to-one?
11. Prove that there exists a linear transformation T: R2 —> R3 such that
T(l,l) = (1,0,2) and T(2,3) = (1,-1,4). What/s T(8,11)?
12. Is there a linear transformation T: R3 —» R2 such that T(l, 0,3) = (1,1)
and T(-2,0,-6) = (2,1)?
13. Let V and W be vector spaces, let T: V —> W be linear, and let
{w\,it)2, • • • ,wk} be a linearly independent subset of R(T). Prove that
if S — {vi,v2,... ,vk} is chosen so that T(vi) = Wi for i = 1,2,... ,k,
then S is linearly independent.
14. Let V and W be vector spaces and T: V —> W be linear.
(a) Prove that T is one-to-one if and only if T carries linearly inde
pendent subsets of V onto linearly independent subsets of W.
(b) Suppose that T is one-to-one and that 5 is a subset of V. Prove
that S is linearly independent if and only if T(S) is linearly inde
pendent.
(c) Suppose ft = {vi,v2,... ,vn} is a basis for V and T is one-to-one
and onto. Prove that T(ft) = {T(vi),T(v2),... , T(?;Tl)} is a basis
for W.
15. Recall the definition of P(R) on page 10. Define
T:P(JJ)-P(Jl) by T(/(x)) = f(t)dt.
Prove that T linear and one-to-one. but not onto.

76 Chap. 2 Linear Transformations and Matrices
16. Let T: P(R) -> P{R) be defined by T(/(x)) = f'(x). Recall that T is
linear. Prove that T is onto, but not one-to-one.
17. Let V and W be finite-dimensional vector spaces and T: V
linear.
W be
(a) Prove that if dim(V) < dim(W), then T cannot be onto.
(b) Prove that if dim(V) > dim(W), then T cannot be one-to-one.
18. Give an example of a linear transformation T: R2
N(T) = R(T).
R2 such that
19. Give an example of distinct linear transformations T and U such that
N(T) = N(U) and R(T) = R(U).
20. Let V and W be vector spaces with subspaces Vi and Wi, respectively.
If T: V —» W is linear, prove that. T(V() is a subspace of W and that
{x £ V: T(x) £ Wi} is a subspace of V.
21. Let V be the vector space of sequences described in Example 5 of Sec
tion 1.2. Define the functions T, U: V -» V by
T(ai,o2,...) = (02,03,...) and U(ai,a2,...) = (0,ai,a2, • • •)•
T and U arc called the left shift and right shift operators on V,
respectively. ^
(a) Prove that T and U are linear.
(b) Prove that T is onto, but not one-to-one.
(c) Prove that U is one-to-one, but not onto.
22. Let T: R3 —» R be linear. Show that there exist scalars a, b, and c such
that T(x, y, z) = ax + by + cz for all (x, y, z) £ R3. Can you generalize
this result for T: Fn —• F? State and prove an analogous result for
T . pn . pra
23. Let T: R —> R be linear. Describe geometrically the possibilities for
the null space of T. Hint: Use Exercise 22.
The following definition is used in Exercises 24 27 and in Exercise 30.
Definition. Let V be a vector space and Wi and W2 be subspaces of
V such that V = W] © W2. (Recall the definition of direct sum given in the
exercises of Section 1.3.) A function T: V —> V is called the projection on
Wi along VV2 if, for x = xi 4- x2 with xi £ Wi and x2 £ W2, we have
T(x) = x\.
24. Let T: R2 —* R2. Include figures for each of the following parts.

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 77
(a) Find a formula for T(a,b), where T represents the projection on
the y-axis along the x-axis.
(b) Find a formula for T(a, b), where T represents the projection on
the y-axis along the line L — {(s,s): s £ R}.
25. Let T: R: R3.
(a) If T(o, b,c) = (a, 6,0), show that T is the projection on the xy-
plane along the 2-axis.
(b) Find a formula for T(a, b, c), where T represents the projection on
the 2-axis along the xy-plane.
(c) If T(a,b,c) = (a — c, b, 0), show that T is the projection on the
x;(/-plane along the line L = {(o, 0, a): a £ R}.
26. Using the notation in the definition above, assume that T: V —> V is
the projection on Wi along W2.
(a) Prove that T is linear and W| ={JGV: T(X) = x}.
(b) Prove that W, = R(T) and W2 = N(T).
(c) Describe T if W, -V.
(d) Describe T if Wi is the zero subspace.
27. Suppose that W is a subspace of a finite-dimensional vector space V.
(a) Prove that there exists a subspace W and a function T: V —> V
such that T is a projection on W along W.
(b) Give an example of a subspace W of a vector space V such that
there are two projections on W along two (distinct) subspaces.
The following definitions are used in Exercises 28 32.
Definitions. Let V be a vector space, and let T: V —• V be linear. A
subspace W of V is said to be T-invariant ifT(x) £ W for every x £ W, that
is, T(W) C W. 7f W is T-iuvariaut, we define the restriction of T on W to
be the function Tw: W —• W defined by Tw(x) = T(x) for all x £ W.
Exercises 28-32 assume that W is a subspace of a vector space V and that
T: V —> V is linear. Warning: Do not assume that W is T-invariant or that
T is a projection unless explicitly stated.
28. Prove that the subspaces {0}, V, R(T), and N(T) are all T-invariant.
29. If W is T-invariant, prove that Tw is linear.
30. Suppose that T is the projection on W along some subspace W. Prove
that W is T-invariant and that Tw — lw-
31. Suppose that V = R(T)©W and W is T-invariant. (Recall the definition
of direct sum given in the exercises of Section 1.3.)

78 Chap. 2 Linear Transformations and Matrices
(a) Prove that WCN(T).
(b) Show that if V is finite-dimensional, then W = N(T).
(c) Show by example that the conclusion of (b) is not necessarily true
if V is not finite-dimensional.
32. Suppose that W is T-invariant. Prove that N(TW) = N(T)nW and
R(TW) = T(W).
33. Prove Theorem 2.2 for the case that ft is infinite, that is, R(T) =
spm({T(v):v£ft}).
34. Prove the following generalization of Theorem 2.6: Let V and W be
vector spaces over a common field, and let ft be a basis for V. Then for
any function f:ft—>\N there exists exactly one linear transformation
T: V -+ W such that T(x) = f(x) for all x £ ft.
Exercises 35 and 36 assume the definition of direct sum given in the exercises
of Section 1.3.
35. Let V be a finite-dimensional vector space and T: V —» V be linear.
(a) Suppose that V = R(T) + N(T). Prove that V = R(T) © N(T).
(b) Suppose that R(T) n N(T) = {0}. Prove that V = R(T) © N(T).
Be careful to say in each part where finite-dimensionality is used.
/
36. Let V and T be as defined in Exercise 21.
(a) Prove that V = R(T) + N(T), but V is not a direct sum of these two
spaces. Thus the result of Exercise 35(a) above cannot be proved
without assuming that V is finite-dimensional.
(b) Find a linear operator Tx on V such that R(TT) n N(Ti) = {0} but
V is not a direct sum of R(Ti) and N(Ti). Conclude that V being
finite-dimensional is also essential in Exercise 35(b).
37. A function T: V —> W between vector spaces V and W is called additive
if T(x + y) = T(x) 4- T(y) for all x, y £ V. Prove that if V and W
are vector spaces over the field of rational numbers, then any additive
function from V into W is a linear transformation.
38. Let T: C —» C be the function defined by T(z) = z. Prove that T is
additive (as defined in Exercise 37) but not linear.
39. Prove that there is an additive function T: R —> R (as defined in Ex
ercise 37) that is not linear. Hint: Let V be the set of real numbers
regarded as a vector space over the field of rational numbers. By the
corollary to Theorem 1.13 (p. 60), V has a basis ft. Let x and y be two
distinct vectors in ft, and define /: ft —> V by f(x) = y, f(y) = x, and
f(z) = z otherwise. By Exercise 34, there exists a linear transformation

Sec. 2.2 The Matrix Representation of a Linear Transformation 79
T: V -> V such that T(u) = f(u) for all u £ ft. Then T is additive, but
for c = y/x, T(cx) ^ cT(x).
The following exercise requires familiarity with the definition of quotient space
given in Exercise 31 of Section 1.3.
40. Let V be a vector space and W be a subspace of V. Define the mapping
n: V -» V/W by jj(v) = v + W for v £ V.
(a) Prove that 77 is a linear transformation from V onto V/W and that
N(n) = W.
(b) Suppose that V is finite-dimensional. Use (a) and the dimen
sion theorem to derive a formula relating dim(V), dim(W), and
dim(V/W).
(c) Read the proof of the dimension theorem. Compare the method of
solving (b) with the method of deriving the same result as outlined
in Exercise 35 of Section 1.6.
2.2 THE MATRIX REPRESENTATION OF A LINEAR
TRANSFORMATION
Until now, we have studied linear transformations by examining their ranges
and null spaces. In this section, we embark on one of the most useful ap
proaches to the analysis of a linear transformation on a finite-dimensional
vector space: the representation of a linear transformation by a matrix. In
fact, we develop a one-to-one correspondence between matrices and linear
transformations that allows us to utilize properties of one to study properties
of the other.
We first need the concept of an ordered basis for a vector space.
Definition. Let V be a finite-dimensional vector space. An ordered
basis for V is a basis for V endowed with a specific order; that is, an ordered
basis for V is a finite sequence of linearly independent vectors in V that
generates V.
Example 1
In F3, ft = {ei,e2,e3} can be considered an ordered basis. Also 7 =
{e2, ei, 63} is an ordered basis, but ft ^ 7 as ordered bases. •
For the vector space Fn, we call {ei, e2,..., en} the standard ordered
basis for Fn. Similarly, for the vector space Pn(F), we call {1, x,... ,xn} the
standard ordered basis for Pn(F).
Now that we have the concept of ordered basis, we can identify abstract
vectors in an n-dimensional vector space with n-tuples. This identification is
provided through the use of coordinate vectors, as introduced next.

80 Chap. 2 Linear Transformations and Matrices
Definition. Let ft — {ui,u2,... ,un} be an ordered basis for a finite-
dimensional vector space V. For x £ V, let oi, a2,..., an be the unique scalars
such that
x =
= y^QjMj.
i=i
We define the coordinate vector of x relative to ft, denoted [x]a, by
/ai
o2
\x a =
\On/
Notice that [ui\p = ei in the preceding definition. It is left as an exercise
to show that the correspondence x —• [x]/3 provides us with a linear transfor
mation from V to Fn. We study this transformation in Section 2.4 in more
detail.
Example 2
Let V = P2(R), and let ft = {1, x,x2} be the standard ordered basis for V. If
f(x) =4 + 6x-7x2, then
Let us now proceed with the promised matrix representation of a linear
transformation. Suppose that V and W are finite-dimensional vector spaces
with ordered bases ft = {v\, v2, •.. , vn} and 7 = {101,102, • • • , wm}, respec
tively. Let T: V —+ W be linear. Then for each j, 1 < j < n, there exist
unique scalars Oij £ F, 1 <i <m, such that
m
T(VJ) = y2 aijwi f°r 1 < .? < n.
i=l
Definition. Using the notation above, we call the mxn matrix A defined
by Aij = a^ the matrix representation of T in the ordered bases ft
and 7 and write A = [T]^. If V = W and ft = 7, then we write A = [T]p.
Notice that the jih column of A is simply [T(vj-)]7. Also observe that if
U: V —> W is a linear transformation such that [U]^ = [T]^, then U = T by
the corollary to Theorem 2.6 (p. 73).
We illustrate the computation of [T]2 in the next several examples.

Sec. 2.2 The Matrix Representation of a Linear Transformation 81
Example 3
Let T: R2 —* R3 be the linear transformation defined by
T(ai, a2) = (oi 4- 3a2,0, 2oi - 4a2).
Let ft and 7 be the standard ordered bases for R2 and R3, respectively. Now
T(l, 0) = (1,0,2) = lei + 0e2 4- 2e3
and
T(0,1) = (3,0, -4) = 3d + 0e2 - 4e3.
Hence
If we let 7' = {e3, e2, ei}, then
'2 -£
/
mi' =10 01. •
1 3,
Example 4
Let T: P3(-R) —• P2(-#) be the linear transformation defined by T(/(x)) =
f'(x). Let ft and 7 be the standard ordered bases for P3(R) and P2(i?),
respectively. Then
T(l) = 0-l + 0-x4-0.x2
T(x) = l-l-r-O-x + O-x2
T(x2) = 0-l4-2-x + 0-x2
So
T(x3) = 0-l4-0-x4-3-x2.
/0 1 0 0'
[T]} =0020
0 0 0 3,
Note that when T(x3) is written as a linear combination of the vectors of 7,
its coefficients give the entries of column j + 1 of [TH. •

82 Chap. 2 Linear Transformations and Matrices
Now that we have defined a procedure for associating matrices with linear
transformations, we show in Theorem 2.8 that this association "preserves"
addition and scalar multiplication. To make this more explicit, we need some
preliminary discussion about the addition and scalar multiplication of linear
transformations.
Definition. Let T, U: V —» W be arbitrary functions, where V and W
are vector spaces over F, and let a £ F. We define T 4- U: V —* W by
(T + U)(x) = T(x) 4- U(x) for a/1 x £ V, and aT: V -> W by (aT)(x) = oT(x)
for all x € V.
Of course, these are just the usual definitions of addition and scalar mul
tiplication of functions. We are fortunate, however, to have the result that
both sums and scalar multiples of linear transformations are also linear.
Theorem 2.7. Let V and W be vector spaces over a field F, and let
T, U: V -• W be linear.
(a) For all a, £ F, aT 4- U is linear.
(b) Using the operations of addition and scalar multiplication in the pre
ceding definition, the collection of all linear transformations from V to
W is a vector space over F.
Proof, (a) Let x, y £ V and c £ F. Then
(aT 4- UWcx 4- y) = aT(cx 4- y) 4- U(cx 4- y)
= a[T(cx + y)]4-cU(x) + U(y)
= a[cT(x) + T(y)] + cU(x)4-U(y)
= acT(x) 4- cU(x) 4- aT(y) + V(y)
= c(aT + [)){x) + {aT + U){y).
So aT 4- U is linear.
(b) Noting that T0, the zero transformation, plays the role of the zero
vector, it is easy to verify that the axioms of a vector space are satisfied,
and hence that the collection of all linear transformations from V into W is a
vector space over F. II
Definitions. Let V and W be vector spaces over F. We denote the
vector space of all linear transformations from V into W by £(V, W). In the
case that V = W, we write £(V) instead of £(V, W).
In Section 2.4, we see a complete identification of £(V, W) with the vector
space Mmxn(E), where n and rn are the dimensions of V and W, respectively.
This identification is easily established by the use of the next theorem.
Theorem 2.8. Let V and W be finite-dimensional vector spaces with
ordered bases ft and 7, respectively, and let T, U: V —> W be linear transfor
mations. Then

Sec. 2.2 The Matrix Representation of a Linear Transformation 83
(a) [T + % = [T]} + M}and
(b) [oT]I = a[T]^ for all scalars a.
Proof. Let ft = {vi, v2,..., vn} and 7 = {wi,w2,... ,wm}. There exist
unique scalars ay and />ij (1 < i < m, 1 < j < n) such that
rn m
T(t;j) = yjoij'";i ail(l U(vj) = 7] 6jjt^j for 1 < j < n.
j=] i=l
Hence
Thus
(T + UK^^^Oy+fcyJtB,.
i=
([T4-U]J)ij=n,J4-^ = ([T]54-[U]3)u.
So (a) is proved, and the proof of (b) is similar. U
Example 5
Let T: R2 —> R3 and U: R2 —> R3 be the linear transformations respectively
defined by
T(oi,a2) = (ai 4- 3a2,0.2oi — 4a2) and U(oi,o2) = (ai - a2,2oi,3ai 4- 2a2).
Let ft and 7 be the standard ordered bases of R2 and R3, respectively. Then
mj = [ 0 01
2 -A,
(as computed in Example 3), and
'1 -r
U j 2 0 I
.3 2,
If we compute T 4- U using the preceding definitions, we obtain
(T 4- U)(a,,a2) = (2ai 4- 2a2,2ai,5ai - 2a2).
So
[T + U]J-[2 Oj,
which is simply [T]l + [U]^, illustrating Theorem 2.8. •

84 Chap. 2 Linear Transformations and Matrices
1.
2.
4.
5.
EXERCISES
Label the following statements as true or false. Assume that V and
W are finite-dimensional vector spaces with ordered bases ft and 7,
respectively, and T, U: V —> W are linear transformations.
(a) For any scalar a, aT 4- U is a linear transformation from V to W.
(b) [T]J = [[)]} implies that T = U.
(c) If m = dim(V) and n = dim(W), then [T]^ is an m x n matrix.
(d) [H-U]J = rT]J + [U]J-
(e) £(V, W) is a vector space.
(f) £(V,W) = £(W,V).
Let ft and 7 be the standard ordered bases for Rn and Rm, respectively.
For each linear transformation T: Rn —> Rm, compute [TTC.
(a)
(b)
(c)
(d)
T:R2
T:R3
T:R3
T:R3
R3 defined by T(01,02) = (2ai — a2,3ai 4-4a2,ai).
R2 defined by T(ai, a2,03) = (2ai 4- 3a2 — 03, ai 4- 03).
R defined by T(ai, 02,03) = 2ai 4- a2 — 303.
R3 defined by
(e)
(0
(g)
T: Rr
T: Rr
T: R7
T(ai,02,03) = (2a2 4-03,-0! 4-4a2 4-5a3,ai +03).
-* Rn defined by T(ai,a2,... ,an) = (ai,ai,... ,01).
-* Rn defined by T(ai,a2,... ,a„) = (on,a„_i,... ,ai).
-* R defined by T(ai,a2,... ,an) = ai 4- an.
» R3 be defined by T(ai,a2) = (ai — a2,ai,2ai 4-a2). Let ft
be the standard ordered basis for R2 and 7 = {(1,1,0), (0,1,1), (2,2,3)}.
Compute [T]J. If a = {(1,2), (2,3)}, compute [T]l.
Define
3. Let T: R2
T: M2x2(R)
Let
0 =
P2(fl) by T = (a + b) + (2d)x 4- 6x2.
and 7 = {l,x,x2}.
Compute [T]^.
Let
and
ft={l,x,x1},
7 = {1}-

Sec. 2.2 The Matrix Representation of a Linear Transformation 85
(a) Define T: M2x2(F) -+ M2x2(F) by T(A) = AK Compute [T]a.
(b) Define
T: P2(R) - M2x2(R) by T(/(x)) = ^0) *£jjj) ,
where ' denotes differentiation. Compute [T]2.
(c) Define T: M2x2(F) -> F by T(A) = tr(A). Compute [T]l.
(d) Define T: P2(R) -* R by T(/(x)) = /(2). Compute [T]L
(e) If
A =
1 -2
0 4
compute [i4]Q.
(f) If f(x) = 3 — 6x 4- x2, compute [f(x)]@.
(g) For a £ F, compute [a]7.
6. Complete the proof of part (b) of Theorem 2.7.
7. Prove part (b) of Theorem 2.8.
/
8.* Let V be an n-dimensional vector space with an ordered basis ft. Define
T: V —> Fn by T(x) = [x]^. Prove that T is linear.
9. Let V be the vector space of complex numbers over the field R. Define
T: V —> V by T(z) = z, where z is the complex conjugate of z. Prove
that T is linear, and compute [T]^, where ft — {l,i}. (Recall by Exer
cise 38 of Section 2.1 that T is not linear if V is regarded as a vector
space over the field C.)
10. Let V be a vector space with the ordered basis ft = {v\,V2, • • •, vn}.
Define vo = 0. By Theorem 2.6 (p. 72), there exists a linear trans
formation T: V —• V such that T(VJ) = Vj 4- fj-i for j = 1,2,... ,n.
Compute [T]^.
11. Let V be an n-dimensional vector space, and let T: V —* V be a linear
transformation. Suppose that W is a T-invariant subspace of V (see the
exercises of Section 2.1) having dimension k. Show that there is a basis
ft for V such that [T]^ has the form
A B
O C
where A is a k x k matrix and O is the (n — k) x k zero matrix.

86 Chap. 2 Linear Transformations and Matrices
12. Let V be a finite-dimensional vector space and T be the projection on
W along W, where W and W are subspaces of V. (See the definition
in the exercises of Section 2.1 on page 76.) Find an ordered basis ft for
V such that [T]^ is a diagonal matrix.
13. Let V and W be vector spaces, and let T and U be nonzero linear
transformations from V into W. If R(T) n R(U) = {0}, prove that
{T, U} is a linearly independent subset of £(V, W).
14. Let V = P(R), and for j > 1 define T,-(/(a;)) = fu)(x), where f{j)(x)
is the jib. derivative of f(x). Prove that the set {Ti, T2,... ,Tn} is a
linearly independent subset of £(V) for any positive integer n.
15. Let V and W be vector spaces, and let S be a subset of V. Define
5° = {T £ £(V,W): T(x) = 0 for all x £ S}. Prove the following
statements.
(a) 5° is a subspace of £(V, W).
(b) If Si and S2 are subsets of V and Si C S2, then S% C 5?.
(c) If Vi and V2 are subspaces of V, then (Vi 4- V2)° = V? n Vg.
16. Let V and W be vector spaces such that dim(V) = dim(W), and let
T: V —» W be linear. Show that there exist ordered bases ft and 7 for
V and W, respectively, such that [T]I is a diagonal matrix.
2.3 COMPOSITION OF LINEAR TRANSFORMATIONS
AND MATRIX MULTIPLICATION
In Section 2.2, we learned how to associate a matrix with a linear transforma
tion in such a way that both sums and scalar multiples of matrices are associ
ated with the corresponding sums and scalar multiples of the transformations.
The question now arises as to how the matrix representation of a composite
of linear transformations is related to the matrix representation of each of the
associated linear transformations. The attempt to answer this question leads
to a definition of matrix multiplication. We use the more convenient notation
of UT rather than U o T for the composite of linear transformations U and T.
(See Appendix B.)
Our first result shows that the composite of linear transformations is lin
ear.
Theorem 2.9. Let V, W, and Z be vector spaces over the same field F,
and let T: V -+ W and U: W -» Z be linear. Then UT: V -> Z is linear.
Proof. Let x, y € V and a£ F. Then
UT(ox 4- y) = U(T(ox 4- y)) = U(oT(x) 4- T(y))
= all(T(x)) 4- U(T(y)) = o(UT)(x) 4- UT(y). I

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 87
The following theorem lists some of the properties of the composition of
linear transformations.
Theorem 2.10. Let V be a vector space. Let T, Ui, U2 £ £(V). Then
(a) T(U, +U2) = TUi+TU2and(U] + U2)T = U1T4- U2T
(b) T(UiU2) = (TU,)U2
(c) Tl = IT - T
(d) a(UjU2) = (alli)U2 = U,(aU2) for all scalars a.
Proof. Exercise. I
A more general result holds for linear transformations that have domains
unequal to their codomains. (See Exercise 8.)
Let T: V —» W and U: W —• Z be linear transformations, and let A = [U]^
and B = [T]£, where a = {vi,v2,... ,vn}, ft = {ttfi,i02,...,u;m}, and 7 =
{z\,Z2, • • • ,zp} are ordered bases for V, W, and Z, respectively. We would
like to define the product AB of two matrices so that AB — [UT]^. Consider
the matrix [UT]^. For 1 < j < n, we have
/ m \ rn
(UT)fe) = \J(T(Vj)) = U (]£ Bkjwk = J2 BkMwk)
,fc=i k=l
= E E^B kj
i=l \A;=1
i=l
where
Cij — 2_, MkBkj-
fe=i
This computation motivates the following definition of matrix multiplication.
Definition. Let A be an m x n matrix and B be an n x p matrix. We
define the product of A and B, denoted AB, to be the m x p matrix such
that
n
(AB)ij = Y, AikBkj for 1 < i < m, 1 < j < p.
k=
Note that (AB)ij is the sum of products of corresponding entries from the
ith row of A and the jth column of B. Some interesting applications of this
definition are presented at the end of this section.

88 Chap. 2 Linear Transformations and Matrices
The reader should observe that in order for the product AB to be defined,
there are restrictions regarding the relative sizes of A and B. The following
mnemonic device is helpful: "(ra x n)'(n x p) = (ra x p)"; that is, in order
for the product AB to be defined, the two "inner" dimensions must be equal,
and the two "outer" dimensions yield the size of the product.
Example 1
We have
1-44-2-24-1-5
0-44-4-24-(-l)-5
Notice again the symbolic relationship (2x3)-(3x l) = 2x 1. •
As in the case with composition of functions, we have that matrix multi
plication is not commutative. Consider the following two products:
1 1
0 0
0 1
1 0
1 1
0 0
and
0 1
1 0
1 1
0 0
0 0
1 1
Hence we see that even if both of the matrix products AB and BA are defined,
it need not be true that AB = BA.
Recalling the definition of the transpose of a matrix from Section 1.3, we
show that if A is an mxn matrix and B is an nxp matrix, then (AB)1 = BtAt.
Since
(AB)tij = (AB)ji = J2AjkBki
fc=i
and
[B*A% = ]£(B*)«(A% = J2BkiA^
fc=i fc=i
we are finished. Therefore the transpose of a product is the product of the
transposes in the opposite order.
The next theorem is an immediate consequence of our definition of matrix
multiplication.
Theorem 2.11. Let V, W, and Z be finite-dimensional vector spaces with
ordered bases a, ft, and 7, respectively. Let T: V —* W and U: W —* Z be
linear transformations. Then
[UTE = [urjmS-

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 89
Corollary. Let V be a finite-dimensional vector space with an ordered
basis ft. Let T,U £ £(V). Then [[)T]0 = [V]p[T]p.
We illustrate Theorem 2.11 in the next example.
Example 2
Let U:'P3(i2) -> P2(R) and T: P2(R) -* ?3{R) be the linear transformations
respectively defined by
U(/(x)) = f'(x) and T(/(x)) = f f(t) dt.
Jo
Let a and /? be the standard ordered bases of Ps(R) and P2(R), respectively.
From calculus, it follows that UT = I, the identity transformation on P2(R).
To illustrate Theorem 2.11, observe that
'0 1 0 0N
/°
1
0
\°
0
0
1
2
0
o
0
0
1
3/
[UTl/j = [UlSm? = |0 0 2 0
.0 0 0 3,
The preceding 3x3 diagonal matrix is called an identity matrix and i
defined next, along with a very useful notation, the Kronecker delta.
is
Definitions. We define the Kronecker delta 6ij by Sij = lifi=j and
6ij =0ifi^j. The n x n identity matrix In is defined by (In)ij — Oy.
Thus, for example,
/i = (l), h =
1 0
0 1
and 1% —
The next theorem provides analogs of (a), (c), and (d) of Theorem 2.10.
Theorem 2.10(b) has its analog in Theorem 2.16. Observe also that part (c) of
the next theorem illustrates that the identity matrix acts as a multiplicative
identity in MnXn(F). When the context is clear, we sometimes omit the
subscript n from In.
Theorem 2.12. Let A be an m x n matrix, B and C be n x p matrices,
and D and E be q x rn matrices. Then
(a) A(B -t-C) = AB + AC and (D 4- E)A = DA + EA.
(b) a(AB) = (aA)B — A(aB) for any scalar a.
(c) ImA = A = AIn.
(d) If V is an n-dimensional vector space with an ordered basis ft, then
[lv]/3 = In-

90 Chap. 2 Linear Transformations and Matrices
Proof. We prove the first half of (a) and (c) and leave the remaining proofs
as an exercise. (See Exercise 5.)
(a) We have
[A(B + C)}ij = ]T Aik(B 4- C)kj = J2 Aik(Bkj + Ckj)
k=i fc=i
n n n
= ^2(AikBkj 4- AikCkj) = J2 AikBkj 4- Y AikCkj
k=\ k=l fc=l
= (AB)ij 4- (AC)ij = [AB 4- AC]y.
So A(B + C) = AB + AC.
(c) We have
m m
[ImA)ij = y ^\lrn)ikAkj — / Ojfc-^fcj = Ay.
fc=l fc=l
Corollary. Let A be an ra x n matrix, Bi,B2,- • • ,Bk benxp matrices,
Ci, C2,..., Cfc be qxm matrices, and ai,a2,... ,ak be scalars. Then
Al^OiBi) =YjaiABi
j=i i=
and
/
< fc \ fc
Y aiCi ) A = X] OiCi A
vi=I / i=l
Proof. Exercise.
For an n x n matrix A, we define A.1 = A, A2 = AA, A3 — A2A, and, in
general, Ak = Ak~l A for k = 2,3,.... We define A0 = In.
With this notation, we see that if
A =
0 0
1 0/ '
then A2 = O (the zero matrix) even though A ^ O. Thus the cancellation
property for multiplication in fields is not valid for matrices. To see why,
assume that the cancellation law is valid. Then, from A-A = A2 = O = AO,
we would conclude that A — O, which is false.
Theorem 2.13. Let A be an ra x n matrix and B be an n x p matrix.
For each j (1 < j < p) let Uj and Vj denote the jth columns of AB and B,
respectively. Then

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 91
(a) Uj = AVJ
(b) Vj — BCJ, where ej is the jth standard vector of Fp.
Proof, (a) We have
f(AB)u
(AB)2j
\{AB)mj)
fc=i
^2A2kBk
kj
fc=l
2 4 AmkBkj
Vfc=i
= A
(Bij
B2j
{BnjJ
= Av
J-
Hence (a) is proved. The proof of (b) is left as an exercise. (See Exercise 6.)
It follows (sec Exercise 14) from Theorem 2.13 that column j of AB is
a linear combination of the columns of A with the coefficients in the linear
combination being the entries of column j of B. An analogous result holds
for rows; that is, row i of AB is a linear combination of the rows of B with
the coefficients in the linear combination being the entries of row i of A.
The next result justifies much of our past work. It utilizes both the matrix
representation of a linear transformation and matrix multiplication in order
to evaluate the transformation at any given vector.
Theorem 2.14. Let V and W be finite-dimensional vector spaces having
ordered bases ft and 7, respectively, and let T: V —• W be linear. Then, for
each u £ V, we have
[T(u)]7 = [T]}[u}0.
Proof. Fix u £ V, and define the linear transformations /: F —* V by
/(a) = au and g: F —» W by g(a) = aT(u) for all a £ F. Let a = {1} be
the standard ordered basis for F. Notice that g = If. Identifying column
vectors as matrices and using Theorem 2.11, we obtain
[T(ti)]7 = mh = M = PVE = mji/12 = mj[/(i)]/j = mjM/i- •
Example 3
Let T: Ps(i?) —• P2(#) be the linear transformation defined by T(/(x)) =
f'(x), and let ft and 7 be the standard ordered bases for Ps(R) and P2(i?,),
respectively. If A — [T]L then, from Example 4 of Section 2.2, we have
/0 1 0 0N
A = I 0 0 2 0
\0 0 0 3,

92 Chap. 2 Linear Transformations and Matrices
We illustrate Theorem 2.14 by verifying that [T(p(x))]7 = [T]l[p(x)]/3, where
p(x) £ P3(R) is the polynomial p(x) = 2 —4x4-x24-3x3. Let q(x) — T(p(x));
then q(x) = p'(x) = — A + 2x 4- 9x2. Hence
;T(p(.r)) . !,/(,-)], | 2
but also
raw* = A\P(X)]0 =
'0 1 0 0'
0 0 2 0
0 0 0 3,
( 2
-A
1
V V
We complete this section with the introduction of the left-multiplication
transformation LA, where A is an mxn matrix. This transformation is proba
bly the most important tool for transferring properties about transformations
to analogous properties about matrices and vice versa. For example, we use
it to prove that matrix multiplication is associative.
Definition. Let A be an ra x n matrix with entries from a field F.
We denote by LA the mapping LA- F" —* FTO defined by L^(x) = Ax (the
matrix product of A and x) for each column vector x £ Fn. We call LA a
left-multiplication transformation.
Example 4
Let
A =
1 2 1
0 1 2
Then A £ M2x3(R) and L^: R3 -> R2. If
then
LA(x) = Ax =
1 2
0 1
We see in the next theorem that not only is L^ linear, but, in fact, it has
a great many other useful properties. These properties are all quite natural
and so are easy to remember.

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 93
LB]} = B.
Theorem 2.15. Let A be an m x n matrix with entries from F. Then
the left-multiplication transformation LA '- Fn —> Fm is linear. Furthermore,
if B is any other ra x n matrix (with entries from F) and ft and 7 are the
standard ordered bases for Fn and Fm, respectively, then we have the following
properties.
(a) [LA]} = A.
(b) LA = LB if and only if A = B.
(c) LA+B = LA 4- LB and LaA = OLA for all a£ F.
(d) If T: Fn —* Fm is linear, then there exists a uniquemxn matrixC such
that T = Lc- In fact, C = [T]}.
(e) If E is an n x p matrix, then LAE = ^-ALE-
(f) Ifm = n, then Ljn = lp«.
Proof. The fact that LA is linear follows immediately from Theorem 2.12.
(a) The jth column of [LA]} is equal to LA(CJ). However L^e^) = Aej,
which is also the jth column of A by Theorem 2.13(b). So [LA]} = A.
(b) If LA = LB, then we may use (a) to write A — [LA]}
Hence A — B. The proof of the converse is trivial.
(c) The proof is left as an exercise. (See Exercise 7.)
(d) Let C = [T]}. By Theorem 2.14, we have [T(x)]7 = [T^x]^ or
T(x) = Cx = Lc(x) for all x £ Fn. So T = L^. The uniqueness of C follows
from (b).
(e) For any j (1 < j < p), we may apply Theorem 2.13 several times to
note that (AE)ej is the jth column of AE and that the jth column of AE is
also equal to A(Eej). So (AE)ej = A(Ee,j). Thus
LAE(ej) = (AE)ej = A(Eej) = LA(Ee5) = LA(LB(e,)).
Hence LAE = L^Ls by the corollary to Theorem 2.6 (p. 73).
(f) The proof is left as an exercise. (See Exercise 7.) 1
We now use left-multiplication transformations to establish the associa
tivity of matrix multiplication.
Theorem 2.16. Let A,B, and C be matrices such that A(BC) is de
fined. Then (AB)C is also defined and A(BC) = (AB)C; that is, matrix
multiplication is associative.
Proof. It is left to the reader to show that (AB)C is defined. Using (e)
of Theorem 2.15 and the associativity of functional composition (see Ap
pendix B), we have
LA(BC) = LALBC = LA(LBLC) — (L^LB)LC = L^BI-C = L^AB)C-
So from (b) of Theorem 2.15, it follows that A(BC) = (AB)C. |

94 Chap. 2 Linear Transformations and Matrices
Needless to say, this theorem could be proved directly from the definition
of matrix multiplication (see Exercise 18). The proof above, however, provides
a prototype of many of the arguments that utilize the relationships between
linear transformations and matrices.
Applications
A large and varied collection of interesting applications arises in connec
tion with special matrices called incidence matrices. An incidence matrix
is a square matrix in which all the entries are either zero or one and, for
convenience, all the diagonal entries are zero. If we have a relationship on a
set of n objects that we denote by 1,2,... ,n, then we define the associated
incidence matrix A by Ay = 1 if i is related to j, and Ay = 0 otherwise.
To make things concrete, suppose that we have four people, each of whom
owns a communication device. If the relationship on this group is "can trans
mit to," then Ay = 1 if i can send a message to j, and Ay = 0 otherwise.
Suppose that
A =
fO 1 0 0
10 0 1
0 10 1
\1 1 0 0/
Then since A34 = 1 and A14 = 0, we see that person 3 can send to 4 but 1
cannot send to 4.
We obtain an interesting interpretation of the entries of A2. Consider, for
instance,
(A2)3I = A31A11 4- A32A21 4- A33A31 4- A34A41.
Note that any term A3fcAfci equals 1 if and only if both A3fc and Afci equal 1,
that is, if and only if 3 can send to k and k can send to 1. Thus (A2)3i gives
the number of ways in which 3 can send to 1 in two stages (or in one relay).
Since
A2 =
(\ 0 0 1
12 0 0
2 10 1
Vi 1 0 17
we see that there are two ways 3 can send to 1 in two stages. In general,
(A + A2 + • • • 4- Am)y is the number of ways in which i can send to j in at
most ra stages.
A maximal collection of three or more people with the property that any
two can send to each other is called a clique. The problem of determining
cliques is difficult, but there is a simple method for determining if someone

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 95
belongs to a clique. If we define a new matrix B by By = 1 if i and j can send
to each other, and By = 0 otherwise, then it can be shown (see Exercise 19)
that person i belongs to a clique if and only if (B3)u > 0. For example,
suppose that the incidence matrix associated with some relationship is
A =
To determine which people belong to cliques, we form the matrix B, described
earlier, and compute B3. In this case,
/o
1
1
\1
1 0 1
0 1 0
1 0 1
1 1 0/
B =
(0
1
0
1
1
0
1
0
0
1
0
1
1
0
1
o
and B3 =
(0 A
A 0
0 4
U 0
0
4
0
4
4
0
4
0
Since all the diagonal entries of B3 are zero, we conclude that there are no
cliques in this relationship.
Our final example of the use of incidence matrices is concerned with the
concept of dominance. A relation among a group of people is called a dom
inance relation if the associated incidence matrix A has the property that
for all distinct pairs i and j, Aij = 1 if and only if Aji — 0, that is, given
any two people, exactly one of them dominates (or, using the terminology of
our first example, can send a message to) the other. Since A is an incidence
matrix, An = 0 for all i. For such a relation, it can be shown (see Exercise 21)
that the matrix A 4- A2 has a row [column] in which each entry is positive
except for the diagonal entry. In other words, there is at least one person
who dominates [is dominated by] all others in one or two stages. In fact, it
can be shown that any person who dominates [is dominated by] the greatest
number of people in the first stage has this property. Consider, for example,
the matrix
A =
The reader should verify that this matrix corresponds to a dominance relation.
Now
/o
0
1
0
\1
1
0
0
1
1
0
1
0
0
1
1
0
1
0
0
0
0
0
1
o^
A + A2 =
(0
1
1
1
V2
2
0
2
2
2
1
1
0
2
2
1
1
2
0
2
1
0
1
1
V

96 Chap. 2 Linear Transformations and Matrices
Thus persons 1, 3, 4, and 5 dominate (can send messages to) all the others
in at most two stages, while persons 1, 2, 3, and 4 are dominated by (can
receive messages from) all the others in at most two stages.
EXERCISES
1. Label the following statements as true or false. In each part, V,W,
and Z denote vector spaces with ordered (finite) bases a, ft, and 7,
respectively; T: V —• W and U: W —* Z denote linear transformations;
and A and B denote matrices.
(a) [UTfc - mg[U]?.
(b) \T(v)]0 = mSMa for all v € V.
(c) [U(w)]p = [\J]P[w]l3 for ail w€\N.
(d) [lv]« = /.
(e) [T2]g = ([T]g)2.
(f) A2 = I implies that A = I or A = -I.
(g) T = LA for some matrix A.
(h) A2 = O implies that A = O, where O denotes the zero matrix,
(i) L^+B = LA + LB.
(j) If A is square and Ay = <5y for all i and j, then A = I.
2. (a) Let
/
A =
C =
G
(-:
-?)<
1')
-2 Oj
B=0?
, and B =
D-
(1
Compute A(2B 4- 3C), (AB)D, and A(££>).
(b) Let
B =
Compute A4, A*B, BCt, CB, and CA.
and C = (4 0 3) .
3. Let g(x) = 3 4- x. Let T: P2(R.) -> P2(B) and U: P2(B)
linear transformations respectively defined by
R3 be the
T(/(x)) = f'(x)g(x) 4- 2/(x) and U(a 4- bx + ex2) = (a + b, c, a - b).
Let ft and 7 be the standard ordered bases of P2(R) and R3, respectively.

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 97
(a) Compute [[)]}, [T]^, and [UTg directly. Then use Theorem 2.11
to verify your result.
(b) Let h(x) = 3 - 2x 4- x2. Compute [h{x)]a and [U(/t(x))]7. Then
use [[)]} from (a) and Theorem 2.14 to verify your result.
4. For each of the following parts, let T be the linear transformation defined
in the corresponding part of Exercise 5 of Section 2.2. Use Theorem 2.14
to compute the following vectors:
(a) [T(A)]a, where A = (\ *
(b) [T(/(x))]a,where/(.r)=4-
(c) [T(A)]7, where A=(l f
(d) [T(/(x))]7, where f(x) = 6 - x + 2x2.
5. Complete the proof of Theorem 2.12 and its corollary.
6. Prove (b) of Theorem 2.13.
7. Prove (c) and (f) of Theorem 2.15.
8. Prove Theorem 2.10. Now state and prove a more general result involv
ing linear transformations with domains unequal/to their codomains.
9. Find linear transformations U,T: F2 —• F2 such that UT = To (the zero
transformation) but TU ^ To- Use your answer to find matrices A and
B such that AB = O but BA^O.
10. Let A be an n x n matrix. Prove that A is a diagonal matrix if and
only if Ay = SijAjj for all i and j.
11. Let V be a vector space, and let T: V —> V be linear. Prove that T2 = T()
if and only if R(T) C N(T).
12. Let V, W. and Z be vector spaces, and let T: V — W and U: W — Z
be linear.
(a) Prove that if UT is one-to-one, then T is one-to-one. Must U also
be one-to-one?
(b) Prove that if UT is onto, then U is onto. Must T also be onto?
(c) Prove that if U and T are one-to-one and onto, then UT is also.
13. Let A and B be n x n matrices. Recall that the trace of A is defined
by
tr(A) = £)Aii.
*=i
Prove that tr(AB) = tr(BA) and tr(A) = tr(A').

98 Chap. 2 Linear Transformations and Matrices
14. Assume the notation in Theorem 2.13.
(a) Suppose that z is a (column) vector in Fp. Use Theorem 2.13(b)
to prove that Bz is a linear combination of the columns of B. In
particular, if z = (ai,a2,..., ap)1, then show that
Bz = J2a
i=l
JUJ-
(b)
(c)
(d)
Extend (a) to prove that column j of AB is a linear combination
of the columns of A with the coefficients in the linear combination
being the entries of column j of B.
For any row vector w £ Fm, prove that wA is a linear combination
of the rows of A with the coefficients in the linear combination
being the coordinates of w. Hint: Use properties of the transpose
operation applied to (a).
Prove the analogous result to (b) about rows: Row i of AB is a
linear combination of the rows of B with the coefficients in the
linear combination being the entries of row i of A.
15. * Let M and A be matrices for which the product matrix MA is defined.
If the jth column of A is a linear combination of a set of columns
of A, prove that the jth column of MA is a linear combination of the
corresponding coftimns of MA with the same corresponding coefficients.
16. Let V be a finite-dimensional vector space, and let T: V —• V be linear.
(a) If rank(T) = rank(T2), prove that R(T) n N(f) = {0}. Deduce
that V = R(T) e N(T) (see the exercises of Section 1.3).
(b) Prove that V = R(Tfc) © N(Tfc) for some positive integer k.
17. Let V be a vector space. Determine all linear transformations T: V —* V
such that T = T2. Hint: Note that x = T(x) 4- (x - T(x)) for every
x in V, and show that V = {y: T(y) = y} © N(T) (see the exercises of
Section 1.3).
18. Using only the definition of matrix multiplication, prove that multipli
cation of matrices is associative.
19. For an incidence matrix A with related matrix B defined by By = 1 if
i is related to j and j is related to i, and By = 0 otherwise, prove that
i belongs to a clique if and only if (B3)u > 0.
20. Use Exercise 19 to determine the cliques in the relations corresponding
to the following incidence matrices.

Sec. 2.4 Invertibility and Isomorphisms 99
(a)
(() 1 0 1
10 0 0
0 10 1
\l 0 1 ())
(b)
{{) 0 1 1
10 0 1
10 0 1
\1 0 1 0/
21. Let A be an incidence matrix that is associated with a dominance rela
tion. Prove that the matrix A 4- A2 has a row [column] in which each
entry is positive except for the diagonal entry.
22. Prove that the matrix
A =
corresponds to a dominance relation. Use Exercise 21 to determine
which persons dominate [are dominated by] each of the others within
two stages.
23. Let A be an n x n incidence matrix that corresponds to a dominance
relation. Determine the number of nonzero entries of A.
/
2.4 INVERTIBILITY AND ISOMORPHISMS
The concept of invertibility is introduced quite early in the study of functions.
Fortunately, many of the intrinsic properties of functions are shared by their
inverses. For example, in calculus we learn that the properties of being con
tinuous or differentiable are generally retained by the inverse functions. We
see in this section (Theorem 2.17) that the inverse of a linear transformation
is also linear. This result greatly aids us in the study of inverses of matrices.
As one might expect from Section 2.3, the inverse of the left-multiplication
transformation L,.\ (when it exists) can be used to determine properties of the
inverse of the matrix A.
In the remainder of this section, we apply many of the results about in
vertibility to the concept of isomorphism. We will see that finite-dimensional
vector spaces (over F) of equal dimension may be identified. These ideas will
be made precise shortly.
The facts about inverse functions presented in Appendix B are, of course,
true for linear transformations. Nevertheless, we repeat some of the defini
tions for use in this section.
Definition. Let V and W be vector spaces, and let T: V —> W be linear.
A function U: W —• V is said to be an inverse of T if TU — lw and UT = lv-
If T has an inverse, then T is said to be invertible. As noted in Appendix B,
if T is invertible. then the inverse of T is unique and is denoted by T~l.

100 Chap. 2 Linear Transformations and Matrices
The following facts hold for invertible functions T and U.
1. (TU)-1 = U~1T-1.
2. (T-1)-1 = T; in particular, T_1 is invertible.
We often use the fact that a function is invertible if and only if it is both
one-to-one and onto. We can therefore restate Theorem 2.5 as follows.
3. Let T: V —• W be a linear transformation, where V and W are finite-
dimensional spaces of equal dimension. Then T is invertible if and only
if rank(T) = dim(V).
Example 1
Let T: Pi(B) —• R2 be the linear transformation defined by T(a 4- bx) =
(a, a 4- b). The reader can verify directly that T_1: R2 —> Pi(B) is defined by
T_1(c, d) = c + (d — c)x. Observe that T_1 is also linear. As Theorem 2.17
demonstrates, this is true in general. •
Theorem 2.17. Let V and W be vector spaces, and let T: V
linear and invertible. Then T_1: W —> V is linear.
W be
Proof. Let 3/1,2/2 € W and c £ F. Since T is onto and one-to-one, there
exist unique vectors xi and x2 such that T(xi) = yi and T(x2) = j/2. Thus
xi - T-^j/i) and x2 = T~1(y2); so
f
T-^q/! 4- y2) = T-1[cT(xi) 4- T(x2)] = J-l[T(cxi 4- x2)]
= cxi-rx2 = cT-1(j/1)4-T-1(j/2). |
It now follows immediately from Theorem 2.5 (p. 71) that if T is a linear
transformation between vector spaces of equal (finite) dimension, then the
conditions of being invertible, one-to-one, and onto are all equivalent.
We are now ready to define the inverse of a matrix. The reader should
note the analogy with the inverse of a linear transformation.
Definition. Let A be an n x n matrix. Then A is invertible if there
exists annx n matrix B such that AB = BA — I.
If A is invertible, then the matrix B such that AB = BA = J is unique. (If
C were another such matrix, then C = CI - C(AB) = (CA)B = IB = B.)
The matrix B is called the inverse of A and is denoted by A-1.
Example 2
The reader should verify that the inverse of
5 7
2 3
is
3 -7
-2 5

Sec. 2.4 Invertibility and Isomorphisms 101
In Section 3.2, we learn a technique for computing the inverse of a matrix.
At this point, we develop a number of results that relate the inverses of
matrices to the inverses of linear transformations.
Lemma. Let T be an invertible linear transformation from V to W. Then
V is finite-dimensional if and only if W is finite-dimensional. In this case,
dim(V) = dim(W).
Proof. Suppose that V is finite-dimensional. Let ft = {x\,X2, • • •, xn} be a
basis for V. By Theorem 2.2 (p. 68), T(/?) spans R(T) = W; hence W is finite-
dimensional by Theorem 1.9 (p. 44). Conversely, if W is finite-dimensional,
then so is V by a similar argument, using T_1.
Now suppose that V and W are finite-dimensional. Because T is one-to-one
and onto, we have
nullity(T) = 0 and rank(T) = dim(R(T)) = dim(W).
So by the dimension theorem (p. 70), it follows that dim(V) = dim(W). 1
Theorem 2.18. Let V and W be finite-dimensional vector spaces with
ordered bases (3 and 7, respectively. Let T: V —> W be linear. Then T is
invertible if and only if [T]^ is invertible. Furthermore, ^T-1]^ = ([T]2)_1.
Proof. Suppose that T is invertible. By the lemma, we have dim(V) =
dim(W). Let n = dim(V). So [T]l is an n x n matrix. Now T_1: W —• V
satisfies TT_1 = lw and T-1T = lv- Thus
In = [Ivfe = [T-lTl, = [T-1]^.
Similarly, [Tgp-1]? = Jn. So [T]} is invertible and ([Tg)"1 = [T"1]^.
Now suppose that A = [T]Z is invertible. Then there exists an n x n
matrix B such that AB = BA = In. By Theorem 2.6 (p. 72), there exists
Ue£(W,V) such that
u(wj) = ^2 BiiVi fori = 1,2,..., n,
i=l
where 7 = {101,142,..., wn} and 0 = {«i, ifc,..., vn}. It follows that [U]^ =
B. To show that U = T_1, observe that
[iJT]0 = [^[T}} = BA = In = {W}0
by Theorem 2.11 (p. 88). So UT = lv, and similarly, TU = lw. I

102 Chap. 2 Linear Transformations and Matrices
Example 3
Let 0 and 7 be the standard ordered bases of Pi(R) and R2, respectively. For
T as in Example 1, we have
TO=(i ' '-']?=(_;;
It can be verified by matrix multiplication that each matrix is the inverse of
the other. •
Corollary 1. Let V be a finite-dimensional vector space with an ordered
basis (3, and let T: V —> V be linear. Then T is invertible if and only if [T]#
is invertible. Furthermore, [T-1]^ = (PI/3)- .
Proof. Exercise. !§j
Corollary 2. Let A be ann x n matrix. Then A is invertible if and only
if\-A is invertible. Furthermore, (La)-1 = I-A-'-
Proof. Exercise. @
The notion of invertibility may be used to formalize what may already
have been observed by the reader, that is, that certain vector spaces strongly
resemble one another except for the form of their vectors. For example, in
the case of M2x2(^) and F4, if we associate to each matrix
a 6
c d
the 4-tuple (a, 6, c, d), we see that sums and scalar products associate in a
similar manner; that is, in terms of the vector space structure, these two
vector spaces may be considered identical or isomorphic.
Definitions. Let V and W be vector spaces. We say that V is isomor
phic to W if there exists a linear transformation T: V —> W that is invertible.
Such a linear transformation is called an isomorphism from V onto W.
We leave as an exercise (see Exercise 13) the proof that "is isomorphic
to" is an equivalence relation. (See Appendix A.) So we need only say that
V and W are isomorphic.
Example 4
Define T: F2 —* P.i(F) by T(a\,a2) = fli + nix. It is easily checked that T is
an isomorphism; so F2 is isomorphic to Pi(F). •

Sec. 2.4 Invertibility and Isomorphisms
Example 5
Define
T:P3(J2)-M2x2(ie) byT(/) =
103
/(I) 1(2)
/(3) /(4)
It is easily verified that T is linear. By use of the Lagrange interpolation
formula in Section 1.6, it can be shown (compare with Exercise 22) that
T(/) = O only when / is the zero polynomial. Thus T is one-to-one (see
Exercise 11). Moreover, because dim(P3(/?)) = dim(M2x2(i?)), it follows that
T is invertible by Theorem 2.5 (p. 71). We conclude that Ps{R) is isomorphic
to M2x2(JR). •
In each of Examples 4 and 5, the reader may have observed that isomor
phic vector spaces have equal dimensions. As the next theorem shows, this
is no coincidence.
Theorem 2.19. Let V and W be finite-dimensional vector spaces (over
the same field). Then V is isomorphic to W if and only if dim(V) = dim(W).
Proof. Suppose that V is isomorphic to W and that T: V —> W is an
isomorphism from V to W. By the lemma preceding Theorem 2.18, we have
that dim(V) = dim(W). /
Now suppose that dim(V) = dim(W), and let 0 — {vi, V2, •. • ,vn} and
7 = {101,102,... ,wn} be bases for V and W, respectively. By Theorem 2.6
(p. 72), there exists T: V —• W such that T is linear and T(vi) — Wi for
i — 1,2,... ,n. Using Theorem 2.2 (p. 68), we have
R(T) = span(T(/?)) = span(7) = W.
So T is onto. From Theorem 2.5 (p. 71), we have that T is also one-to-one.
Hence T is an isomorphism. 1
By the lemma to Theorem 2.18, if V and W are isomorphic, then either
both of V and W are finite-dimensional or both arc infinite-dimensional.
Corollary. Let V be a vector space over F. Then V is isomorphic to Fn
if and only if dim(V) = n.
Up to this point, we have associated linear transformations with their
matrix representations. We are now in a position to prove that, as a vector
space, the collection of all linear transformations between two given vector
spaces may be identified with the appropriate vector space ofmxn matrices.
Theorem 2.20. Let V and W be finite-dimensional vector spaces over F
of dimensions n and m, respectively, and let 0 and 7 be ordered bases for V
and W, respectively. Then the function : £(V, W) —» Mmxn(F), defined by
$(T) = [T]J for T € £(V, W), is an isomorphism.
L

104 Chap. 2 Linear Transformations and Matrices
Proof. By Theorem 2.8 (p. 82), $ is linear. Hence we must show that $
is one-to-one and onto. This is accomplished if we show that for every mxn
matrix A, there exists a unique linear transformation T: V —• W such that
$(T) = A. Let 0 — {vi,V2,... ,vn}, 7 = {101,103,... ,wm}, and let A be a
given mxn matrix. By Theorem 2.6 (p. 72), there exists a unique linear
transformation T: V —* W such that
J(vj) = 5Z AiJWi for 1 - i' - n-
But this means that [T]l = A, or $(T) = A. Thus <]> is an isomorphism. 1
Corollary. Let V and W be finite-dimensional vector spaces of dimensions
n and rn, respectively. Then £(V, W) is finite-dimensional of dimension mn.
Proof. The proof follows from Theorems 2.20 and 2.19 and the fact that
dim(MmXn(F)) = mn. I
We conclude this section with a result that allows us to see more clearly
the relationship between linear transformations defined on abstract finite-
dimensional vector spaces and linear transformations from Fn to Fm.
We begin by naming the transformation x —* [x]p introduced in Sec
tion 2.2.
/
Definition. Let 0 be an ordered basis for an n-dhncnsional vector space
V over the fleid F. The standard representation ofV with respect to
0 is the function <f)p: V —> Fn defined by <t>p{x) = [x]ff for each x € V.
Example 6
Let 0 = {(1,0), (0,1)} and 7 = {(1,2), (3,4)}. It is easily observed that 0
and 7 are ordered bases for R2. For x = (1,-2), we have
<f>p (x) = [x](3 f_2j and (j)1(x) = [x
-5
2
We observed earlier that <j>p is a linear transformation. The next theorem
tells us much more.
Theorem 2.21. For any finite-dimensional vector space V with ordered
basis 0, 4>p is an isomorphism.
Proof Exercise. 1
This theorem provides us with an alternate proof that an n-dimensional
vector space is isomorphic to Fn (sec the corollary to Theorem 2.19).

Sec. 2.4 Invertibility and Isomorphisms 105
V
(1)
-*- w
u
(2)
T
-*- F
07
Figure 2.2
Let V and W be vector spaces of dimension n and m, respectively, and let
T: V —> W be a linear transformation. Define A = [T]l, where 0 and 7 are
arbitrary ordered bases of V and W, respectively. We are now able to use 4>@
and 07 to study the relationship between the linear transformations T and
LA: Fn -> Fm.
Let us first consider Figure 2.2. Notice that there are two composites of
linear transformations that map V into Fm:
1. Map V into Fn with <fip and follow this transformation with L^; this
yields the composite LA(J>0-
2. Map V into W with T and follow it by 07 to obtain the composite 07T.
These two composites are depicted by the dashed arrows in the diagram.
By a simple reformulation of Theorem 2.14 (p. 91), we may conclude that
LA4>P = </>7T;
that is, the diagram "commutes." Heuristically, this relationship indicates
that after V and W are identified with F" and Fm via 0/? and 7, respectively,
we may "identify" T with LA- This diagram allows us to transfer operations
on abstract vector spaces to ones on Fn and Fm.
Example 7
Recall the linear transformation T: P-.](R) —> P2{R) defined in Example 4 of
Section 2.2 (T(f(x)) = f'(x)). Let 0 and 7 be the standard ordered bases for
P3(R) and P2(i*), respectively, and let (f>p: P-S{R) -> R4 and 07: P2(R) -+ R3
be the corresponding standard representations of P-.i(R) and P2(/^)- If A =
[T]}, then
/o 1 0 0
A = I 0 0 2 0 ] .
\0 0 0 3/

106 Chap. 2 Linear Transformations and Matrices
Consider the polynomial p(x) = 2+x—3x2+bx3. We show that Lj(f)p(p(x)) =
07T(p(x)). Now
/0 1 0 0s
LA4>0(p(x)) =0020
\0 0 0 3
/ 2
1
-3
V V
But since T(p(x)) = v'(x) = 1 — 6x + 15a;2, we have
-J(vU)) (-61
15
So LAMPW) = ^T(P(*)). •
Try repeating Example 7 with different polynomials p(x).
EXERCISES
1. Label the following statements as true or false. In each part, V and
W are vector spaces with ordered (finite) bases a and 0, respectively,
T: V —» W is linear, and A and B are matrices.
(a) ([Tig)"1 £ [T-'lg.
(b) T is invertible if and only if T is one-to-one and onto.
(c) T = LA, where A = [T]g.
(d) M2X3(F) is isomorphic to F5.
(e) Pn{F) is isomorphic to Pm(F) if and only if n — rn.
(f ) AB = I implies that A and B are invertible.
(g) If A is invertible, then (A-1)-1 = A.
(h) A is invertible if and only if L^ is invertible.
(i) A must be square in order to possess an inverse.
2. For each of the following linear transformations T, determine whether
T is invertible and justify your answer.
(a) T: R2 —> R3 defined by T(oi,a2) = (ai — 2a2,a2,3ai +4a2).
(b) T: R2 —> R3 defined by T(ai,a2) = (3ai - a2,a2,4ai).
(c) T: R3 —> R3 defined by T(oi,#2,03) = (3ai — 2a3,a2,3«i +4a2).
(d) T: P3(R) -> P2{R) defined by T(p{x)) = p'(x).
(e) T: M2x2(i?) -> P2(R) defined by T (a \ = a + 26.x + (c + d)x2.
(f) T: M2x2(R) -+ M2x2{R) defined by T
a + b
c
a
c + d

Sec. 2.4 Invertibility and Isomorphisms 107
3. Which of the following pairs of vector spaces are isomorphic? Justify
your answers.
(a) F3 and P3(F).
(b) F4andP3(F).
(c) M2x2(i?) and P3(i2).
(d) V = {A e M2x2(R): tr(A) = 0} and R4.
4J Let A and B be n x n invertible matrices. Prove that AB is invertible
and (AB)'1 = B^A'1.
5.* Let A be invertible. Prove that A1 is invertible and (A1)'1 = (A'1)1.
6. Prove that if A is invertible and AB = O. then B = O.
7. Let A be an n x n matrix.
(a) Suppose that A2 = O. Prove that A is not invertible.
(b) Suppose that AB — O for some nonzero n x n matrix B. Could A
be invertible? Explain.
8. Prove Corollaries 1 and 2 of Theorem 2.18.
9. Let A and B be n x n matrices such that AB is invertible. Prove that A
and B are invertible. Give an example to show that arbitrary matrices
A and B need not be invertible if AB is invertible.
10.T Let A and B be n x n matrices such that AB — In.
(a) Use Exercise 9 to conclude that A and B are invertible.
(b) Prove A — B~x (and hence B = A~l). (We are, in effect, saying
that for square matrices, a "one-sided" inverse is a "two-sided"
inverse.)
(c) State and prove analogous results for linear transformations de
fined on finite-dimensional vector spaces.
11. Verify that the transformation in Example 5 is one-to-one.
12. Prove Theorem 2.21.
13. Let ~ mean "is isomorphic to." Prove that ~ is an equivalence relation
on the class of vector spaces over F.
14. Let
Construct an isomorphism from V to F .

108 Chap. 2 Linear Transformations and Matrices
15. Let V and W be n-dimensional vector spaces, and let T: V —* W be a
linear transformation. Suppose that 0 is a basis for V. Prove that T is
an isomorphism if and only if T(0) is a basis for W.
16. Let B be an n x n invertible matrix. Define $: Mnxn(F) —> Mnxn(F)
by $(A) = B~XAB. Prove that $ is an isomorphism.
17.* Let V and W be finite-dimensional vector spaces and T: V —> W be an
isomorphism. Let Vn be a subspace of V.
(a) Prove that T(V0) is a subspace of W.
(b) Prove that dim(V0) = dim(T(V0)).
18. Repeat Example 7 with the polynomial p(x) = 1 + x + 2x2 + x3.
19. In Example 5 of Section 2.1, the mapping T: M2x2(i?) —> M2x2(i?) de
fined by T(M) = Ml for each M e M2x2(i?) is a linear transformation.
Let 0 = {EU,E12, E21,E22}, which is a basis for M2x2(R), as noted in
Example 3 of Section 1.6.
(a) Compute [Tj/3.
(b) Verify that LAMM) = ^T(M) for A = [T]^ and
/ M =
1 2
3 4
20. * Let T: V —> W be a linear transformation from an n-dimensional vector
space V to an m-dimensional vector space W. Let 0 and 7 be ordered
bases for V and W, respectively. Prove that rank(T) = rank(Lyi) and
that nullity(T) = nullity (L,i), where A = [T]^. Hint: Apply Exercise 17
to Figure 2.2.
21. Let V and W be finite-dimensional vector spaces with ordered bases
0 = {vi> v2, - - -, Vn} and 7 = {w\, w2, - - -, wm}, respectively. By The
orem 2.6 (p. 72), there exist linear transformations T,j: V —> W such
that
Tij(vk) =
Wi if k = j
0 iik^ j.
First prove that {Tij: 1 < i < m, 1 < j < n} is a basis for £(V, W).
Then let MlJ be the mxn matrix with 1 in the ith row and jth column
and 0 elsewhere, and prove that [T^-H = MlK Again by Theorem 2.6,
there exists a linear transformation $: £(V, W) —> MmXn(F) such that
$(Tij) = My'. Prove that $ is an isomorphism.

Sec. 2.4 Invertibility and Isomorphisms 109
22. Let cn,ci,...,cn be distinct scalars from an infinite field F. Define
T: Pn(F) -» Fn+1 by T(/) = (/(cb),/(d),... ,/(c„)). Prove that T is
an isomorphism. Hint: Use the Lagrange polynomials associated with
C0,Ci,...,C„.
23. Let V denote the vector space defined in Example 5 of Section 1.2, and
let W = P(F). Define
T: V — W by T(<r) = j^a(i)xj,
i=0
where n is the largest integer such that er(n) ^ 0. Prove that T is an
isomorphism.
The following exercise requires familiarity with the concept of quotient space
defined in Exercise 31 of Section 1.3 and with Exercise 40 of Section 2.1.
24. Let T: V —» Z be a linear transformation of a vector space V onto a
vector space Z. Define the mapping
T: V/N(T) -» Z by T(v + N(T)) = T(v)
for any coset v + N(T) in V/N(T).
(a) Prove that T is well-defined; that is, prove that if v + N(T) =
v' + N(T), then T(v) = T(v').
(b) Prove that T is linear.
(c) Prove that T is an isomorphism.
(d) Prove that the diagram shown in Figure 2.3 commutes; that is,
prove that T = Tr/.
V
T
• Z
4
/
7T
V/N(T)
Figure 2.3
25. Let V be a nonzero vector space over a field F, and suppose that S is
a basis for V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7,
every vector space has a basis). Let C(S, F) denote the vector space of
all functions / € !F(S, F) such that f(s) — 0 for all but a finite number

110 Chap. 2 Linear Transformations and Matrices
of vectors in 5. (See Exercise 14 of Section 1.3.) Let #: C{S,F) -> V
be defined by *(/) = 0 if / is the zero function, and
*(/) = E /«*.
s€S,/(s)^0
otherwise. Prove that \I> is an isomorphism. Thus every nonzero vector
space can be viewed as a space of functions.
2.5 THE CHANGE OF COORDINATE MATRIX
In many areas of mathematics, a change of variable is used to simplify the
appearance of an expression. For example, in calculus an antiderivative of
2xex can be found by making the change of variable u — x . The resulting
expression is of such a simple form that an antiderivative is easily recognized:
2xex dx = eudu = eu + c = ex + c.
Similarly, in geometry the change of variable
2 , 1 ,
/
"=vT 7f'
can be used to transform the equation 2x2 — 4xy + by2 — 1 into the simpler
equation (x')2 +6(y')2 = 1, in which form it is easily seen to be the equation of
an ellipse. (See Figure 2.4.) We see how this change of variable is determined
in Section 6.5. Geometrically, the change of variable
is a change in the way that the position of a point P in the plane is described.
This is done by introducing a new frame of reference, an x'y'-coordinate
system with coordinate axes rotated from the original xy-coordinate axes. In
this case, the new coordinate axes are chosen to lie in the direction of the
axes of the ellipse. The unit vectors along the x'-axis and the y'-axis form an
ordered basis
'•{AC
1_/-1
y/E\ 2
for R2, and the change of variable is actually a change from [P]p = ( j, the
coordinate vector of P relative to the standard ordered basis 0 = {ei, e2}, to

Sec. 2.5 The Change of Coordinate Matrix 111
[P]pi = ( , J, the coordinate vector of P relative to the new rotated basis 0'.
Figure 2.4
A natural question arises: How can a coordinate vector relative to one ba
sis be changed into a coordinate vector relative to the other? Notice that the
system of equations relating the new and old coordinates can be represented
by the matrix equation
1_ (2 -1\ fx'
A u 2 y
Notice also that the matrix
W y/EV 2
equals [l]«,, where I denotes the identity transformation on R2. Thus [v]p =
Q[v)p> for all v e R2. A similar result is true in general.
Theorem 2.22. Let 0 and 0' be two ordered bases for a finite-dimensional
vector space V, and let Q — [\v]%. Then
(a) Q is invertible.
(b) For any v£\J, [v]p - Q[v]p>.
Proof, (a) Since lv is invertible, Q is invertible by Theorem 2.18 (p. 101).
(b) For any v 6 V,
[vb = [W(v)]0 = {\vfp,[v]0>=Q[vh>
by Theorem 2.14 (p. 91).

112 Chap. 2 Linear Transformations and Matrices
The matrix Q = [\y]p, defined in Theorem 2.22 is called a change of coor
dinate matrix. Because of part (b) of the theorem, we say that Q changes
/^'-coordinates into /^-coordinates. Observe that if 0 — {xt,x2,... , xn}
and 0' = {x'i,x2, • • • ,x'n}, then
X j / WijXi
i=l
for j = 1,2,... , n; that is, the jth column of Q is [x'Ap.
Notice that if Q changes /"^'-coordinates into /^-coordinates, then Q~l
changes /^-coordinates into /"/-coordinates. (See Exercise 11.)
Example 1
In R2, let 0 = {(1,1), (1, -1)} and 0' = {(2,4), (3,1)}. Since
(2,4) = 3(1,1) - 1(1, -1) and (3,1) = 2(1,1) + 1(1, -1),
the matrix that changes /^'-coordinates into /^-coordinates is
3 2N
Thus, for instance,
1\ / 3
[(2A)Yp = Q[(2,4)]p,=Q
0/ V-l
For the remainder of this section, we consider only linear transformations
that map a vector space V into itself. Such a linear transformation is called a
linear operator on V. Suppose now that T is a linear operator on a finite-
dimensional vector space V and that 0 and 0' are ordered bases for V. Then
V can be represented by the matrices [T]^ and \T]p>. What is the relationship
between these matrices? The next theorem provides a simple answer using a
change of coordinate matrix.
Theorem 2.23. Let T be a linear operator on a finite-dimensional vector
space V, and let 0 and 0' be ordered bases for V. Suppose that Q is the
change of coordinate matrix that changes 0'-coordinates into 0-coordinates.
Then
Proof. Let I be the identity transformation on V. Then T = IT = Tl;
hence, by Theorem 2.11 (p. 88),
Q[r\p = ["]£m£ = [IT]J - [Tl]£ = [Tfp[p, = [T]pQ-
Therefore [T]p, = Q-^pQ. I

Sec. 2.5 The Change of Coordinate Matrix
Example 2
Let T be the linear operator on R2 defined by
T , (3a — b
113
a+ 36 7 '
and let 0 and 0' be the ordered bases in Example 1. The reader should verify
that
[T]a =
3 1
-1 37"
In Example 1, we saw that the change of coordinate matrix that changes
/^'-coordinates into /^-coordinates is
Q =
3 2
-1 17 '
and it is easily verified that
W 5 1 37 '
Hence, by Theorem 2.23,
/
[T]/3'=Q-1[T]/3Q=(J I)'
To show that this is the correct matrix, we can verify that the image
under T of each vector of 0' is the linear combination of the vectors of 0'
with the entries of the corresponding column as its coefficients. For example,
the image of the second vector in 0' is
T
3-ifiHfi
Notice that the coefficients of the linear combination are the entries of the
second column of [T]^. •
It is often useful to apply Theorem 2.23 to compute [T]^, as the next
example shows.
Example 3
Recall the reflection about the rc-axis in Example 3 of Section 2.1. The rule
(x, y) —> (x, —y) is easy to obtain. We now derive the less obvious rule for
the reflection T about the line y = 2x. (See Figure 2.5.) We wish to find an
expression for T(a, b) for any (a, b) in R2. Since T is linear, it is completely

114 Chap. 2 Linear Transformations and Matrices
y
y = 2x
Figure 2.5
determined by its values on a basis for R2. Clearly, T(l,2) = (1,2) and
T(-2,1) = -(-2,1) = (2, -1). Therefore if we let
0' =
1\ 1-2
27' V i
then 0' is an ordered basis for R2 and
/
H> =
o -17'
Let 0 be the standard ordered basis for R2, and let Q be the matrix that
changes /^'-coordinates into /^-coordinates. Then
1 -2
Q =
and Q l[^]pQ — [T]/9'. We can solve this equation for [T]^ to obtain that
[T]li = Q\T\d'Q~l- Because
Q 5 I -2 1
the reader can verify that
t*-Hi a)-
Since 0 is the standard ordered basis, it follows that T is left-multiplication
by [T]p. Thus for any (a,b) in R2, we have
T
1 /-3 4
4 37
1 /-3a+ 46
4a+ 367 '

Sec. 2.5 The Change of Coordinate Matrix 115
A useful special case of Theorem 2.23 is contained in the next corollary,
whose proof is left as an exercise.
Corollary. Let A e Mnxn(F), and let 7 be an ordered basis for F". Then
[Lyt]7 = Q-1 AQ, whore Q is the n x n matrix whose jth column is the jth
vector of 7.
Example 4
Let
A =
and let
7 =
which is an ordered basis for R3. Let Q be the 3x3 matrix whose jth column
is the jth vector of 7. Then
and Q-1 =
So by the preceding corollary,
Mv:
= Q-lAQ =
I °
M
V 0
2
4
-1
8
6
-1
The relationship between the matrices [T]^' and [T]p in Theorem 2.23 will
be the subject of further study in Chapters 5, 6, and 7. At this time, however,
we introduce the name for this relationship.
Definition. Let A and B be matrices in Mnxn(F). We say that B is
similar to A if there exists an invertible matrix Q such that B = Q.AQ.
Observe that the relation of similarity is an equivalence relation (see Ex
ercise 9). So we need only say that A and B are similar.
Notice also that in this terminology Theorem 2.23 can be stated as follows:
If T is a linear operator on a finite-dimensional vector space V, and if 0 and
0' are any ordered bases for V, then [T]p' is similar to [T]p.
Theorem 2.23 can be generalized to allow T: V —* W, where V is distinct
from W. In this case, we can change bases in V as well as in W (see Exercise 8).

116 Chap. 2 Linear Transformations and Matrices
EXERCISES
1. Label the following statements as true or false.
(a) Suppose that 0 = {x\,x2,...,xn} and 0' = {xi,x2,.. .,xJJ are
ordered bases for a vector space and Q is the change of coordinate
matrix that changes /^'-coordinates into /^-coordinates. Then the
jth column of Q is [xj]p>.
Every change of coordinate matrix is invertible.
Let T be a linear operator on a finite-dimensional vector space V,
let 0 and 0' be ordered bases for V, and let Q be the change of
coordinate matrix that changes /^'-coordinates into /^-coordinates.
Then [J]p,« QPVQ"1.
The matrices A,B£ MnXn(F) are called similar if B = Q*AQ for
some Q <E MnXn(F).
Let T be a linear operator on a finite-dimensional vector space V.
Then for any ordered bases 0 and 7 for V, [T]^ is similar to [T]7.
(b)
(c)
(d)
(e)
2. For each of the following pairs of ordered bases 0 and 0' for R2, find
the change of coordinate matrix that changes /^'-coordinates into 0-
coordinates.
(a) 0= {ei,e2} and 0' = {(01,02), (61,62)}
(b) 0 = {(-1 3), (2, -1)} and 0' = {(0,10), (5,0)}
(c) 0= {(2,5), (-1,-3)} and 0' = {eue2}
(d) 0 = {(-4,3), (2, -1)} and 0' = {(2,1), (-4,1)}
3. For each of the following pairs of ordered bases 0 and 0' for P2(R),
find the change of coordinate matrix that changes /^'-coordinates into
/^-coordinates.
(a) 0 = {x2,x,l} and
0' = {a2x2 + a\x + a0,62x2 + 61 a: + 60, c2x2 + C\X + CQ}
0 = {l,x, x2} and
0' = {a2x2 + mx + a0,62a;2 + 61X + 60, c2x2 + c\x + Cn}
0 = {2x2 - x, 3x2 + 1, x2} and 0' = {1, x, x2}
0 = {x2 - x + 1, x + 1, x2 + 1} and
0' = {x2 + x + 4,4x2 - Sx + 2,2x2 + 3}
/? = {x2 — a;, x2 + 1, a: — 1} and
0' = {5a:2 - 2x - 3, -2x2 + 5x + 5,2x2 - x - 3}
/9 = {2x2 - x + 1, x2 + 3x - 2, -x2 + 2x + 1} and
0' = {9x-9,x2+21x-2,3x2 + 5x + 2}
(b)
(c)
(d)
(e)
(f)
4. Let T be the linear operator on R2 defined by
T
2a+ 6
a — 36

Sec. 2.5 The Change of Coordinate Matrix
let 0 be the standard ordered basis for R2, and let
0' =
Use Theorem 2.23 and the fact that
i r-1
1 2
2
-1
-1
1
to find [T]p'.
117
5. Let T be the linear operator on Pi(R) defined by T(p(x)) = p'(x),
the derivative of p(x). Let 0 — {l,a:} and 0' = {1 + x, 1 — x}. Use
Theorem 2.23 and the fact that
1
1
1
-ij
-1
(l
u
1
2
1
2
6.
to find [T]p>.
For each matrix A and ordered basis /?, find [L^]^. Also, find an invert
ible matrix Q such that [LA]P = Q~lAQ.
(d) A =
7. In R2, let L be the line y = mx, where m ^ 0. Find an expression for
T(x, t/), where
(a) T is the reflection of R2 about L.
(b) T is the projection on L along the line perpendicular to L. (See
the definition of projection in the exercises of Section 2.1.)
8. Prove the following generalization of Theorem 2.23. Let T: V —•» W be
a linear transformation from a finite-dimensional vector space V to a
finite-dimensional vector space W. Let 0 and 0' be ordered bases for

118 Chap. 2 Linear Transformations and Matrices
V, and let 7 and 7' be ordered bases for W. Then [T]j/, = P_1[T]JQ,
where Q is the matrix that changes /^'-coordinates into /^-coordinates
and P is the matrix that changes 7'-coordinates into 7-coordinates.
9. Prove that "is similar to" is an equivalence relation on MnXn(F).
10. Prove that if A and B are similar n x n matrices, then tr(A) = tr(I?).
Hint: Use Exercise 13 of Section 2.3.
11. Let V be a finite-dimensional vector space with ordered bases a,0,
and 7.
(a) Prove that if Q and R are the change of coordinate matrices that
change a-coordinates into ^-coordinates and ,0-coordinates into
7-coordinates, respectively, then RQ is the change of coordinate
matrix that changes o>coordinates into 7-coordinates.
(b) Prove that if Q changes a-coordinates into /3-coordinates, then
Q~l changes /^-coordinates into a-coordinates.
12. Prove the corollary to Theorem 2.23.
13.* Let V be a finite-dimensional vector space over a field F, and let 0 —
{x\, x2,..., xn} be an ordered basis for V. Let Q be an nxn invertible
matrix with entries from F. Define
x'j = 22 Qijxi for 1 < J < ™,
and set 0' — {x[, x2, • • •, x'n}. Prove that 0' is a basis for V and hence
that Q is the change of coordinate matrix changing /^'-coordinates into
/^-coordinates.
14. Prove the converse of Exercise 8: If A and B are each mxn matrices
with entries from a field F, and if there exist invertible mxrn and n x n
matrices P and Q, respectively, such that B = P~1AQ, then there exist
an n-dimensional vector space V and an m-dimensional vector space W
(both over F), ordered bases 0 and 0' for V and 7 and 7' for W, and a
linear transformation T: V —> W such that
A = [T]} and B = [!]},.
Hints: Let V = Fn, W = Fm, T = L^, and 0 and 7 be the standard
ordered bases for F" and Fm, respectively. Now apply the results of
Exercise 13 to obtain ordered bases 0' and 7' from 0 and 7 via Q and
P, respectively.

Sec. 2.6 Dual Spaces
2.6* DUAL SPACES
119
In this section, we are concerned exclusively with linear transformations from
a vector space V into its field of scalars F, which is itself a vector space of di
mension 1 over F. Such a linear transformation is called a linear functional
on V. We generally use the letters f, g, h,... to denote linear functionals. As
we see in Example 1, the definite integral provides us with one of the most
important examples of a linear functional in mathematics.
Example 1
Let V be the vector space of continuous real-valued functions on the interval
[0,27r]. Fix a function g € V. The function h: V —• R defined by
2TT
1 rn
h(x) = —y x(t)g(t)dt
is a linear functional on V. In the cases that g(t) equals sin nt or cos nt, h(x)
is often called the nth Fourier coefficient of x. •
Example 2
Let V = MnXn(F), and define f: V -> F by f(A) = tr(A), the trace of A. By
Exercise 6 of Section 1.3, we have that f is a linear functional. •
Example 3
Let V be a finite-dimensional vector space, and let 0 = {xi,X2,... ,xn} be
an ordered basis for V. For each i = 1,2,..., n, define fi(x) = a», where
/«i
02
N/3 =
\dn)
is the coordinate vector of x relative to 0. Then fj is a linear functional on V
called the ith coordinate function with respect to the basis 0. Note
that U(XJ) = Sij, where &y is the Kronecker delta. These linear functionals
play an important role in the theory of dual spaces (see Theorem 2.24). •
Definition. For a vector space V over F, we define the dual space of
V to be the vector space £(V, F), denoted by V*.
Thus V* is the vector space consisting of all linear functionals on V with
the operations of addition and scalar multiplication as defined in Section 2.2.
Note that if V is finite-dimensional, then by the corollary to Theorem 2.20
(p. 104)
dim(V*) = dim(£(V,F)) = dim(V) • dim(F) = dim(V).

120 Chap. 2 Linear Transformations and Matrices
Hence by Theorem 2.19 (p. 103), V and V* are isomorphic. We also define
the double dual V** of V to be the dual of V*. In Theorem 2.26, we show,
in fact, that there is a natural identification of V and V** in the case that V
is finite-dimensional.
Theorem 2.24. Suppose that V is a finite-dimensional vector space with
the ordered basis 0 — {x\,x2,... ,xn}. Let f* (1 < i < n) be the ith coordi
nate function with respect to 0 as just defined, and let 0* = {fi,f2, • • • ,fn}-
Then 0* is an ordered basis for V*, and, for any f G V*, we have
f = ^f(xi)fi.
i=l
Proof Let f € V*. Since dim(V*) = n, we need only show that
n
f = ^f(xi)fi,
i=
from which it follows that 0* generates V*, and hence is a basis by Corollary
2(a) to the replacement theorem (p. 47). Let
g = ^f(a:i)fi.
For 1 < j < n, we have
g(*j) = f ][^f(zi)fi j (xj) = ^f(ffi)fi(#j
n
•=£ffo)*y=f(*i).
i=l
i=
Therefore f = g by the corollary to Theorem 2.6 (p. 72). |
Definition. Using the notation of Theorem 2.24, we call the ordered
basis 0* = {fi,f2, • • • ,fn} of\/* that satisfies U(XJ) — Sij (1 < i,j < n) the
dual basis of 0.
Example 4
Let 0 — {(2,1), (3,1)} be an ordered basis for R2. Suppose that the dual
basis of 0 is given by 0* = {fi, f2}- To explicitly determine a formula for fi,
we need to consider the equations
1 = fi(2,1) = fi(2ei + e2) = 2fi(ei) + fi(e2)
0 = f] (3,1) = fi(3ei + e2) = 3fi(ei) + fi(e2).
Solving these equations, we obtain fi(ei) = —1 and fi(e2) — 3; that is,
fi(x, y) — —x + 3y. Similarly, it can be shown that f2(x,y) = x — 2y. •

Sec. 2.6 Dual Spaces 121
We now assume that V and W are finite-dimensional vector spaces over F
with ordered bases 0 and 7, respectively. In Section 2.4, we proved that there
is a one-to-one correspondence between linear transformations T: V —» W and
mxn matrices (over F) via the correspondence T *-*• [T]2. For a matrix of
the form A = [T]l, the question arises as to whether or not there exists a
linear transformation U associated with T in some natural way such that U
may be represented in some basis as A1. Of course, if m / n, it would be
impossible for U to be a linear transformation from V into W. We now answer
this question by applying" what we have already learned about dual spaces.
Theorem 2.25. Let V and W be finite-dimensional vector spaces over
F with ordered bases 0 and 7, respectively. For any linear transformation
T: V -> W, the mapping V: W* -> V* defined by T'(g) = gT for all g e W*
is a linear transformation with the property that [T']'« = ([T]!)'.
Proof. For g £ W*, it is clear that T*(g) = gT is a linear functional on V
and hence is in V*. Thus Tf maps W* into V*. We leave the proof that Tb is
linear to the reader.
To complete the proof, let 0 = {xi,x2,... ,xn} and 7 = {3/1,2/2,.. • ,ym}
with dual bases 0* — {fi,f2, • • • ,fn} and 7* = {gi,g2, • • -/gm}, respectively.
For convenience, let A = [T]^. To find the jth column of [Tf]ij,», we be
gin by expressing T (gj) as a linear combination of the vectors of 0*. By
Theorem 2.24, we have
Tt(gJ) = gjT = ^(g,T)(x.,)f.s.
s=l
So the row i, column j entry of [T*]^, is
(gjT)(xi) = gj(T(Xi)) = gj I J2 Akiyk j
m rn
= ^2Akigj(yk) = y~] Akj6jk = Aji.
fc=i fc=i
Hence [T*]£ = A*. I
The linear transformation Tl defined in Theorem 2.25 is called the trans
pose of T. It is clear that T is the unique linear transformation U such that
[< = cm?)*- _
We illustrate Theorem 2.25 with the next example.

122 Chap. 2 Linear Transformations and Matrices
Example 5
Define T: ?i{R) -* R2 by T(p(x)) = (p(0),p(2)). Let 0 and 7 be the standard
ordered bases for Pi(R) and R2, respectively. Clearly,
m? - (1 j) •
We compute [T']r,. directly from the definition. Let /?* = {f 1, f2} ami 7* =
{gi,g2}. Suppose that [T*]£ = P b\ Then T*(gi) = af, +cf2. So
T*(gl)(l) = (ofi + cf2)(l) = afi (1) + cf2(l) = a(l) + c(0) = a.
But also
(Tt(gi))(l) = gi(T(l))=g1(l,l) = l.
So a = 1. Using similar computations, we obtain that c = 0, b = 1, and
d = 2. Hence a direct computation yields
[T
/
as predicted by Theorem 2.25.
- f1 1
0 2
•
= (m?)
We now concern ourselves with demonstrating that any finite-dimensional
vector space V can be identified in a natural way with its double dual V**.
There is, in fact, an isomorphism between V and V** that does not depend
on any choice of bases for the two vector spaces.
For a vector x G V, we define x: V* —* F by x(f) = f(x) for every f G V*.
It is easy to verify that x is a linear functional on V*, so x G V**. The
correspondence x *-* x allows us to define the desired isomorphism between
V and V**.
Lemma. Let V be a finite-dimensional vector space, and let x G V. If
x(f) = 0 for all f G V*, then x = 0.
Proof. Let x ^ 0. We show that there exists f G V* such that x(f) ^ 0.
Choose an ordered basis 0 = {xi,X2,... ,xn} for V such that xi = x. Let
{fi, f2,..., fn} be the dual basis of 0. Then fi(xx) = 1 ^ 0. Let f = f,. I
Theorem 2.26. Let V be a finite-dimensional vector space, and define
ip: V —» V** by ip(x) = x. Then ip is an isomorphism.

Sec. 2.6 Dual Spaces
Proof, (a) if) is linear: Let x,y G V and c£ F. For f G V*, we have
V>(cx + y)(f) = f(cx + y) = ef(x) + %) = cx(f) + y(f)
= (cx + y)(f).
123
Therefore
-0(cx + y) = ex + y = cip(x) + V(y)-
(b) •{/> is one-to-one: Suppose that i})(x) is the zero functional on V* for
some x G V. Then x(f) = 0 for every f G V*. By the previous lemma, we
conclude that x = 0.
(c) ijj is an isomorphism: This follows from (b) and the fact that dim(V) =
dim(V**). |
Corollary. Let V be a finite-dimensional vector space with dual space V*.
Then every ordered basis for V* is the dual basis for some basis for V.
Proof. Let {fi,f2,... ,fn} be an ordered basis for V*. We may combine
Theorems 2.24 and 2.26 to conclude that for this basis for V* there exists a
dual basis {xi,X2,... ,xn} in V**, that is, 5ij = x~i(fj) = f/(xi) for all i and
j. Thus {fi, f2,..., fn} is the dual basis of {xi, x2,..., x^|. I!
Although many of the ideas of this section, (e.g., the existence of a dual
space), can be extended to the case where V is not finite-dimensional, only a
finite-dimensional vector space is isomorphic to its double dual via the map
x —» x. In fact, for infinite-dimensional vector spaces, no two of V, V*, and
V** are isomorphic.
EXERCISES
1. Label the following statements as true or false. Assume that all vector
spaces are finite-dimensional.
(a) Every linear transformation is a linear functional.
(b) A linear functional defined on a field may be represented as a 1 x 1
matrix.
(c) Every vector space is isomorphic to its dual space.
(d) Every vector space is the dual of some other vector space.
(e) If T is an isomorphism from V onto V* and 0 is a finite ordered
basis for V, then T(0) = 0*.
(f) If T is a linear transformation from V to W, then the domain of
(vy is v**.
(g) If V is isomorphic to W, then V* is isomorphic to W*.

y2 + z2
124 Chap. 2 Linear Transformations and Matrices
(h) The derivative of a function may be considered as a linear func
tional on the vector space of differentiable functions.
2. For the following functions f on a vector space V, determine which are
linear functionals.
(a) V = P(R); f(p(x)) = 2p'(0) + p"(l), where ' denotes differentiation
(b) V = R2;f(x,2/) = (2x,4y)
(c) V = M2x2(F);f(i4)=tr(i4)
(d) V = R3;f(x,y,z)=a;2
(e) V = P(/l);f(p(x))=J01p(t)<ft
(f) V = M2x2(F);f(A) = Au
3. For each of the following vector spaces V and bases 0, find explicit
formulas for vectors of the dual basis 0* for V*, as in Example 4.
(a) V = R3;/? = {(1,0,1), (1,2,1), (0,0,1)}
(b) V = P2(Z?);/3={l,x,x2}
4. Let V = R3, and define fi,f2,f3 G V* as follows:
fi(x,y,z) =x-2y, f2(x,y, z) = x + y + z, f:i(x,y,z) = y - Zz.
Prove that {fi, f2, T3} is a basis for V*, and then find a basis for V for
which it is the dual basis.
f
5. Let V = Pi(.R), and, for p(x) G V, define fi,f2 G V* by
fi(p(x))= / p(t)dt and f2(p(x))= / p(t)dt.
Jo Jo
Prove that {fi,f2} is a basis for V*, and find a basis for V for which it
is the dual basis.
6. Define f G (R2)* by f(x,y) = 2x + y and T: R2 -> R2 by T(x,y) =
(Zx + 2y,x).
(a) Compute T*(f).
(b) Compute [T^p*, where 0 is the standard ordered basis for R2 and
0* = {f 1, f2} is the dual basis, by finding scalars a, b, c, and d such
that T'(fi) = ofi + cf2 and T'(f2) = 6f: + df2.
(c) Compute [T]p and ([T]^)4, and compare your results with (b).
7. Let V = Pi(R) and W = R2 with respective standard ordered bases 0
and 7. Define T: V —> W by
T(p(x)) = (p(0) - 2p(l),p(0) +p'(0)),
where p'(x) is the derivative of p(x).

Sec. 2.6 Dual Spaces 125
8.
9.
(a) For f G W* defined by f (o, b) = a - 26, compute T'(f).
(b) Compute [T*]^, without appealing to Theorem 2.25.
(c) Compute [T]« and its transpose, and compare your results with
(b).
Show that every plane through the origin in R3 may be identified with
the null space of a vector in (R3)*. State an analogous result for R2.
Prove that a function T: Frt —• Fm is linear if and only if there exist
fi,f2,--.,fm G (Fn)* such that T(x) = (fi(x),f2(x),... ,fm(x)) for all
x G Fn. Hint: If T is linear, define fj(x) = (g;T)(x) for x G Fn; that is,
fi — T'(gi) for 1 < i < m, where {gi,g2, • • • ,gm} is the dual basis of
the standard ordered basis for Fm.
10. Let V = Pn(F), and let cn,ci,..., cn be distinct scalars in F.
(a) For 0 < i < n, define U G V* by U(p(x)) = p(c.i). Prove that
{fo, fi,... ,fn} is a basis for V*. Hint: Apply any linear combi
nation of this set that equals the zero transformation to p(x) =
(x — ci)(x — c2) • • • (x — cn), and deduce that the first coefficient is
zero.
(b) Use the corollary to Theorem 2.26 and (a) to show that there exist
unique polynomials po(x),pi(x),... ,pn(x) such that Pi(cj) = 5ij
for 0 < i < n. These polynomials are the Lagrange polynomials
defined in Section 1.6.
(c) For any scalars an, fli,..., an (not necessarily distinct), deduce that
there exists a unique polynomial q(x) of degree at most n such that
q(ci) — a,i for 0 < i < n. In fact,
a(x) = ^2<hPi(x).
i=0
(d) Deduce the Lagrange interpolation formula:
n
P(X) = ^P(ci)Pi(x)
i=0
for any p(x) G V.
(e) Prove that
where
r-b n
j p(t)dt = ^2p(ci)di,
Ja i=0
di= [
Ja
Pi(t)dt.

126 Chap. 2 Linear Transformations and Matrices
Suppose now that
i(b — a) . _ i
Ci = a-\ for % = 0,1,..., n.
n
For n = 1, the preceding result yields the trapezoidal rule for
evaluating the definite integral of a polynomial. For n — 2, this
result yields Simpson's rule for evaluating the definite integral of
a polynomial.
11. Let V and W be finite-dimensional vector spaces over F, and let tp\ and
ip2 be the isomorphisms between V and V** and W and W**, respec
tively, as defined in Theorem 2.26. Let T: V —» W be linear, and define
T" = (T*)*. Prove that the diagram depicted in Figure 2.6 commutes
(i.e., prove that ip2T = T"^).
V
V
W
i>2
W
/
Figure 2.6
12. Let V be a finite-dimensional vector space with the ordered basis 0.
Prove that ip(0) = 0**, where ip is defined in Theorem 2.26.
In Exercises 13 through 17, V denotes a finite-dimensional vector space over
F. For every subset S of V, define the annihilator 5° of 5 as
5° = {f G V*: f(x) = 0 for all x G S}.
13. (a) Prove that S° is a subspace of V*.
(b) If W is a subspace of V and x g" W, prove that there exists f G W°
such that f (x) ^ 0.
(c) Prove that (5°)° = span(^'(S')), where 'ip is defined as in Theo
rem 2.26.
(d) For subspaces Wi and W2, prove that Wi = W2 if and only if
W? = Wg.
(e) For subspaces Wi and W2, show that (Wj + W2)° = Wj n W§.
14. Prove that if W is a subspace of V, then dim(W) + dim(W°) = dim(V).
Hint: Extend an ordered basis {xi,X2, • • •, xk] of W to an ordered ba
sis 0 = {xi,X2,. -. ,xn} of V. Let 0* = {fi,f2,... ,fn}. Prove that
{ffc+i,ffc+2,...,f„} is a basis for W°.

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 127
15. Suppose that W is a finite-dimensional vector space and that T: V —» W
is linear. Prove that N(T<) = (R(T))°.
16. Use Exercises 14 and 15 to deduce that rank(L^t) = rank(L^) for any
A&MmXn(F).
17. Let T be a linear operator on V, and let W be a subspace of V. Prove
that W is T-invariant (as defined in the exercises of Section 2.1) if and
only if W° is ^-invariant.
18. Let V be a nonzero vector space over a field F, and let 5 be a basis
for V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7, every
vector space has a basis.) Let $: V* —• J-(S, F) be the mapping defined
by $(f) = fs, the restriction of f to S. Prove that is an isomorphism.
Hint: Apply Exercise 34 of Section 2.1.
19. Let V be a nonzero vector space, and let W be a proper subspace of V
(i.e., W/V). Prove that there exists a nonzero linear functional f G V*
such that f (x) = 0 for all x G W. Hint: For the infinite-dimensional
case, use Exercise 34 of Section 2.1 as well as results about extending
linearly independent sets to bases in Section 1.7.
20. Let V and W be nonzero vector spaces over the''same field, and let
T: V —» W be a linear transformation.
(a) Prove that T is onto if and only if T* is one-to-one.
(b) Prove that T* is onto if and only if T is one-to-one.
Hint: Parts of the proof require the result of Exercise 19 for the infinite-
dimensional case.
2.7* HOMOGENEOUS LINEAR DIFFERENTIAL EQUATIONS
WITH CONSTANT COEFFICIENTS
As an introduction to this section, consider the following physical problem. A
weight of mass m is attached to a vertically suspended spring that is allowed to
stretch until the forces acting on the weight are in equilibrium. Suppose that
the weight is now motionless and impose an xy-coordinate system with the
weight at the origin and the spring lying on the positive y-axis (see Figure 2.7).
Suppose that at a certain time, say t = 0, the weight is lowered a distance
s along the y-axis and released. The spring then begins to oscillate.
We describe the motion of the spring. At any time t > 0, let F(t) denote
the force acting on the weight and y(t) denote the position of the weight along
the y-axis. For example, y(0) = — s. The second derivative of y with respect

128 Chap. 2 Linear Transformations and Matrices
Figure 2.7
to time, y"(t), is the acceleration of the weight at time t\ hence, by Newton's
second law of motion,
F(t) = my"(t). (1)
It is reasonable to assume that the force acting on the weight is due totally
to the tension of the spring, and that this force satisfies Hooke's law: The force
acting on the weight is proportional to its displacement from the equilibrium
position, but acts in the opposite direction. If k > 0 is the proportionality
constant, then Hooke's law states that
F(t) = -ky(t).
Combining (1) and (2), we obtain my" = —ky or
(2)
y + — y = 0. (3)
The expression (3) is an example of a differential equation. A differential
equation in an unknown function y = y(t) is an equation involving y, t, and
derivatives of y. If the differential equation is of the form
any
(n)
an-iy
(n-1)
+ •• a\y
(i)
a>oy = f; (4)
where an, ai,..., on and / are functions of t and y^k' denotes the kth deriva
tive of y, then the equation is said to be linear. The functions ai are called
the coefficients of the differential equation (4). Thus (3) is an example
of a linear differential equation in which the coefficients are constants and
the function / is identically zero. When / is identically zero, (4) is called
homogeneous.
In this section, we apply the linear algebra we have studied to solve ho
mogeneous linear differential equations with constant coefficients. If an ^ 0,

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 129
we say that differential equation (4) is of order n. In this case, we divide
both sides by an to obtain a new, but equivalent, equation
.'/
(n)
K-iy
(n-1) + • • • + hy{1) + b0y = 0,
where 6, = a;/an for i = 0,1,... ,n — 1. Because of this observation, we
always assume that the coefficient an in (4) is 1.
A solution to (4) is a function that when substituted for y reduces (4)
to an identity.
Example 1
The function y(t) = sin y/k/m t is a solution to (3) since
y (t) + —y(t) == sin \ — t + — sin
m m V m m
m
for all f. Notice, however, that substituting y(t) = t into (3) yields
k k
y"(t) + -y(t) = -t,
m m
which is not identically zero. Thus y(t) — t is not a solution to (3). •
In our study of differential equations, it is useful to regard solutions as
complex-valued functions of a real variable even though the solutions that
are meaningful to us in a physical sense are real-valued. The convenience
of this viewpoint will become clear later. Thus we are concerned with the
vector space J-~(R, C) (as defined in Example 3 of Section 1.2). In order to
consider complex-valued functions of a real variable as solutions to differential
equations, we must define what it means to differentiate such functions. Given
a complex-valued function x G T(R, C) of a real variable t, there exist unique
real-valued functions xi and X2 of i, such that
x(t) = xi(t) + ix2(t) for t G R,
where i is the imaginary number such that i2 = —1. We call Xi the real part
and X2 the imaginary part of x.
Definitions. Given a function x G J-(R, C) with real part x\ and imag
inary part x2, we say that x is differentiable ifx\ and x2 are differentiable.
If x is differentiable, we define the derivative x' of x by
3s — 'dj i ~\~ IXo'
We illustrate some computations with complex-valued functions in the
following example.

130 Chap. 2 Linear Transformations and Matrices
Example 2
Suppose that x(t) — cos 2t + i sin 2t. Then
x'(t) = -2 sin 2t + 2i cos 2t.
We next find the real and imaginary parts of x2. Since
x2(t) = (cos 2t + i sin 2t)2 = (cos2 2t - sin2 2t) + i(2 sin 2t cos 2i)
= cos4i + isin4t,
the real part of x2(t) is cos4i, and the imaginary part is sin4£. •
The next theorem indicates that we may limit our investigations to a
vector space considerably smaller than T(R, C). Its proof, which is illustrated
in Example 3, involves a simple induction argument, which we omit.
Theorem 2.27. Any solution to a homogeneous linear differential equa
tion with constant coefficients has derivatives of all orders; that is, if x is a
solution to such an equation, then x^k' exists for every positive integer k.
Example 3
To illustrate Theorem 2.27, consider the equation
/
y(2) + 4y = 0.
Clearly, to qualify as a solution, a function y must have two derivatives. If y
is a solution, however, then
y(2) = -4y-
Thus since y^ is a constant multiple of a function y that has two derivatives,
yW must have two derivatives. Hence y^ exists; in fact,
3,(4) = -4yM.
Since j/4' is a constant multiple of a function that we have shown has at
least two derivatives, it also has at least two derivatives; hence y(6> exists.
Continuing in this manner, we can show that any solution has derivatives of
all orders. •
Definition. We use C°° to denote the set of all functions in T(R, C) that
have derivatives of all orders.
It is a simple exercise to show that C°° is a subspace of F(R, C) and hence
a vector space over C. In view of Theorem 2.27, it is this vector space that

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 131
is of interest to us. For x G C°°, the derivative x' of x also lies in C°°. We
can use the derivative operation to define a mapping D: C°° —+ C°° by
D(x) = x' forxGC00.
It is easy to show that D is a linear operator. More generally, consider any
polynomial over C of the form
p(t) = antn +an-it n-l + a + OQ.
If we define
p(D) = anDn + On-iD*-1 + • • • + oiD + o0l,
then p(D) is a linear operator on C°°. (See Appendix E.)
Definitions. For any polynomial p(t) over C of positive degree, p(D) is
called a differential operator. The order of the differential operator p(D)
is the degree of the polynomial p(t).
Differential operators are useful since they provide us with a means of
reformulating a differential equation in the context of linear algebra. Any
homogeneous linear differential equation with constant coefficients,
y{n)+an-iy(n-v + --- + aiy{i) +a0y=(),
can be rewritten using differential operators as
;Dn + ow_iDn-1 + aiD + ool)(y) = 0.
Definition. Given the differential equation above, the complex polyno
mial
p(t) = tn + on_iin_1 + • • • + oii + a0
is called the auxiliary polynomial associated with the equation.
For example, (3) has the auxiliary polynomial
Any homogeneous linear differential equation with constant coefficients
can be rewritten as
p(D)(y) = 0,
where p(t) is the auxiliary polynomial associated with the equation. Clearly,
this equation implies the following theorem.

132 Chap. 2 Linear Transformations and Matrices
Theorem 2.28. The set of all solutions to a homogeneous linear differen
tial equation with constant coefficients coincides with the null space ofp(D),
where p(t) is the auxiliary polynomial associated with the equation.
Proof Exercise. I
Corollary. The set of all solutions to a homogeneous linear differential
equation with constant coefficients is a subspace of C°°.
In view of the preceding corollary, we call the set of solutions to a homo
geneous linear differential equation with constant coefficients the solution
space of the equation. A practical way of describing such a space is in terms
of a basis. We now examine a certain class of functions that is of use in
finding bases for these solution spaces.
For a real number s, we are familiar with the real number e9, where e is
the unique number whose natural logarithm is 1 (i.e., lne = 1). We know,
for instance, certain properties of exponentiation, namely,
es+t = eset and e — -r
for any real numbers s and t. We now extend the definition of powers of e to
include complex numbers in such a way that these properties are preserved.
Definition. Let 6 = a + ib be a complex number with real part a and
imaginary part b. Define
ec = ea(cosb + isinb).
The special case
Jb e = cos b + i sin 6
is called Euler's formula.
For example, for c = 2 + i(n/Z),
= e (cos-+Jsin-)=e ^-
Clearly, if c is real (6 = 0), then we obtain the usual result: ec = ea. Using
the approach of Example 2, we can show by the use of trigonometric identities
that
ec+d = eced and e'c = —
for any complex numbers c and d.

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 133
Definition. A function f:R—*C defined by f(t) = ect for a fixed
complex number c is called an exponential function.
The derivative of an exponential function, as described in the next theo
rem, is consistent with the real version. The proof involves a straightforward
computation, which we leave as an exercise.
Theorem 2.29. For any exponential function f(t) = ect, f'(t) = cect.
Proof. Exercise. 1
We can use exponential functions to describe all solutions to a homoge
neous linear differential equation of order 1. Recall that the order of such an
equation is the degree of its auxiliary polynomial. Thus an equation of order
1 is of the form
y' + a0y = 0. (5)
Theorem 2.30. The solution space for (5) is of dimension 1 and has
{e~aot} as a basis.
Proof. Clearly (5) has e_a,|t as a solution. Suppose that x(t) is any solution
to (5). Then
x'(t) = -a0x(t) for all t G R. y
Define
z(t) = eai,tx(t).
Differentiating z yields
z'(t) = (eaot)'x(t) + e°°V(t) = o0e°°*x(*) - a0eaotx(t) = 0.
(Notice that the familiar product rule for differentiation holds for complex-
valued functions of a real variable. A justification of this involves a lengthy,
although direct, computation.)
Since z' is identically zero, z is a constant function. (Again, this fact, well
known for real-valued functions, is also true for complex-valued functions.
The proof, which relies on the real case, involves looking separately at the
real and imaginary parts of z.) Thus there exists a complex number k such
that
z(t) = eaotx(t) = k for all t G R.
So
x(t) = ke-aot.
We conclude that any solution to (5) is a linear combination of e~aot.

134 Chap. 2 Linear Transformations and Matrices
Another way of stating Theorem 2.30 is as follows.
Corollary. For any complex number c, the null space of the differential
operator D — cl has {ect} as a basis.
We next concern ourselves with differential equations of order greater
than one. Given an nth order homogeneous linear differential equation with
constant coefficients,
^+an.l2/(»-1' + OiyW +a0?/= 0,
its auxiliary polynomial
p(t) = tn + an_xtn'1 + • • • + ait + oo
factors into a product of polynomials of degree 1, that is,
p(t) = (t-ci)(t-c2)---(t-cn),
where c.i,c2,... ,cn are (not necessarily distinct) complex numbers. (This
follows from the fundamental theorem of algebra in Appendix D.) Thus
p(D) = (D-c1)(D-c2)---(D-cn).
The operators D — cj commute, and so, by Exercise 9, we have that
r, N(D-Cil) C N(p(D)) foralli
Since N(p(D)) coincides with the solution space of the given differential equa
tion, we can deduce the following result from the preceding corollary.
Theorem 2.31. Let p(t) be the auxiliary polynomial for a homogeneous
linear differential equation with constant coefficients. For any complex num
ber c, if c is a zero ofp(t), then ect is a solution to the differential equation.
Example 4
Given the differential equation
y"-3y' + 2y=0,
its auxiliary polynomial is
p(t) = t2 - U + 2 = (t - l)(t - 2).
Hence, by Theorem 2.31, el and e2t are solutions to the differential equa
tion because c — 1 and c — 2 are zeros of p(t). Since the solution space
of the differential equation is a subspace of C°°, span({e*,e2*}) lies in the
solution space. It is a simple matter to show that {ee,e2t} is linearly inde
pendent. Thus if we can show that the solution space is two-dimensional, we
can conclude that {e*,e2*} is a basis for the solution space. This result is a
consequence of the next theorem. •

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 135
Theorem 2.32. For any differential operator p(D) of order n, the null
space ofp(D) is an n-dimensional subspace of C°°.
As a preliminary to the proof of Theorem 2.32, we establish two lemmas.
Lemma 1. The differential operator D — c\: C°° —> C°° is onto for any
complex number c.
Proof. Let v G C°°. We wish to find a u G C°° such that (D — c)u = v.
Let w(t) = v(t)e~ct for t G R. Clearly, w G C°° because both v and e~ct lie in
C°°. Let W\ and w2 be the real and imaginary parts of w. Then w\ and w2 are
continuous because they are differentiable. Hence they have antiderivatives,
say, W\ and W2, respectively. Let W: R, —* C be defined by
W(t) = Wt (t) + iW2(t) for t G R.
Then W G C°°, and the real and imaginary parts of W are W\ and W2,
respectively. Furthermore, W = w. Finally, let u: R —* C be defined by
u(t) = W(t)ect for t G R. Clearly u G C°°, and since
(D - c)u(t) = u'(t) - cu(t)
= W'(t)cct + W(t)cect - cW(f)ect
= w(t)e,:t
= v(t)erctert
= v(t),
we have (D — c)u = v. II
Lemma 2. Let V be a vector space, and suppose that T and U are
linear operators on V such that U is onto and the null spaces of T and U are
finite-dimensional. Then the null space of TU is finite-dimensional, and
dim(N(TU)) = dim(N(T)) + dim(N(U)).
Proof. Let p — dim(N(T)), q = dim(N(U)), and {ui,u2,... ,up} and
{v\,v2,... ,Vq] be bases for N(T) and N(U), respectively. Since U is onto,
we can choose for each i (1 < i < p) a vector Wi G V such that U(wi) — m.
Note that the Wi's are distinct. Furthermore, for any i and j, Wi 7^ Vj, for
otherwise Ui = i)(wi) = U(VJ) = 0—a contradiction. Hence the set
0= {wi,w2,... ,wp,vi,v2,... ,vq}
contains p + q distinct vectors. To complete the proof of the lemma, it suffices
to show that 0 is a basis for N(TU).

136 Chap. 2 Linear Transformations and Matrices
We first show that 0 generates N(TU). Since for any Wi and Vj in 0,
TUK) = T(ui) = 0 and TU(^) = T(0) = 0, it follows that 0 C N(TU).
Now suppose that v G N(TU). Then 0 - TU(v) = T(U(v)). Thus U(v) €
N(T). So there exist scalars 01,02,... ,op such that
\J(v) = aiu\ + 02^2 + V apup
= aiU(w) + a2U(w2) H h aPU(wp)
= \J(ttiWi + 02^2 + f- npwp).
Hence
\J(v — (aiwi + 02^2 H f- aPwp)) = 0.
Consequently, v — (a\Wi + 02^2 + • • • + apwp) lies in N(U). It follows that
there exist scalars 61, b2,..., bq such that
v - (aiwi + a2w2 H r- apwp) = 61^1 + &2W2 H h &?u9
or
v = aiu/i + 02^2 H f- apwp + 61 vi + 62^2 H r- 6gv9.
Therefore /? spans N(TU).
To prove that 0 is linearly independent, let ai,a2,... ,ap,b\,b2,... ,bq be
any scalars such that
aiwi + a2u'2 H h apwp + &1V1 + 62^2 H r- &9vg = 0. (6)
Applying U to both sides of (6), we obtain
oiUi + a2u2 + • • • + apup — 0.
Since {m,u2,... ,up] is linearly independent, the a;'s are all zero. Thus (6)
reduces to
&if 1 + b2v2 "1 r bqvq = 0.
Again, the linear independence of {v\,v2,... ,vq} implies that the frj's are
all zero. We conclude that 0 is a basis for N(TU). Hence N(TU) is finite-
dimensional, and dim(N(TU)) = p + q = dim(N(T)) + dim(N(U)). I
Proof of Theorem 2.32. The proof is by mathematical induction on the
order of the differential operator p(D). The first-order case coincides with
Theorem 2.30. For some integer n > 1, suppose that Theorem 2.32 holds
for any differential operator of order less than n, and consider a differential

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 137
operator p(D) of order n. The polynomial p(t) can be factored into a product
of two polynomials as follows:
p(t) = q(t)(t-c),
where q(t) is a polynomial of degree n — 1 and c is a complex number. Thus
the given differential operator may be rewritten as
p(D) = g(D)(D-cl).
Now, by Lemma 1, D — cl is onto, and by the corollary to Theorem 2.30,
dim(N(D-d)) = 1. Also, by the induction hypothesis, dim(N(r/(D)) = n—1.
Thus, by Lemma 2, we conclude that
dim(N(p(D))) = dim(N(<-/(D))) + dim(N(D - cl))
= (n - 1) + 1 = n. I
Corollary. The solution space of any nth-order homogeneous linear dif
ferential equation with constant coefficients is an n-dimensional subspace of
C°°.
The corollary to Theorem 2.32 reduces the problem of finding all solutions
to an r/,th-order homogeneous linear differential equation with constant coeffi
cients to finding a set of n linearly independent solutions to the equation. By
the results of Chapter 1, any such set must be a basis for the solution space.
The next theorem enables us to find a basis quickly for many such equations.
Hints for its proof arc provided in the exercises.
Theorem 2.33. Given n distinct complex numbers c.\,c2,... ,cn, the set
of exponential functions {eClt,eC2t,... ,ec,tt} is linearly independent.
Proof Exercise. (See Exercise 10.) 1
Corollary. For any nth-order homogeneous linear differential equation
with constant coefficients, if the auxiliary polynomial has n distinct zeros
Ci,c2,...,Cn, then {eCl*, ec'2t,..., eCn*} is a basis for the solution space of the
differential equation.
Proof. Exercise. (See Exercise 10.) II
Example 5
We find all solutions to the differential equation
y" + 5y' + 4y=0.

138 Chap. 2 Linear Transformations and Matrices
Since the auxiliary polynomial factors as (t + 4)(t + 1), it has two distinct
zeros, —1 and —4. Thus {e~L,e~4t} is a basis for the solution space. So any
solution to the given equation is of the form
y(t)=ble-t + b2e-4t
for unique scalars b\ and b2. •
Example 6
We find all solutions to the differential equation
y" + 9y = 0.
The auxiliary polynomial t2 + 9 factors as (t — 3i)(t + 3?') and hence has
distinct zeros Ci = 3i and c2 — —3i. Thus {e3*f,e-3**} is a basis for the
solution space. Since
cos 3^ = -(e 3/7 + e~3it) and sin3£ = —(e :nt -3it
2i
),
it follows from Exercise 7 that {cos St, sin 3t} is also a basis for this solution
space. This basis has an advantage over the original one because it consists of
the familiar sine and cosine functions and makes no reference to the imaginary
number i. Using this latter basis, we see that any solution to the given
equation is of the form
y(t) = b\ cos St + b2 sin 3t
for unique scalars &iand b2. •
Next consider the differential equation
y" + 2y' + y = 0,
for which the auxiliary polynomial is (t + l)2. By Theorem 2.31, e~l is a
solution to this equation. By the corollary to Theorem 2.32, its solution
space is two-dimensional. In order to obtain a basis for the solution space,
we need a solution that is linearly independent of e_t. The reader can verify
that te~l is a such a solution. The following lemma extends this result.
Lemma. For a given complex number c and positive integer n, suppose
that (t — c)n is the auxiliary polynomial of a homogeneous linear differential
equation with constant coefficients. Then the set
0={ect,tect,...,tn-1ect}
is a basis for the solution space of the equation.

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 139
Proof. Since the solution space is n-dimensional, we need only show that
0 is linearly independent and lies in the solution space. First, observe that
for any positive integer k,
k-\ „ct Lk„ct Lk„ct
Hence for k < n,
D - c)(tKect) = ktK~lect + ctKect - ctKe
= ktk~1ect.
(D-c)n(tkect) = 0.
It follows that 0 is a subset of the solution space.
We next show that 0 is linearly independent. Consider any linear combi
nation of vectors in 0 such that
,Tl-l„ct
b0ect + hie" + • • • + bn-itn-*ea = 0
for some scalars bo, &i,..., 6n-i- Dividing by ect in (7), we obtain
&n + M
n-l bn^tn~1 = 0.
(7)
(8)
y(4) _ 4y(3) + §yK
Thus the left side of (8) must be the zero polynomial function. We conclude
that the coefficients bo,bi,...,bn—i are all zero. So 0 is linearly independent
and hence is a basis for the solution space. f]
Example 7
We find all solutions to the differential equation
Jy(2) -^(l) +y=0.
Since the auxiliary polynomial is
t4 - 4t6 + 6t2 -4t+l = (t- l)4,
we can immediately conclude by the preceding lemma that {e*, tef, fte1:, i3e*}
is a basis for the solution space. So any solution y to the given differential
equation is of the form
y(t) = bie1 + bite* + M2e* + ht3e*
for unique scalars b\, b2, bj, and 64. •
The most general situation is stated in the following theorem.
Theorem 2.34. Given a homogeneous linear differential equation with
constant coefficients and auxiliary polynomial
(t-Ci)ni(t-c2)n*-'-(t-ck)n
where n\,n2,...,nk arc positive integers and c,\,c2,... ,ck are distinct com
plex numbers, the following set is a basis for the solution space of the equation:
{eClt,teCl\ ... ,tni-1ec,t,... ,eCfct,ieCfct,... ,tn" eCkt}.

140 Chap. 2 Linear Transformations and Matrices
Proof. Exercise. |
Example 8
The differential equation
y(3) _ 4y(2) + 5^(1) -2y= 0
has the auxiliary polynomial
f3 - 4£2 + 5t - 2 = (* - l)2(t - 2).
By Theorem 2.34, {e*,£e*,e2<} is a basis for the solution space of the differ
ential equation. Thus any solution y has the form
y(t) = 6ie* + b2tet + b3e2t
for unique scalars b\,b2, and 63. •
EXERCISES
1. Label the following statements as true or false.
(a) The set of solutions to an nth-order homogeneous linear differential
equation with constant coefficients is an n-dimensional subspace of
C°°.
(b) The solution space of a homogeneous linear differential equation
with constant coefficients is the null space of a differential operator.
(c) The auxiliary polynomial of a homogeneous linear differential
equation with constant coefficients is a solution to the differential
equation.
(d) Any solution to a homogeneous linear differential equation with
constant coefficients is of the form aect or atkect, where a and c
are complex numbers and k is a positive integer.
(e) Any linear combination of solutions to a given homogeneous linear
differential equation with constant coefficients is also a solution to
the given equation.
(f) For any homogeneous linear differential equation with constant
coefficients having auxiliary polynomial p(t), if ci,C2,--.,Cfe are
the distinct zeros of p(t), then {eCl',cC2*,... ,eCkt} is a basis for
the solution space of the given differential equation.
(g) Given any polynomial p(t) G P(C), there exists a homogeneous lin
ear differential equation with constant coefficients whose auxiliary
polynomial is p(t).

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 141
2. For each of the following parts, determine whether the statement is true
or false. Justify your claim with either a proof or a counterexample,
whichever is appropriate.
(a) Any finite-dimensional subspace of C°° is the solution space of a
homogeneous linear differential equation with constant coefficients.
(b) There exists a homogeneous linear differential equation with con
stant coefficients whose solution space has the basis {t, t2}.
(c) For any homogeneous linear differential equation with constant
coefficients, if a; is a solution to the equation, so is its derivative
x'.
Given two polynomials p(t) and q(t) in P(C), if x G N(p(D)) and y G
N((/(D)), then
(d) x + yeU(P(D)q(D)).
(e) xyeU(p(D)q(D)).
3. Find a basis for the solution space of each of the following differential
equations.
(a) y" + 2y' + y = 0
(b) */" = ;</
(c) ?/4) - 2;(/2> + y = 0
(d) y" + 2y, + y=0
(e) y(3) - y{2) + 3y{1) +5y = 0
/
4. Find a basis for each of the following subspaces of C°°.
(a) N(D2-D-I)
(b) N(D3-3D2 + 3D-I)
(c) N(D3 + 6D2+8D)
5. Show that C°° is a subspace of T(R, C).
6. (a) Show that D: C°° —* C°° is a linear operator.
(b) Show that any differential operator is a linear operator on C01
7. Prove that if {x, y} is a basis for a vector space over C, then so is
\{=o + v),^-y)
8. Consider a second-order homogeneous linear differential equation with
constant coefficients in which the auxiliary polynomial has distinct con
jugate complex roots a + ib and a — ib, where a,b G R. Show that
{eat cos bt, eat sin bt} is a basis for the solution space.

142 Chap. 2 Linear Transformations and Matrices
9. Suppose that {Ui, U2, • • •, Un} is a collection of pairwise commutative
linear operators on a vector space V (i.e., operators such that U7Uj =
UjUi for all i,j). Prove that, for any i (1 < i < n),
N(Ui)CN(UiU2---Un).
10. Prove Theorem 2.33 and its corollary. Hint: Suppose that
b\eClt + b2eC2t -\ h bneCnt = 0 (where the c^'s are distinct).
To show the b^s are zero, apply mathematical induction on n as follows.
Verify the theorem for n = 1. Assuming that the theorem is true for
n — 1 functions, apply the operator D — cnl to both sides of the given
equation to establish the theorem for n distinct exponential functions.
11. Prove Theorem 2.34. Hint: First verify that the alleged basis lies in
the solution space. Then verify that this set is linearly independent by
mathematical induction on k as follows. The case k = 1 is the lemma
to Theorem 2.34. Assuming that the theorem holds for k — 1 distinct
Cj's, apply the operator (D — Cfcl)"fc to any linear combination of the
alleged basis that equals 0.
12. Let V be the solution space of an nth-order homogeneous linear differ
ential equation with constant coefficients having auxiliary polynomial
p(t). Prove that if p(t) = g(t)h(t), where g(t) and h(t) are polynomials
of positive degree, then
N(MD)) = R(o(Dv)) = p(D)(V),
where Dv: V —> V is defined by Dv(x) = x' for x G V. Hint: First prove
o(D)(V) C N(/i(D)). Then prove that the two spaces have the same
finite dimension.
13. A differential equation
y(n) + an-iy{n-l) + •••+ alV{l) + a0y = x
is called a nonhomogeneous linear differential equation with constant
coefficients if the a% 's are constant and x is a function that is not iden
tically zero.
(a) Prove that for any x G C°° there exists y G C°° such that y is
a solution to the differential equation. Hint: Use Lemma 1 to
Theorem 2.32 to show that for any polynomial pit), the linear
operator p(D): C°° —• C°° is onto.

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 143
(b) Let V be the solution space for the homogeneous linear equation
•+Oii(/(1) +a0y= 0. y^ + on-i^"-1)
Prove that if z is any solution to the associated nonhomogeneous
linear differential equation, then the set of all solutions to the
nonhomogeneous linear differential equation is
[z + y: ye V}.
14. Given any nth-order homogeneous linear differential equation with con
stant coefficients, prove that, for any solution x and any to G R, if
x(t0) = x'(to) = ••• = .T(TC_1)(£0) = 0, then x = 0 (the zero function).
Hint: Use mathematical induction on n as follows. First prove the con
clusion for the case n = 1. Next suppose that it is true for equations of
order n — 1, and consider an nth-order differential equation with aux
iliary polynomial p(t). Factor p(t) = q(t)(t — c), and let z = q((D))x.
Show that z(to) = 0 and z' — cz — 0 to conclude that z = 0. Now apply
the induction hypothesis.
15. Let V be the solution space of an nth-order homogeneous linear dif
ferential equation with constant coefficients. Fix ta G R, and define a
mapping $: V —+ C™ by
*(*) =
* x(to)
x'(to)
\xSn-lHto)J
for each x in V.
(a)
(b)
Prove that $ is linear and its null space is the zero subspace of V.
Deduce that «£ is an isomorphism. Hint: Use Exercise 14.
Prove the following: For any nth-order homogeneous linear dif
ferential equation with constant coefficients, any to G R, and any
complex numbers Co, Ci,... ,cn_i (not necessarily distinct), there
exists exactly one solution, x, to the given differential equation
such that x(to) — co and x^(to) = ck for k = 1, 2,... n — 1.
16. Pendular Motion. It is well known that the motion of a pendulum is
approximated by the differential equation
0" + ^8= 0,
where 6(t) is the angle in radians that the pendulum makes with a
vertical line at time t (see Figure 2.8), interpreted so that 6 is positive
if the pendulum is to the right and negative if the pendulum is to the

144 Chap. 2 Linear Transformations and Matrices
Figure 2.8
left of the vertical line as viewed by the reader. Here I is the length
of the pendulum and g is the magnitude of acceleration due to gravity.
The variable t and constants / and g must be in compatible units (e.g.,
t in seconds, / in meters, and g in meters per second per second).
(a) Express an arbitrary solution to this equation as a linear combi
nation of two real-valued solutions.
(b) Find the unique solution to the equation that satisfies the condi
tions
0(0) = 0O > 0 and 0'(O) = 0.
(The significance of these conditions is that at time t = 0 the
pendulum is released from a position displaced from the vertical
by 0n.)
(c) Prove that it takes 2-Ky/lJg units of time for the pendulum to make
one circuit back and forth. (This time is called the period of the
pendulum.)
17. Periodic Motion of a Spring without Damping. Find the general solu
tion to (3), which describes the periodic motion of a spring, ignoring
frictional forces.
18. Periodic Motion of a Spring with Damping. The ideal periodic motion
described by solutions to (3) is due to the ignoring of frictional forces.
In reality, however, there is a frictional force acting on the motion that
is proportional to the speed of motion, but that acts in the opposite
direction. The modification of (3) to account for the frictional force,
called the damping force, is given by
my
+ ry' + ky — 0,
where r > 0 is the proportionality constant,
(a) Find the general solution to this equation.

Chap. 2 Index of Definitions 145
(b) Find the unique solution in (a) that satisfies the initial conditions
y(0) — 0 and y'(0) = vo, the initial velocity.
(c) For y(t) as in (b), show that the amplitude of the oscillation de
creases to zero; that is, prove that lim y(t) = 0.
t—>oo
19. In our study of differential equations, we have regarded solutions as
complex-valued functions even though functions that are useful in de
scribing physical motion are real-valued. Justify this approach.
20. The following parts, which do not involve linear algebra, arc included
for the sake of completeness.
(a) Prove Theorem 2.27. Hint: Use mathematical induction on the
number of derivatives possessed by a solution.
(b) For any c,d G C, prove that
ec+d = cced and
(c) Prove Theorem 2.28.
(d) Prove Theorem 2.29.
(e) Prove the product rule for differentiating complex-valued func
tions of a real variable: For any differentiable functions x and
y in T(R, C), the product xy is differentiable and
(xy)' = x'y + xy'.
Hint: Apply the rules of differentiation to the real and imaginary
parts of xy.
(f) Prove that if x G J-(R, C) and x' — 0, then x is a constant func
tion.
INDEX OF DEFINITIONS FOR CHAPTER 2
Auxiliary polynomial 131
Change of coordinate matrix 112
Clique 94
Coefficients of a differential equation
128
Coordinate function 119
Coordinate vector relative to a basis
80
Differential equation 128
Differential operator 131
Dimension theorem 69
Dominance relation 95
Double dual 120
Dual basis 120
Dual space 119
Euler's formula 132
Exponential function 133
Fourier coefficient 119
Homogeneous linear differential
equation 128
Identity matrix 89
Identity transformation 67

146 Chap. 2 Linear Transformations and Matrices
Incidence matrix 94
Inverse of a linear transformation
99
Inverse of a matrix 100
Invertible linear transformation 99
Invertible matrix 100
Isomorphic vector spaces 102
Isomorphism 102
Kronecker delta 89
Left-multiplication transformation
92
Linear functional 119
Linear operator 112
Linear transformation 65
Matrix representing a linear trans
formation 80
Nonhomogeneous differential equa
tion 142
Nullity of a linear transformation
69
Null space 67
Ordered basis 79
Order of a differential equation 129
Order of a differential operator 131
Product of matrices 87
Projection on a subspace 76
Projection on the x-axis 66
Range 67
Rank of a linear transformation 69
Reflection about the x-axis 66
Rotation 66
Similar matrices 115
Solution to a differential equation
129
Solution space of a homogeneous dif
ferential equation 132
Standard ordered basis for F" 79
Standard ordered basis for Pn(F)
79
Standard representation of a vector
space with respect to a basis 104
Transpose of a linear transformation
121
Zero transformation 67
/

3
Elementary Matrix
Operations and Systems
of Linear Equations
3.1 Elementary Matrix Operations and Elementary Matrices
3.2 The Rank of a Matrix and Matrix Inverses
3.3 Systems of Linear Equations—Theoretical Aspects
3.4 Systems of Linear Equations—Computational Aspects
_£. his chapter is devoted to two related objectives:
1. the study of certain "rank-preserving" operations on matrices;
2. the application of these operations and the theory of linear transforma
tions to the solution of systems of linear equations.
As a consequence of objective 1, we obtain a simple method for com
puting the rank of a linear transformation between finite-dimensional vector
spaces by applying these rank-preserving matrix operations to a matrix that
represents that transformation.
Solving a system of linear equations is probably the most important ap
plication of linear algebra. The familiar method of elimination for solving
systems of linear equations, which was discussed in Section 1.4, involves the
elimination of variables so that a simpler system can be obtained. The tech
nique by which the variables are eliminated utilizes three types of operations:
1. interchanging any two equations in the system;
2. multiplying any equation in the system by a nonzero constant;
3. adding a multiple of one equation to another.
In Section 3.3, we express a system of linear equations as a single matrix
equation. In this representation of the system, the three operations above
are the "elementary row operations" for matrices. These operations provide
a convenient computational method for determining all solutions to a system
of linear equations.
147

148 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
3.1 ELEMENTARY MATRIX OPERATIONS AND ELEMENTARY
MATRICES
In this section, we define the elementary operations that are used throughout
the chapter. In subsequent sections, we use these operations to obtain simple
computational methods for determining the rank of a linear transformation
and the solution of a system of linear equations. There are two types of el
ementary matrix operations—row operations and column operations. As we
will see, the row operations are more useful. They arise from the three opera
tions that can be used to eliminate variables in a system of linear equations.
Definitions. Let A be an m x n matrix. Any one of the following
three operations on the rows [columns] of A is called an elementary row
[column] operation:
(1) interchanging any two rows [columns] of A;
(2) multiplying any row [column] of A by a nonzero scalar;
(3) adding any scalar multiple of a row [column] of A to another row [col
umn].
Any of these three operations is called an elementary operation. Elemen
tary operations are of type 1, type 2, or type 3 depending on whether they
are obtained by (1),12), or (3).
Example 1
Let
A =
Interchanging the second row of A with the first row is an example of an
elementary row operation of type 1. The resulting matrix is
B =
Multiplying the second column of A by 3 is an example of an elementary
column operation of type 2. The resulting matrix is
C =

Sec. 3.1 Elementary Matrix Operations and Elementary Matrices 149
Adding 4 times the third row of A to the first row is an example of an
elementary row operation of type 3. In this case, the resulting matrix is
M =
'17 2 7 12'
2 1-13
4 0 12,
Notice that if a matrix Q can be obtained from a matrix P by means of an
elementary row operation, then P can be obtained from Q by an elementary
row operation of the same type. (See Exercise 8.) So, in Example 1, A can
be obtained from M by adding —4 times the third row of M to the first row
ofM.
Definition. An n x n elementary matrix is a matrix obtained by
performing an elementary operation on In. The elementary matrix is said
to be of type 1, 2, or 3 according to whether the elementary operation
performed on In is a type 1, 2, or 3 operation, respectively.
For example, interchanging the first two rows of I3 produces the elemen
tary matrix
Note that E can also be obtained by interchanging the first two columns of
J3. In fact, any elementary matrix can be obtained in at least two ways—
either by performing an elementary row operation on In or by performing an
elementary column operation on In. (See Exercise 4.) Similarly,
is an elementary matrix since it can be obtained from 1$ by an elementary
column operation of type 3 (adding —2 times the first column of I3 to the
third column) or by an elementary row operation of type 3 (adding —2 times
the third row to the first row).
Our first theorem shows that performing an elementary row operation on
a matrix is equivalent to multiplying the matrix by an elementary matrix.
Theorem 3.1. Let A € MmXn(F), and suppose that B is obtained from
A by performing an elementary row [column] operation. Then there exists an
m x m [n x n] elementary matrix E such that B — EA [B = AE]. fn fact,
E is obtained from fm [fn] by performing the same elementary row [column]
operation as that which was performed on A to obtain B. Conversely, if E is

150 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
an elementary m x m [n x n] matrix, then EA [AE] is the matrix obtained
from A by performing the same elementary row [column] operation as that
which produces E from Im [In].
The proof, which we omit, requires verifying Theorem 3.1 for each type
of elementary row operation. The proof for column operations can then be
obtained by using the matrix transpose to transform a column operation into
a row operation. The details are left as an exercise. (See Exercise 7.)
The next example illustrates the use of the theorem.
Example 2
Consider the matrices A and B in Example 1. In this case, B is obtained from
A by interchanging the first two rows of A. Performing this same operation
on J3, we obtain the elementary matrix
E =
Note that EA = B.
In the second part of Example 1, C is obtained from A by multiplying the
second column of A by 3. Performing this same operation on I4, we obtain
the elementary matrix
E =
(\ 0 0 0
0 3 0 0
0 0 10
\0 0 0 1/
Observe that AE = C.
It is a useful fact that the inverse of an elementary matrix is also an
elementary matrix.
Theorem 3.2. Elementary matrices are invertible, and the inverse of an
elementary matrix is an elementary matrix of the same type.
Proof. Let E be an elementary nxn matrix. Then E can be obtained by
an elementary row operation on fn. By reversing the steps used to transform
Jn into E, we can transform E back into fn. The result is that In can
be obtained from E by an elementary row operation of the same type. By
Theorem 3.1, there is an elementary matrix E such that EE = fn. Therefore,
by Exercise 10 of Section 2.4, E is invertible and E~1 = E. I

Sec. 3.1 Elementary Matrix Operations and Elementary Matrices
EXERCISES
151
1. Label the following statements as true or false.
(a) An elementary matrix is always square.
The only entries of an elementary matrix are zeros and ones.
The n x n identity matrix is an elementary matrix.
The product of two n x n elementary matrices is an elementary
matrix.
The inverse of an elementary matrix is an elementary matrix.
The sum of two nxn elementary matrices is an elementary matrix.
The transpose of an elementary matrix is an elementary matrix.
If B is a matrix that can be obtained by performing an elementary
row operation on a matrix A, then B can also be obtained by
performing an elementary column operation on A.
If B is a matrix that can be obtained by performing an elemen
tary row operation on a matrix A, then A can be obtained by
performing an elementary row operation on B.
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
2. Let
A--=
/l 0 3\ /l 0 31
, B = ( 1 -2 1 , and C ^ 0 -2 -2
VI -3 1/ \l -3 1
Find an elementary operation that transforms A into B and an elemen
tary operation that transforms B into C. By means of several additional
operations, transform C into I3.
3. Use the proof of Theorem 3.2 to obtain the inverse of each of the fol
lowing elementary matrices.
/o 0 1\ /l 0 0\ / 1 0 0
(a) 0 1 0 (b) (0 3 0 (c) ( 0 1 0
\1 0 0/ \0 0 1/ \-2 0 1/
4. Prove the assertion made on page 149: Any elementary nxn matrix can
be obtained in at least two ways -either by performing an elementary
row operation on In or by performing an elementary column operation
on fn.
5. Prove that E is an elementary matrix if and only if Et is.
6. Let A be an m x n matrix. Prove that if B can be obtained from A by
an elementary row [column] operation, then Bl can be obtained from
At by the corresponding elementary column [row] operation.
7. Prove Theorem 3.1.

152 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
8. Prove that if a matrix Q can be obtained from a matrix P by an elemen
tary row operation, then P can be obtained from Q by an elementary
row operation of the same type. Hint: Treat each type of elementary
row operation separately.
9. Prove that any elementary row [column] operation of type 1 can be
obtained by a succession of three elementary row [column] operations
of type 3 followed by one elementary row [column] operation of type 2.
10. Prove that any elementary row [column] operation of type 2 can be
obtained by dividing some row [column] by a nonzero scalar.
11. Prove that any elementary row [column] operation of type 3 can be
obtained by subtracting a multiple of some row [column] from another
row [column].
12. Let A be an m x n matrix. Prove that there exists a sequence of
elementary row operations of types 1 and 3 that transforms A into an
upper triangular matrix.
3.2 THE RANK OF A MATRIX AND MATRIX INVERSES
In this section, we define the rank of a matrix. We then use elementary
operations to compute the rank of a matrix and a linear transformation. The
section concludes with a procedure for computing the inverse of an invertible
matrix.
Definition, ffA G Mmxn(F), we define the rank of A, denoted rank(^4),
to be the rank of the linear transformation LA • Fn —> Fm.
Many results about the rank of a matrix follow immediately from the
corresponding facts about a linear transformation. An important result of
this type, which follows from Fact 3 (p. 100) and Corollary 2 to Theorem 2.18
(p. 102), is that an n x n matrix is invertible if and only if its rank is n.
Every matrix A is the matrix representation of the linear transformation
L,4 with respect to the appropriate standard ordered bases. Thus the rank
of the linear transformation LA is the same as the rank of one of its matrix
representations, namely, A. The next theorem extends this fact to any ma
trix representation of any linear transformation defined on finite-dimensional
vector spaces.
Theorem 3.3. Let T: V —> W be a linear transformation between finite-
dimensional vector spaces, and let 0 and 7 be ordered bases for V and W,
respectively. Then rank(T) = rank([T]^).
Proof. This is a restatement of Exercise 20 of Section 2.4. |

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 153
Now that the problem of finding the rank of a linear transformation has
been reduced to the problem of finding the rank of a matrix, we need a result
that allows us to perform rank-preserving operations on matrices. The next
theorem and its corollary tell us how to do this.
Theorem 3.4. Let A be an m x n matrix, ff P and Q are invertible
m x m and nxn matrices, respectively, then
(a) rank(j4<2) = rank(A),
(b) rank(PA) = rank(A),
and therefore,
(c) rank(PAQ) = rank(A).
Proof. First observe that
R(L.4Q) = R(L^LQ) = L/VLQ(F") = LA(LQ(R)) = L^R) = R(LA)
since LQ is onto. Therefore
rank(AQ) = dim(R(L^c))) = dim(R(Lyl)) = rank(A).
This establishes (a). To establish (b), apply Exercise 17 of Section 2.4 to
T = Lp. We omit the details. Finally, applying (a) and fab), we have
mnk(PAQ) = rank(PA) = rank(A). |
Corollary. Elementary row and column operations on a matrix are rank-
preserving.
Proof. If B is obtained from a matrix A by an elementary row operation,
then there exists an elementary matrix E such that B = EA. By Theorem 3.2
(p. 150), E is invertible, and hence rank(Z?) = rank(/4) by Theorem 3.4. The
proof that elementary column operations are rank-preserving is left as an
exercise. |fi|
Now that we have a class of matrix operations that preserve rank, we
need a way of examining a transformed matrix to ascertain its rank. The
next theorem is the first of several in this direction.
Theorem 3.5. The rank of any matrix equals the maximum number of its
linearly independent columns; that is, the rank of a matrix is the dimension
of the subspace generated by its columns.
Proof. For any AGMmxn(F),
rank(i4) = rank(LA) = dim(R(Lj4)).

154 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Let 0 be the standard ordered basis for Fn. Then 0 spans Fn and hence, by
Theorem 2.2 (p. 68),
K(LA) = span(LA(/?)) = span ({LA(ei), LA(e2),..., U(en)}).
But, for any j, we have seen in Theorem 2.13(b) (p. 90) that LA(CJ) = Aej =
a,, where aj is the jth column of A. Hence
Thus
R(LA) =span({ai,o2, ...,an}).
rank(A) = dim(R(L/i)) = dim(span({ai,a2,... ,a„})).
Example 1
Let
Observe that the first and second columns of A are linearly independent and
that the third column is a linear combination of the first two. Thus
rank(.A) = cjim I span = 2.
To compute the rank of a matrix A, it is frequently useful to postpone the
use of Theorem 3.5 until A has been suitably modified by means of appro
priate elementary row and column operations so that the number of linearly
independent columns is obvious. The corollary to Theorem 3.4 guarantees
that the rank of the modified matrix is the same as the rank of A. One
such modification of A can be obtained by using elementary row and col
umn operations to introduce zero entries. The next example illustrates this
procedure.
Example 2
Let
A =
If we subtract the first row of A from rows 2 and 3 (type 3 elementary row
operations), the result is

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 155
If we now subtract twice the first column from the second and subtract the
first column from the third (type 3 elementary column operations), we obtain
It is now obvious that the maximum number of linearly independent columns
of this matrix is 2. Hence the rank of A is 2. •
The next theorem uses this process to transform a matrix into a particu
larly simple form. The power of this theorem can be seen in its corollaries.
Theorem 3.6. Let A be an m x n matrix of rank r. Then r < m, r < n,
and, by means of a finite number of elementary row and column operations,
A can be transformed into the matrix
D =
Ir OX
Q2 Oa
where 0\, 02, and O3 are zero matrices. Thus Du = 1 for i < r and Dij = 0
otherwise.
Theorem 3.6 and its corollaries are quite important. Its proof, though
easy to understand, is tedious to read. As an aid in following the proof, we
first consider an example.
Example 3
Consider the matrix
A =
/0 2 4 2 2
4 4 4 8 0
8 2 0 10 2
\6 3 2 9 1/
By means of a succession of elementary row and column operations, we can
transform A into a matrix D as in Theorem 3.6. We list many of the inter
mediate matrices, but on several occasions a matrix is transformed from the
preceding one by means of several elementary operations. The number above
each arrow indicates how many elementary operations are involved. Try to
identify the nature of each elementary operation (row or column and type)
in the following matrix transformations.
A) 2 4 2 2
4 4 4 8 0
8 2 0 10 2
\6 3 2 9 1/
(4 4 4 8 0
0 2 4 2 2
8 2 0 10 2
\6 3 2 9 1/
/111 2 0
0 2 4 2 2
8 2 0 10 2
\6 3 2 9 1/

156 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
(1
0
0 -
[p -
(
0
0
^°
0
1
-6
-3
0
1
0
0
(I
0
0
Vo
0
2
-8
-4
0
0
1
2
1 1 2
»>
2 4 2 2
-6 -8 -6 2
-3 -4 -3 1/
0 0\ A
1 1
-6 2
2
^
0
0
-3 1/ \0
0 0\ /l c
0 0
0 2
0 4y
l 0 1
0 0
v»
(
3
0
1
0
0
0
0
1
0
/l 0 0 0
0 2 4 2
0 -6 -8 -6
\0 -3 -4 -3
0 0 0\ /l
2 1 1
4 0 8
3
*
0
0
2 0 4/ \0
0 0\ /l 0
0 0
0 2
(
°>
I 0 1
0 0
V»
0
°
2
2
i/
0 0
1 0
0 4
0 2
0 0
0 0
1 0
0 0
1
0 0
0 0
0 8
0 4/
<A
0
0
oy
By the corollary to Theorem 3.4, rank (./I) = rank(D).
rank(D) = 3; so rank(^l) = 3. •
Clearly, however,
Note that the first two elementary operations in Example 3 result in a
1 in the 1,1 position, and the next several operations (type 3) result in 0's
everywhere in the first row and first column except for the 1,1 position. Sub
sequent elementary operations do not change the first row and first column.
With this example in mind, we proceed with the proof of Theorem 3.6.
Proof of Theorem 3.6. If A is the zero matrix, r = 0 by Exercise 3. In
this case, the conclusion follows with D = A.
Now suppose that A ^ O and r = rank (A); then r > 0. The proof is by
mathematical induction on m, the number of rows of A.
Suppose that m = 1. By means of at most one type 1 column operation
and at most one type 2 column operation, A can be transformed into a matrix
with a 1 in the 1,1 position. By means of at most n — 1 type 3 column
operations, this matrix can in turn be transformed into the matrix
(1 0 0).
Note that there is one linearly independent column in D. So rank(D) =
rank(A) = 1 by the corollary to Theorem 3.4 and by Theorem 3.5. Thus the
theorem is established for m = 1.
Next assume that the theorem holds for any matrix with at most m — 1
rows (for some m > 1). We must prove that the theorem holds for any matrix
with m rows.
Suppose that A is any m x n matrix. If n — 1, Theorem 3.6 can be
established in a manner analogous to that for m = 1 (see Exercise 10).
We now suppose that n > 1. Since A ^ O, Aij ^ 0 for some i, j. By
means of at most one elementary row and at most one elementary column

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 157
operation (each of type 1), we can move the nonzero entry to the 1,1 position
(just as was done in Example 3). By means of at most one additional type 2
operation, we can assure a 1 in the 1,1 position. (Look at the second operation
in Example 3.) By means of at most m — 1 type 3 row operations and at most
n — 1 type 3 column operations, we can eliminate all nonzero entries in the
first row and the first column with the exception of the 1 in the 1,1 position.
(In Example 3, we used two row and three column operations to do this.)
Thus, with a finite number of elementary operations, A can be transformed
into a matrix
B =
where B' is an (m — 1) x (n — 1) matrix. In Example 3, for instance,
/ 2 4 2 2'
B' =1-6 -8 -6 2
1-3 -4 -3 1
/1
0
Vo
0 ••• 0
B'
)
By Exercise 11, B' has rank one less than B. Since rank(.A) = rank(jB) =
r, rank(-B') — r — 1. Therefore r — 1 < m — 1 and r — \ < n — 1 by the
induction hypothesis. Hence r < m and r < n.
Also by the induction hypothesis, B' can be transformed by a finite num
ber of elementary row and column operations into the (m — 1) x (n— 1) matrix
D' such that
D' =
/r-]
On
04
Oe
where O4, O5, and OQ are zero matrices. That is, D' consists of all zeros
except for its first r — 1 diagonal entries, which are ones. Let
D =
We see that the theorem now follows once we show that D can be obtained
from B by means of a finite number of elementary row and column operations.
However this follows by repeated applications of Exercise 12.
Thus, since A can be transformed into B and B can be transformed into
D, each by a finite number of elementary operations, A can be transformed
into D by a finite number of elementary operations.
/1
0
\o
0 •••
D'
0
/

158 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Finally, since D' contains ones as its first r— 1 diagonal entries, D contains
ones as its first r diagonal entries and zeros elsewhere. This establishes the
theorem. I
Corollary 1. Let A be an m x n matrix of rank r. Then there exist
invertible matrices B and C of sizes mxm and nxn, respectively, such that
D = BAC, where
D =
fr
o2
Ol
o.
is the mxn matrix in which Oi, 02, and O3 are zero matrices.
Proof. By Theorem 3.6, A can be transformed by means of a finite number
of elementary row and column operations into the matrix D. We can appeal
to Theorem 3.1 (p. 149) each time we perform an elementary operation. Thus
there exist elementary mxm matrices E\,E2,... ,EP and elementary nxn
matrices Gi,G2,...,Gq such that
D — EpEp-i • • • E2E\AG\G2 • • • Gq.
By Theorem 3.2 (p. 150), each Ej and Gj is invertible. Let B — EpEp-\ • • • E
and C = G\G2- -Gq. Then B and C are invertible by Exercise 4 of Sec
tion 2.4, and D = BAC. I
Corollary 2. Let A be an m x n matrix. Then
(a) rank(4*) = rank(^l).
(b) The rank of any matrix equals the maximum number of its linearly
independent rows; that is, the rank of a matrix is the dimension of the
subspace generated by its rows.
(c) The rows and columns of any matrix generate subspaces of the same
dimension, numerically equal to the rank of the matrix.
Proof, (a) By Corollary 1, there exist invertible matrices B and C such
that D = BAC, where D satisfies the stated conditions of the corollary.
Taking transposes, we have
Dl = (BAC)1 = CtAtBt.
Since B and C are invertible, so are Bl and C* by Exercise 5 of Section 2.4.
Hence by Theorem 3.4,
rank(A') = rank(CtA*5*) = rank(Df).
Suppose that r = rank(A). Then D* is an n x m matrix with the form of the
matrix D in Corollary 1, and hence rank(Z)t) = r by Theorem 3.5. Thus
rank(^4) = rank(Dt) =r = rank(^l).
This establishes (a).
The proofs of (b) and (c) are left as exercises. (See Exercise 13.) ^J

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 159
Corollary 3. Every invertible matrix is a product of elementary matrices.
Proof. If A is an invertible nxn matrix, then rank(^4) = n. Hence the
matrix D in Corollary 1 equals In, and there exist invertible matrices B and
C such that In - BAC.
As in the proof of Corollary 1, note that B = EpEp-\ • • • E\ and C =
GiG2-Gq, where the E^s and GVs are elementary matrices. Thus A =
B-lInC-1 = B-lC~l, so that
A = E^E^..-E-lG-1G-l1---G-[1.
The inverses of elementary matrices are elementary matrices, however, and
hence A is the product of elementary matrices. 1
We now use Corollary 2 to relate the rank of a matrix product to the rank
of each factor. Notice how the proof exploits the relationship between the
rank of a matrix and the rank of a linear transformation.
Theorem 3.7. Let T: V —> W and U: W —• Z be linear transformations
on finite-dimensional vector spaces V, W, and Z, and let A and B be matrices
such that the product AB is defined. Then
(a) rank(UT) < rank(U).
(b) rank(UT) < rank(T). /
(c) rank(,4£) < rank(A).
(d) rank(AB) < rank(B).
Proof. We prove these items in the order: (a), (c), (d), and (b).
(a) Clearly, R(T) C W. Hence
R(UT) = UT(V) = U(T(V)) = U(R(T)) C U(W) = R(U).
Thus
rank(UT) = dim(R(UT)) < dim(R(U)) = rank(U).
(c) By (a),
rank(AB) — rank(LAs) = rank(L^LB) < rank(L^) = rank(^l).
(d) By (c) and Corollary 2 to Theorem 3.6,
mnk(AB) = rank((4£)*) = rank(BtAt) < rank(Bf) = rank(B).
(b) Let ct,0, and 7 be ordered bases for V, W, and Z, respectively, and
let A' = [U]J and B' = [T]g. Then A'B' = [UT]2 by Theorem 2.11 (p. 88).
Hence, by Theorem 3.3 and (d),
rank(UT) = rank^'S') < rank(5') = rank(T). I
J

160 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
It is important to be able to compute the rank of any matrix. We can
use the corollary to Theorem 3.4, Theorems 3.5 and 3.6, and Corollary 2 to
Theorem 3.6 to accomplish this goal.
The object is to perform elementary row and column operations on a
matrix to "simplify" it (so that the transformed matrix has many zero entries)
to the point where a simple observation enables us to determine how many
linearly independent rows or columns the matrix has, and thus to determine
its rank.
Example 4
(a) Let
A =
1 2 1 1
11-11
Note that the first and second rows of A are linearly independent since one
is not a multiple of the other. Thus raxfk(A) — 2.
(b) Let
/l 3 1 1
A- 1 0 1 1
\0 3 0 0,
In this case, there are several ways to proceed. Suppose that we begin with
an elementary row operation to obtain a zero in the 2,1 position. Subtracting
the first row from the second row, we obtain
0
0
3
-3
3
1 1
0 0
0 0
Now note that the third row is a multiple of the second row, and the first and
second rows are linearly independent. Thus rank(vl) = 2.
As an alternative method, note that the first, third, and fourth columns
of A are identical and that the first and second columns of A are linearly
independent. Hence rank(v4) = 2.
(c) Let
A =
Using elementary row operations, we can transform A as follows:
1 2 3 r
0 -3 -5 -1
0 0 3 0,

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 161
It is clear that the last matrix has three linearly independent rows and hence
has rank 3. •
In summary, perform row and column operations until the matrix is sim
plified enough so that the maximum number of linearly independent rows or
columns is obvious.
The Inverse of a Matrix
We have remarked that an nxn matrix is invertible if and only if its rank
is n. Since we know how to compute the rank of any matrix, we can always
test a matrix to determine whether it is invertible. We now provide a simple
technique for computing the inverse of a matrix that utilizes elementary row
operations.
Definition. Let A and B be m x n and m x p matrices, respectively.
By the augmented matrix (A\B), we mean the m x (n + p) matrix (A B),
that is, the matrix whose first n columns arc the columns of A, and whose
last p columns are the columns of B.
Let A be an invertible nxn matrix, and consider the n x 2n augmented
matrix C — (A\In). By Exercise 15, we have
A~lC = (A~lA\A'lfn) = (In\A~l). (1)
By Corollary 3 to Theorem 3.6, A-1 is the product of elementary matrices,
say A"1 = EpEp-i • • • E\. Thus (1) becomes
EpEp_x • • • Ei(A\In) = A~lC = (In\A -i
Because multiplying a matrix on the left by an elementary matrix transforms
the matrix by an elementary row operation (Theorem 3.1 p. 149), we have
the following result: If A is an invertible n x n matrix, then it is possible to
transform the matrix (A\In) into the matrix (In\A~x) by means of a finite
number of elementary row operations.
Conversely, suppose that A is invertible and that, for some nxn matrix
B, the matrix (A\In) can be transformed into the matrix (In\B) by a finite
number of elementary row operations. Let E\, E2,..., Ep be the elementary
matrices associated with these elementary row operations as in Theorem 3.1;
then
EpEp-1---Ei(A\In) = {In\B).
Letting M = EpEp-\ • • • E\, we have from (2) that
(MA\M) = M(A\In) = (In\B).
(2)

162 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Hence MA = In and M = B. It follows that M = A'1. So B = A~\ Thus
we have the following result: If A is an invertible nxn matrix, and the matrix
(A\In) is transformed into a matrix of the form (In\B) by means of a finite
number of elementary row operations, then B — A~x.
If, on the other hand, A is an n x n matrix that is not invertible, then
rank(^4) < n. Hence any attempt to transform (^4|/n) into a matrix of the
form (In\B) by means of elementary row operations must fail because oth
erwise A can be transformed into In using the same row operations. This
is impossible, however, because elementary row operations preserve rank. In
fact, A can be transformed into a matrix with a row containing only zero
entries, yielding the following result: If A is an n x n matrix that is not
invertible, then any attempt to transform (A\In) into a matrix of the form
(In\B) produces a row whose first n entries are zeros.
The next two examples demonstrate these comments.
Example 5
We determine whether the matrix
A =
is invertible, and if it is, we compute its inverse.
We attempt to use elementary row operations to transform
'0 2 4
(A\I) = ( 2 4 2
3 3 1
into a matrix of the form (I\B). One method for accomplishing this transfor
mation is to change each column of A successively, beginning with the first
column, into the corresponding column of I. Since we need a nonzero entry
in the 1,1 position, we begin by interchanging rows 1 and 2. The result is
2
0
3
4
2
3
2
4
1
0
1
0
1 0
0 0
0 1
In order to place a 1 in the 1,1 position, we must multiply the first row by \;
this operation yields
0
3
2
2
3
1
4
1
0
1
0
1 0
0 0
0 1

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 163
We now complete work in the first column by adding —3 times row 1 to row
3 to obtain
(
0
2 1
2 4
-3 -2
0 0
-* 1
In order to change the second column of the preceding matrix into the
second column of /, we multiply row 2 by | to obtain a 1 in the 2,2 position.
This operation produces
fl
0
Vo
2
1
-3
1
2
-2
o o
o -i l
We now complete our work on the second column by adding —2 times row 2
to row 1 and 3 times row 2 to row 3. The result is
/l 0
0 1
0 0
-1 \ 0
0 0
6 _3 -i
2 2 '
7
/
Only the third column remains to be changed. In order to place a 1 in the
3,3 position, we multiply row 3 by \ this operation yields
/l 0
0 1
0 0
-3
2
1
5 ON
0 0
1
Adding appropriate multiples of row 3 to rows 1 and 2 completes the process
and gives
Thus A is invertible, and
0 0
IN
i
2
1
V
A'3 =
( *
_ i
4
3
^ 8
2
x
4 i

164 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Example 6
We determine whether the matrix
is invertible, and if it is, we compute its inverse. Using a strategy similar to
the one used in Example 5, we attempt to use elementary row operations to
transform
(A\I) =
/I 2
2 1
\1 5
1
-1
4
10 0
0 1 0
0 0 1
into a matrix of the form (I\B). We first add —2 times row 1 to row 2 and
—1 times row 1 to row 3. We then add row 2 to row 3. The result,
1 2
2 1
1 5
1
-1
4
1 0
0 1
0 0
0
0
1
1
0
0
2
-3
3
1
-3
3
/
is a matrix with a row whose first 3 entries are zeros. Therefore A is not
invertible. •
1
0
0
2
-3
0
1
-3
0
1
-2
-1
1
-2
-3
0
1
0
0
1
1
0
0
1
0
0
1
Being able to test for invertibility and compute the inverse of a matrix
allows us, with the help of Theorem 2.18 (p. 101) and its corollaries, to test
for invertibility and compute the inverse of a linear transformation. The next
example demonstrates this technique.
Example 7
Let T: P2(R) -> P2(R) be defined by T(f(x)) = f(x) + f'(x) + f"(x), where
f'(x) and f"(x) denote the first and second derivatives of f(x). We use
Corollary 1 of Theorem 2.18 (p. 102) to test T for invertibility and compute
the inverse if T is invertible. Taking 0 to be the standard ordered basis of
P2(R), we have
/l 1 2N
[T]/3 =012
\0 0 1

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 165
Using the method of Examples 5 and 6, we can show that \T]p is invertible
with inverse
T -1 0'
([T^r = 0 1 -2 .
,0 0 1
Thus T is invertible, and ([T]^)"1 = [T-1]^. Hence by Theorem 2.14 (p. 91),
we have
[T l (a0 + aix + a2x2)]p = 0
Therefore
T (ao + aix -I- a2x ) = (a0 — ai) + («i - 2a2)x + a2x'
/
EXERCISES
1. Label the following statements as true or false.
(a) The rank of a matrix is equal to the number of its nonzero columns.
(b) The product of two matrices always has rank equal to the lesser of
the ranks of the two matrices.
(c) The mxn zero matrix is the only mxn matrix having rank 0.
(d) Elementary row operations preserve rank.
(e) Elementary column operations do not necessarily preserve rank.
(f) The rank of a matrix is equal to the maximum number of linearly
independent rows in the matrix.
(g) The inverse of a matrix can be computed exclusively by means of
elementary row operations.
(h) The rank of an n x n matrix is at most n.
(i) An n x n matrix having rank n is invertible.
2. Find the rank of the following matrices.
(a) (c)
1 0 2
1 1 4

166 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
(d)
(f)
1 2 1
2 4 2
(e)
/l 2 3 1 1
14 0 12
0 2-301
\1 0 0 0 0/
/ 1
2
3
1-4
2 0
4 1
6 2
-8 1
1 l
3 0
5 1
-3 l)
(g)
(I 1 0 1
2 2 0 2
110 1
V i o V
3. Prove that for any mxn matrix A, rank(A) — 0 if and only if A is the
zero matrix.
4. Use elementary row and column operations to transform each of the
following matrices into a matrix D satisfying the conditions of Theo
rem 3.6, and then determine the rank of each matrix.
(a) (b)
5. For each of the following matrices, compute the rank and the inverse if
it exists.
/
(a)
1 2
1 1
(g)
/ 1
2
-2
V 3
6. For each of the following linear transformations T, determine whether
T is invertible, and compute T-1 if it exists.
(a) T: P2(R) -* P2(R) defined by T(/(x)) = f"(x) + 2f'(x) - f(x).
(b) T: P2(R) -+ P2(R) defined by T(/(*)) = (x + l)f'(x).
(c) T: R3 -> R3 defined by
T(a\,a2,a2,) = (ai + 2a2 + as, -a\ + a2 + 2a-s,ai + a3).

Sec. 3.2 The Rank of a Matrix and Matrix Inverses
(d) T: R3 -» P2(R) defined by
T(oi,02,03) = (a] + o2 + o3) + (ai - a2 + 03)3; + oix2.
(e) T: Pa(H) - R3 defined by T(/(x)) = (/(-l),/(0),/(l)).
(f) T: M2X2(-R) -> R4 defined by
167
T(A) = (tY(A),tr(At),tr(EA),tr(AE)),
where
E =
0 1
1 0
7. Express the invertible matrix
(!»•)
as a product of elementary matrices.
8. Let A be an m x n matrix. Prove that if c is any nonzero scalar, then
rank(c,4) = rank(^4).
9. Complete the proof of the corollary to Theorem 3.4 by showing that
elementary column operations preserve rank.
10. Prove Theorem 3.6 for the case that A is an m x 1 matrix.
11. Let
B =
where B' is an m x n submatrix of B. Prove that if rank(B) = r, then
rank(£') = r - 1.
12. Let B' and D'bemxn matrices, and let B and D be (ra-f- 1) x (n + 1)
matrices respectively defined by
/'
0
V 0
0 ••• 0
B'
7
B =
(1
0
U
0 ••• 0
B'
7
and D —
/ 1
0
U
0 ••• 0
D'
7
Prove that if B' can be transformed into D' by an elementary row
[column] operation, then B can be transformed into D by an elementary
row [column] operation.

168 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
13. Prove (b) and (c) of Corollary 2 to Theorem 3.6.
14. Let T, U: V —> W be linear transformations.
(a) Prove that R(T + U) C R(T) + R(U). (See the definition of the sum
of subsets of a vector space on page 22.)
(b) Prove that if W is finite-dimensional, then rank(T+U) < rank(T) +
rank(U).
(c) Deduce from (b) that rank(.A + B) < rank(A) + rank(5) for any
mxn matrices A and B.
15. Suppose that A and B are matrices having n rows.
M(A\B) = (MA\MB) for any mxn matrix M.
Prove that
16. Supply the details to the proof of (b) of Theorem 3.4.
17. Prove that if B is a 3 x 1 matrix and C is a 1 x 3 matrix, then the 3x3
matrix BC has rank at most 1. Conversely, show that if A is any 3x3
matrix having rank 1, then there exist a 3 x 1 matrix B and a 1 x 3
matrix C such that A = BC.
18. Let A be an m x n matrix and B be an n x p matrix. Prove that AB
can be written as a sum of n matrices of rank at most one.
19. Let A be an m x n matrix with rank m and B be an n x p matrix with
rank n. Determine the rank of AB. Justify your answer.
20. Let /
A =
(a) Find a 5 x 5 matrix M with rank 2 such that AM = O, where O
is the 4x5 zero matrix.
(b) Suppose that B is a 5 x 5 matrix such that AB = O. Prove that
rank(B) < 2.
21. Let A be an m x n matrix with rank m. Prove that there exists an
nxm matrix B such that AB = Im.
22. Let B be an n x m matrix with rank m. Prove that there exists an
mxn matrix A such that AB = Im.
(1
-1
-2
\ 3
0
1
1
-1
-1
3
4
-5
2
-1
-1
1
l
0
3
-e)
3.3 SYSTEMS OF LINEAR EQUATIONS—THEORETICAL ASPECTS
This section and the next are devoted to the study of systems of linear equa
tions, which arise naturally in both the physical and social sciences. In this
section, we apply results from Chapter 2 to describe the solution sets of

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 169
systems of linear equations as subsets of a vector space. In Section 3.4, el
ementary row operations are used to provide a computational method for
finding all solutions to such systems.
The system of equations
(S)
On2.-i + a\2x2
021^1 + 022-X2
omiXi + arn2x2
• • + ainXn = b
• • + a2nxn = 62
1 U-itviyi'rt — "tn
where aij and h (1 < i < rn and 1 < j < n) are scalars in a field F and
Xi,x2,... ,xn are n variables taking values in F, is called a system of m
linear equations in n unknowns over the field F.
The mxn matrix
A =
021
«i2
«22
ai„
a2l
lml O-m'2 J
is called the coefficient matrix of the system (S).
If we let
x —
x2
\xn)
and b =
b2
then the system (S) may be rewritten as a single matrix equation
Ax = b.
To exploit the results that we have developed, we often consider a system of
linear equations as a single matrix equation.
A solution to the system (S) is an n-tuple
s2
\SnJ
£ F"
such that As = b. The set of all solutions to the system (S) is called the
solution set of the system. System (S) is called consistent if its solution
set is nonempty; otherwise it is called inconsistent.

170 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Example 1
(a) Consider the system
xi + x2 = 3
X\ — x2 = 1.
By use of familiar techniques, we can solve the preceding system and conclude
that there is only one solution: xi — 2, x2 = 1; that is,
1/ '
In matrix form, the system can be written
1 1\ (xi
1 -1/ \X2 1)
so
A = l\ _J) and b=(3..
(b) Consider
/
2rci + 3x2 + x3 = 1
x\— x2 + 2z3 = 6;
that is,
2 3 1
1 -1 2
'xi
X2
,X3
This system has many solutions, such as
'-6N
s — \ 2 | and s
6/
-4 |
(c) Consider
that is,
xi + x2 = 0
xi +x2 = 1;
1 1\ /a*
1 1/ U2
0^
It is evident that this system has no solutions. Thus we see that a system of
linear equations can have one, many, or no solutions. •

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 171
We must be able to recognize when a system has a solution and then be
able to describe all its solutions. This section and the next are devoted to
this end.
We begin our study of systems of linear equations by examining the class
of homogeneous systems of linear equations. Our first result (Theorem 3.8)
shows that the set of solutions to a homogeneous system of m linear equations
in n unknowns forms a subspace of Fn. We can then apply the theory of vector
spaces to this set of solutions. For example, a basis for the solution space can
be found, and any solution can be expressed as a linear combination of the
vectors in the basis.
Definitions. A system Ax = b of m linear equations in n unknowns
is said to be homogeneous if b = 0. Otherwise the system is said to be
nonhomogeneous.
Any homogeneous system has at least one solution, namely, the zero vec
tor. The next result gives further information about the set of solutions to a
homogeneous system.
Theorem 3.8. Let Ax = 0 be a homogeneous system of m linear equa
tions in n unknowns over a field F. Let K denote the set of all solutions
to Ax = 0. Then K = N(L^); hence K is a subspace of Fn of dimension
n — rank(L/i) = n — rank(A).
Proof. Clearly, K = {s G Fn: As = 0} = N(LA). The second part now
follows from the dimension theorem (p. 70). 1
Corollary. Ifrn < n, the system Ax = 0 has a nonzero solution.
Proof. Suppose that m < n. Then rank(A) = rank(L/i) < m. Hence
dim(K) = n — rank(Lyi) > n — m > 0,
where K = N(L/). Since dim(K) > 0, K ^ {()}. Thus there exists a nonzero
vector s € K; so s is a nonzero solution to Ax — 0. 1
Example 2
(a) Consider the system
Xi + 2.T2 + x3 = 0
x,\ - x2 - x:i = 0.
Let
A =
1 2 1
1 -1 -1

172 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
be the coefficient matrix of this system. It is clear that rank(^l) = 2. If K is
the solution set of this system, then dim(K) = 3 — 2 — 1. Thus any nonzero
solution constitutes a basis for K. For example, since
is a solution to the given system,
is a basis for K. Thus any vector in K is of the form
where t G R.
(b) Consider the system X\ — 2x2 + x$ — 0 of one equation in three
unknowns. If A = (l —2 l) is the coefficient matrix, then rank (A) = 1.
Hence if K is the solution set, then dim(K) =3—1 = 2. Note that
/2\ /-1N
l j ana ( 0
are linearly independent vectors in K. Thus they constitute a basis for K, so
that
K=<£i
In Section 3.4, explicit computational methods for finding a basis for the
solution set of a homogeneous system are discussed.
We now turn to the study of nonhomogeneous systems. Our next result
shows that the solution set of a nonhomogeneous system Ax = b can be
described in terms of the solution set of the homogeneous system Ax = 0. We
refer to the equation Ax = 0 as the homogeneous system corresponding
to Ax = b.
Theorem 3.9. Let K be the solution set of a system of linear equations
Ax = b, and let KH be the solution set of the corresponding homogeneous
system Ax — 0. Then for any solution s to Ax — b
K = {,s} + KH = {,s + fc:fcGKH}.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 173
Proof. Let s be any solution to Ax = b. We must show that K = {."?} + KH.
If w G K, then Aw — b. Hence
A(w - s) = Aw - As = b- b= 0.
So w — s G KH. Thus there exists k G KH such that w — s — k. It follows that
w = s + fc G {S} + KH, and therefore
KC{S} + KH-
Conversely, suppose that w G {s} + KH; then w = s + k for some fce KH.
But then Aw = A(s + k) = As + Ak = b + 0 = b; so w G K. Therefore
{«} + KH QK, and thus K-{s} + K». I
Example 3
(a) Consider the system
Xi + 2x2 + £3 = 7
#1 — #2 — X3 = —4.
The corresponding homogeneous system is the system in Example 2(a). It is
easily verified that
"(I) '
is a solution to the preceding nonhomogeneous system. So the solution set of
the system is
K={(i)+t(tl:te*
by Theorem 3.9.
(b) Consider the system x\ — 2x2 + x% — 4. The corresponding homoge
neous system is the system in Example 2(b). Since
is a solution to the given system, the solution set K can be written as
K; , | 1 j +*2 ( o) /,./,,• R\. 0

174 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
The following theorem provides us with a means of computing solutions
to certain systems of linear equations.
Theorem 3.10. Let Ax = b be a system of n linear equations in n
unknowns. If A is invertible, then the system has exactly one solution, namely,
A~lb. Conversely, if the system has exactly one solution, then A is invertible.
Proof. Suppose that A is invertible. Substituting A~lb into the system, we
have A(A~lb) = (AA~x)b = b. Thus A~xb is a solution. If s is an arbitrary
solution, then As = b. Multiplying both sides by A~l gives s — A~1b. Thus
the system has one and only one solution, namely, A~lb.
Conversely, suppose that the system has exactly one solution s. Let KH
denote the solution set for the corresponding homogeneous system Ax = 0.
By Theorem 3.9, {s} = {s} + KH. But this is so only if KH = {0}. Thus
N(L^) = {()}, and hence A is invertible. jif
Example 4
Consider the following system of three linear equations in three unknowns:
2x2 + 4x3 = 2
2xi + 4.T2 + 2x3 = 3
3xi + 3x2 + X3 = 1.
In Example 5 of Section 3.2, we computed the inverse of the coefficient matrix
A of this system. Thus the system has exactly one solution, namely,
*A
1
X2
w
,
= A lb =
( *
4
3
V 8
5
8
4
3
8
4 '
2
1
f2
I
3 U
w
(-l
4
l-M
We use this technique for solving systems of linear equations having in
vertible coefficient matrices in the application that concludes this section.
In Example 1(c), we saw a system of linear equations that has no solutions.
We now establish a criterion for determining when a system has solutions.
This criterion involves the rank of the coefficient matrix of the system Ax — b
and the rank of the matrix (>1|&). The matrix (^4|6) is called the augmented
matrix of the system Ax = b.
Theorem 3.11. Let Ax = b be a system of linear equations. Then the
system is consistent if and only ifrank(A) = rank(-A|fr).
Proof. To say that Ax = b has a solution is equivalent to saying that
b G R(L/i). (See Exercise 9.) In the proof of Theorem 3.5 (p. 153), we saw
that
R(Lyi) =span({ai,o2,...,o„}),

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 175
the span of the columns of A. Thus Ax = b has a solution if and only
if b G span({ai,02,... ,an}). But b G span({oi,02,... ,on}) if and only
if span({ai,02,... ,on}) = span({ai,a2,... ,an,b}). This last statement is
equivalent to
dim(span({oi, o2,..., on})) = dim(span({oi, a2,..., an, b})).
So by Theorem 3.5, the preceding equation reduces to
rank(v4) = rank(A|6). I
Example 5
Recall the system of equations
xi + x2 — 0
xi + x2 = 1
in Example 1(c).
Since
A =
1 1
1 1
and (A) =
1 1 0
1 1 1
rank(A) = 1 and rank(A|o) = 2. Because the two ranks are unequal, the
system has no solutions. •
Example 6
We can use Theorem 3.11 to determine whether (3,3,2) is in the range of the
linear transformation T: R3 —* R3 defined by
T(oi,a2,o3) = (ai +0.2 +03,01 - o2 + a3,ai +o3).
Now (3,3,2) G R(T) if and only if there exists a vector s = (xi,X2,xa)
in R3 such that T($) — (3,3,2). Such a vector s must be a solution to the
system
x\ + x2 + x3 = 3
X] — X2 + X3 = 3
X\ + X3 = 2.
Since the ranks of the coefficient matrix and the augmented matrix of this
system are 2 and 3, respectively, it follows that this system has no solutions.
Hence (3,3,2) £ R(T). •

176 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
An Application
In 1973, Wassily Leontief won the Nobel prize in economics for his work
in developing a mathematical model that can be used to describe various
economic phenomena. We close this section by applying some of the ideas we
have studied to illustrate two special cases of his work.
We begin by considering a simple society composed of three people
(industries)-—a farmer who grows all the food, a tailor who makes all the
clothing, and a carpenter who builds all the housing. We assume that each
person sells to and buys from a central pool and that everything produced is
consumed. Since no commodities either enter or leave the system, this case
is referred to as the closed model.
Each of these three individuals consumes all three of the commodities pro
duced in the society. Suppose that the proportion of each of the commodities
consumed by each person is given in the following table. Notice that each of
the columns of the table must sum to 1.
Food Clothing Housing
Farmer
Tailor
Carpenter
0.40
0.10
0.50
0.20
0.70
0.10
0.20
0.20
0.60
Let Pi,p2, and p% denote the incomes of the farmer, tailor, and carpenter,
respectively. To ensure that this society survives, we require that the con
sumption of each individual equals his or her income. Note that the farmer
consumes 20% of the clothing. Because the total cost of all clothing is p2,
the tailor's income, the amount spent by the farmer on clothing is 0.20p2-
Moreover, the amount spent by the farmer on food, clothing, and housing
must equal the farmer's income, and so we obtain the equation
0.40pi + 0.20p2 + 0.20p3 = Pi-
Similar equations describing the expenditures of the tailor and carpenter pro
duce the following system of linear equations:
0.40pi + 0.20p2 + 0.20p3 =7 Pi
O.lOpi + 0.70p2 + 0.20p3 = Pi
0.50pi + 0.10p2 + 0.60p3 = Pa-
This system can be written as Ap = p, where
P =

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 177
and A is the coefficient matrix of the system. In this context, A is called
the input-output (or consumption) matrix, and Ap = p is called the
equilibrium condition.
For vectors 6 = (b\,b2,... , bn) and c — (c\,c2,... , cn) in Rn, we use the
notation b > c [b > c] to mean bi > c-i [bi > Ci] for all i. The vector b is called
nonnegative [positive] if b > 0 [b > 0].
At first, it may seem reasonable to replace the equilibrium condition by
the inequality Ap < p, that is, the requirement that consumption not exceed
production. But, in fact, Ap ^AkjPj-
j
Hence, since the columns of A sum to 1,
53» > 5Z £ ^PS = £ (J2 AiA Pi = £**'
* 3
which is a contradiction.
One solution to the homogeneous system (I — A)x = 0, which is equivalent
to the equilibrium condition, is /
We may interpret this to mean that the society survives if the farmer, tailor,
and carpenter have incomes in the proportions 25 : 35 : 40 (or 5 : 7 : 8).
Notice that we are not simply interested in any nonzero solution to the
system, but in one that is nonnegative. Thus we must consider the question
of whether the system (/ — A)x = 0 has a nonnegative solution, where A is a
matrix with nonnegative entries whose columns sum to 1. A useful theorem
in this direction (whose proof may be found in "Applications of Matrices to
Economic Models and Social Science Relationships," by Ben Noble, Proceed
ings of the Summer Conference for College Teachers on Applied Mathematics,
1971, CUPM, Berkeley, California) is stated below.
Theorem 3.12. Let A be an n x n input output matrix having the form
A =
B C
D E
where D is alx(n-l) positive vector and C is an (n-l)xl positive vector.
Then (I — A)x = 0 has a one-dimensional solution set that is generated by a
nonnegative vector.

178 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Observe that any input-output matrix with all positive entries satisfies
the hypothesis of this theorem. The following matrix does also:
'0.75 0.50 0.65'
0 0.25 0.35
v0.25 0.25 0
In the open model, we assume that there is an outside demand for each
of the commodities produced. Returning to our simple society, let Xi,X2,
and X3 be the monetary values of food, clothing, and housing produced with
respective outside demands d\,d2, and 0*3. Let A be the 3x3 matrix such
that Aij represents the amount (in a fixed monetary unit such as the dollar)
of commodity i required to produce one monetary unit of commodity j. Then
the value of the surplus of food in the society is
xi - (Auxx + A12X2 + ^13^3),
that is, the value of food produced minus the value of food consumed while
producing the three commodities. The assumption that everything produced
is consumed gives us a similar equilibrium condition for the open model,
namely, that the surplus of each of the three commodities must equal the
corresponding outside demands. Hence
.{' 1 / , AijXj
3=1
= di fori = 1,2,3.
In general, we must find a nonnegative solution to (/ — A)x = d, where
A is a matrix with nonnegative entries such that the sum of the entries of
each column of A does not exceed one, and d > 0. It is easy to see that if
(/ — A)-1 exists and is nonnegative, then the desired solution is (/ — A)~ld.
Recall that for a real number a, the series 1 + a + a2 + • • • converges to
(1 — o)_1 if \a\ < 1. Similarly, it can be shown (using the concept of conver
gence of matrices developed in Section 5.3) that the series I + A + A2 + • • •
converges to (/ — A)"1 if {A71} converges to the zero matrix. In this case,
(I — .A)-1 is nonnegative since the matrices J, A, A2,... are nonnegative.
To illustrate the open model, suppose that 30 cents worth of food, 10
cents worth of clothing, and 30 cents worth of housing are required for the
production of $1 worth of food. Similarly, suppose that 20 cents worth of
food, 40 cents worth of clothing, and 20 cents worth of housing are required
for the production of $1 of clothing. Finally, suppose that 30 cents worth of
food, 10 cents worth of clothing, and 30 cents worth of housing are required
for the production of $1 worth of housing. Then the input output matrix is
/0.30 0.20 0.30
A = I 0.10 0.40 0.10 ;
\0.30 0.20 0.30/

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 179
so
I-A =
0.70 -0.20 -0.30'
-0.10 0.60 -0.10
-0.30 -0.20 0.70,
and (I^A)~1 =
2.0
0.5
1.0
1.0
2.0
1.0
1.0
0.5
2.0
Since (I —A)'1 is nonnegative, we can find a (unique) nonnegative solution to
(/ — A)x — d for any demand d. For example, suppose that there are outside
demands for $30 billion in food, $20 billion in clothing, and $10 billion in
housing. If we set
d =
then
x = (I-A)~ld= 60
So a gross production of $90 billion of food, $60 billion of clothing, and $70
billion of housing is necessary to meet the required demands.
EXERCISES
1. Label the following statements as true or false.
(a) Any system of linear equations has at least one solution.
(b) Any system of linear equations has at most one solution.
(c) Any homogeneous system of linear equations has at least one so
lution.
(d) Any system of n linear equations in n unknowns has at most one
solution.
(e) Any system of n linear equations in n unknowns has at least one
solution.
(f) If the homogeneous system corresponding to a given system of lin
ear equations has a solution, then the given system has a solution.
(g) If the coefficient matrix of a homogeneous system of n linear equa
tions in n unknowns is invertible, then the system has no nonzero
solutions.
(h) The solution set of any system of m linear equations in n unknowns
is a subspace of Fn.
2. For each of the following homogeneous systems of linear equations, find
the dimension of and a basis for the solution set.

180 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
(a)
(c)
xi + 3x2 = 0
2xi + 6x2 = 0
xi + 2x2 - x3 = 0
2xi + x2 + x3 = 0
(b)
Xi -f X2 — X3 = 0
4xi + X2 - 2x3 = 0
2xi + x2 - x3 = 0
(d) xi - x2 + x3 = 0
xi + 2x2 - 2x3 = 0
(e) xi + 2x2 - 3x3 + x4 = 0
Xi + 2X2 + X3 + X4 = 0
(f)
xi + 2x2 = 0
Xi — x2 = 0
(g)
X2 — X3 + X4 = 0
3. Using the results of Exercise 2, find all solutions to the following sys
tems.
(a)
(c)
(e)
xi + 3x2 = 5
2xi + 6x2 = 10
Xi + 2X2 — X3 = 3
2xi + x2 + x3 = 6
xi + 2x2 -^3x3 + x4 = 1
(b)
Xl + X2 - X3 = 1
4xi + x2 - 2x3 = 3
2xi + x2 - x3 = 5
(d) Xi - X2 + X3 as 1
xi + 2x2 - 2x3 = 4
(f)
xi + 2x2 = 5
Xi - x2 = -1
(g)
Xi + 2X2 + X3 + X4 = 1
X2 ~ X3 + X4 = 1
4. For each system of linear equations with the invertible coefficient matrix
A,
(1) Compute A"1.
(2) Use A-1 to solve the system.
W 2x1+5x2 = 3 M *l+*2ZX3 =
2xi - 2x2 + x3 = 4
5. Give an example of a system of n linear equations in n unknowns with
infinitely many solutions.
6. Let T: R3 —• R2 be defined by T(a, b, c) = (a + 6,2a — c). Determine
T-Kui)-
7. Determine which of the following systems of linear equations has a so
lution.
.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 181
X\ -r x2 — Xg = 1
2xi + x2 + 3x3 = 2
X\ + X2 + 3X3 — X4 = 0
Xl + X2+ X3 + X4 = 1
Xi - 2x2 + X3 - X4 = 1
4xi + X2 + 8x3 — X4 = 0
8.
9.
Xi + X2 — X3 + 2x4 = 2
(a) xi + x2 + 2x3 = 1 (b)
2xi + 2x2 + x3 + 2x4 = 4
xi + 2x2 + 3x3 = 1
(c) xi + x2 - x3 = 0 (d)
xi -I- 2x2 + X3 = 3
Xi + 2x2 — X3 = 1
(e) 2xi + x2 + 2x3 = 3
Xi — 4x2 + 7x3 = 4
Let T: R3 -+ R3 be defined by T(o,6,c) = (o + 6,6 - 2c, o + 2c). For
each vector v in R3, determine whether v £ R(T).
(a) v = (1,3,-2) (b) ^ = (2,1,1)
Prove that the system of linear equations Ax = b has a solution if and
only if 6 e R(L^).
10. Prove or give a counterexample to the following statement: If the co
efficient matrix of a system of m linear equations in n unknowns has
rank m, then the system has a solution. /
11. In the closed model of Leontief with food, clothing, and housing as the
basic industries, suppose that the input-output matrix is
A =
At what ratio must the farmer, tailor, and carpenter produce in order
for equilibrium to be attained?
12. A certain economy consists of two sectors: goods and services. Suppose
that 60% of all goods and 30% of all services are used in the production
of goods. What proportion of the total economic output is used in the
production of goods?
13. In the notation of the open model of Leontief, suppose that
(&
5
16
1
1 4
1
2
1
6
1
3
ft
5
16
1
2J
A = and d =
are the input-output matrix and the demand vector, respectively. How
much of each commodity must be produced to satisfy this demand?

182 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
14. A certain economy consisting of the two sectors of goods and services
supports a defense system that consumes $90 billion worth of goods and
$20 billion worth of services from the economy but does not contribute
to economic production. Suppose that 50 cents worth of goods and 20
cents worth of services are required to produce $1 worth of goods and
that 30 cents worth of goods and 60 cents worth of services are required
to produce $1 worth of services. What must the total output of the
economic system be to support this defense system?
3.4 SYSTEMS OF LINEAR EQUATIONS-
COMPUTATIONAL ASPECTS
In Section 3.3, we obtained a necessary and sufficient condition for a system
of linear equations to have solutions (Theorem 3.11 p. 174) and learned how
to express the solutions to a nonhomogeneous system in terms of solutions
to the corresponding homogeneous system (Theorem 3.9 p. 172). The latter
result enables us to determine all the solutions to a given system if we can
find one solution to the given system and a basis for the solution set of the
corresponding homogeneous system. In this section, we use elementary row
operations to accomplish these two objectives simultaneously. The essence of
this technique is to transform a given system of linear equations into a system
having the same solutions, but which is easier to solve (as in Section 1.4).
Definition. Two systems of linear equations are called equivalent if
they have the same solution set.
The following theorem and corollary give a useful method for obtaining
equivalent systems.
Theorem 3.13. Let Ax = 6 be a system of m linear equations in n
unknowns, and let C be an invertible mxm matrix. Then the system
(CA)x = Cb is equivalent to Ax — b.
Proof. Let K be the solution set for Ax = 6 and K' the solution set for
(CA)x = Cb. If w <E K, then Aw = 6. So (CA)w = Cb, and hence w € K'.
Thus KCK'.
Conversely, if w € K', then (CA)w = Cb. Hence
Aw = C~\CAw) = C~\Cb) = 6;
so w <E K. Thus K' C K, and therefore, K = K'. |
Corollary. Let Ax = 6 be a system ofm linear equations in n unknowns.
If (A'') is obtained from (A) by a finite number of elementary row opera
tions, then the system A'x = 6' is equivalent to the original system.

Sec. 3.4 Systems of Linear Equations—Computational Aspects 183
Proof. Suppose that (A'') is obtained from (^4|6) by elementary row
operations. These may be executed by multiplying (A) by elementary mx m
matrices E\, E2,..., Ep. Let C = Ep • • • E2E\; then
(A'') = C(A) = (CA\Cb).
Since each Ei is invertible, so is C. Now A' = CA and b' = Cb. Thus by
Theorem 3.13, the system A'x = b' is equivalent to the system Ax = b. I
We now describe a method for solving any system of linear equations.
Consider, for example, the system of linear equations
3xi + 2x2 + 3x3 - 2x4 = 1
xi + x2 + x3 =3
xi + 2x2 + x3 - x4 = 2.
First, we form the augmented matrix
3
1
1
2 3
1 1
2 1
-2
0
-1
By using elementary row operations, we transform the/augmented matrix
into an upper triangular matrix in which the first nonzero entry of each row
is 1, and it occurs in a column to the right of the first nonzero entry of each
preceding row. (Recall that matrix A is upper triangular if Aij — 0 whenever
i>h)
1. In the leftmost nonzero column, create a 1 in the first row. In our
example, we can accomplish this step by interchanging the first and
third rows. The resulting matrix is
1 2 1
1 1 1
3 2 3
-1
0
-2
2. By means of type 3 row operations, use the first row to obtain zeros in
the remaining positions of the leftmost nonzero column. In our example,
we must add —1 times the first row to the second row and then add —3
times the first row to the third row to obtain
0
0
2 1
-1 0
-4 0
-1
1
1
2
1
-5
3. Create a 1 in the next row in the leftmost possible column, without using
previous row(s). In our example, the second column is the leftmost

184 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
possible column, and we can obtain a 1 in the second row, second column
by multiplying the second row by —1. This operation produces
0
0
2 1
1 0
-4 0
-1
-1
1
2
-1
-5
4. Now use type 3 elementary row operations to obtain zeros below the 1
created in the preceding step. In our example, we must add four times
the second row to the third row. The resulting matrix is
0
0
2 1
1 0
0 0
-1
-1
-3
5. Repeat steps 3 and 4 on each succeeding row until no nonzero rows
remain. (This creates zeros above the first nonzero entry in each row.)
In our example, this can be accomplished by multiplying the third row
by — |. This operation produces
1
0
0
2 1
1 0
0 0
-1
-1
1
2
-1
3
We have now obtained the desired matrix. To complete the simplification
of the augmented matrix, we must make the first nonzero entry in each row
the only nonzero entry in its column. (This corresponds to eliminating certain
unknowns from all but one of the equations.)
6. Work upward, beginning with the last nonzero row, and add multiples of
each row to the rows above. (This creates zeros above the first nonzero
entry in each row.) In our example, the third row is the last nonzero
row, and the first nonzero entry of this row lies in column 4. Hence we
add the third row to the first and second rows to obtain zeros in row 1,
column 4 and row 2, column 4. The resulting matrix is
0
0
2
1
0
1 0
0 0
0 1
7. Repeat the process described in step 6 for each preceding row until it is
performed with the second row, at which time the reduction process is
complete. In our example, we must add —2 times the second row to the
first row in order to make the first row, second column entry become
zero. This operation produces
0
0
0 1
1 0
0 0
0
0
1

Sec. 3.4 Systems of Linear Equations—Computational Aspects 185
We have now obtained the desired reduction of the augmented matrix.
This matrix corresponds to the system of linear equations
x\ + x3 =1
x2 =2
X4 = 3.
Recall that, by the corollary to Theorem 3.13, this system is equivalent to
the original system. But this system is easily solved. Obviously X2 = 2 and
X4 = 3. Moreover, xi and X3 can have any values provided their sum is 1.
Letting X3 = t, we then have xi = 1 — t. Thus an arbitrary solution to the
original system has the form
fl-t
2
t
K ^ j
=
f
2
0
A
+ *
f-l
0
1
V °/
Observe that
0
1
V <v
/
is a basis for the homogeneous system of equations corresponding to the given
system.
In the preceding example we performed elementary row operations on the
augmented matrix of the system until we obtained the augmented matrix of a
system having properties 1, 2, and 3 on page 27. Such a matrix has a special
name.
Definition. A matrix is said to be in reduced row echelon form if
the following three conditions are satisfied.
(a) Any row containing a nonzero entry precedes any row in which all the
entries are zero (if any).
(b) The first nonzero entry in each row is the only nonzero entry in its
column.
(c) The first nonzero entry in each row is 1 and it occurs in a column to
the right of the first nonzero entry in the preceding row.
Example 1
(a) The matrix on page 184 is in reduced row echelon form. Note that the
first nonzero entry of each row is 1 and that the column containing each such
entry has all zeros otherwise. Also note that each time we move downward to

186 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
a new row, we must move to the right one or more columns to find the first
nonzero entry of the new row.
(b) The matrix
is not in reduced row echelon form, because the first column, which contains
the first nonzero entry in row 1, contains another nonzero entry. Similarly,
the matrix
'0 1 0 2y
10 0 1
,0 0 11,
is not in reduced row echelon form, because the first nonzero entry of the
second row is not to the right of the first nonzero entry of the first row.
Finally, the matrix
/
is not in reduced row echelon form, because the first nonzero entry of the first
row is not 1. •
It can be shown (see the corollary to Theorem 3.16) that the reduced
row echelon form of a matrix is unique; that is, if different sequences of
elementary row operations are used to transform a matrix into matrices Q
and Q' in reduced row echelon form, then Q = Q'. Thus, although there are
many different sequences of elementary row operations that can be used to
transform a given matrix into reduced row echelon form, they all produce the
same result.
The procedure described on pages 183-185 for reducing an augmented
matrix to reduced row echelon form is called Gaussian elimination. It
consists of two separate parts.
1. In the forward pass (steps 1-5), the augmented matrix is transformed
into an upper triangular matrix in which the first nonzero entry of each
row is 1, and it occurs in a column to the right of the first nonzero entry
of each preceding row.
2. In the backward pass or back-substitution (steps 6-7), the upper trian
gular matrix is transformed into reduced row echelon form by making
the first nonzero entry of each row the only nonzero entry of its column.
P

Sec. 3.4 Systems of Linear Equations—Computational Aspects 187
Of all the methods for transforming a matrix into its reduced row ech
elon form, Gaussian elimination requires the fewest arithmetic operations.
(For large matrices, it requires approximately 50% fewer operations than the
Gauss-Jordan method, in which the matrix is transformed into reduced row
echelon form by using the first nonzero entry in each row to make zero all
other entries in its column.) Because of this efficiency, Gaussian elimination
is the preferred method when solving systems of linear equations on a com
puter. In this context, the Gaussian elimination procedure is usually modified
in order to minimize roundoff errors. Since discussion of these techniques is
inappropriate here, readers who are interested in such matters are referred to
books on numerical analysis.
When a matrix is in reduced row echelon form, the corresponding sys
tem of linear equations is easy to solve. We present below a procedure for
solving any system of linear equations for which the augmented matrix is in
reduced row echelon form. First, however, we note that every matrix can be
transformed into reduced row echelon form by Gaussian elimination. In the
forward pass, we satisfy conditions (a) and (c) in the definition of reduced
row echelon form and thereby make zero all entries below the first nonzero
entry in each row. Then in the backward pass, we make zero all entries above
the first nonzero entry in each row, thereby satisfying condition (b) in the
definition of reduced row echelon form.
Theorem 3.14. Gaussian elimination transforms any matrix into its re
duced row echelon form.
We now describe a method for solving a system in which the augmented
matrix is in reduced row echelon form. To illustrate this procedure, we con
sider the system
2xi + 3x2 + X3 + 4x4 — 9x5 = 17
xi + x2 + X3 + X4 — 3x5 = 6
xi 4- x2 + X3 + 2x4 - 5x5 = 8
2x, + 2x2 + 2x3 + 3x4 - 8x5 = 14,
for which the augmented matrix is
(2 3 1 4
1111
1
V
-9
-3
-5
-8
17
6
8
14/
Applying
duces the
Gaussian elimination to the augmented matrix of the system pro-
following sequence of matrices.
17
6
8
UJ
(
2
1
V2
1
3
1
2
1 1
1 4
1 2
2 3
-3
-9
-5
-8
6
17
8
14,

188 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
/l 1 11
0 1-12
0 0
\0 0
a i
0 1
0 0
\0 0
-3
-3
0 1 -2
0 1 -2
6
5
2
V
<l l
0 1
0 0
^0 0
1 1
-1 2
0 1
0 0
-3
-3
-2
0
6>
5
2
«y
1 0
-1 0
0 1
0 0
-1
1
-2
0
4^
1
2
<v
~~*
/l 0 2 0-2
0 1-10 1
0 0 0 1-2
\0 0 0 0 0
3
1
2
0/
The system of linear equations corresponding to this last matrix is
xi + 2x3 - 2x5 = 3
#2 - #3 + x5 = 1
X4 — 2x5 = 2.
Notice that we have ignored the last row since it consists entirely of zeros.
To solve a system for which the augmented matrix is in reduced row
echelon form, divide the variables into two sets. The first set consists of
those variables that appear as leftmost variables in one of the equations of
the system (in this case the set is {xi,X2,X4}). The second set consists of
all the remaining variables (in this case, {x3,xs}). To each variable in the
second set, assign a parametric value t\,t2,... (X3 = t\, X5 = t2), and then
solve for the variables of the first set in terms of those in the second set:
xi = -2x3 + 2x5 + 3 = -2ti + 2*2 + 3
x2= x3 - x5 + 1 = t\ - t2 + 1
x4 = 2x5 + 2 = 2t2 + 2.
Thus an arbitrary solution is of the form
/xA /-2ti + 2*2 + 3
x2 h — t2 + 1
x3 = ti
x4 2*2 + 2
V^5> \ t2

=
/
/3
1
0
2
w
-rh
f-2
1
1
0
I 0)
+ t2
( 2
-1
0
2
I 1/
where ti,t2 6 R. Notice that
(
!
{
f-2
1
1
0
I o^
5
< 2
-1
0
2
I V

J

Sec. 3.4 Systems of Linear Equations—Computational Aspects 189
is a basis for the solution set of the corresponding homogeneous system of
equations and
/3
1
0
2
w
is a particular solution to the original system.
Therefore, in simplifying the augmented matrix of the system to reduced
row echelon form, we are in effect simultaneously finding a particular solu
tion to the original system and a basis for the solution set of the associated
homogeneous system. Moreover, this procedure detects when a system is in
consistent, for by Exercise 3, solutions exist if and only if, in the reduction of
the augmented matrix to reduced row echelon form, we do not obtain a row
in which the only nonzero entry lies in the last column.
Thus to use this procedure for solving a system Ax = 6 of m linear equa
tions in n unknowns, we need only begin to transform the augmented matrix
(.4|6) into its reduced row echelon form (^4'|6') by means of Gaussian elimi
nation. If a row is obtained in which the only nonzero entry lies in the last
column, then the original system is inconsistent. Otherwise, discard any zero
rows from (A''), and write the corresponding system of equations. Solve
this system as described above to obtain an arbitrary solution of the form
s = so + hui + t2u2 H + tn_rwn_r,
where r is the number of nonzero rows in A' (r < m). The preceding equation
is called a general solution of the system Ax = 6. It expresses an arbitrary
solution s of Ax = b in terms of n — r parameters. The following theorem
states that s cannot be expressed in fewer than n — r parameters.
Theorem 3.15. Let Ax = 6 be a system of r nonzero equations in n
unknowns. Suppose that rank(A) — rank(^|6) and that (A) is in reduced
row echelon form. Then
(a) rank(A) = r.
(b) If the general solution obtained by the procedure above is of the form
s = s0 + hui + t2u2 H \- tn-run-r,
then {ui,u2,..., un-r} is a basis for the solution set of the correspond
ing homogeneous system, and So is a solution to the original system.
Proof. Since (A) is in reduced row echelon form, (A) must have r
nonzero rows. Clearly these rows are linearly independent by the definition
of the reduced row echelon form, and so rank(yl|6) = r. Thus rank(yl) = r.

190 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
Let K be the solution set for Ax = b. and let KH be the solution set for
Ax = 0. Setting t\ = t2 — • • • — tn-r = 0, we see that s = So G K. But by
Theorem 3.9 (p. 172), K = {.s,,} + KH. Hence
KH = {so} + K = span({u1.u2,.... «„-,})•
Because rank(A) = r, we have dim(Kn) = n — r. Thus since dim(Kn) = n — r
and KH is generated by a set {u\,u2,... ,un-r} containing at most n — r
vectors, we conclude that this set is a basis for KH- 1
An Interpretation of the Reduced Row Echelon Form
Let A be an m x n matrix with columns Oi,a2, • • • ,an< aiJd let B be the
reduced row echelon form of A. Denote the columns of B by b\,b2,...,b„. If
the rank of A is r, then the rank of B is also r by the corollary to Theorem 3.4
(p. 153). Because B is in reduced row echelon form, no nonzero row of B can
be a linear combination of the other rows of B. Hence B must have exactly
r nonzero rows, and if r > 1, the vectors e.\, e2,..., er must occur among the
columns of B. For i = 1,2,..., r, let j.t denote a column number of B such
that bjt = e». We claim that a7l, aj2..... aJr, the columns of A corresponding
to these columns of B, are linearly independent. For suppose that there are
scalars c\,c2,...,cr such that
c\Uj1 + c2aj2 -\ 1- crajr = 0.
Because B can be obtained from A by a sequence of elementary row oper
ations, there exists (as in the proof of the corollary to Theorem 3.13) an
invertible mxm matrix M such that MA = B. Multiplying the preceding
equation by M yields
CIMOJ, +c2Ma
3i
+ crMa.; = 0.
Since Ma*. = 6, = a, it follows that
C('\ + C2C2 H r Cr('r = 0.
Hence c\ = c2 = • • • = cr = 0. proving that the vectors ctj11Q>ja, • • • ,o-jr are
linearly independent.
Because B has only r nonzero rows, every column of B has the form
dr
0
W

Sec. 3.4 Systems of Linear Equations—Computational Aspects
for scalars d\,d2,... ,dr. The corresponding column of A must be
191
r-l M '(diei +d2e2 H \-drer) = d\M 1ei+d2M *e2-t + drM_1(
= diM^bj, + d2M~1bJ2 + • • • + drM~lbjr
= diajr + d2aj2 + h drajv.
The next theorem summarizes these results.
Theorem 3.16. Let A be an m x n matrix of rank r, where r > 0, and
let B be the reduced row echelon form of A. Then
(a) The number of nonzero rows in B is r.
(b) For each i = 1,2,..., r, there is a column bji of B such that bjt = e^.
(c) The columns of A numbered ji,j2,---, jr nre linearly independent.
(d) For each k = 1,2,... n, if column k ofB is d\ei +d2e2 -\ \-drer, then
column k of A is d\Oj1 + d2aj2 + • • • + dra,jr.
Corollary. The reduced row echelon form of a matrix is unique.
Proof. Exercise. (See Exercisel5.) M
Example 2
Let
/
A =
(2 4 6 2 4
12 3 11
2 4 8 0 0
\3 6 7 5 9/
The reduced row echelon form of A is
B =
(I
0
0
^
2
0
0
0
0
1
0
0
4
-1
0
0
0
0
1
0/
Since B has three nonzero rows, the rank of A is 3. The first, third, and fifth
columns of B are ei,e2, and 63; so Theorem 3.16(c) asserts that the first,
third, and fifth columns of A are linearly independent.
Let the columns of A be denoted ai, a2, a$, 04, and a$. Because the second
column of B is 2ei, it follows from Theorem 3.16(d) that a2 = 2a-i, as is easily
checked. Moreover, since the fourth column of B is 4ei + (—l)e2, the same
result shows that
04 = 4a\ + (—l)as. •

192 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
In Example 6 of Section 1.6, we extracted a basis for R3 from the gener
ating set
S = {(2, -3,5), (8, -12,20), (1,0, -2), (0,2, -1), (7,2,0)}.
The procedure described there can be streamlined by using Theorem 3.16.
We begin by noting that if S were linearly independent, then S would be a
basis for R3. In this case, it is clear that S is linearly dependent because
S contains more than dim(R3) = 3 vectors. Nevertheless, it is instructive
to consider the calculation that is needed to determine whether S is linearly
dependent or linearly independent. Recall that S is linearly dependent if
there are scalars Ci,C2,C3,C4, and C5, not all zero, such that
ci(2,-3,5)+C2(8,-12,20)+c3(l,0,-2)+c4(0,2,-l)+C5(7,2,0) = (0,0,0).
Thus S is linearly dependent if and only if the system of linear equations
2ci + 8c2 + c3 + 7c5 = 0
-3ci - 12c2 + 2c4 + 2c5 = 0
5ci + 20c2 - 2c3 - c4 =0
has a nonzero solution. The augmented matrix of this system of equations is
A= -3 -12
\ 5 20
and its reduced row echelon form is
B =
Using the technique described earlier in this section, we can find nonzero
solutions of the preceding system, confirming that 5 is linearly dependent.
However, Theorem 3.16(c) gives us additional information. Since the first,
third, and fourth columns of B are e,\, e2, and 63, we conclude that the first,
third, and fourth columns of A are linearly independent. But the columns
of A other than the last column (which is the zero vector) are vectors in S.
Hence
0 = {(2, -3,5), (1,0, -2), (0,2,-1)}
is a linearly independent subset of S. If follows from (b) of Corollary 2 to the
replacement theorem (p. 47) that 0 is a basis for R3.
Because every finite-dimensional vector space over F is isomorphic to Fn
for some n, a similar approach can be used to reduce any finite generating
set to a basis. This technique is illustrated in the next example.

Sec. 3.4 Systems of Linear Equations—Computational Aspects 193
Example 3
The set
S = {2+x+2x2+3x3,4+2x+4x2+6x3,6+3x+8x2+7x3,2+x+5x3,4+x+9x3}
generates a subspace V of Ps(R). To find a subset of S that is a basis for V,
we consider the subset
Sf = {(2,1,2,3), (4,2,4,6), (6,3,8,7), (2,1,0,5), (4,1,0,9)}
consisting of the images of the polynomials in S under the standard repre
sentation of Ps(R) with respect to the standard ordered basis. Note that the
4x5 matrix in which the columns are the vectors in S' is the matrix A in
Example 2. From the reduced row echelon form of A, which is the matrix B
in Example 2, we see that the first, third, and fifth columns of A are linearly
independent and the second and fourth columns of A are linear combinations
of the first, third, and fifth columns. Hence
{(2,1,2,3), (6,3,8,7), (4,1,0,9)}
is a basis for the subspace of R4 that is generated by S'. It follows that
{2 + x + 2x2 + 3x3,6 + 3x + 8x2 + 7x3,4 + x + 9x3 }
is a basis for the subspace V of Ps(i?.). •
We conclude this section by describing a method for extending a linearly
independent subset S of a finite-dimensional vector space V to a basis for V.
Recall that this is always possible by (c) of Corollary 2 to the replacement
theorem (p. 47). Our approach is based on the replacement theorem and
assumes that we can find an explicit basis 0 for V. Let S' be the ordered set
consisting of the vectors in S followed by those in 0. Since 0 C S', the set
S' generates V. We can then apply the technique described above to reduce
this generating set to a basis for V containing S.
Example 4
Let
V = {(xi,X2,x3,X4,x5) e R5: xi +7x2 + 5x3 - 4x4 + 2x5 = 0}.
It is easily verified that V is a subspace of R5 and that
S = {(-2,0,0, -1, -1), (1,1, -2, -1, -1), (-5,1,0,1,1)}
is a linearly independent subset of V.

194 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
To extend 5 to a basis for V, we first obtain a basis 0 for V. To do so,
we solve the system of linear equations that defines V. Since in this case V is
defined by a single equation, we need only write the equation as
Xi = —7x2 — 5x3 + 4x4 — 2x5
and assign parametric values to X2, X3, X4, and X5. If X2 = £1, X3 = t2,
X4 = £3, and X5 = £4, then the vectors in V have the form
(xi,x2,X3,X4,x5) = (-7£i -5t2+4t3 -2t4,ti,t2,t3,t4)
= ii(-7,l,0,0,0)+^(-5,0,l,0,0)+£3(4,0,0,l,0) + t4(-2,0,0,0,l).
Hence
0= {(-7,1,0,0,0), (-5,0,1,0,0), (4,0,0,1,0), (-2,0,0,0,1)}
is a basis for V by Theorem 3.15.
The matrix whose columns consist of the vectors in S followed by those
in 0 is
(-2
0
0
-1
^
1
1
-2
-1
-1
-5
1
0
1
1
-7
1
0
0
0
-5
0
1
0
0
4
0
0
1
0
-2
0
0
0
l)
and its reduced row echelon form is
/l 0 0 1 1 0 -1
0 1 0 0 -.5 0 0
0 0 11 .5 0 0
0 0 0 0 0 1-1
\o 0 0 0 00 oy
Thus
{(-2,0,0,-1,-1), (1,1,-2,-1,-1), (-5,1,0,1,1), (4,0,0,1,0)}
is a basis for V containing S. •
EXERCISES
1. Label the following statements as true or false.
(a) If (A'') is obtained from (A) by a finite sequence of elementary
column operations, then the systems Ax = b and ^4'x = b' are
equivalent.

Sec. 3.4 Systems of Linear Equations—Computational Aspects 195
(b) If (A'') is obtained from (^4|6) by a finite sequence of elemen
tary row operations, then the systems ^4x = 6 and ^4'x = b' are
equivalent.
(c) If A is an n x n matrix with rank n, then the reduced row echelon
form of A is In-
(d) Any matrix can be put in reduced row echelon form by means of
a finite sequence of elementary row operations.
(e) If (A) is in reduced row echelon form, then the system Ax = 6 is
consistent.
(f) Let J4X = 6 be a system of m linear equations in n unknowns for
which the augmented matrix is in reduced row echelon form. If
this system is consistent, then the dimension of the solution set of
Ax = 0 is n — r, where r equals the number of nonzero rows in A.
(g) If a matrix A is transformed by elementary row operations into a
matrix A' in reduced row echelon form, then the number of nonzero
rows in A' equals the rank of A.
2. Use Gaussian elimination to solve the following systems of linear equa
tions.
(a)
(c)
(d)
(e)
(g)
(h)
Xi + 2X2 — X3 = —1
2xi + 2x2 + X3 = 1 (b)
3xi + 5x2 — 2.X3 — — 1
xi + 2x2 + 2x4 = 6
3xi + 5x2 — X3 + 6x4 = 17
2xi + 4x2 + x3 + 2x4 = 12
2xi — 7x3 + HX4 = 7
x\ — x2 — 2x3 + 3x4 = — 7
2xi — X2 + 6x3 + 6x4 = —2
— 2X] + X2 — 4X3 ~" 3X4 = 0
3xi — 2x2 + 9^3 + IOX4 = —5
Xi -
2xi -
3xi -
Xi
- 2x2 - X3
- 3x2 +- x3
- 5x2
+ 5x3
= 1
= 6
= 7
= 9
xi - 4x2 - x3 + x4 = 3
2xi - 8x2 + X3 — 4x4 = 9
-X! + 4x2 - 2x3 + 5x4 = -6
2xi — 2x2 — X3 + 6x4 — 2x5 = 1
.X'i — X2 + X3 + 2x4 — X5 = 2
4xi - 4x2 + 5x3 + 7x4 - x5 = 6
3xi — X2 + X3 — X4 + 2x5 = 5
X] - x2 - x3 - 2x4 - x5 = 2
5xi - 2x2 + x-s - 3x4 + 3x5 = 10
2xi — X2 — 2x4 + X5 = 5
xi + 2x2 — X3 + 3x4 = 2
(f) 2xi + 4x2 - x3 + 6x4 = 5
x2 + 2x4 = 3

196 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
(i)
(j)
3xi - x2 + 2x3 + 4x4 + x5 = 2
Xi — x2 + 2X3 + 3X4 + x5 = — 1
2xi — 3x2 + 6x3 + 9x4 + 4x5 = —5
7xi — 2x2 + 4x3 + 8x4 + X5 = 6
2xi + 3x3 ~ 4x5 — 5
3xi — 4x2 + 8x3 + 3x4 = 8
Xi — x2 + 2x3 + X4 — X5 = 2
—2xi + 5x2 — 9x3 — 3x4 — 5x5 = —8
3. Suppose that the augmented matrix of a system .Ax = 6 is transformed
into a matrix (A'') in reduced row echelon form by a finite sequence
of elementary row operations.
(a) Prove that vank(A') ^ rank(,4/|6/) if and only if (^4'|6') contains a
row in which the only nonzero entry lies in the last column.
(b) Deduce that Ax = 6 is consistent if and only if (^4'|6') contains no
row in which the only nonzero entry lies in the last column.
4. For each of the systems that follow, apply Exercise 3 to determine
whether the system is consistent. If the system is consistent, find all
solutions. Finally, find a basis for the solution set of the corresponding
homogeneous system.
X\ + X2 — 3x3 + X4 = —2
(b) xi + x2 + x3 - X4 = 2
xi + x2 - x3 =0
xi + 2x2 — X3 + X4 = 2
(a) 2xi + x2 + x3 - x4 = 3
xi + 2x2 ~~ 3x3 + 2.T4 = 2
Xi + X2 — 3X3 + X4 = 1
(c) Xi + X2 + X3 - X4 = 2
Xi + X2 — X3 =0
5. Let the reduced row echelon form of A be
'10 2 0 -2'
0 1-50-3
J) 0 0 1 6,
Determine A if the first, second, and fourth columns of A are
and
respectively.
6. Let the reduced row echelon form of A be
/l -3 0 4 0
0 0
0 0
\0 0
5
1 3 0
0 0 1
0 0 0
2
-1
0/

Sec. 3.4 Systems of Linear Equations—Computational Aspects
Determine A if the first, third, and sixth columns of A are
197
/ 1
-2
-1
, *)
i
>—>
1
2
\-4j
and
( 3
-9
2
V V
respectively.
7. It can be shown that the vectors u\ = (2, —3,1), u2 = (1,4, —2), uz =
(-8,12, -4), w4 = (1,37, -17), and u5 = (-3, -5,8) generate R3. Find
a subset of {u\, u2,us, U4,u$} that is a basis for R3.
8. Let W denote the subspace of R5 consisting of all vectors having coor
dinates that sum to zero. The vectors
«i = (2,-3,4,-5,2),
«3 = (3,-2,7,-9,1),
u5 = (-1,1,2,1,-3),
u7 = (1,0,-2,3,-2), and
u2 = (-6,9,-12,15,-6),
u4 = (2,-8,2,-2,6),
u6 = (0,-3,-18,9,12),
u8 = (2,-l,l,-9,7)
generate W. Find a subset of {ui,u2,..., ug} that is a basis for W.
9. Let W be the subspace of M2x2(R) consisting of the symmetric 2x2
matrices. The set
S =
1 2
2 3
2 1
1 9
1 -2
-2 4
-1 2
2 -1
generates W. Find a subset of S that is a basis for W.
10. Let
V = {(xi,x2,X3,X4,x5) G R5: Xi - 2x2 + 3x3 - x4 + 2x5 = 0}.
(a) Show that S = {(0,1,1,1,0)} is a linearly independent subset of
V.
(b) Extend S to a basis for V.
11. Let V be as in Exercise 10.
(a) Show that S = {(1,2,1,0,0)} is a linearly independent subset of
V.
(b) Extend S to a basis for V.
12. Let V denote the set of all solutions to the system of linear equations
^i — #2 + 2x4 — 3x5 + XQ = 0
2xi — X2 — X3 + 3x4 — 4x5 + 4x6 = 0-

198 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations
(a) Show that S = {(0, -1,0,1,1,0), (1,0,1,1,1,0)} is a linearly inde
pendent subset of V.
(b) Extend 5 to a basis for V.
13. Let V be as in Exercise 12.
(a) Show that S = {(1,0,1,1,1,0), (0,2,1,1,0,0)} is a linearly inde
pendent subset of V.
(b) Extend S to a basis for V.
14. If (A) is in reduced row echelon form, prove that A is also in reduced
row echelon form.
15. Prove the corollary to Theorem 3.16: The reduced row echelon form of
a matrix is unique.
INDEX OF DEFINITIONS FOR CHAPTER 3
Augmented matrix 161
Augmented matrix of a system of lin
ear equations 174
Backward pass 186
Closed model of a sirnpic economy
176
Coefficient matrix of a system of lin
ear equations 169
Consistent system of linear equations
169
Elementary column operation 148
Elementary matrix 149
Elementary operation 148
Elementary row operation 148
Equilibrium condition for a simple
economy 177
Equivalent systems of linear equa
tions 182
Forward pass 186
Gaussian elimination 186
General solution of a system of linear
equations 189
Homogeneous system correspond
ing to a nonhomogeneous system
172
Homogeneous system of linear equa
tions 171
Inconsistent system of linear equa
tions 169
Input- output matrix 177
Nonhomogeneous system of linear
equations 171
Nonnegative vector 177
Open model of a simple economy
178
Positive matrix 177
Rank of a matrix 152
Reduced row echelon form of a ma
trix 185
Solution to a system of linear equa
tions 169
Solution set of a system of equations
169
System of linear equations 169
Type 1, 2, and 3 elementary opera
tions 148

4
Determinants
4.1
4.2
4.3
4.4
4.5*
Determinants of Order 2
Determinants of Order n
Properties of Determinants
Summary — Important Facts about Determinants
A Characterization of the Determinant
JL he determinant, which has played a prominent role in the theory of lin
ear algebra, is a special scalar-valued function defined on the set of square
matrices. Although it still has a place in the study of linear algebra and its
applications, its role is less central than in former times. Yet no linear algebra
book would be complete without a systematic treatment of the determinant,
and we present one here. However, the main use of determinants in this book
is to compute and establish the properties of eigenvalues, which we discuss in
Chapter 5.
Although the determinant is not a linear transformation on Mnxn(F)
for n > 1, it does possess a kind of linearity (called n-linearity) as well
as other properties that are examined in this chapter. In Section 4.1, we
consider the determinant on the set of 2 x 2 matrices and derive its important
properties and develop an efficient computational procedure. To illustrate the
important role that determinants play in geometry, we also include optional
material that explores the applications of the determinant to the study of
area and orientation. In Sections 4.2 and 4.3, we extend the definition of the
determinant to all square matrices and derive its important properties and
develop an efficient computational procedure. For the reader who prefers to
treat determinants lightly, Section 4.4 contains the essential properties that
are needed in later chapters. Finally, Section 4.5, which is optional, offers
an axiomatic approach to determinants by showing how to characterize the
determinant in terms of three key properties.
4.1 DETERMINANTS OF ORDER 2
In this section, we define the determinant of a 2 x 2 matrix and investigate
its geometric significance in terms of area and orientation.
199

200 Chap. 4 Determinants
Definition. If
A =
a b
c d
is a 2 x 2 matrix with entries from a field F, then we define the determinant
of A, denoted det(^4) or \A\, to be the scalar ad — be.
Example 1
For the matrices
A =
1 2
and B =
3 2
3 4; ™ ~ \Q 4
in M2x2(R), we have
det(A) = 1-4-2-3 = -2 and det(B) = 3-4-2-6 = 0. •
For the matrices A and B in Example 1, we have
'4 4N
A + B =
9 8
and so
/
det(A + B) = 4-8 - 4-9 = -4.
Since det (4 + B) ^ det(A) + det(B), the function det: M2X2(-R) —* i? is
not a linear transformation. Nevertheless, the determinant does possess an
important linearity property, which is explained in the following theorem.
Theorem 4.1. The function det: M2x2(-F") —> F is a linear function of
each row of a 2 x 2 matrix when the other row is held fixed. That is, ifu, v,
and w are in F2 and k is a scalar, then
and
^lU + kv)=det("
w \w
w
A; det
w
det I I". )=dct[W)+kdetln
U + KV J \U I \V
Proof. Let u = (ai,a2), v — (b\,b2), and w = (c\,c2) be in F2 and A; be a
scalar. Then
det
ID
kdctiV)=det(ai ^
WI \Ci c2
A-d,. { h h
C] c2

Sec. 4.1 Determinants of Order 2 201
= {a(-2 - &2C1) + k{b\C2 - b><-
= (a\ + kbi)c2 - («2 + kb2)<
a\ + kb\ a,2 + /c&2
c\ c2
u + kv
w
= det
= det
A similar calculation shows that
det A-det = det
u + ki
For the 2x2 matrices A and B in Example 1, it is easily checked that A
is invertible but D is not. Note that det(i4) ^ 0 but dct(#) = 0. We now
show that this property is true in general.
Theorem 4.2. Let A G M2x2(-^)- Then the determinant of A is nonzero
if and only if A is invertible. Moreover, if A is invertible, then
A'1 =
Ai,
22
-Avl
det(A) V-^2i Au
Proof. If det(yl) ^ 0, then we can define a matrix
M =
I ,1,,, -A
i-'
det(i4) \-A2i An
A straightforward calculation shows that AM = MA = I, and so A is invert
ible and M = A~x.
Conversely, suppose that A is invertible. A remark on page 152 shows
that the rank of
A =
Au Ai2
A2\ A22
must be 2. Hence A\ ^ 0 or A2\ ^ 0. If An =^ 0, add —A21/A11 times row 1
of A to row 2 to obtain the matrix
A22 ~
Because elementary row operations are rank-preserving by the corollary to
Theorem 3.4 (p. 153), it, follows that

202 Chap. 4 Determinants
Therefore det(A) = AnA22 - A!2A2i ^ 0. On the other hand, if A2i ^ 0,
we see that det(A) ^ 0 by adding -An/A2i times row 2 of A to row 1 and
applying a similar argument. Thus, in either case, det(A) ^ 0.
In Sections 4.2 and 4.3, we extend the definition of the determinant to
nxn matrices and show that Theorem 4.2 remains true in this more general
context. In the remainder of this section, which can be omitted if desired,
we explore the geometric significance of the determinant of a 2 x 2 matrix.
In particular, we show the importance of the sign of the determinant in the
study of orientation.
The Area of a Parallelogram
By the angle between two vectors in R2, we mean the angle with measure
9 (0 < 9 < IT) that is formed by the vectors having the same magnitude and
direction as the given vectors but emanating from the origin. (See Figure 4.1.)
Figure 4.1: Angle between two vectors in R'
If 0 = {u,v} is an ordered basis for R2, we define the orientation of 0
to be the real number
O
del
del
(The denominator of this fraction is nonzero by Theorem 4.2.) Clearly
0 1*" I I.
v
Notice that
ei
O ( w = 1 and 0 1 ~l 1 -1.
e2/ V-C2.
' I
Recall that a coordinate system {u, v} is called right-handed if u can
be rotated in a counterclockwise direction through an angle 9 (0 < 9 < n)

Sec. 4.1 Determinants of Order 2 203
to coincide with v. Otherwise {u, v} is called a left-handed system. (See
Figure 4.2.) In general (see Exercise 12),
V
I
V
A right-handed coordinate system A left-handed coordinate system
Figure 4.2
O = 1
if and only if the ordered basis {u, v} forms a right-handed coordinate system.
For convenience, we also define
() = 1
if {u.v} is linearly dependent.
Any ordered set {u, v} in R2 determines a parallelogram in the following
manner. Regarding u and v as arrows emanating from the origin of R2, we
call the parallelogram having u and v as adjacent sides the parallelogram
determined by u and v. (See Figure 4.3.) Observe that if the set {//, v)
A
V
A
^M
'
x
Figure 4.3: Parallelograms determined by u and v
is linearly dependent (i.e.. if u and r are parallel), then the "parallelogram"
determined by u and v is actually a line segment, which we consider to be a
degenerate parallelogram having area zero.

204 Chap. 4 Determinants
There is an interesting relationship between
A
the area of the parallelogram determined by u and v, and
det
which we now investigate. Observe first, however, that since
det
may be negative, we cannot expect that
< -«**:
But we can prove that
A|W =0 "l-det
v \ v I W
u
from which it follows that
Our argument that
det
»>:-»:•-:
employs a technique that, although somewhat indirect, can be generalized to
Rn. First, since
0 [l) = H,
we may multiply both sides of the desired equation by
to obtain the equivalent form
°': •*:-*»:

Sec. 4.1 Determinants of Order 2 205
We establish this equation by verifying that the three conditions of Exercise 11
are satisfied bv the function
< =°: -A :
(a) We begin by showing that for any real number c
l")-e.*("
cv J \v
Observe that this equation is valid if c = 0 because
'™J = °lo -A o1
So assume that c ^ 0. Regarding cv as the base of the parallelogram deter
mined by u and cv, we see that
A [ J = base x altitude = |c|(length of v)(altitude) = \c\ • A ( ) .
since the altitude h of the parallelogram determined by u and cv is the same
as that in the parallelogram determined by u and v. (See Figure 4.4.) Hence
Figure 4.4
"•: =°: -M;: c v^
|c|-A
-or: -A : )=c.sr
A similar argument shows that
Si0") ~e.t[u
v \v

206 Chap. 4 Determinants
We next prove that
1, )=»•*("
mi + bw J \w
for any u, w € R2 and any real numbers a and b. Because the parallelograms
determined by u and w and by u and u + w have a common base u and the
same altitude (see Figure 4.5), it follows that
If a = 0. then
Figure 4.5
= A
u + w
= 61 ")=b-of
au + bw J w I \w
by the first paragraph of (a). Otherwise, if a ^ 0, then
81 u ) =a-6 "i, ) =a-6 [ /,'
\au + bwj \ u +-ic I \-w
So the desired conclusion is obtained in either case.
We are now able to show that
s( '•' )=s(u) +4"
\Vl+V2j V'"/ V'-'
for all u,v\,V2 G R2. Since the result is immediate if u = 0, we assume that
u / 0. Choose any vector w € R2 such that {(/.//•} is linearly independent.
Then for any vectors V[,t'2 € R2 there exist scalars a, and b, such that
Vi = a,;W + bite (i = 1, 2). Thus
si ; =8
•vi +v2
II
[O] + 0,2)11 + (6i + b2
= (bl+b2)6
ir

Sec. 4.1 Determinants of Order 2
1, )+*( !* = *("
a\u + biwl \a2u-\-b2wl \V
A similar argument shows that
51U1+U2)=S(U1)+5(U2
for all ui,U2,v 6 R2-
(b) Since
A f£\ = 0, it follows that 6 C^\ = 0 f£\ • A (^) = 0
for any u G R2.
(c) Because the parallelogram determined by ei and e2 is the unit square,
'j '2 e2
Therefore 5 satisfies the three conditions of Exercise 11, and hence 5 = det.
So the area of the parallelogram determined by u and v equals
r.detr
v I \v
Thus we see, for example, that the area of the parallelogram determined
by u = (-1,5) and v = (4, -2) is
det det
4 -2
= 18.
EXERCISES
1. Label the following statements as true or false.
(a) The function det: W\2x2(F) —* F is a linear transformation.
(b) The determinant of a 2 x 2 matrix is a linear function of each row
of the matrix when the other row is held fixed.
(c) If A e M2x2{F) and det(,4) = 0, then A is invertible.
(d) If u and v are vectors in R2 emanating from the origin, then the
area of the parallelogram having u and v as adjacent sides is
det

208 Chap. 4 Determinants
(e) A coordinate system is right-handed if and only if its orientation
equals 1.
2. Compute the determinants of the following matrices in M2x2(-ft)-
'I 1) « (1 i) W (a -f
(a)
3. Compute the determinants of the following matrices in M2x2(C).
'-1 + i 1-4A ... / 5-2i 6 + 4A , v /2i 3
3 + 2* 2-3i] (b) (-3+ i 7* (C) I 4 6i
(a)
4. For each of the following pairs of vectors u and v in R , compute the
area of the parallelogram determined by u and v.
(a) u = (3, -2) and v = (2, 5)
(b) u = (1,3) andu = (-3,1)
(c) u = (4,-l)andv = (-6,-2)
(d) u = (3,4) and v = (2, -6)
5. Prove that if B is the matrix obtained by interchanging the rows of a
2x2 matrix A, then det(B) = - det(A).
6. Prove that if the two columns of A G M2x2(F) are identical, then
dct(A) = 0.
7. Prove that det(A') = det(.A) for any A 6 M2X2(-F)-
8. Prove that if A e M2X2(^) is upper triangular, then det(^l) equals the
product of the diagonal entries of A.
9. Prove that det{AB) = det(A) • det(B) for any A, Be M2X2{F).
10. The classical adjoint of a 2 x 2 matrix A G M2X2(-F) is the matrix
C =
Aoo -A
12
-A:
21
A«
Prove that
(a) 04 = AC = [det(A)]7.
(b) det(C) = det(A).
(c) The classical adjoint of A1 is C*.
(d) If A is invertible, then A"1 = [det^^C.
11. Let 8: N\2x2{F) —» F be a function with the following three properties.
(i) 8 is a linear function of each row of the matrix when the other row
is held fixed.
(ii) If the two rows of A E M2x2(F) are identical, then 8(A) = 0.

Sec. 4.2 Determinants of Order n
(iii) li I is the 2x2 identity matrix, then 8(1) = 1.
Prove that 8(A) = det(A) for all A G M2X2(F). (This result is general
ized in Section 4.5.)
12. Let {u.r} be an ordered basis for R2. Prove that
•0-
if and only if {u, v} forms a right-handed coordinate system. Hint:
Recall the definition of a rotation given in Example 2 of Section 2.1.
4.2 DETERMINANTS OF ORDER 71
In this section, we extend the definition of the determinant to n x n matrices
for n > 3. For this definition, it is convenient to introduce the following
notation: Given A G MnXn(F), for n > 2, denote the (n - 1) x (n - 1) matrix
obtained from A by deleting row i and column j by Aij. Thus for
/. 2 3
A=\4 5 6 eM3x3(R),
\7 8 9/
we have
Au«
id for
5 6
8 9
B =
Aia =
I 5
7 8
and A-12 =
1 3
4 6
/ 1
-3
2
V-2
-1 2
4 1
-5 -3
6 -4
1
-1
8
1
G M4x4(tf),
we have
Bo* =
i -l -r
2-5 8 | and £4;
-2 6 1
Definitions. Let A G MnXn(F). 7f n = 1. so that A = (An), we define
det(A) — An- For n > 2, we define det(A) recursively as
n
det(A) = j^(-l)1+iAy • det(iy).

210 Chap. 4 Determinants
The scalar det (A) is called the determinant of A and is also denoted by\A\.
The scalar
(-l)i+j det(A{j)
is called the co factor of the entry of A in row i, column j.
Letting
Cij = (-l)^det(AV)
denote the cofactor of the row i, column j entry of A, we can express the
formula for the determinant of A as
det(A) = Audi + Auci2 + ••• + Ai„ci„.
Thus the determinant of A equals the sum of the products of each entry in row
1 of A multiplied by its cofactor. This formula is called cofactor expansion
along the first row of A. Note that, for 2 x 2 matrices, this definition of
the determinant of A agrees with the one given in Section 4.1 because
det(A) = Au(-l)l+1 det(An) + A12(-l)1+2det(A12) = AnA22 - A12A21.
Example 1
Let
1 3 -3
-3-5 2 € M3x3(i?).
-4 4 -6/
Using cofactor expansion along the first row of A, we obtain
det(A) = (-l)1+1An- det(An) + (-l)1+2Ai2- det(A12)
+ (-l)1+3A13-det(A13)
,2'- .<^AN , . I — 3 2
= (-l)2(l).det
4 -2)+<-ift3)-*tL2
-3 -5
+ (-l)4(-3)-detv_4 4
= 1 [-5(-6) - 2(4)] - 3 [-3(-6) - 2(-4)] - 3 [-3(4) - (-5)(-4)]
= 1(22) - 3(26) - 3(-32)
= 40. •

Sec. 4.2 Determinants of Order // 211
Example 2
Let
/ 0 1 3
B= -2 -3 -5 GM3x3(i?).
\ 4 -4 4/
Using cofactor expansion along the first row of B, we obtain
det(B) = (-l)1 + lBu • det(Bn) + (-l)1+2Bi2- det(BV2)
+ (-iy+3BV3.det(B13)
= (-l)2(0)-det(^ -^+(-l)3(l).det(-^ ~
+ (-l)4(3)-det
-2 -3
4 -4,
= 0-1 [-2(4) - (-5)(4)] + 3 [-2(-4) - (-3)(4)]
= 0- 1(12)+ 3(20)
= 48. •
Example 3
Let
C =
( 2 0 0 1
0 1 3-3
-2 -3 -5 2
\ 4 -4 4 -ey
GM4x4(fi).
Using cofactor expansion along the first row of C and the results of Examples 1
and 2, we obtain
det(C) = (-1)2(2). det(Cn) + (-1)3(0)- det(C12)
+ (-1)4(0)« det(C13) + (-1)5(1). det(C14)
/ 1 3 -3
= (-l)2(2)-det -3 -5 2+0 + 0
\-4 4 -6/
0 1
•f (-l)5(l).det I -2 -3 -5
4 -4
= 2(40)+ 0 + 0- 1(48)
= 32. •

212 Chap. 4 Determinants
Example 4
The determinant of the n x n identity matrix is 1. We prove this assertion by
mathematical induction on n. The result is clearly true for the lxl identity
matrix. Assume that the determinant of the (n — 1) x (n — 1) identity matrix
is 1 for some n > 2, and let I denote the nxn identity matrix. Using cofactor
expansion along the first row of J, we obtain
det(I) = (-1)2(1) • det(Jn) + (-1)3(0) • det(/12) + • • •
+ (-l)l+n(0).det(/ln)
= l(l) + 0 + --- + 0
= 1
because In is the (n - 1) x (n - 1) identity matrix. This shows that the
determinant of the nxn identity matrix is 1, and so the determinant of any
identity matrix is 1 by the principle of mathematical induction. •
As is illustrated in Example 3, the calculation of a determinant using
the recursive definition is extremely tedious, even for matrices as small as
4x4. Later in this section, we present a more efficient method for evaluating
determinants, but we must first learn more about them.
Recall from Theorem 4.1 (p. 200) that, although the determinant of a 2 x 2
matrix is not a linear transformation, it is a linear function of each row when
the other row is held fixed. We now show that a similar property is true for
determinants of any size.
Theorem 4.3. The determinant of an n x n matrix is a linear function
of each row when the remaining rows are held fixed. That is, for 1 < r < n,
we have
det
"I
or_i
u + kv
ar+i
= det
( «i
ar_ i
u
O-r+l
+ fcdet
( a,
ar_!
v
ar+i
whenever k is a scalar and u, v, and each a^ are row vectors in Fn.
Proof. The proof is by mathematical induction on n. The result is imme
diate if n = 1. Assume that for some integer n > 2 the determinant of any
(n — l)x(n— 1) matrix is a linear function of each row when the remaining

Sec. 4.2 Determinants of Order n 213
rows are held fixed. Let A be an n x n matrix with rows a\, a2,..., an, respec
tively, and suppose that for some r (1 < r < n), we have ar = u + kv for some
u, v G Fn and some scalar k. Let u = (&i, b2,..., bn) and v = (c\, c2, • • •, cn),
and let £ and C be the matrices obtained from A by replacing row r of A by
u and v, respectively. We must prove that det(A) = det(B) + fcdet(C). We
leave the proof of this fact to the reader for the case r = 1. For r > 1 and
1 ^ j < n-, the rows of Aij, JBIJ, and Cy are the same except for row r — 1.
Moreover, row r — 1 of Au is
(bi+kci,..., bj^i + kCj-i,bj+i + fccj+i,..., 6„ + fccn),
which is the sum of row r — 1 of B\j and fc times row r — 1 of Cij. Since Z?ij
and Cij are (n — 1) x (n — 1) matrices, we have
det(Aij) = det(Z?ij) + kdet(Cij)
by the induction hypothesis. Thus since A\j = Z?ij = C\j, we have
n
det(A) = ^2(-l)1+j Au • det(iij)
= y>i)'+'A-
det(fiy) +Ardet(Cij
- £(-l)1+iAii • dct(Bl3) + *2(-l)1+iAy * det(Cy)
= det(£) + fcdet(C).
This shows that the theorem is true for nxn matrices, and so the theorem
is true for all square matrices by mathematical induction.
Corollary. If A & Mnxn(F) has a row consisting entirely of zeros, then
det(A) = 0.
Proof. See Exercise 24.
The definition of a determinant requires that the determinant of a matrix
be evaluated by cofactor expansion along the first row. Our next theorem
shows that the determinant of a square matrix can be evaluated by cofactor
expansion along any row. Its proof requires the following technical result.
Lemma. Let B G Mnxn(F), where n > 2. If row i of B equals e^ for
some k (1 < k < n), then det(B) = (-l)i+k det(Bik).

214 Chap. 4 Determinants
Proof. The proof is by mathematical induction on n. The lemma is easily
proved for n = 2. Assume that for some integer n > 3, the lemma is true for
(n — 1) x (n — 1) matrices, and let B be an n x n matrix in which row i of B
equals e* for some /c (1 < k < n). The result follows immediately from the
definition of the determinant if i = 1. Suppose therefore that 1 < i < n. For
each j ^ A: (1 < j < n), let Cij denote the (n — 2) x (n — 2) matrix obtained
from B by deleting rows 1 and i and columns j and k. For each j, row i — 1
of Z?ij is the following vector in Fn_1:
efc_i ifj<fc
0 if j = A;
efc if j > k.
Hence by the induction hypothesis and the corollary to Theorem 4.3, we have
(-l)«-i)+(*-i)det(Cy if j <k
= I 0 if j = k
(-l)(*-i)+*det(Cy) )ij>k.
fV<
Therefore
n
det(B) = ^(-l)1+^lj-det(Bli;
j=i
= ^(-l)1+^ir det(By) + £(-l)1+^ • det(Bn;
j<fc j>fc
= ^(-l^ZV [(_i)(*-D+(*-i) det(C^)
3<k
+ £(-l)i+>By. [(-!)(*-!)+*det(^)
j>fc
= (-1
.i+fc ^(-l)1+^ird.et(Ci
j<fc
+ ^(_1)i+0-i)5lj.det(C.
j>fc
Because the expression inside the preceding bracket is the cofactor expan
sion of Bik along the first row, it follows that
(let(£) = (-l)i+fcdet(Gifc)-
This shows that the lemma is true for n x n matrices, and so the lemma is
true for all square matrices by mathematical induction.

Sec. 4.2 Determinants of Order n 215
We are now able to prove that cofactor expansion along any row can be
used to evaluate the determinant of a square matrix.
Theorem 4.4. The determinant of a square matrix can be evaluated by
cofactor expansion along any row. That is, if A G Mnxn(F), then for any
integer i (I 1. Row i of A can
be written as X)?=i ^ijej- For 1 < j < n, let Bj denote the matrix obtained
from A by replacing row i of A by ej. Then by Theorem 4.3 and the lemma,
we have
n n
det(A) =^Aij det(Bj) = ^(-1)*+JA;.,- • det(A^). I
3=1 3=1
Corollary. If A G MnXn(F) has two identical rows, then det(A) = 0.
Proof. The proof is by mathematical induction on n. We leave the proof
of the result to the reader in the case that n = 2. Assume that for some
integer n > 3, it is true for (n — 1) x (n — 1) matrices, and let rows r and
s of A G Mnxn(F) be identical for r ^ s. Because n > 3, we can choose an
integer i (1 < i < n) other than r and s. Now
det(A) = ^(-l)i+JAirdet(Ai
3=1
by Theorem 4.4. Since each Ay is an (n — 1) x (n — 1) matrix with two
identical rows, the induction hypothesis implies that each det(A^) = 0, and
hence det(A) = 0. This completes the proof for n x n matrices, and so the
lemma is true for all square matrices by mathematical induction.
It is possible to evaluate determinants more efficiently by combining co-
factor expansion with the use of elementary row operations. Before such a
process can be developed, we need to learn what happens to the determinant
of a matrix if we perform an elementary row operation on that matrix. The
orem 4.3 provides this information for elementary row operations of type 2
(those in which a row is multiplied by a nonzero scalar). Next we turn our
attention to elementary row operations of type 1 (those in which two rows
are interchanged).

216 Chap. 4 Determinants
Theorem 4.5. If A G Mnxn(F) and B is a matrix obtained from A by
interchanging any two rows of A, then det(-B) = — det(A).
Proof. Let the rows of A G MnXn(F) be a\, a2,..., an, and let B be the
matrix obtained from A by interchanging rows r and s, where r < s. Thus
/ai
A =
a,
and B =
(ttl
as
ar
\anJ \anJ
Consider the matrix obtained from A by replacing rows r and s by ar + at
By the corollary to Theorem 4.4 and Theorem 4.3, we have
0 = det
( ai
ar + as
ar + a.
= det
( *i
ar + a4
/ a,
+ det
ar + a.
V an / \ On / \ an /
fai\ /ai\ /ai\ /aA
= det
ar
ar-
det
"/•
det
\an/ Van/
= 0 + det(A) + det(B) + 0.
Therefore det(B) = -det(A).
a.
ar
\an/
det
\an/
We now complete our investigation of how an elementary row operation
affects the determinant of a matrix by showing that elementary row operations
of type 3 do not change the determinant of a matrix.
Theorem 4.6. Let A G MnXn(F), and let B be a matrix obtained by
adding a multiple of one row of A to another row of A. Then det(B) = det(A).
L T-ZZ

Sec. 4.2 Determinants of Order 217
Proof. Suppose that B is the nxn matrix obtained from A by adding k
times row r to row s, where r ^ s. Let the rows of A be a\, a2,..., an, and
the rows of B be b\,b2,..., bn. Then bi = ai for i ^ s and bs = as + kar.
Let C be the matrix obtained from A by replacing row s with ar. Applying
Theorem 4.3 to row s of B, we obtain
det(B) = det(A) + Ardet(C) = det(A)
because det(C) = 0 by the corollary to Theorem 4.4.
In Theorem 4.2 (p. 201), we proved that a 2 x 2 matrix is invertible if
and only if its determinant is nonzero. As a consequence of Theorem 4.6, we
can prove half of the promised generalization of this result in the following
corollary. The converse is proved in the corollary to Theorem 4.7.
Corollary. If A £ Mnxn(F) has rank less than n, then det(A) = 0.
Proof. If the rank of A is less than n, then the rows ai, a2,..., an of A are
linearly dependent. By Exercise 14 of Section 1.5, some row of A, say, row r,
is a linear combination of the other rows. So there exist scalars Cj such that
ar = c\a\ H h Cr-icir-i + cr+iar+i -I h cnan.
Let B be the matrix obtained from A by adding — c? times row i to row r for
each i ^ r. Then row r of B consists entirely of zeros, and so det(B) = 0.
But by Theorem 4.6, det(B) = det(A). Hence det(A) =0. I
The following rules summarize the effect of an elementary row operation
on the determinant of a matrix A G MnXn(F).
(a) If B is a matrix obtained by interchanging any two rows of A, then
det(B) = -det(A).
(b) If B is a matrix obtained by multiplying a row of A by a nonzero scalar
k, then det(B) = fcdet(A).
(c) If B is a matrix obtained by adding a multiple of one row of A to another
row of A, then det(B) = det (A).
These facts can be used to simplify the evaluation of a determinant. Con
sider, for instance, the matrix in Example 1:
-3
4 -6/
Adding 3 times row 1 of A to row 2 and 4 times row 1 to row 3, we obtain
M =

218 Chap. 4 Determinants
Since M was obtained by performing two type 3 elementary row operations
on A, we have det(A) = det(M). The cofactor expansion of M along the first
row gives
det(M) = (-1)1+1(1). det(Mn) + (-l)1+2(3)* det(M12)
+ (-l)1+3(-3)*det(M13).
Both Mi2 and M\3 have a column consisting entirely of zeros, and so
det(Mi2) = det(Mi3) = 0 by the corollary to Theorem 4.6. Hence
det(M) = (-1)1+1(1). det(Mn)
= (-1)^(1). det (/6 _~l
= l[4(-18)-(-7)(16)]=40.
Thus with the use of two elementary row operations of type 3, we have reduced
the computation of det (A) to the evaluation of one determinant of a 2 x 2
matrix.
But we can do even better. If we add —4 times row 2 of M to row 3
(another elementary row operation of type 3), we obtain
P =
Evaluating det(P) by cofactor expansion along the first row, we have
det(P) = (-l)1+1(l).det(Pn)
= (-l)1+1(l)-det^ ~£) =1*4.10 = 40,
as described earlier. Since det(A) = det(M) = det(P), it follows that
det(A) = 40.
The preceding calculation of det(P) illustrates an important general fact.
The determinant of an upper triangular matrix is the product of its diagonal
entries. (See Exercise 23.) By using elementary row operations of types 1
and 3 only, we can transform any square matrix into an upper triangular
matrix, and so we can easily evaluate the determinant of any square matrix.
The next two examples illustrate this technique.
Example 5
To evaluate the determinant of the matrix

Sec. 4.2 Determinants of Order n 219
in Example 2, we must begin with a row interchange. Interchanging rows 1
and 2 of B produces
C =
By means of a sequence of elementary row operations of type 3, we can
transform C into an upper triangular matrix:
-2
0
4
-3
1
-4
-5
3
4
Thus det(C) = -2-1-24 = -48. Since C was obtained from B by an inter
change of rows, it follows that
det(B) = - det(C) = 48. •
Example 6
The technique in Example 5 can be used to evaluate the determinant of the
matrix
C =
2
0
-2
4
0
1
-3
-4
0
3
-5
4
A
-3
2
-v
in Example 3. This matrix can be transformed into an upper triangular
matrix by means of the following sequence of elementary row operations of
type 3:
/ 2 0
0 1
-2 -3
\ 4 -4
T hus det(C)
0
3
-5
4
= 2-
A
-3
2
-6)
1.4.
(2
0
0
/2
0
0
\°
4 = 32
0
1
-3
-4
0
1
0
0
•
0
3
-5
4
0
3
4
0
A
-3
3 *•
f2
0
0
-8/ \0
A
-3
-6
V
0 0
1 3
0 4
0 16
A
-3
-6
-20^
Using elementary row operations to evaluate the determinant of a matrix,
as illustrated in Example 6, is far more efficient than using cofactor expansion.
Consider first the evaluation of a 2 x 2 matrix. Since
det I , 1 = ad — be,

220 Chap. 4 Determinants
the evaluation of the determinant of a 2 x 2 matrix requires 2 multiplications
(and 1 subtraction). For n > 3, evaluating the determinant of an n x n matrix
by cofactor expansion along any row expresses the determinant as a sum of n
products involving determinants of (n— 1) x (n— 1) matrices. Thus in all. the
evaluation of the determinant of an n x n matrix by cofactor expansion along
any row requires over n! multiplications, whereas evaluating the determinant
of an n x n matrix by elementary row operations as in Examples 5 and 6
can be shown to require only (n3 + 2n — 3)/3 multiplications. To evaluate
the determinant of a 20 x 20 matrix, which is not large by present standards,
cofactor expansion along a row requires over 20! ~ 2.4 x 1018 multiplica
tions. Thus it would take a computer performing one billion multiplications
per second over 77 years to evaluate the determinant of a 20 x 20 matrix by
this method. By contrast, the method using elementary row operations re
quires only 2679 multiplications for this calculation and would take the same
computer less than three-millionths of a second! It is easy to see why most
computer programs for evaluating the determinant of an arbitrary matrix do
not use cofactor expansion.
In this section, we have defined the determinant of a square matrix in
terms of cofactor expansion along the first row. We then showed that the
determinant of a square matrix can be evaluated using cofactor expansion
along any row. In addition, we showed that the determinant possesses a
number of special properties, including properties that enable us to calculate
det(B) from det (A) whenever B is a matrix obtained from A by means of an
elementary row operation. These properties enable us to evaluate determi
nants much more efficiently. In the next section, we continue this approach
to discover additional properties of determinants.
EXERCISES
1. Label the following statements as true or false.
(a) The function det: MnXn(F) —> F is a linear transformation.
(b) The determinant of a square matrix can be evaluated by cofactor
expansion along any row.
(c) If two rows of a square matrix A are identical, then det(A) = 0.
(d) If B is a matrix obtained from a square matrix A by interchanging
any two rows, then det(B) = —det(A).
(e) If B is a matrix obtained from a square matrix A by multiplying
a row of A by a scalar, then det(B) = det(A).
(f) If B is a matrix obtained from a square matrix A by adding k
times row i to row j, then det(B) = kdct(A).
(g) If A G Mnx„(F) has rank n, then det(A) = 0.
(h) The determinant of an upper triangular matrix equals the product
of its diagonal entries.

Sec. 4.2 Determinants of Order n 221
2. Find the value of k that satisfies the following equation:
/3a i 3a2 3a3\ fa\ a2 a3
det I 36i 3o2 363 = kdct I &i b2 b3 .
\3ci 3c2 3c3/ \d e2 c3)
3. Find the value of k that satisfies the following equation:
/ 2a i 2a2 2a3 \ /oj a2 a3
det 36i + 5d 362 + 5c2 363 + 5c3 = fcdct [ 6X b2 b3 .
\ 7cj 7c2 7c3 / \c\ c2 c3/
4. Find the value of k that satisfies the following equation:
/bi+ci b2 + c2 b3 + c3\ /oj a2 a3'
det j ai + ci a2 + c2 a3 + c3 j = /edet I 6i 62 63
\a, +61 a2+62 a3 +63/ \d c2 c3/
In Exercises 5-12, evaluate the determinant of the given matrix by cofactor
expansion along the indicated row.
5.
7.
!).
.; : J
V 2 3 0/
along the first row
0 I 2N
-1 0 -3
2 3 0,
along the second row
0 1+i 2
-2i 0 1 r«
3 4i 0 y
along the third row
(i.
8.
10.
1 0 2
0 1 5
-1 3 0/
along the first row
1 0 2
0 1 5
^-1 3 0/
along the third row
(% 2+i 0
-1 3 2i
\ 0 -1 1 -;/
along the second row
/ 0 2 1 3
1 0-2 2
3-101
v-l 1 2 Oy
along the fourth row
12.
/ 1 -1 2 -A
-3 4 1-1
2-5-3 8
^-2 6 -4 I,
along the fourth row
11.
In Exercises 13-22, evaluate the determinant of the given matrix by any le
gitimate method.

222 Chap. 4 Determinants
21.
23. Prove that the determinant of an upper triangular matrix is the product
of its diagonal entries.
24. Prove the corollary to Theorem 4.3.
25. Prove that det(ArA) = kn det(A) for any A G Mnxn(F).
26. Let A G Mnxn(F). Under what conditions is det(-A) = det(A)?
27. Prove that if A G MnXn(F) has two identical columns, then det(A) = 0.
28. Compute det(Fj) if F,; is an elementary matrix of type i.
29.' Prove that if E is an elementary matrix, then det(F') = det(F).
30. Let the rows of A G Mnxn(F) be a\, a2,..., an, and let B be the matrix
in which the rows are an,an_i,... ,aj. Calculate det(Z?) in terms of
det(A).
4.3 PROPERTIES OF DETERMINANTS
In Theorem 3.1, we saw that performing an elementary row operation on
a matrix can be accomplished by multiplying the matrix by an elementary
matrix. This result is very useful in studying the effects on the determinant of
applying a sequence of elementary row operations. Because the determinant

•
Sec. 4.3 Properties of Determinants 223
of the nxn identity matrix is 1 (see Example 4 in Section 4.2), we can interpret
the statements on page 217 as the following facts about the determinants of
elementary matrices.
(a) If E is an elementary matrix obtained by interchanging any two rows
of/, thendet(F) = -1.
(b) If E is an elementary matrix obtained by multiplying some row of / by
the nonzero scalar k, then det(F) = k.
(c) If E is an elementary matrix obtained by adding a multiple of some row
of J to another row, then det(F) = 1.
We now apply these facts about determinants of elementary matrices to
prove that the determinant is a multiplicative function.
Theorem 4.7. For any A, B G ,,(F), det(AP) = det(A)- det(P).
Proof. We begin by establishing the result when A is an elementary matrix.
If A is an elementary matrix obtained by interchanging two rows of /, then
det(A) = -1. But by Theorem 3.1 (p. 149), AB is a matrix obtained by
interchanging two rows of B. Hence by Theorem 4.5 (p. 216), det(AP) =
— dct(B) = det(A)- det(B). Similar arguments establish the result when A
is an elementary matrix of type 2 or type 3. (See Exercise 18.)
If A is an nxn matrix with rank less than n, then det(A) = 0 by the
corollary to Theorem 4.6 (p. 216). Since rank(AP>) < rank(A) < n by Theo
rem 3.7 (p. 159), we have det(AB) = 0. Thus det(AB) = det(A)- det(P) in
this case.
On the other hand, if A has rank n, then A is invertible and hence is
the product of elementary matrices (Corollary 3 to Theorem 3.6 p. 159), say,
A = Em • • • E2Ei. The first paragraph of this proof shows that
det(AB) = det(Fr
= det(Fr
•E2ElB)
det(Fm_!---F2Fi£)
= det(Fm) det(F2) • det(Fi) • det(P)
= det(Fm---F2Fi)-det(P)
= det(A)-det(P). I
Corollary. A matrix A G Mnxn(F) is invertible if and only if det(A) 7^ 0.
Furthermore, if A is invertible, then det(A_1) = -—7—-.
v ; det(A)
Proof. If A G MnXn(F) is not invertible, then the rank of A is less than n.
So det(A) = 0 by the corollary to Theorem 4.6 (p, 217). On the other hand,
if A G MnXTl(F) is invertible, then
det(A). det(A_1) = det(AA_1) = det(/) = 1

224 Chap. 4 Determinants
by Theorem 4.7. Hence det(A) ^ 0 and det(A~1) =
det(A)
In our discussion of determinants until now, we have used only the rows
of a matrix. For example, the recursive definition of a determinant involved
cofactor expansion along a row, and the more efficient method developed in
Section 4.2 used elementary row operations. Our next result shows that the
determinants of A and A1 are always equal. Since the rows of A are the
columns of A6, this fact enables us to translate any statement about determi
nants that involves the rows of a matrix into a corresponding statement that
involves its columns.
Theorem 4.8. For any A G Mnxn(F), det(A') = det(A).
Proof. If A is not invertible, then rank(A) < n. But rank(A') = rank(A)
by Corollary 2 to Theorem 3.G (p. 158), and so A1 is not invertible. Thus
det(A') = 0 = det(A) in this case.
On the other hand, if A is invertible, then A is a product of elementary
matrices, say A = Em- • • E2E\. Since det(F,) = det(F|) for every i by
Exercise 29 of Section 4.2, by Theorem 4.7 we have
det(A') = det(E\El2 ...fit )
= det(Fi).det(F2)
= det(F0-
= det(Fm).
= det(Fm • •
= det(A).
det(F2)----
• • • • det(F2)
•F2F0
det(F^)
• <let(Fm)
•det(Fi)
Thus, in either case, det(A') = det(A). I
Among the many consequences of Theorem 4.8 are that determinants can
be evaluated by cofactor expansion along a column, and that elementary col
umn operations can be used as well as elementary row operations in evaluating
a determinant. (The effect on the determinant of performing an elementary
column operation is the same as the effect of performing the corresponding
elementary row operation.) We conclude our discussion of determinant prop
erties with a well-known result that relates determinants to the solutions of
certain types of systems of linear equations.
Theorem 4.9 (Cramer's Rule). Let Ax = b be the matrix form of
a system of n linear equations in n unknowns, where x = (x\,x2,... ,xny.
If det(A) ^ 0, then tin's system has a unique solution, and for each k (k =
l,2,...,n),
_ det(Mfc)
Xk " det(A) '

Sec. 4.3 Properties of Determinants 225
where Mk is the nxn matrix obtained from A by replacing column k of A
byb.
Proof. If det (A) ^ 0, then the system Ax = b has a unique solution by
the corollary to Theorem 4.7 and Theorem 3.10 (p. 174). For each integer k
(1 < k < n), let ak denote the kth column of A and Xk denote the matrix
obtained from the nxn identity matrix by replacing column k by x. Then
by Theorem 2.13 (p. 90), AXk is the nxn matrix whose zth column is
Aei = ai if i ^ k and Ax = b if i = k.
Thus AXk = Mk- Evaluating Xk by cofactor expansion along row k produces
det(Xfc) = xk-det(I„-i) = xk-
Hence by Theorem 4.7,
det(Mfc) = det(AXk) = det(A)- det(Xfc) = det(A)-xk-
Therefore
xfc = [det(A)]-1-det(Mfc). I
Example 1
We illustrate Theorem 4.9 by using Cramer's rule to solve the following system
of linear equations:
xi + 2x2 + 3x3 = 2
x\ + x3 = 3
x\ + x2 — x3 = 1.
The matrix form of this system of linear equations is Ax = b, where
'1
ll
Because det (A) = 6^0, Cramer's rule applies. Using the notation of Theo
rem 4.9, we have
Xi =
det (Mi)
det (A)
'2 2 3>
13 0 1
1 1 -1
det(A)
15
6
x2 =
det(M2)
det(A)
I 1
2 3s
3 1
1 -1 -6
det(A)
= -1,

226
and
Chap. 4 Determinants
det
^3 =
det(M3)
det(A) det(A) 6 2
Thus the unique solution to the given system of linear equations is
(xx,x2,x3) = ( 2'-1'2
In applications involving systems of linear equations, we sometimes need
to know that there is a solution in which the unknowns are integers. In this
situation, Cramer's rule can be useful because it implies that a system of linear
equations with integral coefficients has an integral solution if the determinant
of its coefficient matrix is ±1. On the other hand, Cramer's rule is not useful
for computation because it requires evaluating n + 1 determinants ofnxn
matrices to solve a system of n linear equations in n unknowns. The amount
of computation to do this is far greater than that required to solve the system
by the method of Gaussian elimination, which was discussed in Section 3.4.
Thus Cramer's rule is primarily of theoretical and aesthetic interest, rather
than of computational value.
As in Section 4.1, it is possible to interpret the determinant of a matrix
A G Mnxn(P) geometrically. If the rows of A are ai,a2,... ,an, respectively,
then |det(A)| is the n-dimensional volume (the generalization of area in
R2 and volume in R3) of the parallelepiped having the vectors ai,a2,... ,a„
as adjacent sides. (For a proof of a more generalized result, see Jerrold
E. Marsden and Michael J. Hoffman, Elementary Classical Analysis, W.H.
Freeman and Company, New York, 1993, p. 524.)
Example 2
The volume of the parallelepiped having the vectors O] = (1,-2,1), a2 =
(1,0, —1), and a3 = (1,1,1) as adjacent sides is
= 6.
Note that the object in question is a rectangular parallelepiped (see Fig
ure 4.6) with sides of lengths \/6. \/2, and y/3. Hence by the familiar formula
for volume, its volume should be \/6*\/2*\/3 = 6, as the determinant calcu
lation shows. •
In our earlier discussion of the geometric significance of the determinant
formed from the vectors in an ordered basis for R2, we also saw that this

Sec. 4.3 Properties of Determinants 227
(1,-2,1)
(1,0,-1)
Figure 4.6: Parallelepiped determined by three vectors in R3.
determinant is positive if and only if the basis induces a right-handed coor
dinate system. A similar statement is true in R". Specifically, if 7 is any
ordered basis for Rn and (3 is the standard ordered basis for Rn, then 7 in
duces a right-handed coordinate system if and only if det(Q) > 0, where Q is
the change of coordinate matrix changing 7-coordinates into /^-coordinates.
Thus, for instance,
7 =
induces a left-handed coordinate system in R3 because
whereas
7 =
induces a right-handed coordinate system in R3 because
r "2 °
det 2 1 0 = 5 > 0.
\0 0 1/

228 Chap. 4 Determinants
More generally, if (3 and 7 are two ordered bases for Rn, then the coordinate
systems induced by (3 and 7 have the same orientation (either both are
right-handed or both are left-handed) if and only if det(Q) > 0, where Q is
the change of coordinate matrix changing 7-coordinates into /^-coordinates.
EXERCISES
1. Label the following statements as true or false.
(a) If E is an elementary matrix, then det(F) = ±1.
(b) For any A,B G MnXn(F), det(AB) = det(A) • det(B).
(c) A matrix M G Mnxn(F) is invertible if and only if det(M) = 0.
(d) A matrix M G MnXn(F) has rank n if and only if det(M) ^ 0.
(e) For any A G Mnxn(F), det (A*) = - det(A).
(f) The determinant of a square matrix can be evaluated by cofactor
expansion along any column.
(g) Every system of n linear equations in n unknowns can be solved
by Cramer's rule.
(h) Let Ax = b be the matrix form of a system of n linear equations
in n unknowns, where x = (x\, x2,..., xn)f. If det(A) 7^ 0 and if
Mk is the nxn matrix obtained from A by replacing row k of A
by bl, then the unique solution of Ax = b is
_ det(Mfc) iQvk_l2
Xk~ det(A) tar*-*M.-.n.
In Exercises 2-7, use Cramer's rule to solve the given system of linear equa
tions.
a-nxi + d\2X2 = b
2. a2\X\ + a22x2 = b2
where ana22 - ai2a2i ^ 0
2xi + x2 — 3x3 = 1
4. x\ - 2x2 + x3= 0
3xi + 4x2 — 2x3 = —5
x\ — x2 + Ax3 = —2
6. -8x1 + 3x2 + x3 = 0
2xi - x2 + x3 = 6
2xi + x2 - 3x3 = 5
3. xi - 2x2 + x3 = 10
3x! + 4x2 - 2x3 = 0
xi - x2 + 4x3 = -4
5. -8x1 + 3x2 + x3 = 8
2xi — x2 + x3 = 0
3xi + x2 + x3 = 4
7. -2xi - x2 =12
xi + 2x2 + x3 = -8
8. Use Theorem 4.8 to prove a result analogous to Theorem 4.3 (p. 212),
but for columns.
9. Prove that an upper triangular nxn matrix is invertible if and only if
all its diagonal entries are nonzero.

Sec. 4.3 Properties of Determinants 229
10. A matrix M G Mnxn(C) is called nilpotent if, for some positive integer
k, Mk = O, where O is the nxn zero matrix. Prove that if M is
nilpotent, then det(M) = 0.
11. A matrix M G Mnxn(C) is called skew-symmetric if Ml = — M.
Prove that if M is skew-symmetric and n is odd, then M is not invert
ible. What happens if n is even?
12. A matrix Q G Mnxn(i?) is called orthogonal if QQl — I. Prove that
if Q is orthogonal, then det(Q) = ±1.
13. For M G Mnxn(C), let M be the matrix such that (M)ij = ~M~~j for all
i,j, where Mij is the complex conjugate of Mij.
(a) Prove that det(Af) = det(M).
(b) A matrix Q G MnXn(C) is called unitary if QQ* = /, where
Q* = Ql. Prove that if Q is a unitary matrix, then | det(Q)| = 1-
14. Let /3 = {ui, u2, • • •, un} be a subset of Fn containing n distinct vectors,
and let B be the matrix in MnXn(F) having Uj as column j. Prove that
0 is a basis for Fn if and only if det(B) ^ 0.
15.* Prove that if A, B G Mnxn(F) are similar, then det(A) = det(B).
16. Use determinants to prove that if A, B G MnXn(F) are such that AB =-.
/, then A is invertible (and hence B = A-1).
17. Let A,Be Mnxn(F) be such that AB = -BA. Prove that if n is odd
and F is not a field of characteristic two, then A or B is not invertible.
18. Complete the proof of Theorem 4.7 by showing that if A is an elementary
matrix of type 2 or type 3, then det(AB) = det(A) • det(B).
19. A matrix A G Mnxn(F) is called lower triangular if Atj — 0 for
1 < » < J < Ti. Suppose that A is a lower triangular matrix. Describe
det (A) in terms of the entries of A.
20. Suppose that M G MnXn(F) can be written in the form
M =
where A is a square matrix. Prove that det(M) = det(A).
21.+ Prove that if M G Mnxn(F) can be written in the form
'A B'
M =
O C
wheje A and C are square matrices, then det(M) = det(A)« det(C).

230 Chap. 4 Determinants
22. Let T: P„(F) —> Fn+1 be the linear transformation defined in Exer
cise 22 of Section 2.4 by T(/) = (/(co), /(d),..., f(cn)), where^
cn,ci,... ,cn are distinct scalars in an infinite field F. Let 0 be the
standard ordered basis for Pn(F) and 7 be the standard ordered basis
for Fn+1.
(a) Show that M = [T]l has the form
Co
\l C
Cf]
cnJ
A matrix with this form is called a Vandermonde matrix.
(b) Use Exercise 22 of Section 2.4 to prove that det(M) •/ 0.
(c) Prove that
det(M)= J] (<*-*).
0<i<j<n
the product of all terms of the form c3; — C{ for 0 < i < j < n.
23. Let A G M„xn(F) be nonzero. For any m (1 < m < n), an m x m
submatrix is obtained by deleting any n ~ m rows and any n — m
columns of A.
(a) Let k (1 < k < n) denote the largest integer such that some k x k
submatrix has a nonzero determinant. Prove that rank(A) = k.
(b) Conversely, suppose that rank (A) = k. Prove that there exists a
A: x A: submatrix with a nonzero determinant.
24. Let A G MnXn(F) have the form
A =
( 0 0 0
1 0 0
0 -1 0
ao
ax
"2
\ 0 0 0 ••• -1 On-l/
Compute det(A + tl), where J is the nxn identity matrix.
25. Let c.jk denote the cofactor of the row j, column k entry of the matrix
A G MnX7,(F).
(a) Prove that if B is t he matrix obtained from A by replacing column
/.• by ej, then det(B) = Cjk-

Sec. 4.3 Properties of Determinants 231
(b) Show that for 1 < j < n, we have
cj2
A = det(A)
\CjnJ
Hint: Apply Cramer's rule to Ax = ej.
(c) Deduce that if C is the nxn matrix such that Cij = c,ji, then
AC = [det(A)]7.
(d) Show that if det(A) / 0, then A"1 = [det(A)]~lC.
The following definition is used in Exercises 26 27.
Definition. The classical adjoint of a .square matrix A is the transpose
of the matrix whose ij-entry is the ij-cofactor of A.
26. Find the classical adjoint of each of the following matrices.
'4 0 0>
(b) ( 0 4 0
0 0 4,
27. Let C be the classical adjoint of A G MriXn(F). Prove the following
statements.
(a) det(C) = [detCA)]""1.
(b) Cl is the classical adjoint of A1.
(c) If A is an invertible upper triangular matrix, then C and A-1 are
both upper triangular matrices.
28. Let y\,y2,- - - ,yn be linearly independent functions in C°°. For each
y G C°°, define T(y) G C°° by
/ y(t) vi(t) y2(t) ••• yn(t)
y'(t) .m v'zit) ••• y'Jt)
[T(y)](i)=det

in) (n).
Wn)(t) y\n,(t) y}2'"(t) y(n )I

232 Chap. 4 Determinants
The preceding determinant is called the Wronskian of y, y\,..., yn.
(a) Prove that T: C°° —» C°° is a linear transformation.
(b) Prove that N(T) contains span({yi, y2, • - •,yn})-
4.4 SUMMARY—IMPORTANT FACTS ABOUT DETERMINANTS
In this section, we summarize the important properties of the determinant
needed for the remainder of the text. The results contained in this section
have been derived in Sections 4.2 and 4.3; consequently, the facts presented
here are stated without proofs.
The determinant of an n x n matrix A having entries from a field F is a
scalar in F, denoted by det(A) or |A|, and can be computed in the following
manner:
1. If A is 1 x 1, then det (A) = An, the single entry of A.
2. If A is 2 x 2, then det(A) = AnA22 - Ai2A2\. For example,
det ( 5 3) = ("1)(3) " (2)(5) = "13-
3. If A is n x n for n > 2, then
n
det(A) = J3(-1)*+'AW • det(Aij)
(if the determinant is evaluated by the entries of row i of A) or
n
det(A) = Y^{-tf+JAij' det(iij)
i=l
(if the determinant is evaluated by the entries of column j of A), where
Aij is the (n-l)x(n-l) matrix obtained by deleting row i and column
j from A.
In the formulas above, the scalar (—l)i+J' det(Aij) is called the cofactor
of the row i column j entry of A. In this language, the determinant of A is
evaluated as the sum of terms obtained by multiplying each entry of some
row or column of A by the cofactor of that entry. Thus det (A) is expressed
in terms of n determinants of (n — 1) x (n — 1) matrices. These determinants
are then evaluated in terms of determinants of (n — 2) x (n — 2) matrices, and
so forth, until 2x2 matrices are obtained. The determinants of the 2x2
matrices are then evaluated as in item 2.

Sec. 4.4 Summary—Important Facts about Determinants 233
Let us consider two examples of this technique in evaluating the determi
nant of the 4x4 matrix
A =
(2 1 1 5
11-4-1
2 0-3 1
\3 6 1 2)
To evaluate the determinant of A by expanding along the fourth row. we
must know the cofactors of each entry of that row. The cofactor of A.n = 3
is (-l)4+1det(£), where
Let us evaluate this determinant by expanding along the first column. We
have
det(B) = (-lY^(l)det (l43 "J) + (-l)*+i(l)dct ( J, * )
+ (-l)3+1(0)det( ] _j)
= 1(1)[(-4)(1) - (-l)(-3)] + (-1)(1)[(1)(1) - (5)(-3)] + 0
= -7 - 16 + 0 = -23.
Thus the cofactor of A4X is (— 1 )5(—23) = 23. Similarly, the cofactors of A42,
A43, and A44 are 8, 11, and —13. respectively. We can now evaluate the
determinant of A by multiplying each entry of the fourth row by its cofactor:
t his gives
det(A) = 3(23) + 6(8) + 1(11) + 2(-13) = 102.
For the sake of comparison, let US also compute the determinant of A
by expansion along the second column. The reader should verify that the
cofactors of A12, A22, and A42 are —14. 10. and 8, respectively. Thus
f\ -4 -l\ (2 1 5N
det(A) = (-l)1+2(l)det 2 -3 I +(-l)2+2(l)det 2 -3 1
\3 12/ \3 1 2,

2 1 .V
+ (-l)3+2(0)det 1 -4-1
\Z 1 2,
= 14 + 40 + 0 + 48 = 102.
'2
-l)4+'(6)det I 1 -4 -1
2 -3 1

234 Chap. 4 Determinants
Of course, the fact that t he value 102 is obtained again is no surprise since the
value of the determinant of A is independent of the choice of row or column
used in the expansion.
Observe that the computation of det(A) is easier when expanded along
the second column than when expanded along the fourth row. The difference
is the presence of a zero in the second column, which makes it unnecessary
to evaluate one of the cofactors (the cofactor of A32). For this reason, it is
beneficial to evaluate the determinant of a mat rix by expanding along a row or
column of the matrix that contains t he largest number of zero entries. In fact,
it is often helpful to introduce zeros into the matrix by means of elementary
row operations before computing the determinant. This technique utilizes
the first three properties of the determinant.
Properties of the Determinant
1. If B is a matrix obtained by interchanging any two rows or interchanging
any two columns of an nxn matrix A, then det(B) = — det(A).
2. If B is a matrix obtained by multiplying each entry of some row or
column of an n x n matrix A by a scalar k. then det(Z?) = k- det(A).
3. If B is a matrix obtained from an n x n matrix A by adding a multiple
of row i to row j or a multiple of column i to column j for i / j. then
det(B) = det(A).
As an example of the use of these three properties in evaluating deter
minants, let us compute the determinant of the 4x4 matrix A considered
previously. Our procedure is to introduce zeros into the second column of
A by employing property 3, and then to expand along that column. (The
elementary row operations used here consist of adding multiples of row 1 to
rows 2 and 4.) This procedure yields
det(A) = det
= 1(-1)
(2 1 1 5
11-4-1
2 0-3 1
V3 6 1 2/
/-I -5
l)1+2det 2 -3
V-9 - 5
det
( 2 X
-1 0
2 0
\-9 0
~6
1 •
-28
1
-5
-3
-5
5
-6
1
-28,
The resulting determinant of a 3 x 3 matrix can be evaluated in the same
manner: Use type 3 elementary row operations to introduce two zeros into
the first column, and then expand along that column. This results in the
value —102. Therefore
det(A) = l(-l)1+2(-102) = 102.

Sec. 4.4 Summary—Important Facts about Determinants 235
Fhe reader should compare this calculation of det (A) with the preceding
ones to see how much less work is required when properties 1, 2, and 3 are
employed.
In the chapters that follow, we often have to evaluate the determinant of
matrices having special forms. The next two properties of the determinant
are useful in this regard:
4. The determinant of an upper triangular matrix is the product of its
diagonal entries. In particular, det(7) = 1.
5. If two rows (or columns) of a matrix are identical, then the determinant
of the matrix is zero.
As an illustration of property 4. notice that
-3 1 2
0 4 5 =(-3)(4)(-6) = 72.
0 0 -6/
Property 1 provides an efficient method for evaluating the determinant of a
matrix:
(a) Use Gaussian elimination and properties 1, 2, and 3 above to reduce the
matrix to an upper triangular matrix.
(b) Compute the product of the diagonal entries.
For instance,
del
( I -1 2 l
2-1-1 4
-4 5-10 -6
\ 3 -2 10 -)
= det
/I -1 2
0 1 -5
0 0 3
\0 0 9
l-l-3*6= 18.
1
2
-4
-*)
= det
(
0
0
^()
(
0
0
V°
-l
I
I
I
-l
I
0
0
2
-5
-2
4
2
-5
3
0
1
2
-2
-4/
l
2
_4
6/
The next three properties of the determinant are used frequently in later
chapters. Indeed, perhaps the most significant property of the determinant
is that it provides a simple characterization of invertible matrices. (See prop
erty 7.)
6. For any n x v matrices A and B, det(AB) = det(A)- det(B).

236 Chap. 4 Determinants
7. An n x n matrix A is invertible if and only if det(A) ^ 0. Furthermore,
if A is invertible. then det(A-1) = -—r--.
det(A)
8. For any n x n matrix A, the determinants of A and A' are equal.
For example, property 7 guarantees that the matrix A on page 233 is
invertible because det (A) = 102.
The final property, stated as Exercise 15 of Section 4.3, is used in Chap
ter 5. It is a simple consequence of properties 6 and 7.
9. If A and B are similar matrices, then det(A) = det(B).
EXERCISES
1. Label the following statements as true or false.
(a) The determinant of a square matrix may be computed by expand
ing the matrix along any row or column.
(b) In evaluating the determinant of a matrix, it is wise to expand
along a row or column containing the largest number of zero en
tries.
(c) If two rows or columns of A are identical, then det(A) = 0.
(d) If B is a matrix obtained by interchanging two rows or two columns
of A, thendet(B) = det(A).
(e) If B is a matrix obtained by multiplying each entry of some row
or column of A by a scalar, then det(B) = det(A).
(f) If B is a matrix obtained from A by adding a multiple of some row
to a different row, then det(B) = det(A).
(g) The determinant of an upper triangular nxn matrix is the product
of its diagonal cut ries.
(h) For every A € Mnxn(F), det(A') = - det(A).
(i) If A, B G MnXfl(F), then det(AB) = det(A) * det(B).
(j) If Q is an invertible matrix, then dct(Q~l) = [det(Q)]-1.
(k) A matrix Q is invertible if and only if det(Q) ^ 0.
2. Evaluate the determinant of the following 2x2 matrices.
(a)
(c)
4 -5
2 3
2 + z -l + 3i
1 - 2i 3 - i
(b)
(d)
-1 7
3 8
3 ii
0/ li
3. Evaluate the determinant of the following matrices in the manner indi
cated.

Sec. 4.4 Summary—Important Facts about Determinants 237
(a)
(c)
(e)
.: | J
\ 2 3 0/
along the first row
0 1 2s
-1 0 -3
2 3 0,
along the second column
/ 0 1 + i 2
-2/ 0 1 - i]
\ 3 Ai 0 /
along the third row
(b)
(d)
(f)
/ ()
(g)
2 1
1 0 -2
3 -1 0
V-l 1 2 0/
along the fourth column
:S
2
I (h)
along the first column
1 0 2N
0 1 o
-1 3 0,
along the third row
i 2+i 0
-1 3 2/
0 -1 l-i/
along the third column
/ 1 -1 2 -1
-3 4 1-1
2 -5 -3 8
\-2 6 -4 )
along the fourth row
4. Evaluate the determinant of the following matrices by any legitimate
method.
(h)
( 1 -2 3 -12
-5 12 -14 19
-9 22 -20 31
\-A 9 -14 15/
5. Suppose that M G MnXn(F) can be written in the form
A BN
M =
O /
where A is a square matrix. Prove that det(M) = det(A).

238 Chap. 4 Determinants
6.* Prove that if Me Mnxn(F) can be written in the form
u (A B
M=(o o)'
where A and C are square matrices, then det(Af) = det(A)* det(C).
4.5* A CHARACTERIZATION OF THE DETERMINANT
In Sections 4.2 and 4.3, we showed that the determinant possesses a number of
properties. In this section, we show that three of these properties completely
characterize the determinant; that is, the only function 8: Mnxn(F) —* F
having these three properties is the determinant. This characterization of
the determinant is the one used in Section 4.1 to establish the relationship
between det ( ) and the area of the parallelogram determined by u and
v. The first of these properties that characterize the determinant is the one
described in Theorem 4.3 (p. 212).
Definition. A function 8: Mnxn(F) —» F is called an n-linear function
if it is a linear function of each row of an n x n matrix when the remaining
n — \ rows are held fixed, that is. 8 is n-linear if. for every r = 1,2,..., n, we
have
/ o.i
Op_i
u + kv
\ «» /
= 8
( a, >
Or-l
li
ar+i
k8
( ii
ar_i
v
ar+i
\ «« /
whenever k is a scalar and u, v, and each Oj are vectors in F".
Example 1
The function 8: M„xn(F) -> F defined by 8(A) = 0 for each A G Mnxn(F)
is an n-linear function. •
Example 2
For 1 < j <n, define 8j: Mnxn(F) ~> F by 8j(A) = AijA2j • • • Anj for each
A G MnXn(F); that is, 8j(A) equals the product of the entries of column j of

Sec. 4.5 A Characterization of the Determinant 239
A. Let Ae Mnxn(F), a{ = (An, Ai2,..., Ain), and v = (bub2,... ,bn) G Fn.
Then each 8j is an n-linear function because, for any scalar k, we have
ar_i
ar + kv
ar+i
V On /
= Aij • • • A(r_1)j(Arj + kbj)A{r+1)j • • • A
"./
= Aij • • • A(r_,):,AriA(T.+1)i • • • Anj
+ Aij • • • A{r_Uj(kbj)A{r+i)j • • • Anj
= Aij • • • A(r_i)jArjA(r+i)j • • • Anj
+ k(Aij • • • A(r_i)j-6jA(r+i)j • • • Anj)
( «1 \ / ay
= 8
ar-i
ar
ar+i
\an )
k.8
(lr-l
V
ar+i
\an )
Example 3
The function 8: MnXn(F) -> F defined for each A G Mnxn(F) by 8(A) =
AnA22 • • -Anil (i.e., 8(A) equals the product of the diagonal entries of A) is
an n-linear function. •
Example 4
The function 8: MnXn(B) —> R defined for each A G Mnxn(i?) by 8(A) =
tr(A) is not an n-linear function for n > 2. For if / is the nxn identity
matrix and A is the matrix obtained by multiplying the first row of / by 2,
then 8(A) = n+1^2n = 2-8(1). •
Theorem 4.3 (p. 212) asserts that the determinant is an n-linear function.
For our purposes this is the most important example of an n-linear function.
Now we introduce the second of the properties used in the characterization
of the determinant.
Definition. An n-linear function 8: MnXn(F) —> F is called alternating
if, for each A G MnXn(F), we have 8(A) = 0 whenever two adjacent rows of
A are identical.

240 Chap. 4 Determinants
Theorem 4.10. Let 8: MnXn(F) —> F be an alternating n-linear function.
(a) If A G Mnxn(F) and B is a matrix obtained from A by interchanging
any two rows of A, then 8(B) = -8(A).
(b) If A e Mnxn(F) has two identical rows, then 8(A) = 0.
Proof, (a) Let A G Mnxn(F), and let B be the matrix obtained from A
by interchanging rows r and s, where r < s. We first establish the result in
the case that s = r + 1. Because 8: M7lXn(F) —» F is an n-linear function
that is alternating, we have
0 = 8
ar + or+i
ar + ar+i

= 6"
ai
ar + ar+i
an /
/fli\ / ai
+ <5
= 5
Or
ar
a,.+ 1
V an y
= 0 + 8(A) + 8(B) + 0.
w
/ ai
flr+1
Or
V an y
"I
ar+i
ar + arj-i
\ an /
( «t
ar+1
ar+1
v «n y
Thus 8(B) = -<5(A).
Next suppose that s > r + 1, and let the rows of A be ai,a2,... ,an.
Beginning with ar and ar+i, successively interchange ar with the row that
follows it until the rows are in the sequence
ai,a2,..., ar_i, ar+i,...,as,ar,as+i,...,an.
In all, .s —r interchanges of adjacent rows are needed to produce this sequence.
Then successively interchange as with the row that precedes it until the rows
are in the order
ax. a2,..., ar_i, as, ar+i,..., os_i, or, as+i,..., an.
This process requires an additional s — r — 1 interchanges of adjacent rows
and produces the matrix B. It follows from the preceding paragraph that
8(B) = (-l)(-s-r)+(s-r-1)<5(A) = -<5(A).
(b) Suppose that rows r and s of A G MnXn(F) are identical, where r < s.
If s = r + 1, then 5(A) = 0 because 5 is alternating and two adjacent rows

Sec. 4.5 A Characterization of the Determinant 241
of A are identical. If .s > r + 1, let B be the matrix obtained from A by
interchanging rows r + 1 and s. Then 8(B) = 0 because two adjacent rows of
B are identical. But 8(B) = -8(A) by (a). Hence 8(A) = 0. I
Corollary 1. Let 8: MnXn(F) —• F be an alternating n-linear function.
If B is a matrix obtained from A G M„X„(F) by adding a multiple of some
row of A to another row, then 8(B) = <5(A).
Proof. Let B be obtained from A G M,lXTi(F) by adding k times row i of
A to row j, where j ^ i, and let C be obtained from A by replacing row J of
A by row z of A. Then the rows of A, B, and C are identical except for row
j. Moreover, row j of B is the sum of row j of A and k times row j of C.
Since 5 is an n-linear function and C has two identical rows, it follows that
8(B) = 8(A) + k8(C) = 8(A) + fc-0 = 8(A). I
The next result now follows as in the proof of the corollary to Theorem 4.6
(p. 216). (See Exercise 11.)
Corollary 2. Let 8: Mnxn(F) —• F be an alternating n-linear function.
If M G MnXn(F) has rank less than n, then 8(M) = 0.
Proof. Exercise.
Corollary 3. Let 8: MnXn(F) —• F be an alternating n-linear function,
and let E\, E2, and E3 in MnXfl(F) be elementary matrices of types 1. 2.
and 3, respectively. Suppose that E2 is obtained by multiplying some row
of I by the nonzero scalar k. Then 8(E) = -8(1), 8(E2) = k-8(I). and
S(E3) = 8(1).
Proof. Exercise.
We wish to show that under certain circumstances, the only alternating
n-linear function 8: Mnx„(F) —+ F is the determinant, that is, 8(A) = det(A)
for all A G M„xri(F). In view of Corollary 3 to Theorem 4.10 and the facts
on page 223 about the determinant of an elementary matrix, this can happen
only if 8(1) = 1. Hence the third condition that is used in the characterization
of the determinant is that the determinant of the n x n identity matrix is 1.
Before we can establish the desired characterization of the determinant, we
must first show that an alternating n-linear function 8 such that 8(1) = 1 is
a multiplicative function. The proof of this result is identical to the proof of
Theorem 4.7 (p. 223), and so it is omitted. (See Exercise 12.)
Theorem 4.11. Let 8: Mnxn(F) —> F be an alternating n-linear function
such that 8(1) = 1. For any A, Be Mnxn(F), we have 8(AB) = 8(A)-8(B).

242 Chap. 4 Determinants
Proof. Kxercise.
Theorem 4.12. If 8: M„XH(F) —• F is an alternating n-linear function
such that 8(1) = 1, then 8(A) = det(A) for every A G MnXn(F).
Proof. Let 8: MnXn(F) —> F be an alternating n-linear function such that
8(1) = 1, and let A G M„xn(F). If A has rank less than n, then by Corollary 2
to Theorem 4.10, 8(A) = 0. Since the corollary to Theorem 4.6 (p. 217) gives
det(A) = 0, we have 8(A) = det(A) in this case. If, on the other hand, A has
rank n, then A is invertible and hence is the product of elementary matrices
(Corollary 3 to Theorem 3.6 p. 159), say A = Em---E2En Since 8(1) = 1,
it follows from Corollary 3 to Theorem 1.10 and the facts on page 223 that
8(E) — det(F) for every elementary matrix E. Hence by Theorems 4.11
and 4.7 (p. 223). we have
8(A) = 8(Em • • • E2EX)
= <KJSm) HE2)-8(Ei)
= det(Fw) det(F2)*det(F1)
= det(Fm---F2F,)
= det(A). I
Theorem 4.12 provides the desired characterization of the determinant: It
is the unique function 8: M,, .,,(1 •') I that i^ //-linear, is alternating, and
has the property t hat 8[ I) 1.
EXERCISES
1. Label the following statements as true or false.
(a) Any n-linear function 8: M;|X„(F) —> F is a linear transformation.
(b) Any n-linear function 8: Mnxn(F) —* F Is a linear function of each
row of an n x n matrix when the other n — 1 rows are held fixed.
(c) If 8: MnXfl(F) —* F is an alternating n-linear function and the
matrix A G M„xn(F) has two identical rows, then 8(A) = 0.
(d) If 8: MnXT,(F) —> F is an alternating n-linear function and B is
obtained from A G Mnxn(F) by interchanging two rows of A, then
8(B) = 8(A).
(e) There is a unique alternating n-linear function 8: MnXn(F) —> F.
(f) The function 8: Mnxn(F) -* F defined by 8(A) = 0 for every
A G Mnxn(F) is an alternating //-linear function.
2. Determine all the 1-linear functions 8: M)xi(F) —» F.
Determine which of the functions 8: M3X.}(F) —* F in Exercises 3-10 are
3-linear functions. .lustily each answer.

Sec. 4.5 A Characterization of the Determinant 243
3. 8(A) = k, where k is any nonzero scalar
4. 8(A) =A22
5. 8(A) = AiiA23A32
6. 5(A) = A„+A23 + A32
7. 5(A) = AnA21A32
8. 8(A) = AnA3iA32
9. 8(A) = AnA222A233
10. 8(A) = AUA22A33 - AUA21A32
11. Prove Corollaries 2 and 3 of Theorem 4.10.
12. Prove Theorem 4.11.
13. Prove that det: M2x2(F) —> F is a 2-linear function of the columns of
a matrix.
14. Let a,b,c,d G F. Prove that the function 8: M2X2(F) —* F defined by
8(A) = AnA22a + AnA2ib + Ai2A22c + Ai2A2id is a 2-linear function.
15. Prove that 8: M2x2(F) —» F is a 2-linear function if and only if it has
the form
8(A) = An A22o + AnA2i& + Ai2A22c + Ai2A2xd
for some scalars a, b,c,d G F.
16. Prove that if 8: MnXT1(F) —> F is an alternating n-linear function, then
there exists a scalar k such that 8(A) = A:det(A) for all A G MnXn(F).
17. Prove that a linear combination of two n-linear functions is an n-linear
function, where the sum and scalar product of n-linear functions are as
defined in Example 3 of Section 1.2 (p. 9).
18. Prove that the set of all n-linear functions over a field F is a vector
space over F under the operations of function addition and scalar mul
tiplication as defined in Example 3 of Section 1.2 (p. 9).
19. Let 8: MnXn(F) —* F be an n-linear function and F a field that does
not have characteristic two. Prove that if 8(B) = —8(A) whenever B is
obtained from A G MnXn(F) by interchanging any two rows of A, then
8(M) = 0 whenever M G MnXn(F) has two identical rows.
20. Give an example to show that the implication in Exercise 19 need not
hold if F has characteristic two.

244 Chap. 4 Determinants
INDEX OF DEFINITIONS FOR CHAPTER 4
Alternating n-linear function 239
Angle between two vectors 202
Cofactor 210
Cofactor expansion along the first
row 210
Cramer's rule 224
Determinant of a 2 x 2 matrix 200
Determinant of a matrix 210
Left-handed coordinate system
203
n-linear function 238
Orientation of an ordered basis
202
Parallelepiped, volume of 226
Parallelogram determined by two
vectors 203
Right-handed coordinate system
202

5
Diagonalization
5.1 Eigenvalues and Eigenvectors
5.2 Diagonalizability
5.3* Matrix Limits and Markov Chains
5.4 Invariant Subspaces and the Cayley-Hamilton Theorem
J. his chapter is concerned with the so-called diagonalization problem. For
a given linear operator T on a finite-dimensional vector space V, we seek
answers to the following questions.
1. Does there exist an ordered basis (3 for V such that [T]/? is a diagonal
matrix?
2. If such a basis exists, how can it be found?
Since computations involving diagonal matrices are simple, an affirmative
answer to question 1 leads us to a clearer understanding of how the operator T
acts on V, and an answer to question 2 enables us to obtain easy solutions to
many practical problems that can be formulated in a linear algebra context.
We consider some of these problems and their solutions in this chapter; see,
for example, Section 5.3.
A solution to the diagonalization problem leads naturally to the concepts
of eigenvalue and eigenvector. Aside from the important role that these
concepts play in the diagonalization problem, they also prove to be useful
tools in the study of many nondiagonalizable operators, as we will see in
Chapter 7.
5.1 EIGENVALUES AND EIGENVECTORS
In Example 3 of Section 2.5, we were able to obtain a formula for the
reflection of R2 about the line y = 2x. The key to our success was to find a
basis (3' for which \J\p> is a diagonal matrix. We now introduce the name for
an operator or matrix that has such a basis.
Definitions. A linear operator T on a finite-dimensional vector space V
is called diagonalizable if there is an ordered basis (3 for V such that [T]^
245

246 Chap. 5 Diagonalization
is a diagonal matrix. A square matrix A is called diagonalizable if \-A is
diagonalizable.
We want to determine when a linear operator T on a finite-dimensional
vector space V is diagonalizable and, if so, how to obtain an ordered basis
(3 = {vi,v2,... ,vn} for V such that [T]^ is a diagonal matrix. Note that, if
D = [T]ff is a diagonal matrix, then for each vector Vj G f3, we have
J(Vj) = Y\ D*3Vi - D33V3 = X3V3>
i=l
where Xj = Djj.
Conversely, if 0 = {i>i,i>2,. • • ,v„} is an ordered basis for V such that
T(VJ) = XjVj for some scalars Ai, A2,..., ATl, then clearly
[T]3 =
/A.
0
0
A2
\u 0
0
K)
In the preceding paragraph, each vector v in the basis 0 satisfies the
condition that T(v) = An for some scalar A. Moreover, because v lies in a
basis, v is nonzero. These computations motivate the following definitions.
Definitions. Let T be a linear operator on a vector space V. A nonzero
vector v G V is called an eigenvector of T if there exists a scalar A such
that T(v) = An. The scalar A is called the eigenvalue corresponding to the
eigenvector v.
Let A be in MnXn(F). A nonzero vector v G Fn is called an eigenvector
of A if v is an eigenvector of LA; that is, if Av = Xv for some scalar X. The
scalar X is called the eigenvalue of A corresponding to the eigenvector v.
The words characteristic vector and proper vector are also used in place of
eigenvector. The corresponding terms for eigenvalue are characteristic value
and proper value.
Note that a vector is an eigenvector of a matrix A if and only if it is an
eigenvector of LA- Likewise, a scalar A is an eigenvalue of A if and only if it is
an eigenvalue of L.4. Using the terminology of eigenvectors and eigenvalues.
we can summarize the preceding discussion as follows.
Theorem 5.1. A linear operator T on a tinilc-dhncnsional vector space V
is diagonalizable if and only if there exists an ordered basis 0 for V consisting
of eigenvectors ofT. Furthermore, ifT is diagonalizable, 0 = {v\,v2,... ,vn}
is an ordered basis of eigenvectors of T, and D = [T]^, then D is a diagonal
matrix and Djj is the eigenvalue corresponding to Vj for 1 < j < n.

Sec. 5.1 Eigenvalues and Eigenvectors 247
To diagonalize a matrix or a linear operator is to find a basis of eigenvec
tors and the corresponding eigenvalues.
Before continuing our study of the diagonalization problem, we consider
three examples of eigenvalues and eigenvectors.
Example 1
Let
Since
A =
UM =
1 3
4 2
1 3
4 2
v\ =
1
-1
1
I
and v2
-2
2
= -2
I
= -2vi,
v\ is an eigenvector of L4. and hence of A. Here Ai = —2 is the eigenvalue
corresponding to v\. Furthermore,
\-A(V2) =
1 3
4 2
= b
= 5n2,
and so v2 is an eigenvector of LA, and hence of A, with the corresponding
eigenvalue A2 = 5. Note that 0 = {vi,v2} is an ordered basis for R2 consisting
of eigenvectors of both A and L^, and therefore A and L.,4 are diagonalizable.
Moreover, by Theorem 5.1,
Me =
-2 0
0 5
Example 2
Let T be the linear operator on R2 that rotates each vector in the plane
through an angle of 7r/2. It is clear geometrically that for any nonzero vector
v, the vectors v and T(v) are not collinear; hence T(v) is not a multiple of
v. Therefore T has no eigenvectors and, consequently, no eigenvalues. Thus
there exist operators (and matrices) with no eigenvalues or eigenvectors. Of
course, such operators and matrices are not diagonalizable. •
Example 3
Let Ceyo(R) denote the set of all functions f:R—>R having derivatives of all
orders. (Thus CX(R) includes the polynomial functions, the sine and cosine
functions, the exponential functions, etc.) Clearly, C°°(B) is a subspace of
the vector space F(R, R) of all functions from R to R as defined in Section
1.2. Let T: C°°(B) -> Coc(B) be the function defined by T(/) = /', the
derivative of /. It is easily verified that T is a linear operator on C°°(B). We
determine the eigenvalues and eigenvectors of T.

248 Chap. 5 Diagonalization
Suppose that / is an eigenvector of T with corresponding eigenvalue A.
Then /' = T(/) = A/. This is a first-order differential equation whose solu
tions are of the form f(t) = cext for some constant c. Consequently, every
real number A is an eigenvalue of T, and A corresponds to eigenvectors of the
form cext for c ^ 0. Note that for A = 0, the eigenvectors are the nonzero
constant functions. •
In order to obtain a basis of eigenvectors for a matrix (or a linear opera
tor), we need to be able to determine its eigenvalues and eigenvectors. The
following theorem gives us a method for computing eigenvalues.
Theorem 5.2. Let A G MnXn(F). Then a scalar X is an eigenvalue of A
if and only if det (A - XIn) =0.
Proof. A scalar A is an eigenvalue of A if and only if there exists a nonzero
vector v G Fn such that Av = An, that is, (A - XIn)(v) = 0. By Theorem 2.5
(p. 71), this is true if and only if A — XIn is not invertible. However, this
result is equivalent to the statement that det(A — A/n) = 0.
Definition. Let A G Mnxri(F). The polynomial f(t) = det(A - tln) is
called the characteristic polynomial l of A.
Theorem 5.2 states that the eigenvalues of a matrix are the zeros of its
characteristic polynomial. When determining the eigenvalues of a matrix or
a linear operator, we normally compute its characteristic polynomial, as in
the next example.
Example 4
To find the eigenvalues of
A =
1 1
4 1
6 M2x2(/2),
we compute its characteristic polynomial:
det(A - tl2) = det
1-t
1 -/,
= t2 -2t-3 = (t-3)(t + l).
It follows from Theorem 5.2 that the only eigenvalues of A are 3 and — 1.
JThe observant reader may have noticed that the entries of the matrix A ~ tln
are not scalars in the field F. They are, however, scalars in another field F(t), the
field of quotients of polynomials in t with coefficients from F. Consequently, any
results proved about determinants in Chapter 4 remain valid in this context.

Sec. 5.1 Eigenvalues and Eigenvectors 249
It is easily shown that similar matrices have the same characteristic poly
nomial (see Exercise 12). This fact enables us to define the characteristic
polynomial of a linear operator as follows.
Definition. Let T be a linear operator on an n-dimensional vector sjmcc
V with ordered basis 0. We define the characteristic polynomial f(t) of
T to be the characteristic polynomial of A = [T]^. That is,
f(t) = det(A-tln).
The remark preceding this definition shows that the definition is indepen
dent of the choice of ordered basis 0. Thus if T is a linear operator on a
finite-dimensional vector space V and 0 is an ordered basis for V, then A is
an eigenvalue of T if and only if A is an eigenvalue of [T]g. We often denote
the characteristic polynomial of an operator T by det(T — tl).
Example 5
Let T be the linear operator on P2(B) defined by T(f(x)) = f(x)+(x+l)f'(x),
let 0 be the standard ordered basis for P2(B), and let A = [T]^. Then
The characteristic polynomial of T is
(\-t 1 0
det(A - tl3) = (let 0 2-t 2
\ 0 0 3-r
= (1 - t)(2 - 0(3 - t)
= -(*-l)(*-2)(t-3).
Hence A is an eigenvalue of T (or A) if and only if A = 1, 2, or 3. •
Examples 4 and 5 suggest that the characteristic polynomial of an n x n
matrix A is a polynomial of degree n. The next theorem tells us even more.
It can be proved by a straightforward induction argument.
Theorem 5.3. Let A G Mnxn(F).
(a) The characteristic polynomial of A is a polynomial of degree n with
leading coefficient ( — l)n.
(b) A has at most n distinct eigenvalues.
Proof. Exercise.

250 Chap. 5 Diagonalization
Theorem 5.2 enables us to determine all the eigenvalues of a matrix or
a linear operator on a finite-dimensional vector space provided that we can
compute the zeros of the characteristic polynomial. Our next result gives
us a procedure for determining the eigenvectors corresponding to a given
eigenvalue.
Theorem 5.4. Let T be a linear operator on a vector space V, and let X
be an eigenvalue ofT. A vector v G V is an eigenvector of T corresponding
to X if and only if v ^ 0 and v G N(T - AI).
Proof. Exercise.
Example 6
To find all the eigenvectors of the matrix
in Example 4, recall that A has two eigenvalues, Ai = 3 and A2 = — 1. We
begin by finding all the eigenvectors corresponding to Ai = 3. Let
Then
is an eigenvector corresponding to Ai = 3 if and only if x ^ 0 and x G N(LR, );
that is, x ^ 0 and
-2 l\ (xx\ __ (-2xx + x2\ _ /0
4 -2) \x2) V 4*i - 2W V°
Clearly the set of all solutions to this equation is
Hence x is an eigenvector corresponding to Aj = 3 if and only if
x = t { \ for some t ^ 0.
Now suppose that x is an eigenvector of A corresponding to A2 = —1. Let
,i i\ (-\ o^ (2 \s
B2 = A- X2I = 4 I \ 0 -1/ V4 2

Sec. 5.1 Eigenvalues and Eigenvectors
Then
'xi
251
1 x2 ' E N(Lfi2
if and only if rr is a solution to the system
2xi + x2 = 0
4xi + 2x2 = 0.
Hence
N(Lfl,) = |t( 2)-teRj.
Thus x is an eigenvector corresponding to A2 = — 1 if and only if
x = t
-2
Observe that
for some t ^ 0.
».-i
is a basis for R2 consisting of eigenvectors of A. Thus L^, and hence A, is
diagonalizable. •
Suppose that 0 is a basis for Fn consisting of eigenvectors of A. The
corollary to Theorem 2.23 assures us that if Q is the nxn matrix whose
columns are the vectors in 0, then Q"1 AQ is a diagonal matrix. In Example 6,
for instance, if
Q =
then
»- i Q~lAQ =
2/'
3 0
0 -1
Of course, the diagonal entries of this matrix are the eigenvalues of A that
correspond to the respective columns of Q.
To find the eigenvectors of a linear operator T on an /".-dimensional vector
space, select an ordered basis 0 for V and let A = [T]^. Figure 5.1 is the
special case of Figure 2.2 in Section 2.4 in which V = W and 0 = 7. Recall
that for v G V, <f)p(v) = [v]p, the coordinate vector of v relative to 0. We
show that v € V is an eigenvector of T corresponding to A if and only if 4>p(v)

252
V V
</',(
Chap. 5 Diagonalization
F" —bi_» p"
Figure 5.1
is an eigenvector of A corresponding to A. Suppose that v is an eigenvector
of T corresponding to A. Then T(v) = A?;. Hence
A<p0(v) = \-A<j>p{v) = 4>pT(v) = 4>n(Xv) = Xcj>fl(v).
Now 4>p(v) 7*^ 0, since 0/? is an isomorphism; hence 0/3 (n) is an eigenvector
of A. This argument is reversible, and so we can establish that if 4>g(v)
is an eigenvector of A corresponding to A, then v is an eigenvector of T
corresponding to A. (See Exercise 13.)
An equivalent formulation of the result discussed in the preceding para
graph is that for an eigenvalue A of A (and hence of T), a vector y G Fn is an
eigenvector of A corresponding to A if and only if 01] (y) is an eigenvector of
T corresponding to A.
Thus we have reduced the problem of finding the eigenvectors of a linear
operator on a finite-dimensional vector space to the problem of finding the
eigenvectors of a matrix. The next example illustrates this procedure.
Example 7
Let T be the linear operator on P2(R) defined in Example 5. and let 0 be the
standard ordered basis for P2(i?). Recall that T has eigenvalues 1, 2, and 3
and that
/. 1 0
A = [T}0 = 0 2 2 .
\0 0 3/
We consider each eigenvalue separately.
Let Ai = 1, and define
/0 1 0N
Bi = A-XJ = 0 1 2 | .
\0 0 2,
Then
j
36R!

Sec. 5.1 Eigenvalues and Eigenvectors 253
is an eigenvector corresponding to Ai = 1 if and only if x ^ 0 and x G N(LBJ);
that is. x is a nonzero solution to the system
x2 =0
x2 + 2x3 = 0
2x3 = 0.
Notice that this system has three unknowns, xi, x2, and X3, but one of these,
xi, does not actually appear in the system. Since the values of xi do not
affect the system, we assign xi a parametric value, say xi = a, and solve the
system for x2 and X3. Clearly, x2 = X3 = 0, and so the eigenvectors of A
corresponding to Ai = 1 are of the form
for a ^ 0. Consequently, the eigenvectors of T corresponding to Ai = 1 are
of the form
0^1(aei) = a0^x(ei) = a-1 = a
for any a^O. Hence the nonzero constant polynomials are the eigenvectors
of T corresponding to Ai = 1.
Next let A2 = 2, and define
'-1 1 0N
B2 = A - X2I = ( 0 0 2
0 0 1
It is easily verified that
and hence the eigenvectors of T corresponding to A2 = 2 are of the form
V
^" (ei + e2) = a(l + x)
for 0 ^ 0.
Finally, consider A3 = 3 and
-2 1 0X
A-A3/= 0 -1 2
0 0 0

254 Chap. 5 Diagonalization
Since
the eigenvectors of T corresponding to A3 = 3 are of the form
0^ [o|2| ) = a0/;1(ei + 2e2 + e3) = a(l + 2x + x2)
V W/
for a ^ 0.
For each eigenvalue, select the corresponding eigenvector with a = 1 in the
preceding descriptions to obtain 7 = {l,l+x,l+2x+x2}, which is an ordered
basis for P2(R) consisting of eigenvectors of T. Thus T is diagonalizable. and
/l 0 0
[T]7 =020. •
\0 0 3/
We close this section with a geometric description of how a linear operator
T acts on an eigenvector in the context of a vector space V over R. Let v be
an eigenvector of T and A be the corresponding eigenvalue. We can think of
W = span({n}), the one-dimensional subspace of V spanned by v, as a line
in V that passes through 0 and v. For any w G W, w = cv for some scalar c,
and hence
T(iy) = T(cn) = cT(n) = cAn = Xw;
so T acts on the vectors in W by multiplying each such vector by A. There
are several possible ways for T to act on the vectors in W, depending on the
value of A. We consider several cases. (See Figure 5.2.)
CASE 1. If A > 1, then T moves vectors in W farther from 0 by a factor
of A.
CASE 2. If A = 1, then T acts as the identity operator on W.
CASE 3. If 0 < A < 1, then T moves vectors in W closer to 0 by a factor
of A.
CASE 4. If A = 0, then T acts as the zero transformation on W.
CASE 5. If A < 0, then T reverses the orientation of W; that is, T moves
vectors in W from one side of 0 to the other.

Sec. 5.1 Eigenvalues and Eigenvectors 255
Case 1: A > 1
Case 2: A = 1
Case 3: 0 < A < 1
Case 4: A = 0
Case 5: A < 0
Figure 5.2: The action of T on W = span({v}) when v is an eigenvector of T.
To illustrate these ideas, we consider the linear operators in Examples 3,
4, and 2 of Section 2.1.
For the operator T on R2 defined by T(ai,a2) = (ai, — a2), the reflection
about the x-axis, ei and e2 are eigenvectors of T with corresponding eigen
values 1 and — 1, respectively. Since ei and e2 span the x-axis and the y-axis,
respectively, T acts as the identity on the x-axis and reverses the orientation
of the t/-axis.
For the operator T on R2 defined by T(ai,a2) = (ai,0), the projection on
the x-axis, ei and e2 are eigenvectors of T with corresponding eigenvalues 1
and 0, respectively. Thus, T acts as the identity on the x-axis and as the zero
operator on the y-axis.
Finally, we generalize Example 2 of this section by considering the oper
ator that rotates the plane through the angle 9, which is defined by
T0(0.1,0,2) = (ai cos# — a2sin*9, ai sin# + a2 cos9).
Suppose that 0 < 9 < n. Then for any nonzero vector v, the vectors v and
T#(n) are not collinear, and hence T# maps no one-dimensional subspace of
R2 into itself. But this implies that T0 has no eigenvectors and therefore no
eigenvalues. To confirm this conclusion, let 0 be the standard ordered basis
for R2, and note that the characteristic polynomial of T# is
det{[Te}3 - tl2) = det
cos 9 — t — sin 1
sin# cos 9 — t
= t-2-(2cos0)t + l,

256 Chap. 5 Diagonalization
which has no real zeros because, for 0 < 9 < it, the discriminant 4 cos2 9 — 4
is negative.
2.
EXERCISES
Label the following statements as true or false.
(a) Every linear operator on an n-dimensional vector space has n dis
tinct eigenvalues.
(b) If a real matrix has one eigenvector, then it has an infinite number
of eigenvectors.
(c) There exists a square matrix with no eigenvectors.
(d) Eigenvalues must be nonzero scalars.
(e) Any two eigenvectors are linearly independent.
(f) The sum of two eigenvalues of a linear operator T is also an eigen
value of T.
(g) Linear operators on infinite-dimensional vector spaces never have
eigenvalues.
(h) An n x n matrix A with entries from a field F is similar to a
diagonal matrix if and only if there is a basis for Fn consisting of
eigenvectors of A.
(i) Similar matrices always have the same eigenvalues,
(j) Similar matrices always have the same eigenvectors,
(k) The sum of two eigenvectors of an operator T is always an eigen
vector of T.
For each of the following linear operators T on a vector space V and
ordered bases 0, compute [T]/?, and determine whether 0 is a basis
consisting of eigenvectors of T.
(a) V = R2, T
10a- 66
17a - 106
, and 0 =
(b) V = Pi (R), J(a + 6x) = (6a - 66) + (12a - 116)x, and
3 = {3 + 4x, 2 + 3x}
(c) V = R
3a + 26 - 2c'
-4a - 36 + 2c . and
(d) V = P2(i?),T(a + 6x + cx2) =
(-4a + 26 - 2c) - (7a + 36 + 7c)x + (7a + 6 + 5c>2,
and 0 = {x - x2, -1 + x2, -1 - x + x2}

Sec. 5.1 Eigenvalues and Eigenvectors
(e) V = P3(R), T(a + 6x + ex2 + dx3) =
-d + (-c + d)x + (a + 6 - 2c)x2 + (-6 + c - 2d)x3,
and /? = {1 -x + x3,l +x2,l,x + x2}
257
(f) V = M2x2(i2),T
0 =
c d
1 0
1 0
-7a -46 + Ac -Ad b
-8a - 46 + 5c - Ad d
, and
-1 2
0 0
1 0
2 0
-1 0
0 2
3. For each of the following matrices A G MnXn(F),
(i) Determine all the eigenvalues of A.
(ii) For each eigenvalue A of A, find the set of eigenvectors correspond
ing to A.
If possible, find a basis for Fn consisting of eigenvectors of A.
If successful in finding such a basis, determine an invertible matrix
Q and a diagonal matrix D such that Q~x AQ = D.
= R
for F = R
(hi;
(iv;
for F = C
for F = R
4. For each linear operator T on V, find the eigenvalues of T and an ordered
basis 0 for V such that [T]/3 is a diagonal matrix.
(a) V = R2 and T(a, 6) = (-2a + 36. - 10a + 96)
(b) V = R3 and T(a, 6, c) = (7a - 46 + 10c, 4a - 36 + 8c, -2a + 6 - 2c)
(c) V = R3 and T(a, 6, c) = (-4a + 36- 6c. 6a - 76 + 12c, 6a - 66 + lie)
(d) V = P,(7?) and T(u.r + 6) = (-6a + 26)x + (-6a + 6)
(e) V = P2(R) and T(/(x)) = x/'(x) + /(2)x + /(3)
(f) V = P3(R) and T(/(x)) = f(x) + /(2)x
(g) V = P3(R) and T(/(x)) = xf'(x) + f"(x) - f(2)
(h) V = M2x2(R) and T (a ^

258
7.
8.
Chap. 5 Diagonalization
c d
d 1 \a 6
(i) V = M2x2(fi) and T
(j) V = M2x2(/?.) and T(A) = A1 + 2 • tr(A) • I2
5. Prove Theorem 5.4.
6. Let T be a linear operator on a finite-dimensional vector space V, and
let 0 be an ordered basis for V. Prove that A is an eigenvalue of T if
and only if A is an eigenvalue of [T]#.
Let T be a linear operator on a finite-dimensional vector space V. We
define the determinant of T, denoted det(T), as follows: Choose any
ordered basis 0 for V, and define det(T) = det([T]/3).
(a) Prove that the preceding definition is independent of the choice
of an ordered basis for V. That is, prove that if 0 and 7 are two
ordered bases for V, then det([T]/3) = det([T]7).
(b) Prove that T is invertible if and only if det(T) ^ 0.
(c) Prove that if T is invertible, then det(T_1) = [det(T)]"1.
(d) Prove that if U is also a linear operator on V, then det(TU) =
dct(T)-det(U).
(e) Prove that det(T — Aly) = det([T]/j — XI) for any scalar A and any
ordered basis 0 for V.
(a) Prove that a linear operator T on a finite-dimensional vector space
is invertible if and only if zero is not an eigenvalue of T.
(b) Let T be an invertible linear operator. Prove that a scalar A is an
eigenvalue of T if and only if A~! is an eigenvalue of T-1.
(c) State and prove results analogous to (a) and (b) for matrices.
9. Prove that the eigenvalues of an upper triangular matrix M are the
diagonal entries of M.
10. Let V be a finite-dimensional vector space, and let A be any scalar.
(a) For any ordered basis 0 for V, prove that [Aly]/? = XI.
(b) Compute the characteristic polynomial of Aly.
(c) Show that Aly is diagonalizable and has only one eigenvalue.
11. A scalar matrix is a square matrix of the form XI for some scalar A;
that is, a scalar matrix is a diagonal matrix in which all the diagonal
entries are equal.
(a) Prove that if a square matrix A is similar to a scalar matrix XI,
then A- XL
(b) Show that a diagonalizable matrix having only one eigenvalue is a
scalar matrix.

Sec. 5.1 Eigenvalues and Eigenvectors 259
(c) Prove that I . 1 is not diagonalizable.
12. (a) Prove that similar matrices have the same characteristic polyno
mial.
(b) Show that the definition of the characteristic polynomial of a linear
operator on a finite-dimensional vector space V is independent of
the choice of basis for V.
13. Let T be a linear operator on a finite-dimensional vector space V over a
field F, let 0 be an ordered basis for V, and let A = [T]/3. In reference
to Figure 5.1, prove the following.
(a) If r G V and <t>0(v) is an eigenvector of A corresponding to the
eigenvalue A, then v is an eigenvector of T corresponding to A.
(b) If A is an eigenvalue of A (and hence of T), then a vector y G Fn
is an eigenvector of A corresponding to A if and only if 4>~Q (y) is
an eigenvector of T corresponding to A.
14.* For any square matrix A, prove that A and A1 have the same charac
teristic polynomial (and hence the same eigenvalues).
15- (a) Let T be a linear operator on a vector space V, and let x be an
eigenvector of T corresponding to the eigenvalue A. For any posi
tive integer m, prove that x is an eigenvector of Tm corresponding
to the eigenvalue A7",
(b) State and prove the analogous result for matrices.
16. (a) Prove that similar matrices have the same trace. Hint: Use Exer
cise 13 of Section 2.3.
(b) How would you define the trace of a linear operator on a finite-
dimensional vector space? Justify that your definition is well-
defined.
17. Let T be the linear operator on MnXn(i?) defined by T(A) = A1.
(a) Show that ±1 are the only eigenvalues of T.
(b) Describe the eigenvectors corresponding to each eigenvalue of T.
(c) Find an ordered basis 0 for M2x2(Z?) such that [T]/3 is a diagonal
matrix.
(d) Find an ordered basis 0 for Mnx„(R) such that [T]p is a diagonal
matrix for n > 2.
18. Let A,B e M„xn(C).
(a) Prove that if B is invertible, then there exists a scalar c G C such
that A + cB is not invertible. Hint: Examine det(A + cB).

r
260 Chap. 5 Diagonalization
(b) Find nonzero 2x2 matrices A and B such that both A and A + cB
are invertible for all e G C.
19. * Let A and B be similar nxn matrices. Prove that there exists an n-
dimensional vector space V, a linear operator T on V, and ordered bases
0 and 7 for V such that A = [T]/3 and B = [T]7. Hint: Use Exercise 14
of Section 2.5.
20. Let A be an nxn matrix with characteristic polynomial
f(t) = (-l)ntn + an-ir"-1 + • • • + a^ + a0.
Prove that /(0) = an = det(A). Deduce that A is invertible if and only
if a0 ^ 0.
21. Let A and f(t) be as in Exercise 20.
(a) Prove that/(0 = (An-t)(A22-t) ••• (Ann-t) + q(t), where q(t)
is a polynomial of degree at most n — 2. Hint: Apply mathematical
induction to n.
(b) Show that tr(A) = (-l)n-1an_i.
22.t
(a) Let T be a linear operator on a vector space V over the field F,
and let g(t) be a polynomial with coefficients from F. Prove that
if x is an eigenvector of T with corresponding eigenvalue A, then
g(T)(x) = g(X)x. That is, x is an eigenvector of o(T) with corre
sponding eigenvalue g(X).
(b) State and prove a comparable result for matrices.
(c) Verify (b) for the matrix A in Exercise 3(a) with polynomial g(t) =
2t2 — t + 1, eigenvector x = I „ J, and corresponding eigenvalue
A = 4.
23. Use Exercise 22 to prove that if f(t) is the characteristic polynomial
of a diagonalizable linear operator T, then /(T) = To, the zero opera
tor. (In Section 5.4 we prove that this result does not depend on the
diagonalizability of T.)
24. Use Exercise 21(a) to prove Theorem 5.3.
25. Prove Corollaries 1 and 2 of Theorem 5.3.
26. Determine the number of distinct characteristic polynomials of matrices
in M2x2(Z2).

Sec. 5.2 Diagonalizability 261
5.2 DIAGONALIZABILITY
In Section 5.1, we presented the diagonalization problem and observed that
not all linear operators or matrices are diagonalizable. Although we are able
to diagonalize operators and matrices and even obtain a necessary and suf
ficient condition for diagonalizability (Theorem 5.1 p. 246), we have not yet
solved the diagonalization problem. What is still needed is a simple test to
determine whether an operator or a matrix can be diagonalized, as well as a
method for actually finding a basis of eigenvectors. In this section, we develop
such a test and method.
In Example 6 of Section 5.1, we obtained a basis of eigenvectors by choos
ing one eigenvector corresponding to each eigenvalue. In general, such a
procedure does not yield a basis, but the following theorem shows that any
set constructed in this manner is linearly independent.
Theorem 5.5. Let T be a linear operator on a vector space V, and let
Ai, A2,..., Xk be distinct eigenvalues of T. Ifvi,v2, • • • ,vk are eigenvectors of
T such that Xi corresponds to Vi (I < i < k), then {vi,v2,... ,vk} is linearly
independent.
Proof. The proof is by mathematical induction on k. Suppose that k = 1.
Then vi ^ 0 since v\ is an eigenvector, and hence {vi} is linearly independent.
Now assume that the theorem holds for k - 1 distinct eigenvalues, where
k — 1 > 1, and that we have k eigenvectors v\, v2,..., Vk corresponding to the
distinct eigenvalues Ai, A2,..., Xk. We wish to show that {vi, v2,..., vk} is
linearly independent. Suppose that ai,a2,...,ak are scalars such that
aini+a2n2-l Yakvk = 0. (1)
Applying T - A^ I to both sides of (1), we obtain
ai(Ai - Xk)vi +a2(A2 - Afc)n2 + ••• +afc_i(Afe_1 - Xk)vk-i = 0.
By the induction hypothesis {fi,v2,... ,vk-i} is linearly independent, and
hence
ai(Xi - Afc) = a2(A2 - Afc) = • • • = afc_i(Afc-i - Afc) = 0.
Since Ai, A2,... , A^ are distinct, it follows that Aj - A^ ^ 0 for 1 < i < k — 1.
So ai = a2 = • • • = afc_i = 0, and (1) therefore reduces to akvk = 0. But
vk ^ 0 and therefore ak = 0. Consequently ai = a2 = • • • = ak = 0, and it
follows that {vi,v2,... ,vk} is linearly independent.
Corollary. Let T be a linear operator on an n-dimensional vector space
V. If T has n distinct eigenvalues, then T is diagonalizable.

262 Chap. 5 Diagonalization
Proof. Suppose that T has n distinct eigenvalues Ai,...,A„. For each i
choose an eigenvector Vi corresponding to A;. By Theorem 5.5, {v\,... ,vn}
is linearly independent, and since dim(V) = n, this set is a basis for V. Thus,
by Theorem 5.1 (p. 246), T is diagonalizable. 1
Example 1
Let
A =
1 1
1 1
G M2x2(R).
The characteristic polynomial of A (and hence of L^) is
det (A - tl) = det
\-t
1
1
1-rf
= t(t - 2),
and thus the eigenvalues of L^ are 0 and 2. Since L^ is a linear operator on the
two-dimensional vector space R2, we conclude from the preceding corollary
that L.4 (and hence A) is diagonalizable. •
The converse of Theorem 5.5 is false. That is, it is not true that if T is
diagonalizable. then it has n distinct eigenvalues. For example, the identity
operator is diagonalizable even though it has only one eigenvalue, namely,
A = l.
We have seen that diagonalizability requires the existence of eigenvalues.
Actually, diagonalizability imposes a stronger condition on the characteristic
polynomial.
Definition. A polynomial f(t) in P(F) splits over F if there are scalars
c,ai,... ,a„ (not necessarily distinct) in F such that
f(t) = c(t-ai)(t-a2)---(t-an).
For example, t2 - 1 = (t + l)(t - 1) splits over R, but (t2 + l)(t - 2) does not
split over R because t2 +1 cannot be factored into a product of linear factors.
However, (t2 + )(t - 2) does split over C because it factors into the product
(t + i)(t — i)(t — 2). If f(t) is the characteristic polynomial of a linear operator
or a matrix over a field F, then the statement that f(t) splits is understood
to mean that it splits over F.
Theorem 5.6. The characteristic polynomial of any diagonalizable linear
operator splits.
Proof. Let T be a diagonalizable linear operator on the n-dimensional
vector space V. and let 0 be an ordered basis for V such that [T)p = D is a

Sec. 5.2 Diagonalizability
diagonal matrix. Suppose that
D =
263
(Xx 0
0 A2
0
0
\o o ••• A,,/
and let f(t) be the characteristic polynomial of T. Then
f(t) = det(D - tl) = det
(Xx-t 0 ••• 0
0 A2 - f • • • 0
V 0 0 ••• A»-t/
= (Ai - t)(X2 - t) • • • (A„ -1) = (-1H* - Ai)(t - A2)... (t - An). I
From this theorem, it is clear that if T is a diagonalizable linear operator
on an n-dimensional vector space that fails to have distinct eigenvalues, then
the characteristic polynomial of T must have repeated zeros.
The converse of Theorem 5.6 is false: that is, the character ist i< polynomial
of T may split, but T need not be diagonalizable. (See Example 3. which
follows.) The following concept helps us determine when an operator whose
characteristic polynomial splits is diagonalizable.
Definition. Let X be an eigenvalue of a linear operator or matrix with
characteristic polynomial f(t). The (algebraic) multiplicity of X is the
largest positive integer k for which (t - X)k is a factor of f(t).
Example 2
Let
which has characteristic polynomial f(t) = — (t — 3)2(t — A). Hence A = 3 is
an eigenvalue of A with multiplicity 2. and A = 4 is an eigenvalue of A with
multiplicity 1. •
If T is a diagonalizable linear operator on a finite-dimensional vector spate
V, then there is an ordered basis 0 for V consisting of eigenvectors of T. We
know from Theorem 5.1 (p. 246) that [T]/3 is a diagonal matrix in which the
diagonal entries are the eigenvalues of T. Since the characteristic polynomial
of T is detQT]/? — tl), it is easily seen that each eigenvalue of T must occur
as a diagonal entry of [T]/3 exactly as many times as its multiplicity. Hence

264 Chap. 5 Diagonalization
0 contains as many (linearly independent) eigenvectors corresponding to an
eigenvalue as the multiplicity of that eigenvalue. So the number of linearly
independent eigenvectors corresponding to a given eigenvalue is of interest in
determining whether an operator can be diagonalized. Recalling from Theo
rem 5.4 (p. 250) that the eigenvectors of T corresponding to the eigenvalue
A are the nonzero vectors in the null space of T — AI, we are led naturally to
the study of this set.
Definition. Let T be a linear operator on a vector space V, and let
X be an eigenvalue ofT. Define E\ = {x G V: T(x) = Ax} = N(T — Aly).
The set E\ is called the eigenspace of T corresponding to the eigenvalue
X. Analogously, we define the eigenspace of a square matrix A to be the
eigenspace of LA.
Clearly, E\ is a subspace of V consisting of the zero vector and the eigen
vectors of T corresponding to the eigenvalue A. The maximum number of
linearly independent eigenvectors of T corresponding to the eigenvalue A is
therefore the dimension of EA- Our next result relates this dimension to the
multiplicity of A.
Theorem 5.7. Let T be a linear operator on a finite-dimensional vec
tor space V, and let X be an eigenvalue of T having multiplicity m. Then
1 < dim(EA) < m.
Proof. Choose an ordered basis {v\. r2 vp} for EA. extend it to an or
dered basis 0 = {vi, t)2, • • •, Up, vp+i,..., vn} for V, and let A = [T]g. Observe
that V{ (1 < i < p) is an eigenvector of T corresponding to A, and therefore
A-{() C
By Exercise 21 of Section 4.3, the characteristic polynomial of T is
B
C - tln-p
f{t) = det(A - tin) = det ({X J)l"
= det((A - t)Ip) det(C - tln-p)
= (X-ty>g(t),
where g(t) is a polynomial. Thus (A — t)p is a factor of f(t), and hence the
multiplicity of A is at least p. But dim(EA) = p, and so dim(EA) < rn.
Example 3
Let T be the linear operator on P2(R) defined by T(/(x)) = f'(x). The
matrix representation of T with respect to the standard ordered basis 0 for

Sec. 5.2 Diagonalizability
P2(R.) is
/.) 1 0
[J}0 =0 0 2
\0 0 0/
Consequently, the characteristic polynomial of T is
/-/. 1 0N
det([T]p - £/) = det 0 -/ 2
V 0 0 -t,
265
= -t*.
Thus T has only one eigenvalue (A = 0) wit h mult iplicity 3. Solving T(/(.r)) =
f'(x) = 0 shows that EA = N(T - AI) = N(T) is the subspace of P2(R) con
sisting of the constant polynomials. So {1} is a basis for EA, and therefore
dim(EA) = 1. Consequently, there is no basis for P2(R) consisting of eigen
vectors of T, and therefore T is not diagonalizable. •
Example 4
Let T be the linear operator on R3 defined by
A; A /4ai + a3
T a2 I = I 2ai + 3a2 + 2o3 .
\".i/ \ ai + Au->)
We determine the eigenspace of T corresponding to each eigenvalue. Let >
be the standard ordered basis for R3. Then
/4 0 l
[T],,= I 2 3 2 .
and hence the characteristic polynomial of T is
/4 -1 0 1 N
det([T]/3 - tl) = (let [ 2 3 - t 2
\ I 0 A-t
-(t-5)(t-3)2.
So the eigenvalues of T are Ai = 5 and A2 = 3 with multiplicities 1 and 2,
respectively.

266 Chap. 5 Diagonalization
EAX is the solution space of the system of linear equations
-xi + x3 = 0
2xi - 2x2 + 2x3 = 0
xi - x3 = 0.
It is easily seen (using the techniques of Chapter 3) that
is a basis for EA, • Hence dim(EA,) = 1.
Similarly, EA2 = N(T — A2I) is the solution space of the system
x\ + x3 = 0
2xi + 2x3 = 0
x\ + x3 = 0.
Since the unknown x2 does not appear in this system, we assign it a para
metric value, say, X2 = s, and solve the system for xi and x3, introducing
another parameter t. The result is the general solution to the system
0 , for s, t G R.
It follows that
is a basis for EA2, and dim(EA2) = 2.
In this case, the multiplicity of each eigenvalue Ai is equal to the dimension
of the corresponding eigenspace EA^ . Observe that the union of the two bases
just derived, namely,
is linearly independent and hence is a basis for R3 consisting of eigenvectors
of T. Consequently, T is diagonalizable. •

Sec. 5.2 Diagonalizability 267
Examples 3 and 1 suggest that an operator whose characteristic polyno
mial splits is diagonalizable if and only if the dimension of each eigenspace
is equal to the multiplicity of the corresponding eigenvalue. This is indeed
true, as we now show. We begin with the following lemma, which is a slight
variation of Theorem 5.5.
Lemma. Let T be a linear operator, and let Ai,A2 , AA: be distinct
eigenvalues of T. For each i = 1,2,...,A;, let Vi G EAJ, the eigenspace corre
sponding to A,. //'
Vi+v2-\ rVk = 0,
then V{ = 0 for all i.
Proof. Suppose otherwise. By renumbering if necessary, suppose that, for
1 < m < k, we have v, ^ 0 for 1 m. Then, for
each i < m, Vi is an eigenvector of T corresponding to A, and
v\ + v2 H h vm = 0.
But this contradicts Theorem 5.5, which states that these r,"s are linearly
independent. We conclude, therefore, that Vi = 0 for all i.
Theorem 5.8. Let T be a linear operator on a vector space V. and let
X\, A2,.... A/,, be distinct eigenvalues ot'T. For each i = 1,2,... ,k. let S,
be a finite linearly independent subset of the eigenspace EA, • Then S =
Si U S2 u • • • U Sk is a linearly independent subset ofW.
Proof. Suppose that for each i
Si = {viUVi2,...,vini}.
Then S = [vy : 1 < j < /;,. and 1 < i < A:}. Consider an\- scalars {a^} such
that
^^a-ijVij = 0.
i=i j=i
For each i, let
Wi = ^ajjVij.
Then Wi G EA, for each /'. and W\ + • • • + wk = 0. Therefore, by the lemma.
Wi — 0 for all i. Bui each .S', is linearly independent, and hence a^ — 0 for
all j. We conclude that S is lineally independent.

268 Chap. 5 Diagonalization
Theorem 5.8 tells us how to construct a linearly independent subset of
eigenvectors, namely, by collecting bases for the individual eigeuspaces. The
next theorem tells us when the resulting set is a basis for the entire space.
Theorem 5.9. Let T be a linear operator on a finite-dimensional vector
space V such that the characteristic polynomial of T splits. Let Ai, A2,.... Xk
be the distinct eigenvalues ofT. Then
(a) T is diagonalizable if and only if the multiplicity of Xi is equal to
dim(EA,) for all i.
(b) If T is diagonaliy.nhie and I, is an ordered basis for EA, for each i, then
0 = 0i U02 U • • • U0k is an ordered basis2 for V consisting of eigenvectors
ofT.
Proof. For each /. let /;;, denote the multiplicity of A,, d, = dim(EA,), and
n = dim(V).
First, suppose that T is diagonalizable. Let 0 be a basis for V consisting
of eigenvectors of T. For each /. let 0r = 0 n E\t, the set of vectors in 0 that
are eigenvectors corresponding to At, and let n^ denote the number of vectors
in ).,. Then n, < <l, for each / because 1, is a linearly independent subset of
a subspace of dimension (/,, and di < nrii by Theorem 5.7. The n^'s sum to n
because 0 contains n vectors. The mVs also sum to n because the degree of
the characteristic polynomial of T is equal to the sum of the multiplicities of
the eigenvalues. Thus
k k k
n = 2__. ni — /_, di < 2. mi = n-
'I 1=1 2 = 1
It follows that
^(m,-dt) = 0.
i=i
Since (m, — di) > 0 for all i, we conclude that ntj = dt- for all i.
Con\crsely. suj)pose that nij = d,; for all i. We simultaneously show that
T is diagonalizable and prove (b). For each i, let 0i be an ordered basis for
EA< , and let 0 = 0\ \J02U • • • U0k. By Theorem 5.8, 0 is linearly independent.
Furthermore, since d, = m, for all i. 0 contains
i=l i=l
= n
2 We regard 0\ U 02 U • • • l)0k as an ordered basis in the natural way- -the vectors
in 0\ are listed first (in the same order as in f3), then the vectors in /52 (in the same
order as in ii->), etc.

Sec. 5.2 Diagonalizability 269
vectors. Therefore 3 is an ordered basis for V consisting of eigenvectors of V.
and we conclude that T is diagonalizable.
This theorem completes our study of the diagonalization problem. We
summarize our results.
Test for Diagonalization
Let T be a linear operator on an n-dimensional vector space V. Then T
is diagonalizable if and only if both of the following conditions hold.
1. The characteristic polynomial ofT splits.
2. For each eigenvalue A ofT. the multiplicity of A equals n - rank(T — AI).
These same conditions can be used to test if a square matrix A is diagonal
izable because diagonalizability of A is equivalent to diagonalizability of the
operator L^.
If T is a diagonalizable operator and :j\. 02..... 0k are ordered bases for
the eigenspaces of T. then the union 0 = 0iU02U-• -U0k is an ordered basis
for V consisting of eigenvectors of T. and hence [T]@ is a diagonal matrix.
When testing T for diagonalizability, it is usually easiest to choose a conve
nient basis Q for V and work with B = [T]a. If the characteristic polynomial
of B splits, then use condition 2 above to check if the multiplicity of each
repeated eigenvalue of B equals n - ra\k(B — Xf). (By Theorem 5.7, condi
tion 2 is automatically satisfied for eigenvalues of multiplicity 1.) If so, then
B. and hence T, is diagonalizable.
If T is diagonalizable and a basis 8 for V consisting of eigenvectors ofT
is desired, then we first find a basis for each eigenspace of B. The union of
these bases is a basis *) for F" consisting of eigenvectors of B. Each vector
in 7 is the coordinate vector relative to a of an eigenvector of T. The set
consisting of these n eigenvectors of T is the desired basis >.
Furthermore, if A is an n x n diagonalizable matrix, we can use the corol
lary to Theorem 2.23 (p. 115) to find an invertible n x n matrix Q and a
diagonal n x n matrix D such that Q~l AQ = D. The matrix Q has as its
columns the vectors in a basis of eigenvectors of A. and I) has as its jth
diagonal entry the eigenvalue of A corresponding to the jth column of Q.
We now consider some examples illustrating the preceding ideas.
Example 5
We test t he matrix
G M3x:i(/r)
for diagonalizability.

270 Chap. 5 Diagonalization
The characteristic polynomial of A is det(A-tJ) = -(t — A)(t-3)2, which
splits, and so condition 1 of the test for diagonalization is satisfied. Also A
has eigenvalues Ai = 4 and A2 = 3 with multiplicities 1 and 2. respectively.
Since A| has multiplicity 1, condition 2 is satisfied for X\. Thus we need only
test condition 2 for A2. Because
/() 1 0
A - X2I = I 0 0 0 I
\° ° 7
has rank 2, we see that 3 — rank(A — A27) = 1. which is not the multiplicity
of A2. Thus condition 2 fails for A2, and A is therefore not diagonalizable.
Example 6
Let T be the linear operator on P2(R) defined by
T(/(x)) = /(l) + f'(0)x + (/'(()) + /"(0))x2.
We first test T for diagonalizability. Let a denote the standard ordered basis
for P2(R) and B = [T)n. Then
/l 1 l
B = 0 1 0 .
\0 1 l)
The characterist ic polynomial of li, and hence of T, is (/ I )2{l 2), which
splits. Hence condition 1 of the test for diagonalization is satisfied. Also B
has the eigenvalues Ai = 1 and A2 = 2 with multiplicities 2 and 1, respectively.
Condition 2 is satisfied for A2 because it has multiplicity 1. So we need only
verify condition 2 for Ai =1. For this case,
3 - rank(£ - A, I) = 3 - rank
which is equal to the multiplicity of A|. Therefore T is diagonalizable.
We now find an ordered basis 7 for R3 of eigenvectors of B. We consider
each eigenvalue separately.
The eigenspace corresponding to Ai = 1 is
Ex, = G R'
0
0
0
1
0
1
1
0
1/
(xi
u
V'':i
= 0

Sec. 5.2 Diagonalizability
which is the solution space for the system
x2 + 2:3 = 0,
and has
271
7i =
tr
as a basis.
The eigenspace corresponding to A2 = 2 is
EA2 = X2\ GR3:
-1 1 V
0 -1 0
0 1 (I,
which is the solution space for the system
-xi + x2 + x3 = 0
X2 =0,
and has
as a basis.
Let
72 =
7 = 7i U 72 =
' tr
ir
Then 7 is an ordered basis for R-i consisting of eigenvectors of B.
Finally, observe that the vectors in 7 are the coordinate vectors relative
to a of the vectors in the set
0={l,-x + x2,l+x2},
which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus
/l 0 0
[T}0 =010. •
\o 0 2/

272 Chap. 5 Diagonalization
Our next example is an application of diagonalization that is of interest
in Section 5.3.
Example 7
Let
A =
0 -2
1 3
We show that A is diagonalizable and find a 2 x 2 matrix Q such that Q~1AQ
is a diagonal matix. We then show how to use this result to compute An for
any positive integer n.
First observe that the characteristic polynomial of A is (t — l)(t — 2), and
hence A has two distinct eigenvalues, Ai = 1 and A2 = 2. By applying the
corollary to Theorem 5.5 to the operator L^. we see that A is diagonalizable.
Moreover,
7l = {C~i)} and *""{(~i,
are bases for the eigenspaces EAX and EA2, respectively. Therefore
7 = 7i U 72
-2
1
is an ordered basis for R2 consisting of eigenvectors of A. Let
-2 -T
Q =
1 1
the matrix whose columns are the vectors in 7. Then, by the corollary to
Theorem 2.23 (p. 115),
D = Q'lAQ = [LA}0 =
1 0
0 2) '
To find An for any positive integer n, observe that A = QDQ 1. Therefore
An = (QDQ'1)n
= (QDQ-l)(QDQ-')- • • (QDQ-1)
= QDnQ~l
lo 2«I
-2 -l)(l i) H ~l
1 l) l() 2n \ 1 2
2 _ 2n 2- 2n+1
-l+2n -l+2n+1

Sec. 5.2 Diagonalizability 273
We now consider an application that uses diagonalizat ion to solve a system
of differential equations.
Systems of Differential Equations
Consider the system of differential equations
x\ = 3Xi + X2 + X3
Xo = 2xi + 4x2 + 2x3
x3 = -xi - x2 + x3,
where, for each i, Xj = Xi(t) is a differentiable real-valued function of the
real variable t. Clearly, this system has a solution, namely, the solution in
which each Xi(t) is the zero function. We determine all of the solutions to
this system.
Let x: R —> R3 be the function defined by
The derivative of x, denoted x', is defined by
/•'•'. (t)
At) = A(t)
V'3(t))
Let
A =
be the coefficient matrix of the given system, so that we can rewrite the
system as the matrix equation x' = Ax.
It can be verified that for
Q = 0
1
0
-1
1
-M
-2
1
and D =
(2
0
o
0
2
0
0
0
1
we have Q~lAQ = D. Substitute A = QDQ~l into x' = Ax to obtain
x' = QDQ~lx or, equivalently, Q~xx' = DQ~lx. The function y: R -> R3
defined by y(t) = Q~lx(t) can be shown to be differentiable, and y' = Q~lx'
(see Exercise 16). Hence the original system can be written as y' = Dy.

274 Chap. 5 Diagonalization
Since D is a diagonal matrix, the system y' — Dy is easy to solve. Setting
y(t) =
fyi{t)
V2{t)
we can rewrite y' = Dy as
(v(i)
The three equations
f2yi(t)"
2lte(0
y[ = 2j/i
y2 = 22/2
y3 = 4i/3
are independent of each other, and thus can be solved individually. It is
easily seen (as in Example 3 of Section 5.1) that the general solution to these
equations is yi(t) = cie2', y2(t) = c2e2t,
C3 are arbitrary constants. Finally,
and y3(t) — c3e4t, where ci,c2, and
(*i(0
MO
\x3(t)J
= x(t) = Qy(t) =
/-I
°
\ "
0
-1
1
-2
1
cie
c2e
,c3e
2t
2t
At
—cie
2t
c3e
\i
C]t
- c2e2t - 2e3e4t
21 + c2c2' + c3e4t
yields the general solution of the original system. Note that this solution can
be written as
x(t) = ,2t
+ f
,4t
The expressions in brackets are arbitrary vectors in EA, and EA2, respectively,
where Ai = 2 and A2 = 4. Thus the general solution of the original system is
x(t) = e2tzi + e4tz2, where z\ G EA, and z2 G EA2. This result is generalized
in Exercise 15.
Direct Sums*
Let T be a linear operator on a finite-dimensional vector space V. There
is a way of decomposing V into simpler subspaces that offers insight into the

Sec. 5.2 Diagonalizability 275
behavior of T. This approach is especially useful in Chapter 7, where we study
nondiagonalizable linear operators. In the case of diagonalizable operators,
the simpler subspaces are the eigenspaces ol the operator.
Definition. Let Wi, W2...., \Nk be .subspaces of a vector space V. We
define the sum of these subspaces to be the set
{vi + v2 + r vk: Vi G Wi for 1 < i < k),
k
which we denote by W] + W2 -\ + WA or \\ W,.
It is a simple exercise to show that the sum of subspaces of a vector space
is also a subspace.
Example 8
Let V = R3, let W] denote the xy-p\ane, and let W2 denote the yz-plane.
Then R3 = Wi + W2 because, for any vector (a, b, c) G R3. we have
(a,6,c) = (a,0,0) + (0.fr,c),
where (a, 0,0) G W, and (0, b, c) G W2. •
Notice that in Example 8 the representation of (a, b, c) as a sum of vectors
in Wi and W2 is not unique. For example, (a, b, c) = (a, b, 0) + (0,0. c) is
another representation. Because we are often interested in sums for which
representations are unique, we introduce a condition that assures this out
come. The definition of direct sum that follows is a generalization of the
definition given in the exercises of Section 1.3.
Definition. Let V\/i,\N2 W/, be subspaces of a vector space V. We
call V the direct sum of the subspaces Wi.W2 vVk and write V =
W, @W2©---0VVA.., if
and
v=f>
i=i
W, n Y^ W,• = {0} for each j (1 < j < k).
Example 9
Let V = R', W, = {(a,6,0,0): a,b,€ R}, W2 = {(0,0,c,0): c G R}, and
W3 = {(0,0,0, d): d G R}. For any (a, b, c, d) G V.
(a, 6, c, d) = (a, 6,0,0) + (0,0, c, 0) + (0,0,0, d) € Wi + W2 + W3.

276
Thus
Chap. 5 Diagonalization
V = ^W„
To show that V is the direct sum of Wi, W2, and W3, we must prove that
Wi n (W2 + W3) = W2 n (Wi + W3) = W3 n (Wi + W2) = {0}. But these
equalities are obvious, and so V = Wi © W2 © W3. •
Our next result contains several conditions that are equivalent to the
definition of a direct sum.
Theorem 5.10. Let Wi, W2,.... Wfc be subspaces of a finite-dimensional
vector space V. The following conditions are equivalent.
(a) v = w1ew2©--- ©Wfc.
k
(b) V BB V^Wi and, for any vectors vi,v2,--- ,vk such that Vi G Wj
J=I
(1 <i< k), ifvi + )2 + \-vk = 0, then V{ — 0 for all i.
(c) Each vector v G V can be uniquely written as v = vi + v2 + • • • + vk,
where Vi G W,.
(d) If 7i is an ordered basis for Wi (I < i < k), then 71 U 72 U • • • U 7^ is an
ordered basis for V.
(e) For each i = 1,2,... ,k, there exists an ordered basis 7, for Wj such
that 71 U 72 U • • • U 7fc is an ordered basis for V.
Proof. Assume (a). We prove (b). Clearly
k
V =>*>>,.
i=l
Now suppose that i>i,n2,... ,vk are vectors such that Vi G W^ for all i and
V\ + V2 H \~vk = 0. Then for any j
~v3 B£«l e5ZWi'
But — Vj G Wj and hence
-^•eWj-n^w^io}.
So Vj = 0, proving (b).
Now assume (b). We prove (c). Let v G V. By (b), there exist vectors
vi,i>2, •.., vk such that V{ G W, and v = vi + v2 + • • • + vk. We must show

Sec. 5.2 Diagonalizability 277
that this representation is unique. Suppose also that v = W\ +W2 H h wk,
where Wi G Wj for all i. Then
(vi - w^ + (v2-w2) + --- + (vk - wk) = 0.
But Vi — Wi G Wi for all i, and therefore Vi — Wi — 0 for all i by (b). Thus
V< = Wi for all 2, proving the uniqueness of the representation.
Now assume (c). We prove (d). For each i, let 7, be an ordered basis for
Wt. Since
k
v = ]Tw,
i=i
by (c), it follows that 71 U 72 U • • • U 7fc generates V. To show that this
set is linearly independent, consider vectors Vij G 7< (j = 1,2,...,m* and
i = 1,2,..., k) and scalars a^ such that
^OijVij = 0.
1,3
For each i, set
Wi =
JJ ^j •
Then for each i, Wi G span(7i) = Wi and
n?i + w2 H 1- tnfc = 2J aij% = 0.
«.j
Since 0 G Wi for each i and ^ + 0 H h ^ = wi + if2 H \-u)k, (c) implies
that Wi — 0 for all i. Thus
0 = W» = 7. aijvij
j =
for each i. But each 7^ is linearly independent, and hence a^ = 0 for all i
and j. Consequently 71 U 72 U • • • U 7fc is linearly independent and therefore
is a basis for V.
Clearly (e) follows immediately from (d).
Finally, we assume (e) and prove (a). For each i, let 7< be an ordered
basis for Wi such that 71 U 72 U • • • U 7fc is an ordered basis for V. Then
V = span(7i U72U---U7fc)

278 Chap. 5 Diagonalization
k
= span(7i) + span(72) + • • • + span(7fc) = Jj W,-
i=i
by repeated applications of Exercise 14 of Section 1.4. Fix j (1 < j < k), and
suppose that, for some nonzero vector v € V,
Then
•jeWj-n^w,-.
G Wj = span(7j) and w G 2_,Wi = span I ^J
i*j L»¥j
Hence n is a nontrivial linear combination of both 7j and so that
v can be expressed as a linear combination of 71 U 72 U • • • U 7fc in more than
one way. But these representations contradict Theorem 1.8 (p. 43), and so
we conclude that
w,n]Tw( = {0}.
proving (a). I
With the aid of Theorem 5.10, we are able to characterize diagonalizability
in terms of direct sums.
Theorem 5.11. A linear operator T on a finite-dimensional vector space
V is diagonalizable if and only if\/ is the direct sum of the cigenspaces ofT.
Proof. Let Ai, A2,.... Afc be the distinct eigenvalues of T.
First suppose that T is diagonalizable, and for each i choose an ordered
basis 7i for the eigenspace EA^ By Theorem 5.9, 71 U 72 U • • • U 7fc is a basis
for V, and hence V is a direct sum of the EA^'S by Theorem 5.10.
Conversely, suppose that V is a direct sum of the eigenspaces of T. For
each i, choose an ordered basis jt of EA,. By Theorem 5.10, the union
71 U 72 U • • • U 7fc is a basis for V. Since this basis consists of eigenvectors of
T, we conclude that T is diagonalizable.
Example 10
Let T be the linear operator on R1 defined by
T(a,o,c,d) = (a, 6,2c, 3d).

Sec. 5.2 Diagonalizability 279
It is easily seen that T is diagonalizable with eigenvalues Ai = 1, A2 = 2,
and A3 = 3. Furthermore, the corresponding eigenspaces coincide with the
subspaces Wi, W2, and W3 of Example 9. Thus Theorem 5.11 provides us
with another proof that R4 = Wi © W2 © W3. •
EXERCISES
1. Label the following statements as true or false.
(a) Any linear operator on an n-dimensional vector space that has
fewer than n distinct eigenvalues is not diagonalizable.
(b) Two distinct eigenvectors corresponding to the same eigenvalue
are always linearly dependent.
(c) If A is an eigenvalue of a linear operator T, then each vector in EA
is an eigenvector of T.
(d) If Ai and A2 are distinct eigenvalues of a linear operator T, then
EAlnEA2 = {0}.
(e) Let A G Mnxn(F) and 0 = {vi,v2, • • • ,vn} be an ordered basis for
Fn consisting of eigenvectors of A. If Q is the nxn matrix whose
jth column is Vj (1 < j < n), then Q~lAQ is a diagonal matrix.
(f) A linear operator T on a finite-dimensional vector space is diago
nalizable if and only if the multiplicity of each eigenvalue A equals
the dimension of EA-
(g) Every diagonalizable linear operator on a nonzero vector space has
at least one eigenvalue.
The following two items relate to the optional subsection on direct sums.
(h) If a vector space is the direct sum of subspaces Wl5 W2,..., \Nk,
then Wi n Wj = {0} for 1\± j.
(i) ^
V = £Wi
i=l
then V = Wi © W2 CE
and WiDWj = {tf} for i ^ j,
Wfc.
2. For each of the following matrices A G Mnxn(R), test A for diagonal
izability, and if A is diagonalizable, find an invertible matrix Q and a
diagonal matrix D such that Q~lAQ — D.
(a)
1 2
0 1
(b)
1 3
3 1
(c)
1 4
3 2

280 Chap. 5 Diagonalization
3. For each of the following linear operators T on a vector space V, test
T for diagonalizability. and if T is diagonalizable, find a basis 0 for V
such that [T]a is a diagonal matrix.
(a) V = P3(R) and T is defined by T(/(x)) = f'(x) + f"(x), respec
tively.
(b) V = P2(R) and T is defined by T(ax2 + bx + c) = ex2 + bx + a.
(c) V = R3 and T is defined by
(d) V = P2(R) and T is defined by T(/(x)) = /(0) + /(l)(x + x2).
(e) V = C2 and T is defined by T(z, w) = (z + iw, iz + w).
(f) V = M2x,(/?) and T is defined by T(A) = A1.
4. Prove the matrix version of the corollary to Theorem 5.5: If A G
Mrix„(F) has n distinct eigenvalues, then A is diagonalizable.
5. State and prove the matrix version of Theorem 5.6.
6. (a) .Justify the test for diagonalizability and the method for diagonal
ization stated in this section,
(b) Formulate the results in (a) for matrices.
7. For
find an expression for A", where n is an arbitrary positive integer.
8. Suppose that A G MnXn(F) has two distinct eigenvalues, Ai and A2,
and that dim(EA,) = n — 1. Prove that A is diagonalizable.
9. Let T be a lineal' operator on a finite-dimensional vector space V, and
suppose there exists an ordered basis i for V such that [T]^ is an upper
triangular matrix.
(a) Prove that the characteristic polynomial for T splits.
(b) State and prove an analogous result for matrices.
The converse of (a) is treated in Fxorcise 32 of Section 5.4.

Sec. 5.2 Diagonalizability 281
10. Let T be a linear operator on a finite-dimensional vector space V with
the distinct eigenvalues Ai,A2,...,Afc and corresponding multiplicities
mi,m2,... , mk. Suppose that 0 is a basis for V such that [T]p is an
upper triangular matrix. Prove that the diagonal entries of [T]p are
Ai, A2,... .Afc and that each Ai occurs mi times (1 < i < k).
11. Let A be an n x n matrix that is similar to an upper triangular ma
trix and has the distinct eigenvalues Ai, A2,.... Afc with corresponding
multiplicities mi, m2,..., mk. Prove the following statements.
12.
13.
(a) tr(A) = ^miAi
i=l
(b) det(A) = (Ai)^(A2) Tn2 . . . (A/
Let T be an invertible linear operator on a finite-dimensional vector
space V.
(a) Recall that for any eigenvalue A of T, A-1 is an eigenvalue of T_1
(Exercise 8 of Section 5.1). Prove that the eigenspace of T corre
sponding to A is the same as the eigenspace of T_1 corresponding
to A"1.
(b) Prove that if T is diagonalizable, then T_1 is diagonalizable.
Let A G Mnxn(F). Recall from Exercise 14 of Section 5.1 that A and
A' have the same characteristic polynomial and hence share the same
eigenvalues with the same multiplicities. For any eigenvalue A of A and
Af, let EA and EA denote the corresponding eigenspaces for A and A*,
respectively.
(a) Show by way of example that for a given common eigenvalue, these
two eigenspaces need not be the same.
(b) Prove that for any eigenvalue A, dim(EA) = dim(EA).
(c) Prove that if A is diagonalizable, then A1 is also diagonalizable.
14. Find the general solution to each system of differential equations.
(a)
(c)
15. Let
= x + y
= 3x - y
(b)
x'i = 8x1 + 10x2
x'2 = — 5xi - 7x2
x1 = Xi
x2 = x2
Xo =
^3
X3
2x3
A =
fan
a2i
ai2
a22
0>ln
0>2n
\ani an2

282 Chap. 5 Diagonalization
be the coefficient matrix of the system of differential equations
X\ = o-\X\ + ai2x2 + • • • + aiTlxn
x2 = a2iXi + a22x2 + • • • + a2nxn
x'n = anixi + an2x2 H h annxn.
Suppose that A is diagonalizable and that the distinct eigenvalues of A
are Alr A2,..., Afc. Prove that a differentiable function x: R —* Rn is a
solution to the system if and only if x is of the form
.Ait A2t, x(t) = eAltzi + eA2t22 + • • • + eXkizk,
where Zi G E\t for i = 1,2,..., k. Use this result to prove that the set
of solutions to the system is an n-dimensional real vector space.
16. Let C G Mmxn(/?), and let Y be an n x p matrix of differentiable
functions. Prove (CY)' = CY', where (r% = *£ for all i.j.
Exercises 17 through 19 are concerned with simultaneous diagonalization.
Definitions. Two linear operators T and U on a finite-dimensional vector
space V are called simultaneously diagonalizable if there exists an ordered
basis 0 for V such that both [T]p and [U]/? arc diagonal matrices. Similarly,
A,B£ Mnxn(F) are called simultaneously diagonalizable if there exists
an invertible matrix Q G MnXn(F) such that both Q~l AQ and Q_1 BQ are
diagonal matrices.
17. (a) Prove that if T and U are simultaneously diagonalizable linear
operators on a finite-dimensional vector space V, then the matrices
[T]^ and [U]/3 are simultaneously diagonalizable for any ordered
basis 0.
(b) Prove that if A and B are simultaneously diagonalizable matrices,
then L.4 and LB are simultaneously diagonalizable linear operators.
18. (a) Prove that if T and U are simultaneously diagonalizable operators,
then T and U commute (i.e., TU = UT).
(b) Show that if A and B are simultaneously diagonalizable matrices,
then A and B commute.
The converses of (a) and (b) are established in Exercise 25 of Section 5.4.
19. Let T be a diagonalizable linear operator on a finite-dimensional vector
space, and let m be any positive integer. Prove that T and Tm are
simultaneously diagonalizable.
Exercises 20 through 23 are concerned with direct sums.

Sec. 5.3 Matrix Limits and Markov Chains 283
20. Let Wi, W2,..., Wfc be subspaces of a finite-dimensional vector space V
such that
]Tw, = v.
*=1
Prove that V is the direct sum of Wi5 W2,..., Wfc if and only if
k
dirn(V) = ^dim(Wi).
/=!
21. Let V be a finite-dimensional vector space with a basis 0, and let
0i,02,... ,0k be a partition of 0 (i.e., 0i,02,...,0k are subsets of 0
such that 0 = 0i U 02 U • • • U 0k and 0{ n 0j = 0 if i ^ j). Prove that
V = span(/?i) © span(/?2) © • • • © span(0k).
22. Let T be a linear operator on a finite-dimensional vector space V, and
suppose that the distinct eigenvalues of T are Ai, A2,. - -, Afc. Prove that
span({x G V: x is an eigenvector of T}) = EAX © EA2 © • • • © EAfc.
23. Let Wi, W2, Ki, K2,..., Kp, Mj, M2,..., M9 be subspaces of a vector
space V such that W! = Ki©K2©---©Kp and W2 = Mi©M2©-
Prove that if Wx n W2 = {0}, then
Wi+W2=Wi©W2 = Ki©K2 Kp © Mi © M2 ©
M(/.
M9.
5.3* MATRIX LIMITS AND MARKOV CHAINS
In this section, we apply what we have learned thus far in Chapter 5 to study
the limit of a sequence of powers A, A2,..., An,..., where A is a square
matrix with complex entries. Such sequences and their limits have practical
applications in the natural and social sciences.
We assume familiarity with limits of sequences of real numbers. The
limit of a sequence of complex numbers {zm: m = 1,2,...} can be defined
in terms of the limits of the sequences of the real and imaginary parts: If
Zm — rm + ism, where rm and sm are real numbers, and i is the imaginary
number such that i2 = —1, then
lim zm— lim rm + i lim sm,
m—*oo m—*oo m—»oo
provided that lim rm and lim sm exist.
m—»oo TO—»oo

284 Chap. 5 Diagonalization
Definition. Let L, Ai, A2,... benxp matrices having complex entries.
The sequence Ai, A2,... is said to converge to the n x p matrix L, called
the limit of the sequence, if
lim (Am)ij = Lij
m—• 00
for all 1 < i < n and 1 < j < p. To designate that L is the limit of the
sequence, we write
lim Am = L.
m—»oo
Example 1
If
^T-m —
1 -
3 J 3TO2 , • ( 2m+l
4 J m2 + l T * I m-1 J
1+i
then
lim Am =
TO—»00
1 0 3 + 2z
0 2 e
where e is the base of the natural logarithm. •
A simple, but important, property of matrix limits is contained in the next
theorem. Note the analogy with the familiar property of limits of sequences
of real numbers that asserts that if lim am exists, then
TO—«00
lim cam = c I lim a,
TO—»oo \m —»oc
Theorem 5.12. Let A1.A2,... be a sequence of n x p matrices with
complex entries that converges to the matrix L. Then for any P G M,xn(C)
and Q G Mpxs(C),
lim PAm = PL and lim AmQ = LQ.
TO—»00 TO—»00
Proof. For any t (1 < % < r) and j (1 < j < p),
n
lim (PATn)ij= lim y^Pik(Am)kj
TO—'OO TO—»O0 *• '
fc=l

Sec. 5.3 Matrix Limits and Markov Chains 285
= J^Pik- lim (Am)kj = S^PikLkj = (PL)ij.
*—* TO—»QO *—*
fc=l fc=l
Hence lim PAm = PL. The proof that lim AmQ = LQ is similar.
m—»oo TO—»oo
Corollary. Let A G Mnxn(C) be such that lim Am = L. Then for any
TO—»00
invertible matrix Q G MnXn(C),
lim (QAQ-1)771 = QLQ-1.
Proo/. Since
(QAQ-1)"1 = (QAQ~l)(QAQ~l) • • • (QAQ~l) = QAmQ~
we have
lim (QAQ~l)m = lim QA771^-1 = Q ( lim Am) Q"1 = QLQ
TO—»oo TO—»oo \7n—>oo /
by applying Theorem 5.12 twice.
In the discussion that follows, we frequently encounter the set
5={AGC: |A| < lor A= 1}.
-l
Geometrically, this set consists of the complex number 1 and the interior of
the unit disk (the disk of radius 1 centered at the origin). This set is of
interest because if A is a complex number, then lim Am exists if and only
TO—»00
A G S. This fact, which is obviously true if A is real, can be shown to be true
for complex numbers also.
The following important result gives necessary and sufficient conditions
for the existence of the type of limit under consideration.
Theorem 5.13. Let A be a square matrix with complex entries. Then
lim Am exists if and only if both of the following conditions hold.
TO—»00
(a) Every eigenvalue of A is contained in S.
(b) If 1 is an eigenvalue of A, then the dimension of the eigenspace corre
sponding to 1 equals the multiplicity of 1 as an eigenvalue of A.
One proof of this theorem, which relies on the theory of Jordan canonical
forms (Section 7.2), can be found in Exercise 19 of Section 7.2. A second
proof, which makes use of Schur's theorem (Theorem 6.14 of Section 6.4),
can be found in the article by S. H. Friedberg and A. J. Insel, "Convergence
of matrix powers," Int. J. Math. Educ. Sci. Technoi, 1992, Vol. 23, no. 5,
pp. 765-769.

286 Chap. 5 Diagonalization
The necessity of condition (a) is easily justified. For suppose that A is an
eigenvalue of A such that A ^ 5. Let v be an eigenvector of A corresponding
to A. Regarding v as an n x 1 matrix, we see that
lim (Amv) = lim A7
n—»oo
v = Lv
lim (Amn)
TO—»0O
exists, then
by Theorem 5.12, where L = lim A771. But lim (Amv)
TO—>OC TO—»OC
diverges because lim A771 does not exist. Hence if lim A7
TO—»00 TO—>00
condition (a) of Theorem 5,13 must hold.
Although we are unable to prove the necessity of condition (b) here, we
consider an example for which this condition fails. Observe that the charac
teristic polynomial for the matrix
B =
is (t — l)2, and hence B has eigenvalue A = 1 with multiplicity 2. It can
easily be verified that dim(EA) = 1, so that condition (b) of Theorem 5.13
is violated. A simple mathematical induction argument can be used to show
that
Bm =
1 m
0 1
and therefore that lim Bm does not exist. We see in Chapter 7 that if A
m—»oo
is a matrix for which condition (b) fails, then A is similar to a matrix whose
upper left 2x2 submatrix is precisely this matrix B.
In most of the applications involving matrix limits, the matrix is diag
onalizable, and so condition (b) of Theorem 5.13 is automatically satisfied.
In this case, Theorem 5.13 reduces to the following theorem, which can be
proved using our previous results.
Theorem 5.14. Let A G MnXn(C) satisfy the following two conditions.
(i) Every eigenvalue of A is contained in S.
(ii) A is diagonalizable.
Then lim A7
m —+oo
exists.
Proof. Since A is diagonalizable, there exists an invertible matrix Q such
that Q~] AQ = D is a diagonal matrix. Suppose that
D =
A.
o
0
A2
0 0
o
0
KJ

Sec. 5.3 Matrix Limits and Markov Chains 287
Because Ai, A2,..., An are the eigenvalues of A, condition (i) requires that for
each *, either A* = 1 or |Ad < 1. Thus
TO—»oo o otherwise.
But since
Dw =
/v
V o
0
A271
0
0
0
the sequence D,D2,... converges to a limit L. Hence
lim Am = lim (QDQ-1)7'1 = QLQ~l
TO—>00 TO—»00
by the corollary to Theorem 5.12.
The technique for computing lim Am used in the proof of Theorem 5.14
TO—»00
can be employed in actual computations, as we now illustrate. Let
A =
/7 _9 _15
( A 4 4 1
Using the methods in Sections 5.1 and 5.2, we obtain
0 0N
-\ 0
o i
such that Q~lAQ = D. Hence
lim Am= lim (QDQ~l)m = lim QDmQ~1 = Q ( lim Dm) Q'1
lim
TO—»<X>
/I
0
1°
0
Hr
0
0
0
dr

288 Chap. 5 Diagonalization
Next, we consider an application that uses the limit of powers of a ma
trix. Suppose that the population of a certain metropolitan area remains
constant but there is a continual movement of people between the city and
the suburbs. Specifically, let the entries of the following matrix A represent
the probabilities t hat someone living in the city or in the suburbs on January
1 will be living in each region on January 1 of the next year.
Currently Currently
living in living in
the city the suburbs
Living next year in the city
Living next year in the suburbs
0.90
0.10
0.02
0.98
= A
For instance, the probability that someone living in the city (on January 1)
will be living in the suburbs next year (on January 1) is 0.10. Notice that
since the entries of A are probabilities, they are nonnegative. Moreover, the
assumption of a constant population in the metropolitan area requires that
the sum of the entries of each column of A be 1.
Any square matrix having these two properties (nonnegative entries and
columns that sum to 1) is called a transition matrix or a stochastic ma
trix. For an arbitrary n x n transition matrix M, the rows and columns
correspond to n states, and the entry M^ represents t he probability of mov
ing from state j to state i in one stage.
In our example, there are two states (residing in the city and residing in
the suburbs). So, for example. A2i is the probability of moving from the
city to the suburbs in one stage, that is, in one year. We now determine the
Suburbs
Figure; 5.3
probability that a city resident will be living in the suburbs after 2 years.
There are two different ways in which such a move can be made: remaining
in the city for 1 year and then moving to the suburbs, or moving to the
suburbs during the first year and remaining there the second year. (See

Sec. 5.3 Matrix Limits and Markov Chains 289
Figure 5.3.) The probability that a city dweller remains in the city for the
first year is 0.90, whereas the probability that the city dweller moves to the
suburbs during the first year is 0.10. Hence the probability that a city dweller
stays in the city for the first year and then moves to the suburbs during the
second year is the product (0.90) (0.10). Likewise, the probability that a city
dweller moves to the suburbs in the first year and remains in the suburbs
during the second year is the product (0.10)(0.98). Thus the probability that
a city dweller will be living in the suburbs after 2 years is the sum of these
products, (0.90) (0.10) + (0.10) (0.98) = 0.188. Observe that this number is
obtained by the same calculation as that which produces (A2)21, and hence
(A2)2i represents the probability that a city dweller will be living in the
suburbs after 2 years. In general, for any transition matrix M, the entry
(Mm)ij represents the probability of moving from state j to state i in m
stages.
Suppose additionally that 70% of the 2000 population of the metropolitan
area lived in the city and 30% lived in the suburbs. We record these data as
a column vector:
Proportion of city dwellers /0.70
Proportion of suburb residents \0.30
Notice that the rows of P correspond to the states of residing in the city and
residing in the suburbs, respectively, and that these states are listed in the
same order as the listing in the transition matrix A. Observe also that the
column vector P contains nonnegative entries that sum to 1; such a vector is
called a probability vector. In this terminology, each column of a transition
matrix is a probability vector. It is often convenient to regard the entries of a
transition matrix or a probability vector as proportions or percentages instead
of probabilities, as we have already done with the probability vector P.
In the vector AP, the first coordinate is the sum (0.90)(0.70)+(0.02)(0.30).
The first term of this sum, (0.90)(0.70), represents the proportion of the 2000
metropolitan population that remained in the city during the next year, and
the second term, (0.02) (0.30), represents the proportion of the 2000 metropoli
tan population that moved into the city during the next year. Hence the first
coordinate of AP represents the proportion of the metropolitan population
that was living in the city in 2001. Similarly, the second coordinate of
= /0.636
v0.364
represents the proportion of the metropolitan population that was living in
the suburbs in 2001. This argument can be easily extended to show that the
coordinates of

290 Chap. 5 Diagonalization
represent the proportions of the metropolitan population that were living
in each location in 2002. In general, the coordinates of AmP represent the
proportion of the metropolitan population that will be living in the city and
suburbs, respectively, after m stages (m years after 2000).
Will the city eventually be depleted if this trend continues? In view of
the preceding discussion, it is natural to define the eventual proportion of
the city dwellers and suburbanites to be the first and second coordinates,
respectively, of lim AmP. We now compute this limit. It is easily shown
TO—>00
that A is diagonalizable, and so there is an invertible matrix Q and a diagonal
matrix D such that Q'1 AQ — D. In fact,
Q = and D =
(I
0 0.88
Therefore
lim Am= lim QDmQ-J = Q [\ J1
• OO TO—»oc \ U U
Consequently
lim AmP = LP =
m—*oo
<r=
Thus, eventually, I of the population will live in the city and | will live in the
suburbs each year. Note that the vector LP satisfies A(LP) = LP. Hence
LP is both a probability vector and an eigenvector of A corresponding to
the eigenvalue 1. Since the eigenspace of A corresponding to the eigenvalue
1 is one-dimensional, there is only one such vector, and LP is independent
of the initial choice of probability vector P. (See Exercise 15.) For example,
had the 2000 metropolitan population consisted entirely of city dwellers, the
limiting outcome would be the same.
In analyzing the city-suburb problem, we gave probabilistic interpreta
tions of A2 and AP, showing that A2 is a transition matrix and AP is a
probability vector. In fact, the product of any two transition matrices is a
transition matrix, and the product of any transition matrix and probability
vector is a probability vector. A proof of these facts is a simple corollary
of the next theorem, which characterizes transition matrices and probability
vectors.
Theorem 5.15. Let M be an nxn matrix having real nonnegative entries,
let v be a column vector in Rn having nonnegative coordinates, and let u G Rn
be the column vector in which each coordinate equals 1. Then

Sec. 5.3 Matrix Limits and Markov Chains 291
(a) M is a transition matrix if and only if Mlu = u;
(b) v is a probability vector if and only if ulv = (1).
Proof. Exercise.
Corollary.
(a) The product of two nxn transition matrices is an n x n transition
matrix. In particular, any power of a transition matrix is a transition
matrix.
(b) The product of a transition matrix and a probability vector is a prob
ability vector.
Proof. Exercise.
The city-suburb problem is an example of a process in which elements of
a set are each classified as being in one of several fixed states that can switch
over time. In general, such a process is called a stochastic process. The
switching to a particular state is described by a probability, and in general
this probability depends on such factors as the state in question, the time
in question, some or all of the previous states in which the object has been
(including the current state), and the states that other objects are in or have
been in.
For instance, the object could be an American voter, and the state of the
object could be his or her preference of political party; or the object could
be a molecule of H20, and the states could be the three physical states in
which H20 can exist (solid, liquid, and gas). In these examples, all four of
the factors mentioned above influence the probability that an object is in a
particular state at a particular time.
If, however, the probability that an object in one state changes to a differ
ent state in a fixed interval of time depends only on the two states (and not on
the time, earlier states, or other factors), then the stochastic process is called
a Markov process. If, in addition, the number of possible states is finite,
then the Markov process is called a Markov chain. We treated the city-
suburb example as a two-state Markov chain. Of course, a Markov process is
usually only an idealization of reality because the probabilities involved are
almost never constant over time.
With this in mind, we consider another Markov chain. A certain com
munity college would like to obtain information about the likelihood that
students in various categories will graduate. The school classifies a student
as a sophomore or a freshman depending on the number of credits that the
student has earned. Data from the school indicate that, from one fall semester
to the next, 40% of the sophomores will graduate, 30% will remain sopho
mores, and 30% will quit permanently. For freshmen, the data show that
10% will graduate by next fall, 50% will become sophomores, 20% will re
main freshmen, and 20% will quit permanently. During the present year,

292 Chap. 5 Diagonalization
50% of the students at the school are sophomores and 50% are freshmen. As
suming that the trend indicated by the data continues indefinitely, the school
would like to know
1. the percentage of the present students who will graduate, the percentage
who will be sophomores, the percentage who will be freshmen, and the
percentage who will quit school permanently by next fall;
2. the same percentages as in item 1 for the fall semester two years hence;
and
3. the probability that one of its present students will eventually graduate.
The preceding paragraph describes a four-state Markov chain with the
following states:
1. having graduated
2. being a sophomore
3. being a freshman
4. having quit permanently.
•
The given data provide us with the transition matrix
/l 0.4 0.1 0
A =
0 0.3 0.5 0
0 0 0.2 0
lo 0.3 0.2 l)
of the Markov chain. (Notice that students who have graduated or have quit
permanently are assumed to remain indefinitely in those respective states.
Thus a freshman who quits the school and returns during a later semester
is not regarded as having changed states- the student is assumed to have
remained in the state of being a freshman during the time he or she was not
enrolled.) Moreover, we are told that the present distribution of students is
half in each of states 2 and 3 and none in states 1 and 4. The vector
P =
( o
0.5
0.5
V
that describes the initial probability of being in each state is called the initial
probability vector for the Markov chain.
To answer question 1, we must determine the probabilities that a present
student will be in each state by next fall. As we have seen, these probabilities
are the coordinates of the vector
AP =
(\ 0.4 0.1 0
0 0.3 0.5 0
0 0 0.2 0
\0 0.3 0.2 l)
< °)
0.5
0.5
x <V
/0.25
0.40
0.10
vO.25/

Sec. 5.3 Matrix Limits and Markov Chains 293
Hence by next fall, 25% of the present students will graduate, 40% will be
sophomores, 10% will be freshmen, and 25% will quit the school permanently.
Similarly,
A2P = A(AP)
provides the information needed to answer question 2: within two years 42%
of the present students will graduate, 17% will be sophomores, 2% will be
freshmen, and 39% will quit school.
Finally, the answer to question 3 is provided by the vector LP, where
L = lim Am. For the matrices
/l 0.4 0.1 0
0 0.3 0.5 0
0 0 0.2 0
/0.25
0.40
0.10
v0 0.3 0.2 1/ \0.25J
/0.42
0.17
0.02
[p.39 J
Q =
( 19 0
0 -7 -40 0
0 0 8 0
\0 3 13 1/
and D =
(I 0 0 0
0 0.3 0 0
0 0 0.2 0
\o o o iy
we have Q~*AQ = D. Thus
L= lim Am = Q ( lim Dm) Q
TO—»00 \TO—*00 /
-1
/l 4 19 0\ /l 0 0 0
0 -7 -40 0 0 0 0 0
0 0 80 0000
\0 3 13 1/ i0 0 0 1/
a I
0 0
0 0
0 f
1 o
0 0
0 0
I 1
/I
0
0
0
4
7
1
7
0
3
7
27
56
5
7
1
8
29
56
o
0
0
1
So
LP =
(l f
0 0
0 0
^ 0
56 u
0 0
0 0
29 i
56 iy
/ o
0.5
0.5
1 o^
f 112
0
0
53
\112/
and hence the probability that one of the present students will graduate is ^

294 Chap. 5 Diagonalization
In the preceding two examples, we saw that lim AmP, where A is the
TO—»00
transition matrix and P is the initial probability vector of the Markov chain,
gives the eventual proportions in each state. In general, however, the limit of
powers of a transition matrix need not exist. For example, if
A/ =
0 1
1 0
then lim Mm does not exist because odd powers of M equal M and even
TO—»00
powers of M equal /. The reason that the limit fails to exist is that con
dition (a) of Theorem 5.13 does not hold for M ( — 1 is an eigenvalue). In
fact. it can be shown (see Exercise 20 of Section 7.2) that the only transition
matrices A such that lim Am does not exist are preeiselv those matrices for
TO—»00
which condition (a) of Theorem 5.13 fails to hold.
But even if the limit of powers of the transition matrix exists, the compu
tation of the limit may be quite difficult. (The reader is encouraged to work
Exercise 6 to appreciate the truth of the last sentence.) Fortunately, there is
a large and important class of transition matrices for which this limit exists
and is easily computed this is the cla.ss of nyular transition matrices.
Definition. A transition matrix is called regular if some power of the
matrix contains only positive entries.
Example 2
The transit ion matrix
0.90
0.10
0.02
0.98
of the Markov chain used in the city suburb problem is clearly regular because
each entry is positive. On the other hand, the transition matrix
A =
of the Markov chain describing community college enrollments is not regular
because the first column of .4'" is
/I
0
1)
In
0.4
0.3
0
0.3
0.1
0.5
0.2
0.2
o
0
0
)
for any power rn.

Sec. 5.3 Matrix Limits and Markov Chains
Observe that a regular transition matrix may contain zero entries. For
example,
/0.9 0.5 0N
M = 0 0.5 0.4
\0.1 0 0.6,
is regular because every entry of M2 is positive. •
The remainder of this section is devoted to proving that, for a regular
transition matrix A, the limit of the sequence of powers of A exists and
has identical columns. From this fact, it is easy to compute this limit. In
the course of proving this result, we obtain some interesting bounds for the
magnitudes of eigenvalues of any square matrix. These bounds are given in
terms of the sum of the absolute values of the rows and columns of the matrix.
The necessary terminology is introduced in the definitions that follow.
Definitions. Let A G MnXn(C). For 1 < i, j < n, define Pi(A) to be the
sum of the absolute values of the entries of row i of A, and define Vj(A) to be
equal to the sum of the absolute values of the entries of column j of A. Thus
n
pi(A) = yjAjj-| fori = 1,2, ...n
and
,(A) = ^|Aij| for j = 1,2,
i=i
The row sum of A, denoted p(A), and the column sum of A, denoted f(A),
are defined as
p(A) = max{pi(A): 1 < i < n] and u(A) = max{vj(A): 1 < j <n}.
Example 3
For the matrix
A =
Pi(A) = 7, p2(A) = 6 + s/h, Ps(A) = 6, vx(A) = 4 + v% v2(A) = 3, and
u3(A) = 12. Hence p(A) = 6 + y/h and v(A) = 12. •
J

296 Chap. 5 Diagonalization
Our next results show that the smaller of p(A) and v(A) is an upper
bound for the absolute values of eigenvalues of A. In the preceding example,
for instance, A has no eigenvalue with absolute value greater than 6 + \/5-
To obtain a geometric view of the following theorem, we introduce some
terminology. For an nxn matrix A, we define the ith Gerschgorin disk C{ to
be the disk in the complex plane with center An and radius ri — Pi(A) — \An\;
that is,
Ci = {z£ C: \z- Au\ < n}.
For example, consider the matrix
l+2i 1
A =
2i -3/ '
For this matrix, Ci is the disk with center 1 + 2i and radius 1, and C2 is the
disk with center —3 and radius 2. (See Figure 5.4.)
imaginary axis
Figure 5.4
Gershgorin's disk theorem, stated below, tells us that all the eigenvalues
of A are located within these two disks. In particular, we see that 0 is not an
eigenvalue, and hence by Exercise 8(c) of section 5.1, A is invertible.
Theorem 5.16 (Gerschgorin's Disk Theorem). Let A G MnXn(C).
Then every eigenvalue of A is contained in a Gerschgorin disk.
Proof. Let A be an eigenvalue of A with the corresponding eigenvector
V =
(vi
V2
\Vn)

Sec. 5.3 Matrix Limits and Markov Chains 297
Then v satisfies the matrix equation Av = An, which can be written
^2 Aiivi = Xvi (» = 1» 2,..., n). (2)
Suppose that vk is the coordinate of v having the largest absolute value; note
that vk / 0 because v is an eigenvector of A.
We show that A lies in Ck, that is, |A — Akk\ < rk. For i = k, it follows
from (2) that
\Xvk - Akkvk\ = ^AkjVj - Akkvk
j=i
J2 AkivJ
Thus
so
^ ^2 \Akj\M ^ S I-4** IK I
= \vk\^2\Akj\ = \vkk.
\vk\X- Akk\ < \vkk;
\X- Aul < rk
because \vk\ > 0.
Corollary 1. Let A be any eigenvalue of A G MnXn(C). Then |A| < p(A).
Proof. By Gerschgorin's disk theorem, |A — Afcfcl < rk for some A;. Hence
|A| = |(A - Akk) + Akk\ < |A - Afcfcl + |Afcfc|
<rk + \Akk\ = Pk(A)<p(A). |
Corollary 2. Let A be any eigenvalue of A G MnXn(C). Then
|A|<min{/)(A),i/(A)}.
Proof. Since |A| < p(A) by Corollary 1, it suffices to show that |A| < v(A).
By Exercise 14 of Section 5.1, A is an eigenvalue of A1, and so |A| < p(Al)
by Corollary 1. But the rows of A1 are the columns of A; consequently
p(Al) = v(A). Therefore |A| < v(A). |
The next corollary is immediate from Corollary 2.

298 Chap. 5 Diagonalization
Corollary 3. If X is an eigenvalue of a transition matrix, then |A| < 1.
The next result asserts that the upper bound in Corollary 3 is attained.
Theorem 5.17. Every transition matrix has 1 as an eigenvalue.
Proof. Let A be an n x n transition matrix, and let u G R" be the column
vector in which each coordinate is 1. Then Alu = u by Theorem 5.15, and
hence u is an eigenvector of A1 corresponding to the eigenvalue 1. But since
A and A1 have the same eigenvalues, it follows that 1 is also an eigenvalue of
A. I
Suppose that A is a transition matrix for which some eigenvector corre
sponding to the eigenvalue 1 has only nonnegative coordinates. Then some
multiple of this vector is a probability vector P as well as an eigenvector of
A corresponding to eigenvalue 1. It is interesting to observe that if P is the
initial probability vector of a Markov chain having A as its transition matrix,
then the Markov chain is completely static. For in this situation, AmP = P
for every positive integer m; hence the probability of being in each state never
changes. Consider, for instance, the city-suburb problem with
P =
Theorem 5.18. Let A G MnXn(C) he a matrix in which each entry is
positive, and let X be an eigenvalue of A such that |A| = p(A). Then X — p(A)
and {u} is a basis for E\, where u G Cn is the column vector in which each
coordinate equals 1.
Proof. Let v be an eigenvector of A corresponding to A, with coordinates
Vi,v2, • • • ,vn. Suppose that vk is the coordinate of v having the largest ab
solute value, and let b — \vk\. Then
|A|6= |A||vfc| = |Anfc| =
E^
3=1
<Y,\AkJvj
3=1
= E \^\M ^ E I4ylfc = p^A)h ^ f>Wb'
3=1 3=1
(3)
Since |A| = p(A), the three inequalities in (3) are actually equalities; that is,
(a)
E Ak3V3
3=1
= ^r\Akjvj\,
3=1

Sec. 5.3 Matrix Limits and Markov Chains 299
(b) El^ilN
(c) Pk(A) = p(A).
^ | Afcj 16, and
We see in Exercise 15(b) of Section 6.1 that (a) holds if and only if all
the terms AkjVj (j — 1,2,..., n) are nonnegative multiples of some nonzero
complex number z. Without loss of generality, we assume that \z\ — 1. Thus
there exist nonnegative real numbers ci, c2,..., cn such that
AkjVj = CjZ. (4)
By (b) and the assumption that Akj ^ 0 for all k and j, we have
|Vj | = b for j = 1,2,..., n.
Combining (4) and (5), we obtain
6 = \VJ\ —
A
kj
A
kj
for j = 1,2,... ,n,
and therefore by (4), we have Vj — bz for all j. So
v2
KVJ
(bz
bz
= bzu,
Kbz)
and hence {u} is a basis for EA-
Finally, observe that all of the entries of Au are positive because the same
is true for the entries of both A and u. But Au — Xu, and hence A > 0.
Therefore, A = |A| = p(A). |
Corollary 1. Let A G MnXn(C) be a matrix in which each entry is
positive, and let X be an eigenvalue of A such that |A| = i'(A). Then X — v(A),
and the dimension ofE\ = 1.
Proof. Exercise.
Corollary 2. Let A G Mnxri(C) be a transition matrix in which each
entry is positive, and let X be any eigenvalue of A other than 1. Then \X\ < 1.
Moreover, the eigenspace corresponding to the eigenvalue 1 has dimension 1.
Proof. Exercise.
Our next result extends Corollary 2 to regular transition matrices and thus
shows that regular transition matrices satisfy condition (a) of Theorems 5.13
and 5.14.

300 Chap. 5 Diagonalization
Theorem 5.19. Let A be a regular transition matrix, and let X be an
eigenvalue of A. Then
(a) |A| < 1.
(b) If \X\ = 1, then A = 1, and dim(EA) = 1.
Proof. Statement (a) was proved as Corollary 3 to Theorem 5.16.
(b) Since A is regular, there exists a positive integer s such that As has
only positive entries. Because A is a transition matrix and the entries of
As are positive, the entries of As+1 = AS(A) are positive. Suppose that
|A| = 1. Then As and As+1 are eigenvalues of As and As+1, respectively,
having absolute value 1. So by Corollary 2 to Theorem 5.18, As = As+1 = 1.
Thus A = 1. Let EA and E'x denote the eigenspaces of A and As, respectively,
corresponding to A = 1. Then EA Q E'X and, by Corollary 2 to Theorem 5.18,
dim(E'A) = 1. Hence EA = EA, and dim(EA) = 1. |
Corollary. Let A be a regular transition matrix that is diagonalizable.
Then lim Am exists.
TO—>00
The preceding corollary, which follows immediately from Theorems 5.19
and 5.14, is not the best possible result. In fact, it can be shown that if A is
a regular transition matrix, then the multiplicity of 1 as an eigenvalue of A is
1. Thus, by Theorem 5.7 (p. 264), condition (b) of Theorem 5.13 is satisfied.
So if A is a regular transition matrix, lim Aw exists regardless of whether
TO—»00
A is or is not diagonalizable. As with Theorem 5.13, however, the fact that
the multiplicity of 1 as an eigenvalue of A is 1 cannot be proved at this time.
Nevertheless, we state this result here (leaving the proof until Exercise 21 of
Section 7.2) and deduce further facts about lim A771 when A is a regular
m —»oo
transition matrix.
Theorem 5.20. Let A be an n x n regular transition matrix. Then
(a) The multiplicity of 1 as an eigenvalue of A is 1.
(b) lim A771 exists.
m—*oc
(c) L = lim Am is a transition matrix.
TO—>00
(d) AL = LA = L.
(e) The columns of L are identical. In fact, each column of L is equal to
the unique probability vector v that is also an eigenvector of A corre
sponding to the eigenvalue 1.
(f) For any probability vector w, lim (Amw) = v.
m—»oo
Proof, (a) See Exercise 21 of Section 7.2.
(b) This follows from (a) and Theorems 5.19 and 5.13.
(c) By Theorem 5.15, we must show that ulL = u*. Now Am is a transition
matrix by the corollary to Theorem 5.15, so
u'L = ul lim Am = lim utAm = lim u* = u

Sec. 5.3 Matrix Limits and Markov Chains 301
and it follows that L is a transition matrix,
(d) By Theorem 5.12,
AL = A Mm Am =
m—•TC
lim A A"1 - lim A", + ] = L.
m—»oo in.—>oo
Similarly, LA — L.
(e) Since AL — L by (d), each column of L is an eigenvector of A cor
responding to the eigenvalue 1. Moreover, by (c), each column of L is a
probability vector. Tims, by (a), each column of L is equal to the unique
probability vector v corresponding to the eigenvalue 1 of A.
(f) Let w be any probability vector, and set y = lim A'"w = Lie. Then
III >OC
y is a probability vector by the corollary to Theorem 5.15, and also Ay —
ALw = Lir — y by (d). Hence y is also an eigenvector corresponding to the
eigenvalue 1 of A. So y — v by (e). H
Definition. The vector u in Theorem 5.20(c) is culled the fixed prob
ability vector or stationary vector of the regular transition matrix A.
Theorem 5.20 can be used to deduce information about the eventual dis
tribution in each state of a Markov chain having a. regular transition matrix.
Example 4
A survey in Persia showed that on a particular day 50% of the1 Persians
preferred a loaf of bread, 30% preferred a jug of wine, and 20% preferred
"thou beside me in the wilderness." A subsequent survey 1 month later
yielded the following data: Of those who preferred a loaf of bread on the first
survey. 40% continued to prefer a loaf of bread. 10%: now preferred a jug of
wine, and 50% preferred "thou"; of those who preferred a jug of wine on the
first survey, 20%. now preferred a. loaf of bread, 70%: continued to prefer a jug
of wine, and 10%. now preferred '•thou": of those who preferred "thou" on the
first survey, 20% now preferred a loaf of bread. 20% now preferred a. jug of
wine, and 60% continued to prefer "thou."
Assuming that this trend continues, the situation described in the preced
ing paragraph is a three-state Markov chain in which the slates are the three
possible preferences. We can predict the percentage of Persians in each state
for each month following the original survey. Letting the first, second, and
third states be preferences for bread, wine, and "thou", respectively, we sec
that the probability vector that gives the initial probability of being in each
state is
P =

302 Chap. 5 Diagonalization
and the transition matrix is
/0.40 0.20 0.20
.4 - 0.10 0.70 0.20 .
\0.50 0.10 0.60/
The probabilities of being in each state m months after the original survey
arc the coordinates of the vector A"1 P. The reader may check that
/0.30\ /0.26\ /0.252\ /0.2504
AP = 0.30 , A2P = 0.32 . A*P -- 0.334 . and A4P = 0.3418 .
\0.40y \0.42/ (). 111/ \0.4078/
Note the apparent convergence of A" P.
Since A is regular, the long-range prediction concerning the Persians" pref
erences can be found by computing the fixed probability vector for .4. this
vector is the unique probability vector v such that (A — I)v = 0. Letting
we see that the matrix equation (.4 — I)v — 0 yields the following system of
linear equations:
-O.GOe, + 0.20e2 + 0.20e;5 = 0
0.10v) 0.30^2 + 0.20^3 =0
0.50«i 4- 0.10e-2 - 0.40^3 - 0 .
It is easilv shown that
is a basis for the solution space of this system. Hence the unique fixed prob
ability vector for A is
/nTTTlA
5+7+8
8
\5+7+8/
Thus, in the long run. 25%. of the Persians prefer a loaf of bread. 35% prefer
a jug of wine, and 40% prefer "thou beside me in the wilderness."

Sec. 5.3 Matrix Limits and Markov Chains
Note thai if
-G
then
303
So
lim A"1 =Q
0.25 0.25 0.25
I 0.35 0.35 0.35
0.40 0.40 0.40
i o o
Q-] =Q 0 0 0 Q"
\0 0 0/
Example 5
Farmers in Lamron plant one crop per year either corn, soybeans, or wheat.
Because they believe in the necessity of rotating their crops, these farmers do
not plant the same crop in successive years, in fact. of the total acreage on
which a particular crop is planted, exactly half is planted with each of the
other two crops during the succeeding year. This year, 300 acres of corn, 200
acres of soybeans, and 100 acres of wheat were planted.
The situation just described is another three-state Markov chain in which
the three states correspond to the planting of corn, soybeans, and wheat,
respectively. In this problem, however, the amount of land devoted to each
crop, rather than the percentage of the total acreage (GOO acres), is given. By
converting these amounts into fractions of the total acreage, we see that the
transition matrix A and the initial probability vector P of the Markov chain
are
A =
/0 J h
0 i
A 0
and P =
300
600
•Jim
i>(in
100
V 600
fi
w
The fraction of the total acreage devoted to each crop in m years is given by
the coordinates of A'"P. and the eventual proportions of the total acreage1
used for each crop are the coordinates of lim A" P. Thus the (wentrial

304 Chap. 5 Diagonalization
amounts of land devoted to each crop are found by multiplying this limit by
the total acreage; that is, the eventual amounts of land used for each crop
are the coordinates of GOO • lim A'" P.
m—-oo
Since A is a regular transition matrix, Theorem 5.20 shows that lim -4'"
III —>00
is a matrix L in which each column equals the unique fixed probability vector
for A. It is easily seen that the fixed probability vector for A is
/i
i
3
Hence
L =
I 3 3 :?
i I i
:i 3 :i
1 l i
:{ 3 :5
so
GOO- lim AmP = Q00LP= \ 200
III —>TC
Thus, in the long run, we expect 200 acres of each crop to be planted each
year. (For a direct computation of 600 • lim A"1 P. see Exercise 14.) •
•III—>OC
In (his section, we have1 concentrated primarily on the theory of regular
transition matrices. There is another interesting class of transition matrices
that can be represented in the form
I D
() C
where / is an identify matrix and O is a zero matrix. (Such transition ma
trices are not regular since the lower left block remains O in any power of
the matrix.) The states corresponding to the identity submatrix are called
absorbing states because such a. state is never left once it is entered. A
Markov chain is called an absorbing Markov chain if it is possible to go
from an arbitrary state into an absorbing state; in a finite number of stages.
Observe that the Markov chain that describes the enrollment pattern in a
community college is an absorbing Markov chain with states 1 and 4 as its ab
sorbing states. Readers interested in learning more about absorbing Markov
chains are referred to Introduction to Finite. Mathematics (third edition) by

Sec. 5.3 Matrix Limits and Markov Chains 305
J. Kemeuy. .]. Snell. and G. Thompson (Prentice-Hall. Inc.. Englewood Cliffs,
N. J., 1974) or Discrete Mathematical Models by Fred S. Roberts (Prentice-
Hall, Inc., Englewood Cliffs, N. .).. 1976).
An Application
In species that reproduce sexually, the characteristics of an offspring with
respect to a particular genetic trait are determined by a pair of genes, one
inherited from each parent. The genes for a particular trait are of two types,
which are denoted by G and g. The gene (4 represents the dominant char
acteristic, and g represents the recessive characteristic. Offspring with geno
types GG or Gg exhibit the dominant characteristic, whereas offspring with
genotype gg exhibit the recessive characteristic. For example, in humans,
brown eyes are a dominant characteristic and blue eyes are the correspond
ing recessive characteristic; thus the offspring with genotypes GG or Gg are
brown-eyed, whereas those of type gg are blue-eyed.
Let us consider the probability of offspring ol' each genotype for a male
parent of genotype Gg. (We assume that the population under consideration
is large, that mating is random with respect to genotype, and that the distri
bution of each genotype within the population is independent of sex and life
expectancy.) Let
P =
denote the proportion of the adult population with genotypes GG, Gg, and
gg, respectively, at the start of the experiment. This experiment describes a
three-state Markov chain with the following transition matrix:
Genotype of female parent
GG Gg
Genotype GG
Gg
offspring
(
gg
°
1
2
I
2I
- 13.
It is easily checked that I}2 contains only positive entries; so II is regular.
Thus, by permitting only males of genotype Gg to reproduce, the proportion
of offspring in the population having a. certain genotype will stabilize at the
fixed probability vector for B. which is
f
w

306 Chap. 5 Diagonalization
Now suppose that similar experiments are to be performed with males of
genotypes GG and gg. As already mentioned, these experiments are three-
state Markov chains with transition matrices
A = 0
Vo o
/() 0
c =
o>
t)
respectively. In order to consider the case where all male genotypes are per
mitted to reproduce, we must, form the transition matrix M = pA + qB + rC.
which is the linear combination of A. D. and C weighted by the proportion
of males of each genotype. Thus
(P
M =
!« n> 31 o
V
k + r ¥
0
h
b kq + r
To simplify the notation, let a = p+ \q and b = \q + r. (The numbers a and
b represent the proportions of G and g genes, respectively, in the population.)
Then
(a \a 0
M =
b \ a
0 b
where a + b = p + q+ r = 1.
Let //, q', and /•' denote the proportions of the- first-generation offspring
having genotypes GG, Gg, and gg. respectively. Then
= MP -
/ ap+\aq
bp+ k + <l
L2b<l 4 br J
In order to consider the effects of unrestricted niatings among the first-
generation offspring, a new transition matrix A/ must be determined based
upon the distribution of first-generation genotypes. As before, we find that
M =
(V ¥
¥ ¥ + ¥ * ¥
¥ + ¥
o
¥ p
¥ 4 r'
(a1
b'
0
0
V
2'
| a'
¥ b'

Sec. 5.3 Matrix Limits and Markov Chains
where a' = p' 4- hq' and 1/ = hq' 4- r'. However
307
a' = a2 + -(2a6) = a(a + b) = a and b' - -(2a6) 4- b2 = b(a + b) = 6.
Thus M = 47: so the distribution of second-generation offspring among
the three genotypes is
/ a3+a2b \ I a2(a + b) \ (a2
M(MP) = M2P = a2b 4- ah + ah2 - ab(a 4-14-6) = 2ab
\ ab2 + b3 ) \ b2(a | 6) / \lr
the same as the first-generation offspring. In other words. MP is the fixed
probability vector for M, and genetic equilibrium is achieved in the population
after only one generation. (This result is called the Hardy Weinberg law.)
Notice that in the important special case; that a — b (or equivalently, that
p — r), the distribution at equilibrium is
MP - 2ab I
rt
\¥
EXERCISES
1. Label the following statements as true or false
(a) UAe
'nx n
{(.) and lim A" = L. then, for any invertible matrix
(b)
(c)
Q e MnXn(C), we have lim QAmQ~i = QLQ
If 2 is an eigenvalue of A €
1 ii x n
(C). then lim A" does not
exist.
Anv vector
(d)
(e)
•l'2
such that .r\ + x-2 4- • • • 4- xn = 1 is a probability vector.
The sum of the entries of each row of a transition matrix equals 1.
The product of a transition matrix and a probability vector is a
probability vector.

308 Chap. 5 Diagonalization
(f) Let z be any complex number such that \z\ < 1. Then the matrix
does not have 3 as an eigenvalue,
(g) Every transition matrix has 1 as an eigenvalue.
(h) No transition matrix can have —1 as an eigenvalue,
(i) If A is a transition matrix, then lim Am exists.
m—>oo
(j) If A is a regular transition matrix, then lim Am exists and has
m—+oo
rank 1.
2. Determine whether lim Am exists for each of the following matrices
m—»oo
A, and compute the limit if it exists.
(a)
(d)
0.1 0.7
0.7 0.1
-1.8
-0.8
4.8
2.2
(b)
(e)
-1.4
-2.4
0.8
1.8
-2 -1
4 3
(c)
(f)
0.4 0.7
0.6 0.3
2.0 -0.5
3.0 -0.5
G)
3
-7 4- 2i
3
-13 4-62
3
-5 4- i
3
-5 4-Gi
G 6
7 - 2z
35 - 20?:
3. Prove that if A\, A2, • •. is a sequence of n x p matrices with complex
entries such that lim Am = L. then lim (AmY = V.
m »oo m—>oo
4. Prove that if A € MnXU(C) is diagonalizable and L = lim Am exists,
then either L = In or rank(L) < n.

Sec. 5.3 Matrix Limits and Markov Chains 309
5. Find 2x2 matrices A and B having real entries such that lim Am,
m—»oo
lim Bm. and lim (AB)m all exist, but
m—-oc in ->oc
lim (AB)m ^ ( lim Am)( lim Bm).
6. A hospital trauma unit has determined that 30% of its patients are
ambulatory and 70% are bedridden at the time of arrival at the hospital.
A month after arrival, 60% of the ambulatory patients have recovered,
20% remain ambulatory, and 20% have become bedridden. After the
same amount of time, 10% of the bedridden patients have recovered,
20% have become ambulatory, 50%: remain bedridden, and 20% have
died. Determine the percentages of patients who have recovered, are
ambulatory, are bedridden, and have died 1 month after arrival. Also
determine the eventual percentages of patients of each type.
7. A player begins a. game of chance by placing a marker in box 2, marked
Start. (See Figure 5.5.) A die is rolled, and the marker is moved one
square to the left if a I or a 2 is rolled and one square to the right if a
3. 4. 5. or G is rolled. This process continues until the marker lands in
square 1. in which case the player wins the game, or in square 4, in which
case the player loses the game. What is the probability of winning this
game? Hint: Instead of diagonali/ing the appropriate transition matrix
Win
1
Start
2 3
Lose
4
Figure 5.5
,4, it is easier to represent e<i as a linear combination of eigenvectors of
A and then apply A" to the result.
8. Which of the following transition matrices are regular?
(a)
(d)
0.2
0.3
0.5
0.5
0.5
0
0.3 0.5
0.2 0.5
0.5 0
0 1
1 0
0 «/
/0.5 o r
(b) 0.5 0 0
\ 0 1 0,
(e)
(\ 0 °
/0.5 0 0'
(c) 0.5 0 1
\ 0 1 0.
0
1
(f)
/l 0 0
0 0.7 0.2
\0 0.3 0.8

310 Chap. 5 Diagonalization
(g)
/0 \ 0 0
A o o o
i i o
I o i
(h)
0 0
V1
4 o
o
9. Compute lim A7"' if it exists, for each matrix A in Exercise 8.
m—»oc
10. Each of the matrices that follow is a regular transition matrix for a
three-state Markov chain. In all cases, the initial probability vector is
11.
12.
For each transition matrix, compute the proportions of objects in each
state after two stages and the eventual proportions of objects in each
state by determining the fixed probability vector.
/0.6 0.1 0.l\ /0.8 0.1 0.2\ /0.9 0.1 ().l
(a) 0.1 0.9 0.2 (b) 0.1 0.8 0.2 (c) 0.1 0.6 0.1
\0.3 0 0.7/ V0.1 0.1 0.6/ \ 0 0.3 0.8/
(d)
0.4
0.1
0.5
0.2
0.7
0.1
0.2
0.2
0.6
(e)
0.5 0.3
0.2 0.5
0.3 0.2
0.2
0.3
0.5
(f)
0.6 0 0.4
0.2 0.8 0.2
0.2 0.2 0.4
In 1940, a county land-use survey showed that 10% of the county land
was urban, 50%i was unused, and 40% was agricultural. Five years later.
a follow-up survey revealed that 70%. of the urban land had remained
urban, 10% had become unused, and 20% had become agricultural.
Likewise, 20% of the unused land had become urban. 60% had remained
unused, and 20% had become agricultural. Finally, the 1945 survey
showed that 20% of the agricultural land had become unused while
80% remained agricultural. Assuming that the trends indicated by the
1945 survey continue, compute the percentages of urban, unused, and
agricultural land in the county in 1950 and the corresponding eventual
percentages.
A diaper liner is placed in each diaper worn by a baby. If. after a
diaper change, the liner is soiled, then it is discarded and replaced by a
new liner. Otherwise, the liner is washed with the diapers and reused,
except that each liner is discarded and replaced after its third use (even
if it has never been soiled). The probability that the baby will soil any
diaper liner is one-third. If there are only new diaper liners at first,
eventually what proportions of the diaper liners being used will be new,

Sec. 5.3 latrix Limits and Markov Chains 311
once used, and twice used? Hint: Assume that a diaper liner ready for
use is in one of three states: new. once used, and twice used. After its
use. it then transforms into one of the three states described.
13. In 1975. the automobile industry determined that 40% of American car
owners drove Large cars, 20% drove intermediate-sized cars, and 40%
drove small cars. A second survey in 1985 showed that 70% of the large-
car owners in 1975 still owned large cars in 1985, but 30% had changed
to an intermediate-sized car. Of those who owned intermediate-sized
cars in 1975, 10% had switched to large cars. 70% continued to drive
intermediate-sized cars, and 20% had changed to small cars in 1985.
Finally, of the small-car owners in 1975, 10% owned intermediate-sized
cars and 90% owned sma.ll cars in 1985. Assuming that these trends
continue, determine the percentages of Americans who own cars of each
size in 1995 and the corresponding eventual percentages.
14. Show that if A and P are as in Example 5, then
(
rm rm I 1 ''m+1
>'m + i rin rm-\ i J .
rm I 1 7'm-l !'„
Vm = 3 om-
Deduce that
/30(r
600(4'" P) - A" 200 ]
V 100
200
200
2">
200
(-1)'"11
2'
IOO':
(100)
15. Prove that if a 1-dimensional subspace W of R" contains a nonzero vec
tor with all nonnegative entries, then W contains a unique probability
vector.
16. Prove Theorem 5.15 and its corollary.
17. Prove the two corollaries of Theorem 5.18.
18. Prove the corollary of Theorem 5.19.
19. Suppose that M and M' are n x n transition matrices.

312 Chap. 5 Diagonalization
(a) Prove that if M is regular. N is any n x n transition matrix, and
c is a real number such that 0 < c < 1, then cM + (1 — c)N is a
regular transition matrix.
(b) Suppose that for all i. j, we have that M[- > 0 whenever A/;,- > 0.
Prove that there exists a transition matrix N and a real number c
with 0 < c. < 1 such that M''= cM 4- (1 - c)JV.
(c) Deduce that if the nonzero entries of M and M' occur in the same
positions, then M is regular if and only if M' is regular.
The following definition is used in Exercises 20-24.
Definition. For A 6 Mnx„(C), define e = lim Bm. where
III- —' oc
A2 Am
Bm - I 4 A
2!
(see Exercise 22). Thus e is the .sum of the infinite series
1 + A +
A*_
3!
and Bm is the mth partial sum of this series. (Note the analogy with the
power series
c" = 1
2!
a:'
4!
which is valid for all complex numbers a.)
20. Compute c° and c1. where () and / denote the n x n zero and identity
matrices, respectively.
21. Let P~lAP = D be a diagonal matrix. Prove that eA = PeDP~1.
22. Let A G Mnxn(C) be diagonalizable. Use the result of Exercise 21 to
show that eA exists. (Exercise 21 of Section 7.2 shows that e exists
for every Ac M„X„(C).)
23. Find A,B e M2x2(/?) such that eAeB ^ eA+B.
24. Prove that a differentiable function x: B -^ R" is a solution to the
system of differential equations defined in Exercise 15 of Section 5.2 if
and only if x(t) = e v for some u £ R", where A is defined in that
exercise.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 313
5.4 INVARIANT SUBSPACES AND THE CAYLEY-HAMILTON
THEOREM
In Section 5.1. we observed that if v is an eigenvector of a linear operator
T. then T maps the span of {r} into itself. Subspaces that are mapped into
themselves are of great importance in the study of linear operators (sec. e.g..
Exercises 28 42 of Section 2.1).
Definition. Let T be a linear operator on a vector space V. ,4 suhspace
W of\/ is called a T-invariant subspace of V if T(W) C_ W. thai is. if
T(v) £ W for all v 6 W.
Example 1
Suppose that T is a linear operator on a vector space V. Then the following
subspaces of V arc- T-invariant:
1. {0}
2. V
3. R(T)
f. N(T)
5. E.\- for any eigenvalue A of T.
The proofs that these subspaces are T-invariant are left as exercises. (Sec
Exercise 3.) •
Example 2
Let T be the linear operator on R'{ defined by
T(a,6,c) = (a + b,b + c,0).
Then the .///-plane = {(.r.vy.O): x.y G R) and the .r-axis - {(./:, 0.0): x E R}
are T-invariant subspaces of R5. •
Let T be a linear operator on a vector space V, and let. ;/: be a nonzero
vector in V. The? subspace
W-span({.r.T(:r).T2(.70,...})
is called the T-cyclic subspace of V generated by ./'. It is a. simple matter
to show that W is T-invariant.. In fact, W is the "smallest" T-invariant sub-
space of V containing x. That is. any T-invariant subspace of V containing x
must also contain W (see Exercise 11). Cyclic subspaces have various uses.
We apply them in this section to establish the Cayley -Hamilton theorem. In
Exercise 41. we outline a method for using cyclic subspaces to compute the
characteristic polynomial of a linear operator without resorting to determi
nants. Cyclic subspaces also play an important role in Chapter 7, when1 we
study matrix representations of nondiagonalizable linear operators.

314 Chap. 5 Diagonalization
Example 3
Let T be the linear operator on R,{ denned by
T(a, byc) = (-b + c, a + c, 3c).
We determine the T-cyclic subspace generated by e.\ = (1.0,0). Since
T(ei)=T(l,0,0) = (0,l,0) = e2
and
T2(ex) = T(T(ei)) - T(e2) = (-1,0,0) = -c,.
it follows that
span({ei,T(e1),T2(e1),...}) = span({ei,e2}) - {(a.t.O): s.t e R}. •
Example 4
Let T be the linear operator on P(R) defined by T(f(x)) = f'(x). Then the
T-cyclic subspace generated by x is span({x , 2./:. 2}) = P-2(R). •
The existence of a T-invariant subspace provides the opportunity to define
a new linear operator whose domain is this subspace. If T is a linear operator
on V and W is a T-invariant subspace of V, then the restriction Tw of T to
W (sec Appendix B) is a mapping from W to W. and it follows that Tw is
a linear operator on W (see Exercise 7). As a linear operator, Tw inherits
certain properties from its parent operator T. The following result illustrates
one way in which the two operators are linked.
Theorem 5.21. Let T be a linear operator on a finite-dimensional vector
space V, and let W be a T-invariant subspace of V. Then the characteristic
polynomial of Tw divides the characteristic polynomial of T.
Proof. Choose an ordered basis 7 = {v\ ,t>2, •. •, Vk] for W, and extend it
to an ordered basis (3 — {v\. ??2, • • • <Vk,Vk+1, • • •, vn} for V. Let. A = [T]g and
Bi = [Tw]-y Then, by Exercise 12, A can be written in the form
A =
B\ B2
O B3
Let f(t) be the characteristic polynomial of T and g(t) the characteristic
polynomial of Tw- Then
/(/) det(.4 //„) drl[Blo*J*
Bs-tInJ •'/:/"'lM //; '
by Exercise 21 of Section 4.3. Thus g(t) divides f(t).

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 315
Example 5
Let T be the linear operator on R' defined by
T(a, 6, c, d) = (a 4-6 4- 2c - d, b + d, 2c - d. c + d),
and let W - {(/. .s. 0, 0): t. s 6 R}. Observe that W is a T-invariant subspace
of R' because, for any vector (a,6,0,0) G R4,
T(«.6.0.0) - (a + b. 6,0,0) € W.
Lei -) — {> \.<->). which is an ordered basis for W. Extend 7 to the standard
ordered basis 0 for R1. Then
R> - [Twl-v = 4 A=[T]f3 =
(\ 1 2 -1
0 10 1
0 0 2 -1
\o 0 1 )
in the notation of Theorem 5.21. Let f(t) be the characteristic polynomial of
T and g(t) be the characteristic polynomial of Tw- Then
f(t) = det(A - tl4) = det
A-i 1 2 -1
0 1 - /. 0 1
0 0 2-t -1
\ 0 0 11- tj
''o!!.-,— 2r( r_
- g(t)- det
2 /
- <
In view of Theorem 5.21. we may use the characteristic polynomial of Tw
to gain information about the characteristic polynomial of T itself. In this re
gard, cyclic subspaces are useful because the characteristic polynomial of the
restriction of a linear operator T to a cyclic subspace is readily computable.
Theorem 5.22. Let T be a linear operator on a finite-dimensional vector
space V. and let W denote the T-cyclic subspace ofV generated by a nonzero
vector v 6 V. Let k = dim(W). Then
(a) {v,T(v),T2(v),... .Tk '('•)} ^ a basis for W.
(b) Ifaov + aiT(v) + Hafc_iTfc" 1(v)+Tk(v) = 0, then the characteristic
polynomial of Tw is f(t) = (—1) (an 4- a. + • • • + ak-it A- 1

316 Chap. 5 Diagonalization
Proof, (a) Since v ^ 0, the set {v} is linearly independent. Let j be the
largest positive integer for which
0 = {v,T(v),...,V-1(v)}
is linearly independent. Such a j must exist because V is finite-dimensional.
Let Z = span(.tf). Then ji is a basis for Z. Furthermore, T^(v) 6 Z by
Theorem 1.7 (p. 39). We use this information to show that Z is a T-invariant
subspace of V. Let. w E Z. Since w is a linear combination of the vectors of
(3, there exist scalars bo,b\.... ,bj-^ such that
w = bnv + 6i T(i bj-iV-Hv),
and hence
T(w) = b0T(v) + b{T2{v) 4- • • • 4- fc-iT'(v).
Thus T(w) is a linear combination of vectors in Z, and hence belongs to Z.
So Z is T-invariant. Furthermore, e € Z. By Exercise 11. W is the smallest
T-invariant subspace of V that contains v. so that. W C Z. Clearly. Z C W.
and so we conclude that Z = W. It follows that d is a basis for W. and
therefore dim(W) = j. Thus j = k. This proves (a).
(b) Now view B (from (a)) as an ordered basis for W. Let a,). u\..... a,k
be the scalars such that
k I a0v + a{T(y) + ••• + ak-,Tk J(v) + Th(v) = 0.
Observe that
/() 0
I 0
an
-«i
[Tw]/s =
\0 0 ••• 1 -ak-J
which has the characteristic polynomial
/(i) = (-l)fc(a0-r-a1i4----4-afc_1ifc-1 tk)
by Exercise 19. Thus f(t) is the characteristic polynomial of Tw, proving (b).
Example 6
Let T be the linear operator of Example 3, and let W = span({ei,e2}), the
T-cyclic subspace generated by e\. We compute the characteristic polyno
mial f(t) of Tw in two ways: by means of Theorem 5.22 and by means of
determinants.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 317
(a) By means of Theorem, 5.22. From Example 3, we have that {ei.e^} is
a cycle that generates W, and that T2(ci) = —e\. Hence
\ex+{)T(e,) + T2(cl)^().
Therefore, by Theorem 5.22(b),
f(t) = (-l)2(l + 0t + t2) = t2 + l.
(b) By means of determinants. Let ft = {ci,e2j, which is an ordered basis
for W. Since T(ei) = c> and T(e2) = — e.\, we have
[Tw],, -
0 -1
and therefore.
1 -t]
The Cayley-Hamilton Theorem
As an illustration of the importance of Theorem 5.22, we prove a well-
known result that is used in Chapter 7. The reader should refer to Ap
pendix E for the definition of /(T), where T is a linear operator and f(x) is
a polynomial.
Theorem 5.23 (Cayley-Hamilton). Let T be a linear operator on a
finite-dimensional vector space V. and let f(t) be the characteristic polyno
mial of T. Then /(T) = T0. the zero transformation. That is, T "satisfies"
its characteristic equation.
Proof. We show that /(T)(v) = 0 for all v E V. This is obvious if v = 0
because /(T) is linear: so suppose that v ^ 0. Let W be the T-cyclic subspace
generated by u, and suppose that dim(W) = k. By Theorem 5.22(a), there
exist scalars an, «\, • • • • dk l .such that
-A--1 v)+Tk(v) = 0. a0v + aiT(v) + • • • + dk-iV
Hence Theorem 5.22(b) implies that
g{t) = (-l)fc(a0 4- ait +-••• + afc-it*"1 4- tk)
is the characteristic polynomial of Tw- Combining these two equations yields
9(T)(v) = (-l)fc(a0l + oiT + • • • + afc-iT^"1 + Tk)(v) = 0.
By Theorem 5.21. g(t) divides f(i); hence there exists a polynomial q(t) such
that f(t) = q(t)g(t). So
f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(J){0) = 0.

318 Chap. 5 Diagonalization
Example 7
Let T be the linear operator on R2 defined by T(a, b) = (a + 2b. —2a -1- b). and
let (3= {ei,e2}. Then
A =
1 2
-2 1/ '
where A = [T]^. The characteristic polynomial of T is, therefore.
/(f) = det(,4 - //) - det f ^ t %) = t2 - 2t 4- 5.
It is easily verified that T0 = /(T) = T2 - 2T + 51. Similarly.
-3 4\ , (-2 -4\ / ",
f(A) = A2-2A + 5/ =
0 0
0 ()/'
4 -3 4 -2 0 5
Example 7 suggests the following result.
Corollary (Cayley-Hamilton Theorem for Matrices). Let A be
an n x n matrix, and let f(t) be the characteristic polynomial of A. Then
f(A) = O, the n x n zero matrix.
Proof. See Exercise 15. 1
Invariant Subspaces and Direct Sums*
It is useful to decompose a finite-dimensional vector space V into a direct
sum of as many T-invariant subspaces as possible because the behavior of T
on V can be inferred from its behavior on the direct summands. For example.
T is diagonalizable if and only if V can be decomposed into a direct sum
of one-dimensional T-invariant subspaces (see Exercise 36). In Chapter 7.
we consider alternate ways of decomposing V into direct sums of T-invariant
subspaces if T is not diagonalizable. Wre proceed to gather a few facts about
direct sums of T-invariant subspaces that are used in Section 7.4. The first
of these facts is about characteristic polynomials.
Theorem 5.24. Let T be a linear operator on a finite-dimensional vector
space V, and suppose that V = W| © W-2 © • • • © \Nk, where W, is a T-
invariant subspace of V for each i (1 < i < k). Suppose that /,(/) is the
characteristic polynomial of Tw, (1 < i < k). Then fi(t)'f2(t) ,//,-(/) is
the characteristic polynomial of T.
'This subsection uses optional material on direct sums from Section 5.2.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 319
Proof. The proof is by mathematical induction on k. In what follows. f(l)
denotes the characteristic polynomial of T. Suppose first that k — 2. Let ;3
be an ordered basis for Wj, fa an ordered basis for W2, and 0 = fa U 02-
Then 3 is an ordered basis for V by Theorem 5.10(d) (p. 276). Let A = [T^,
B\ = [TwJ/Si) and J^2 = [Tw2]/V By Exercise 34, it follows that
A =
Bi O
O' B2y '
where O and O' are zero matrices of the appropriate sizes. Then
/(/) = det(i4 - tl) = det(B, - «/)• det(B2 - tl) = fi(t)-f2(t)
as in the proof of Theorem 5.21. proving the result for k — 2.
Now assume t hat the theorem is valid for k — \ summands, where k — 1 > 2,
and suppose that V is a direct, sum of k subspaces, say.
v = w, ©w2••].:•••• ewfe.
Let W = Wi I- W2-l hWfc 1. It is easily verified that W is T-invariant and
that V - W I Wfc. So by the case for k = 2, /(/) - g(t)-fk(t), where g(t) is
the characteristic polynomial of Tw- Clearly W = Wi ®W2©- • -©Wfc_i, and
therefore g(t) — fi(t)-f2(t) fk-i(t) by the induction hypothesis. We
conclude that f{t)=g(t)'fk(t) - fi(t)-f2{t) fk(t). I
As an illustration of this result, suppose that T is a diagonalizable lin
ear operator on a finite-dimensional vector space V with distinct eigenvalues
A]. A2 A/,.. By Theorem 5.11 (p. 278), V is a direct sum of the eigenspaces
of T. Since each eigenspace is T-invariant, we may view this situation in the
context of Theorem 5.24. For each eigenvalue A,;, the restriction of T to E\,
has characteristic polynomial (A; f)"1' • where m., is the dimension of E^..
By Theorem 5.24. the characteristic polynomial f(t) of T is the product
/(*) = (Ai-*)mi(A2-t)ma"-(Afc-t)m*.
It follows that the multiplicity of each eigenvalue is equal to the dimension
of the corresponding eigenspace, as expected.
Example 8
Lei T be the linear operator on R4 denned by
T(a, b, c, d) = (2a - b, a \ b. c - d, c + d),
and let W, - {(s.t.().()): s.t E R] and W2 - {(0,0,s,t): s,t E R}. Notice
that W] and W2 are each T-invariant and that R1 = Wi © W2. Let fa =
{ei,e2}, 02 = {e3,e4}, and 0 = fa U ft2 == {ei,e2,e3,e4}. Then fa is an

320 Chap. 5 Diagonalization
ordered basis for Wi, fa is an ordered basis for VV2, and 0 is an ordered basis
for R4. Let A = [T]a, Bi = [Twjfr, and B2 = [Tw2k- Then
B, =
2 -1
Bo =
-1
1
and
(2 -1 0 0
A =
O Bo
1 0 0
0 0 I
\o 0 1
7
Let ./(f), /i(i), and f2(t) denote the characteristic polynomials of T. Tw,-
and Tw2r respectively. Then
f(t) = det(A - tl) = det(5i - tI)>det(B2 - II) = /, (t)-f2(t). •
The matrix A in Example 8 can be obtained by joining the mat rices B
and B2 in the manner explained in the next definition.
Definition. Let #, E UmXIII{F), and let B2 E M„xn(F). We define the
direct sum of B\ and B2) denoted B\ OjB2, as the (m + ri) x (rn + n) matrix
A such that
({Bxlij fori <i.j<m
Aij = < (B2)(i-m),(j-m) for m + l<i,j <n + m
[ 0 otherwise.
If B\.B<2,.... Bk are square matrices with entries from F, then we define the
direct sum of B\, B2,.... Bk recursively by
lh 0 B2
If A = B\ ©£2©---
®Bk = (B1®B2®---(BBk-1, Bi
Bk, then we often write
A =
(Bx O
O B2
\ O O
o
()
Bk)
Example 9
Let
Bx - ( j " ) , B2 - (3). and B3 =

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 321
Then
Bi © B> '• lh -
( 1 2
1 1
0 0
0 0
0
0
3
0
0 0 0
^ 0 0 0
0
0
0
1
1
1
0 0 ^
0 0
0 0
2 1
2 3
1 1 J
The final result of this section relates direct sums of matrices to direct
sums of invariant subspaces. It is an extension of Exercise 34 to the case
k>2.
Theorem 5.25. Let T be a linear operator on a finite-dimensional vector
space V. and let \Ni,\N2,... .\Nk be T-invariant subspaces of V such that
V - Wi : WV! • • • I WA.. For each i, let .4, be an ordered basis for W;, and
let 3 = 0} U 02 U • • • U 0k. Let A = [T]0 and B, = [Tw,\c for i = 1,2,..., k.
Then A = BX © B2 © • • • © Bk.
Proof. See Exercise 35.
EXERCISES
1. Label the following statements as true or false.
(a) There exists a linear operator T with no T-invariant subspace.
(b) If T is a linear operator on a finite-dimensional vector space V and
W is a T-invariant subspace of V. then the characteristic polyno
mial of Tw divides the characteristic polynomial of T.
(c) Let T be a linear operator on a finite-dimensional vector space V,
and let v and w be in V. If W is the T-cyclic subspace generated
by v. W is the T-cyclic subspace generated by w. and W — W.
then v — iv.
(d) If T is a linear operator on a finite-dimensional vector space V.
then for any r E V the T-cyclic subspace generated by v is the
same as the T-cyclic subspace generated by T(r).
(e) Let T be a linear operator on an //-dimensional vector space. Then
there exists a polynomial g(t) of degree // such that g(T) = T0.
(f) Any polynomial of degree // with leading coefficient (—1)" is the
characteristic polynomial of some linear operator.
(g) If T is a linear operator on a finite-dimensional vector space V. and
if V is the direct sum of/,- T-inva.ria.nt subspaces, then there is an
ordered basis 0 for V such that [Tj:; is a direct sum of k matrices.

322 Chap. 5 Diagonalization
2. For each of the following linear operators T on the vector space V.
determine whether the given subspace W is a T-invariant subspace of
V.
(a) V = P3(R), T(/(a:)) - fix), and W = P2(R)
(b) V - P(R), T(f(:r)) =- xf(x), and W = P2(R)
(c) V - R;i, T(a, b, c) = (a 4- b + c, a 4-6 4- c, a + 6 + c). and
W= {{Lt.t): te R
(d) V = C([0, 1]), T(/(/)) = [./;,' f(x)dx] I. and
W = {/ E V: f(l) = ai 4- b for some a and 6}
'0 1'
(e) V = M2x2(R), T(A) = A, and W = {4 G V: A* = A)
3. Let T be a linear operator on a finite-dimensional vector space V. Prove
that the following subspaces are T-invariant.
(a) {0}andV
(b) N(T) and R(T)
(c) E\, for any eigenvalue A of T
4. Let T be a linear operator on a vector space V, and let W be a T-
invariant subspace of V. Prove that W is p(T)-invariant for any poly
nomial g(t).
5. Let T be a linear operator on a vector space V. Prove that the inter
section of any collection of T-invariant subspaces of V is a T-invariant
subspace of V.
6. For each linear operator T on the vector space V, find an ordered basis
for the T-cyclic subspace generated by the vector z.
(a) V = R4, T(tt, b. c, d) = {a + b, b - c, a + c. a 4- d), and z = cx.
(b) V - P3(i?.), T(/(s)) = /"(./•). and z
(c) V = IV!L.
(d) v =
.3
2(R), T(A) = A'\ and z =
0 1
1 0
,(R), T(.4) - Q 2) A and z =
7. Prove that the restriction of a linear operator T to a. T-invariant sub-
space is a linear operator on that subspace.
8. Let T be a linear operator on a vector space with a T-invariant. subspace
W. Prove that if v is an eigenvector of Tw with corresponding eigenvalue
A, then the same is true for T.
9. For each linear operator T and cyclic subspace W in Exerc4.se 6, compute
the characteristic polynomial of Tw in two ways, as in Example 6.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 323
10. For each linear operator in Exercise 6, find the characteristic polynomial
f(t) of T, and verify that the characteristic polynomial of Tw (computed
in Exercise 9) divides f(t).
11. Let T be a linear operator on a vector space V, let ubea nonzero vector
in V, and let W be the T-cyclic subspace of V generated by v. Prove
that
(a) W is T-invariant.
(b) Any T-invariant subspace of V containing v also contains W.
12. Prove that A =
Bx Bi
O B,
in the proof of Theorem 5.21.
13. Let T be a linear operator on a vector space V, let v be a nonzero vector
in V. and let W be the T-cyclic subspace of V generated by v. For any
w E V, prove that w E W if and only if there exists a polynomial g(t)
such that w = g(T)(?;).
14. Prove that the polynomial g(i) of Exercise 13 can always be chosen so
that its degree is less than dim(W).
15. Use the Cayley-Hamilton theorem (Theorem 5.23) to prove its corol
lary for matrices. Warning: If f(t) = det (.A — tl) is the characteristic
polynomial of A, it is tempting to "prove" that f(A) = O by saying
uf(A) = det(A - AI) = det(O) = 0." But this argument is nonsense.
Why?
16. Let T be a linear operator on a finite-dimensional vector space V.
(a) Prove that if the characteristic polynomial of T splits, then so
does the characteristic polynomial of the restriction of T to any
T-invariant subspace of V.
(b) Deduce that if the characteristic polynomial of T splits, then any
nontrivial T-invariant subspace of V contains an eigenvector of T.
17. Let A be an n x n matrix. Prove that
dim(span({/n, A,A2,.--})) < n.
18. Let A be an n x n matrix with characteristic polynomial
f(t) = (-l)ntn + an-it n-l
+ • ait + OQ.
(a) Prove that A is invertible if and only if an ^ 0.
(b) Prove that if A is invertible, then
A"1 = (-l/ao)[(-l)M n An — 1 n ,An~2 ai J.
Un

324 Chap. 5 Diagonalization
(c) Use (b) to compute A ! for
19. Let A denote the k X k matrix
/() 0 ••• 0 -a„
1 0 ••• 0 -a.!
0 1 ••• 0 -o.2
0 0
\0 0
0 -ak-2
1 -Ok-iJ
where ao,ai, • • •, ak-\ are arbitrary scalars. Prove that the character
istic polynomial of A is
(-!)*(«<) 4-ai/, + --- + afc_,/ k-
+ f
Hint: Use mathematical induction on k, expanding the determinant
along the first row.
20. Let T be a linear operator on a vector space V, and suppose that V is
a T-cyclic subspace of itself. Prove that if U is a linear operator on V,
then UT = TU if and only if U = g(T) for some polynomial g(t). Hint:
Suppose that V is generated by /;. Choose g(t) according to Exercise 13
sothat o(T)(f) = U(t>).
21. Let T be a linear operator on a two-dimensional vector space V. Prove
that either V is a T-cyclic subspace of itself or T = cl for some scalar c.
22. Let T be a linear operator on a two-dimensional vector space V and
suppose that T ^ c\ for any scalar c. Show that if U is any linear
operator on V such that UT = TU, then U = q(T) for some polvnoniial
g(t).
23. Let T be a linear operator on a finite-dimensional vector space V, and
let W be a T-invariant subspace of V. Suppose that Vi,v2, vk are
eigenvectors of T corresponding to distinct eigenvalues. Prove that if
v\ 4- v2 4 (- vk is in W, then vi E W for all i. Hint: Use mathematical
induction on k.
24. Prove that the restriction of a diagonalizable linear operator T to any
nontrivial T-invariant subspace is also diagonalizable. Hint: Use the
result of Exercise 23.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 325
25. (a) Prove the converse to Exercise 18(a) of Section 5.2: If T and U
are diagonalizable linear operators on a finite-dimensional vector
space V such that UT = TU, then T and U are simultaneously
diagonalizable. (See the definitions in the exercises of Section 5.2.)
Hint: For any eigenvalue A of T, show that E\ is U-invariant, and
apply Exercise 24 to obtain a basis for EA of eigenvectors of U.
(b) State and prove a matrix version of (a).
26. Let T be a linear operator on an //-dimensional vector space V such that
T has n distinct eigenvalues. Prove that V is a T-cyclic subspace of itself.
Hint: Use Exercise 23 to find a vector v such that {v, T(v),..., Tn_1(v)}
is linearly independent.
Exercises 27 through 32 require familiarity with quotient spaces as defined
in Exercise 31 of Section 1.3. Before attempting these exercises, the reader
should first review the other exercises treating quotient spaces: Exercise 35
of Section 1.6, Exercise 40 of Section 2.1, and Exercise 24 of Section 2.4.
For the purposes of Exercises 27 through 32, T is a fixed linear operator on
a finite-dimensional vector space V, and W is a nonzero T-invariant subspace
of V. We require the following definition.
Definition. Let T be a linear operator on a vector space V, and let W
be a T-invariant subspace of V. Define T: V/W -* V/W by
T(v + W) = T(v) 4- W for any v 4- W E V/W.
27. (a) Prove; that T is well defined. That. is. show that T(e + W) =
T(v' + W) whenever v + \N = v' + W.
(b) Prove that T is a linear operator on V/W.
(c) bet //: V —> V/W be the linear transformation defined in Exer
cise 40 of Section 2.1 by TJ(V) = v + W. Show that the diagram of
Figure 5.6 commutes; that is, prove that rjT = Tip (This exercise
does not require the assumption that V is finite-dimensional.)
V V
V/W V/W
Figure 5.6
28. Let f(t). g(t), and h(t) be the characteristic polynomials of T, Tw.
and T, respectively. Prove that f(t) = g(i)h(i). Hint: Extend an
ordered basis 7 = {v\,v2,... ,vk} for W to an ordered basis ft =
{vi, v2,..., vk, Vfc+ij • • • ,vn} for V. Then show that the collection of

326 Chap. 5 Diagonalization
cosets a = {vk+i 4- W, vk \.2 4- W,..., v„ + W} is an ordered basis for
V/W, and prove; that
where £j = [T]7 and S3 = [T]f,.
29. Use the hint in Exercise 28 to prove that if T is diagonalizable. then so
isT.
30. Prove that if both Tw and T are diagonalizable and have no common
eigenvalues, then T is diagonalizable.
The results of Theorem 5.22 and Exercise 28 are useful in devising methods
for computing characteristic polynomials without the use of determinants.
This is illustrated in the next exercise.
/I , -3
31. Let A = 2 3 4 . let, T = LA. and let W be the cyclic subspace
V 2 l)
of R generated by e±.
(a) Use Theorem 5.22 to compute1 the characteristic polynomial of Tw.
(b) Show that {e2 4 W} is a basis for R;i/W, and use this fact to
compute the characteristic polynomial of T.
(c) Use the results of (a) and (b) to find the characteristic polynomial
of A.
32. Prove the converse to Exercise 9(a) of Section 5.2: If the characteristic
polynomial of T splits, then there is an ordered basis 0 for V such
that [T].j is an upper triangular matrix. Hints: Apply mathematical
induction to diin(V). First prove that T has an eigenvector v. 1(4 W =
span({e}), and apply the induction hypothesis to T: V/W —* V/W.
Exercise 35(b) of Section 1.6 is helpful here.
Exercises 33 through 40 arc; concerned with direct sums.
33. Let T be a linear operator on a vector space V. and let Wi. W> W/,
be; T-invariant subspaces of V. Prove that W, 4- W2 4 h Wfc is also
a T-invariant subspace of V.
34. Give a direct proof of Theorem 5.25 for the case k = 2. (This result is
used in the proof of Theorem 5.24.)
35. Prove Theorem 5.25. Hint: Begin with Exercise 34 and extend it using
mathematical induction on k.. the number of subspaces.

Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 327
36. Lei T be a linear operator on a finite-dimensional vector space V.
Prove that T is diagonalizable if and only if V is the direct sum of
one-dimensional T-invariant subspaces.
37. Let T be a linear operator on a. finite-dimensional vector space1 V.
and let W|.W_> WA. be T-invariant subspaces of V such that V =
WiO W2 ] ••• DWfc. Prove that
det(T) =det(TW|)«let(Tw,) •••<!(•! (Tw,).
38. Let T be a linear operator on a finite-dimensional vector space V.
and let Wi,W-_> W,, be T-invariant subspaces of V such that V
W| | Wj : • • • W/,. Prove that T is diagonalizable if and only if Tw,
is diagonalizable for all /'.
39. Let C be a collection of diagonalizable linear operators on a finite-
dimensional vector space V. Prove that there is an ordered basis .3
such that [T].( is a diagonal matrix for all T E C if and only if the;
operators of C commute under composition. (This is an extension of
Exercise 25.) Hints for the ease that the operators commute: The result
is trivial if each operator has only OIK1 eigenvalue. Otherwise, establish
the general result by mathematical induction on dim(V), using the fact
that V is the direct sum of the eigenspaces of some operator in C that.
has more than one eigenvalue.
40. Let B\. B2..... B\, be square matrices with entries in the same field, and
let A — B\ B> > ••• i- Bk. Prove that the characteristic polynomial
of A is the product of the characteristic polynomials of the P,'s.
41. Let
A =
2
n • 2
2 - n. + I ir n 4- 2
7
Find the characteristic polynomial of A. Hint: First prove that A has
rank 2 and that span({(l, 1 1), (1,2 //)}) is L^-invariant.
42. Let .4 E M„x„(/i) be the matrix defined by ,4,;, — 1 for all i and j.
find the characteristic polynomial of .4.

328 Chap. 5 Diagonalization
INDEX OF DEFINITIONS FOR CHAPTER 5
Absorbing Markov chain 404
Absorbing state 404
Characteristic polynomial of a linear
operator 249
Characteristic polynomial of a ma
trix 218
Column sum of a matrix 295
Convergence of matrices 284
Cyclic subspace .'514
Diagonalizable linear operator 245
Diagonalizable matrix 246
Direct sum of matrices 320
Direct sum of subspaces 275
Eigenspace of a linear operator 264
Eigenspace of a matrix 264
Eigenvalue of a linear operator 246
Eigenvalue of a matrix 246
Eigenvector of a linear operator
246
Eigenvector of a matrix 246
Fixed probability vector 301
Generator of a cyclic subspace 313
Gerschgorin disk 296
Initial probability vector for a
Markov chain 292
Invariant, subspace 313
Limit of a sequence of mat rices 284
Markov chain 291
Markov process 291
Multiplicity of an eigenvalue 263
Probability vector 289
Regular transition matrix 294
Row sum of a matrix 295
Splits 262
Stochastic process 288
Sum of subspaces 275
Transition matrix 288

6
Inner Product Spaces
6.1
6.2
Inner Products and Norms
The Gram-Schmidt Orthogonalization Process and Orthogonal
Complements
The Adjoint of a Linear Operator
Normal and Self-Adjoint Operators
Unitary and Orthogonal Operators and Their Matrices
Orthogonal Projections and the Spectral Theorem
The Singular Value Decomposition and the Pseudoinverse
Bilinear and Quadratic Forms
Einstein's Special Theory of Relativity
6.10* Conditioning and the Rayleigh Quotient
6.11* The Geometry of Orthogonal Operators
6.3
6.4
6.5
6.6
6.7*
6.8*
6.9*
M, Lost applications of mathematics are involved with the concept, of mea
surement and hence of the magnitude or relative size of various quantities. So
it is not surprising that the fields of real and complex numbers, which have a
built-in notion of distance, should play a special role. Except for Section 6.8,
we assume that all vector spaces are over the field F, where F denotes either
R or C. (See Appendix D for properties of complex numbers.)
We introduce the idea of distance or length into vector spaces via a much
richer structure, the so-called inner product spaa structure. This added
structure provides applications to geometry (Sections 6.5 and 6.11), physics
(Section 6.9), conditioning in systems of linear equations (Section 6.10), least
squares (Section 6.3), and quadratic forms (Section G.8).
6.1 INNER PRODUCTS AND NORMS
Many geometric notions such as angle, length, and perpendicularity in R~
and R'' may be extended to more general real and complex vector spaces. All
of these ideas are related to the concept of inner product.
Definition. Let V be a vector space over F. An inner product on V
is a function that assigns, to every ordered pair of vectors .r and y in V. a
329

330 Chap. 6 Inner Product Spaces
scalar in F, denoted (x,y), such that for all x. y, and z in V and all c in F,
the following hold:
(a) (x + z,y) = (x,y) + (z,y).
(b) (cx,y) = c(x,y).
(c) (x,y) = (y,x), where the bar denotes complex conjugation.
(d) (x,x) X) ifxi= 0.
Note that (e) reduces to (x,y) = (y,x) if F = R. Conditions (a) and (b)
simply require that the; inner product be linear in the first component.
It. is easily shown that if a\ .a2,... , an E F and y, v\, v2,... , vn E V. t hen
(^aiVi,y) = ^2at (vt.y).
Example 1
For x = (ai, a2..... an) and y = (b\, b2...... bn) in F". define
n
fay) = ^2"i'>i-
i i
The verification that (•• •) satisfies conditions (a) through (d) is easy. For
example, if z — (c\. c2-. • • • • c-n). we have for (a.)
(x 4- z, y) - ^2(a-i + Ci)k = ^ a. J), + ^ c,b,
i=l i=l i=l
= fa y) + (z,. y).
Thus, for x = (1 4- i,4) and y = (2 - 3i,4 I 5i) in C2,
(x, y) = (i + i)(2 4- 30 + 4(4 - 5i) = 15-151 •
The inner product in Example 1 is called the standard inner product
on F". When F — R the conjugations are not needed, and in early courses
this standard inner product is usually called the dot product and is denoted
by X'-y instead of {x,y).
Example 2
If (;/;, y) is any inner product on a. vector space; V and r > 0, we may define
another inner product by the rule (x,y) = r{x.y). If r < 0, then (d) would
not hold. •

Sec. 6.1 Inner Products and Norms 331
Example 3
Let V — C([0,1]), the vector space of real-valued continuous functions on
[0,1]. For f.g C V. define (f.g) - ][] f(t)g(t)dt. Since the preceding integral
is linear in /. (a) and (b) are immediate, and (c) is trivial. If/ / tl. then f2
is bounded away from zero on some subinterval of [0,1] (continuity is used
here), and hence (/. /) = /J [f(t)]2 dt > 0. •
Definition. Let A E MmXn(F). We define the conjugate transpose
or adjoint of A to be the n x m matrix A* such that (A')ij — A J; for all i.j.
Example 4
Let
Then
A =
i 1 4-2/
2 3 + U
A* =
1 - 2/ 3 \i
Notice that if x and y are viewed as column vectors in F". then (x,y) =
//'•'••
The conjugate transpose of a matrix plays a very important role in the
remainder of this chapter. In the case that .4 has real entries. A': is simply
the I ranspose of A.
Example 5
Let V - M„X„(F). and define (.4. B) - tv(B'A) for A.B E V. (Recall that
the trace of a matrix A is defined by tr(.l) — Y]"_ , An.) We verify that
(a) and (d) of the definition of inner product, hold and leave (b) and (e) to
the reader, for this purpose, let. A,B,C C V. Then (using Exercise (> of
Section 1.4)
(.4 + B. C) = tr(C*(/l 4- B)) - tr(CM 4- C*B)
- triC* A) r- tr(C*B) = (A.C) + {B.C).
Also
(A. A) = tr(AM) = ^2(A*A)u = X)E(A*)**A«
i--\ ; \ I.- i
2
i=] fc=l
M,A,,-J:E^
/-I &=]
Now if .4 ^ (). then Aki ^ 0 for some k and i. So (A, A) > 0. •

332 Chap. 6 Inner Product Spaces
The inner product on MnXn(F) in Example 5 is called the Frobenius
inner product.
A vector space V over F endowed with a specific inner product is called
an inner product space. If F — C, we call V a complex inner product
space, whereas if F = R. we call V a real inner product space.
It is clear that if V has an inner product (x,y) and W is a subspace of
V, then W is also an inner product space when the same function (.x, y) is
restricted to the vectors x. y E W.
Thus Examples 1, 3, and 5 also provide examples of inner product spaces.
For the remainder of this cha,pter, Fn denotes the inner product space with
the standard inner product as defined in Exam.ple 1. Likewise, Mnxn(F)
denotes the inner product space with the Frobenius inner product as defined
in Example 5. The reader is cautioned that two distinct inner products on
a given vector space yield two distinct inner product spaces. For instance, it
can be shown that both
(f(x),g(x)), = 1 f(t)g(t)dt and (f(x),g(x))2 = j f(t)g(t)dt
are inner products on the vector space P(R). Even though the underlying
vector space is the same, however, these two inner products yield two different
inner product spaces. For example, the polynomials f(x) = x and g(x) — x2
are orthogonal in the second inner product space, but not in the first.
A very important inner product space that resembles C([0, l|) is the space
H of continuous complex-valued functions defined on the interval [0, 27t] with
the inner product
(f,g) =
I
2TT
f(t)g(t)dt.
The reason for the constant 1/27T will become evident later. This inner prod
uct space, which arises often in the context of physical situations, is examined
more closely in later sections.
At this point, we mention a few facts about integration of complex-valued
functions. First, the imaginary number /' can be treated as a constant under
the integration sign. Second, every complex-valued function / may be written
as f — fi -\- if2, where f\ and f2 are real-valued functions. Thus we have
jf=]fi+ijf2 and //=//•
From these properties, as well as the assumption of continuity, it follows
that H is an inner product space (see Exercise 16(a)).
Some properties that follow easily from the definition of an inner product
are contained in the next theorem.

Sec. 6.1 Inner Products and Norms 333
Theorem 6.1. Let V be an inner product space. Then for x, y. 2 € V and
c E F. the following statements are true.
(a) (x.y + z) = (x.y) \ faz).
(b) facy) =c{x,y).
(c) (x,0) = (0.x) =0.
(d) (./•../•) =0 if and only if x - 0.
(e) //' {.r. //) = (./;. 2) for aii x E V. fcnen // — z.
Proof (a) We have
{./-.// 4- ;)-(// + z,x) = (y,x) + (z.x)
= (y,x) + fax) = fay) + faz).
The proofs of (b), (c), (d). and (e) are left as exercises.
The reader should observe that (a) and (b) of Theorem (i.l show thai the
inner product is conjugate linear in the second component.
In order to generalize t he notion of length in R' to arbitrary inner product
spaces, we need only observe that the length of x — (a,b,c) C R' is given by
s/a2 — b- \- c2 — sj(x,x). This leads to the following definition.
Definition. Let V be an inner product space. For x E V. we define the
norm or length of x by \x\ = \/(x.x).
Example 6
Let V — F". If ./• = (r/|. a-2 . • . -a,,), then
\x\ : ||(ai,a2 . ..,an)|| -
-1 I /2
E 2
i^ the Euclidean definition of Length. Note that if // — I. we have ||«|| — lal.
As we might expect, the well-known properties of Euclidean length in R'5
hold in general, as shown next.
Theorem 6.2. Let V be an inner product space over F. Then for all
x.y G V and c E F, the following statements are true.
(a) ||cx|| = |c|-|k|l-
(b) ||.r|| = 0 if and only if x = tl. In any ease, \x\ > 0.
(c) (Canchy Schwarz Inequality) \{.i\y)\ < ||:r|| • \y\.
(d) (Triangle Inequality) \x \ y\ < ||.r|| 4- ||/y||.

334 Chap. 6 Inner Product Spaces
Proof. We leave1 the proofs of (a) and (b) a.s exercises.
(c) ]£y = 0, then the result is immediate. So assume that y ^ 0. For any
c E F, we have
0 < \x — cy\ — (x — cy, x — cy) = (x, x - cy) — c (y. x — cy)
= (x, x) - c (x, y) - c (y, x) f cc (y. y).
In particular, if we set
the inequality becomes
fay)
(<j-y)
o<fax)-^i:-y)f =Wx\?-l^y)
[y,y)
from which (c) follows.
(d) We have
\x + y||2 = (x + y,x 4 y) = fa x) + (y. x) + (x. y) + (y. y)
= \x\2 4- 2${x,y) \\y\2
<||:r||24-2|(.r.|/)| + ||y||2
< ||;7;||2 + 2||:r||-||y|| + \y\2
= (\x\ + \y)2.
where )R(x,y) denotes the real part of the complex number (x.y). Note that
we used (c) to prove (d). 1
The case when equality results in (c) and (d) is considered in Exercise 15.
Example 7
For F", we may apply (c) and (d) of Theorem G.2 to the standard inner
product to obtain the following well-known inequalities:
J^dibi <
II
&''I2
i= 1
1 /2 r" i
£N2
J-\ J
and
1 1/2
£k I 'hi2 <
-I 1/2
Ei«.i
L; = l
-i 1/2
EN2

Sec. 6.1 Inner Products and Norms 335
The reader may recall from earlier courses that, for x and y in R3 or R2,
we have that (x.y) — ||x||*|MI cos0, where 0 (0 < 6 < TT) demotes the angle
between x and y. This equation implies (c) immediately since |cos#| < 1.
Notice also that nonzero vectors x and y are perpendicular if and only if
cos# = 0. that is. if and only if (x.y) = 0.
We are now at the point where we can generalize the notion of perpendic
ularity to arbitrary inner product spaces.
Definitions. Let V he an inner product space. Vectors x and y in V are
orthogonal (perpendicular) if (x.y) = 0. A subset S ofV is orthogonal
if any two distinct vectors in S are orthogonal. A vector x in V is a. unit
vector if\x\ = 1. Finally, a subset S of V is orthonormal ifS is orthogonal
and consists entirely of unit vectors.
Note that if S = {v\, v2....}, then S is orthonormal if and only if (vi,Vj) =
Sij, where Sij denotes the Kronecker delta. Also, observe that multiplying
vectors by nonzero scalars does not affect, their orthogonality and that if x is
any nonzero vector, then (l/||x||)rr is a unit vector. The process of multiplying
a nonzero vector by the reciprocal of its length is called normalizing.
Example 8
In F3, {(1.1. 0), (1, —1, 1). (—1,1, 2)} is an orthogonal set of nonzero vectors,
but it is not orthonormal; however, if we normalize the vectors in the set, we
obtain the orthonormal set
{-L(u,o), -^(1,-1,1), 4=(-i,i,2)}
Our next example is of an infinite orthonormal set that is important in
analysis. This set is used in later examples in this chapter.
Example 9
Recall the inner product space H (defined on page 332). We introduce an im
portant orthonormal subset S of H. For what follows, i is the imaginary num
ber such that i2 = — 1. For any integer n, let fn(t) — eint, where 0 < t < 2-K.
(Recall that eint — count + isinnt.) Now define S = {/„.: n is an integer}.
Clearly S is a subset of H. Using the property that elt — e '' for every real
number t. we have, for m ^ n,
\/TOJ Jn) — 0
»2TT
eimteint dt =
2TT
1
•_'77
2iri(rn
__ i(rn — n)t
ei(m-n)t dt
= 0.

336
Also,
Chap. 6 Inner Product Spaces
In other words. {/,„./„.) = Smn. •
EXERCISES
1. Label the following statements as true or false.
(a) An inner product is a scalar-valued function on the set of ordered
pairs of vectors.
(b) An inner product space must, be over the field of real or complex
numbers.
(c) An inner product is linear in both components.
(d) There is exactly one inner product on the vector space R".
(e) The triangle inequality only holds in finite-dimensional inner prod
uct, spaces.
(f) Only square matrices have a conjugate-transpose.
(g) If x. y. and z are vectors in an inner product space such that
(x,y) — (x.z), then y — z.
(h) If (x,y) — 0 for all x in an inner product space, then y = 0.
2. Let. x = (2.1 f i,i) and y - (2 v',2. 1 4-2/) be vectors in C3. Compute
(x.y). \x\, \y\, and ||:r + v/||. Then verify both the Cauchy Schwarz
inequality and the triangle inequality.
3. In C([0, 1]), let f(t) - I and g(t) = e'. Compute (f\g) (as defined in
Example 3). |j/||, '|<y,|. and I g\. Then verify both the Cauchy
Schwarz inequahty and the triangle; inequality.
4. (a) Complete the proof in Example 5 that (•, •) is an inner product
(the Frobenius inner product) on MnX,,(/**).
(b) Use the Frobenius inner product to compute ||.A||, \B\, and (A.B)
for
,4 -
1 2 \ i
and B =
1 4- i 0
5. In C2. show that (x.y) — xAy* is an inner product, where
.4 -
1 i
-i 2
Compute (x, y) for x = (1 - i, 2 + 3i) and y - (2 4- i, 3 -- 2i).

Sec.
6.
7.
9.
10.
11.
12.
6.1 Inner Products and Norms 337
Complete the proof of Theorem 6.1.
Complete the proof of Theorem Ci.2.
Provide reasons why each of the following is not an inner product on
the given vector spaces.
(a) ((a.b).(c.d)) =ac bd on R2.
(b) (A, B) = tv(A 4- B) on U2x2(R).
(c) (f(x),g(x)) = /J f'(t)g(t)dt on P(R), where ' denotes differentia
tion.
Let 0 be a basis for a finite-dimensional inner product space.
(a) Prove that if (x, z) = 0 for all z E ft. then x = 0.
(b) Prove that if (x. z) = (y, z) for all z E 0, then x = y.
Let V be an inner product space, and suppose that X and y are orthog
onal vectors in V. Prove that \x 4- y\2 = ||a:||2 4- \y\2• Deduce the
Pythagorean theorem in R2.
Prove the parallelogram law on an inner product space V: that is, show
that
\x + y\2 + \x - y\2 = 2||;c||2 4- 2\y\2 for all x. y E V.
What does this equation state about parallelograms in R2?
Let {v\. v2,..., <••/,.} be an orthogonal set in V, and let a.\, a2,.... ak be
scalars. Prove that
J^^i
i=l
Ei«.ni".n2-
i=
13.
14.
15.
Suppose that (•. •)x and (•, •)., are two inner products on a vector space
V. Prove that (•, •) = (•. •), 4- (•, -)2 is another inner product on V.
Let A and B be n x n matrices, and let c be a scalar. Prove that
(A + cB)* = A* +c.B*.
(a) Prove that if V is an inner product space, then | (x.y) | = ||rr|| • \y
if and only if one of the vectors x or y is a multiple of the other.
Hint: If the identity holds and y -£ 0, let
a =
fay)
12 :

338 Chap. 6 Inner Product Spaces
and let z = x — ay. Prove that y and z are orthogonal and
a =
//I
Then apply Exercise 10 to ||.r||2 = \ay 4- z\2 to obtain ||c|| = 0.
(b) Derive a similar result for the equality ||x 4- y\ = ||.r|| 4- \y\- and
generalize; it to the case of /; vectors.
16. (a) Show that the vector space H with (•, •) defined on page 332 is an
inner product space,
(b) Let V- C([(),l]). and define
,•1/2
(.A//)- / f(t)g(t)dt.
Is this an inner product on V?
17. Let T be a linear operator on an inner product space V. and suppose
that ||T(;?r)|| = ||;/:|| for all x. Prove that T is one-to-one.
18. Let V be a vector space over /•'. where F = R or F - ('. and let W be
an inner product space ewer F with inner product (•. •). If T: V —> W
is linear, prove that (x.y) — (T(x).T(y)) defines an inner product on
V if and only if T is one-to-one.
19. Let V be an inner product space. Prove that
(a) \x ± y\2 = \x\2 ± 24? fa y) + \y\2 for all x. y E V, where 3? (x. y)
denotes the real part of the complex number (x.y).
(t>) I ikll \W\ i < ll;r jy'| for all x,y G V.
20. Let V be an inner product space over F. Prove the polar identities: For
all x, yE V,
(a) fay) = l\x + y\2-\\x- y\2 if F = R;
(b) fay) = \ ELI <k\x + jkyf if F = c- wnere <2 = ~l
21. Let A be an n x n matrix. Define
Ai = -(A + A*) and A2 = -(A-A*).
2 2/
(a) Prove that A\ = .4,. A*. = A2, and A = Ax + iA2. Would it be
reasonable to define ,4| and A2 to be the real and imaginary parts.
respectively, of the matrix A!
(b) Let A be an n x //. matrix. Prove that the representation in (a) is
unique. That is, prove1 that if ,4 = Bx 4- i,B2. where B[ — B\ and
B2 - B-2, then B{ = .4, and B2 = A2.

Sec. 6.1 Inner Products and Norms 339
22. Let V be a real or complex vector space (possibly infinite-dimensional),
and let 3 be a basis for V. For x.y E V there exist v\,v2.... ,v„ E ft
such that
Define
x = 2_]aivi an(i y = /^hVj-
I I i=
fay) = ])2"i.bi.
(a) Prove that (•• •) is an inner product, on V and that 0 is ail or
thonormal basis for V. Thus every real or complex vector space;
may be regarded as an inner product space.
(b) Prove that if V — R" or V = C" and ft is the standard ordered
basis, then the inner product, defined above is the standard inner
product.
23. Let V = Fn, and let A E M.„X/I(/•").
(a) Prove that (x. Ay) = (A*x,y) for all x,y C V.
(b) Suppose that for some B € Mnxn(F), we have (x,Ay) = (Bx,y)
for all x.y E_ V. Prove that B = A*.
(c) Let a be the standard ordered basis for V. For any orthonormal
basis 0 for V, let Q be the n x n matrix whose columns are the
vectors in ft. Prove that. Q* =Q '.
(d) Define linear operators T and U on V by T(x) — Ax and U(x) =
A*x. Show that [\J]p — [T]g for any orthonormal basis ft for V.
The following definition is used in Exercises 24 27.
Definition. Let V be a vector space over F, where F is either R or
C. Regardless of whether V is or is not an inner product space, we may still
define a norm. \ • || as a. real-valued function on V satisfying the following three
conditions for all x. y C V and a E F:
(1) j|.r|| > 0. and \x\ = 0 if and only if x = 0.
(2) \ax\ = \a\ • ||.r||.
(3) \x + y\ < \x\ + \y\.
24. Prove that the following are norms on the given vector spaces V.
(a) V - Mmxn(F); ||^|| = max \AKj\ for all ,4 G V
(b) V - C([0,11); \ = max 1/(01 for all / G V
t€[0,l]

340 Chap. 6 Inner Product Spaces
(c) V = C([0,1]); 11/11=/ (t)\dt forall/GV
(d) V = R2; i. b)\ = max{|a|, j6|} for all (a,b) E V
25. Use Exercise 20 to show that there is no inner product (•, •) on R2
such that \x\2 = (x,x) for all x E R2 if the norm is defined as in
Exercise 24(d).
26. Let || • |j be a norm on a vector space V, and define, for each ordered pair
of vectors, the scalar d,(x.y) = \x — y\, called the distance between x
and y. Prove the following results for all x, y, z E V.
(a) d(a;,y)>0.
(b) d(x,y) = d(y,x).
(c) dfay)< dfaz) + dfay).
(d) dfa x) = 0.
(c) dfa y) f 0 if x ^ y.
27. Let || • || be a. norm on a real vector space V satisfying the parallelogram
law given in Exercise 11. Define
fay) = 4 [lls + 3/ll2- lk-2/ll2] •
Prove that (•, •) defines an inner product on V such that ||.x||2 = (x,x)
for all x E V.
Hints:
(a) Prove (x, 2y) = 2 (x, y) for all x. y E V.
(b) Prove (x 4- u, y) — (x, y) + (u. y) for all x, a. y E V.
(c) Prove (nx.y) = ?i(x,y) for every positive integer n and every
x,y E V.
(d) Prove m(—x,y) = (x,y) for every positive integer m and every
x.yEV.
(e) Prove (rx, y) = r (x, y) for every rational number r and every
x, y E V.
(f) Prove | (x.y) | < ||.'c||||y|| for every x.y E V. Hint: Condition (3) in
the definition of norm can be helpful.
(g) Prove that for every c E R, every rational number r, and every
x,y€V,
\cfay) - (cx,y) | = |(c-r) fay) - ((c—r)x,y) \ < 2|c-r|||:e||||y||.
(h) Use the fact that for any c E R. \c — r\ can be made arbitrarily
small, where r varies over the set of rational numbers, to establish
item (b) of the definition of inner product.

Sec. 6.2 Gram-Schmidt Orthogonalization Process 341
28. Let V be a complex inner product space with an inner product (•, •).
Let [•, •] be the real-valued function such that [x.y] is the real part of
the complex number (x.y) for all x.y E V. Prove that [*, •] is an inner
product for V, where V is regarded as a vector space over R. Prove,
furthermore, that [x,ix] = 0 for all x E V.
29. Let V be a vector space over C, and suppose that [•, •] is a real inner
product on V. where V is regarded as a vector space over R. such that
[.;.-,/.;•] = 0 for all ./• G V. Let (•,•) be the complex-valued function
defined by
Prove that
fa y) = [./:, y] 4- i[x, iy] for x. y c V.
) is a complex inner product on V.
30. Let || • || be a norm (as defined in Exercise 24) on a complex vector
space V satisfying the parallelogram law given in Exercise 11. Prove
that there is an inner product (•, •) on V such that ||aj|| = (x,x) for
all x G V.
Hint: Apply Exercise 27 to V regarded as a vector space over R. Then
apply Exercise 29.
6.2 THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS
AND ORTHOGONAL COMPLEMENTS
In previous chapters, we have seen the special role of the standard ordered
bases for C" and R". The special properties of these bases stem from the fact
that the basis vectors form an orthonormal set. .lust as bases are the building
blocks of vector spaces, bases that are also orthonormal sets are the building
blocks of inner product spaces. We now name such bases.
Definition. Let V be an inner product space. A subset of V is an
orthonormal basis forW if it is an ordered basis that is orthonormal.
Example 1
The standard ordered basis for F" is an orthonormal basis for F". •
Example 2
The set
is an orthonormal basis for R2. •

342 Chap. 6 Inner Product Spaces
The next theorem and its corollaries illustrate why orthonormal sets and.
in particular, orthonormal bases are so important.
Theorem 6.3. Let V be an inner product, space and S — {v\, v2 ,vk]
be an orthogonal subset of\/ consisting of nonzero vectors. If y E span(S),
then
y,Vi
i=i
'•/
Proof. Write y — 2_, a>vi.i where a,\, a2,..., ak G F. Then, for I < j < k.
i^i
we have
(y,vj) = \Y^(HVi-v>) = lLai^"'^VJ) = ai (vJ>vl) = aj\vj\2-
So aj — .32 , and the result follows. I
The next corollary follows immediately from Theorem G.3.
Corollary 1. If, in addition to the hypotheses of Theorem 6.3, S is
orthonormal and y E span(S), then
k
y = ^2(y>Vi)vi.
i=
If V possesses a finite orthonormal basis, then Corollary 1 allows us to
compute the coefficients in a linear combination very easily. (See Example 3.)
Corollary 2. Let V be an inner product space, and let S be an orthogonal
subset of"V consisting of nonzero vectors. Then S is linearly independent.
Proof. Suppose that V\, t^2-, • • • ,nk E S and
y^aiVt = o.
i=l
As in the proof of Theorem 6.3 with y ~ 0. we have o.j = (0, Vj)
for all j. So S is linearly independent.
Ie.,ll2=0

Sec. 6.2 Gram-Schmidt Orthogonalization Process 343
Example 3
By Corollary 2. the orthonormal set
^(1,1,0) i(1, -1,1), 4g(-l,l,2)
obtained in Example 8 of Section 6.1 is an orthonormal basis for R'!. Let
x = (2.1.3). Fhe coefficients given by Corollary 1 to Theorem 0.3 that
express x as a linear combination of the basis vectors an;
ai = -L(2 + l) = -L a2 = J-(2_1 + 3) = -l
and
As a check, we have
«3 = -7=(-2+ 1 4-6) - —=.
\/6 v6
(2,1,3) = |(1,1,0) + |(1, -1,1) + |(-1,1,2). •
Corollary 2 tells us that the vector space H in Section 6.1 contains an
infinite linearly independent set, and hence H is not a finite-dimensional vector
space.
Of course, we have not. yet shown that, every finite-dimensional inner prod
uct space possesses an orthonormal basis. Fhe next theorem takes us most
of the way in obtaining this result. It tells us how to construct an orthogonal
set from a linearly independent, set of vectors in such a way that both sets
generate the same subspace.
Before stating this theorem, let us consider a simple case. Suppose that
{W\,W2) is a linearly independent, subset of an inner product space (and
hence a basis for some two-dimensional subspace). We want to construct
an orthogonal set from {w\.w<2} that spans the same subspace. Figure 6.1
suggests that flu; set {v\.v2}, where v\ = W\ and v2 = xv2 ~ cw\. has this
property if c is chosen so that v2 is orthogonal to W|.
To find c, we need only solve the; following equation:
So
Thus
0 = (v2,Wi) = (w2 - CWi,Wi) = (w2,w) - c(wi,wi)
(w2,wi)
Ikill2
v2 = w2
(w2,wi)
Ml
•Wi

344 Chap. 6 Inner Product Spaces
W\ = Vl
Figure 6.1
The next theorem shows us that this process can be extended to any finite
linearly independent subset.
Theorem 6.4. Let V be an inner product space and S = {w\. w2 , Wn }
be a linearly independent, subset o/'V. Define S' = {i'\,v2. •.. ,vn}. where
Vi = W\ and
A—I
for 2 < k < n. (1)
'3
Then S' is an orthogonal set of nonzero vectors such that span(S') = span(5).
Proof. The proof is by mathematical induction on n, the number of vectors
in 5. For k — 1,2, ...,n, let Sk — {w\.w2 ,wk}. If n = 1. then the
theorem is proved by taking S[ = »S|; i.e., Vi = </-i 7^ 0. Assume then that the
set S'k_1 — {t'\. v2,.... ?'/,-1} with the desired properties has been constructed
by the repeated use of (1). We show that the set S'k = {r\. c2- • • • . vk-i,vk}
also has the desired properties, where vk is obtained from Sk_j by (1). If vk =
0. then (1) implies that wk E span(5^._1) = span^-i), which contradicts
the assumption that Sk is linearly independent. For I < i < k — 1, it follows
from (1) that
fe-i
(Vk-.('i) = (wk,Vi) - 2^ II 112 ^P"J = (Wk,Vi) ~
J I I™
IMP
= 0.
since (e;, e^) = 0 if i -/- j by the induction assumption that S'[._ j is orthogonal.
Hence S'k is an orthogonal set. of nonzero vectors. Now, by (1), we have that
span (SI) C spanfSfe). But by Corollary 2 to Theorem 6.3. S'k is linearly
independent; so dim(span(5^)) = dim(span(5fc)) = k. Therefore span(S£.) =
span(Sfe). I
The construction of {v\. V2, • • • .*',,} by the use of Theorem 6.4 is called
the Gram Schmidt process.

Sec. 6.2 Gram-Schmidt Orthogonalization Process 345
Example 4
In R'. let wj --- (1,0,1,0), w2 - (1,1,1,1), and w3 = (0,1,2,1). Then
[wi,w2,w3} is linearly independent. We use the Cram Schmidt process to
compute the orthogonal vectors <•,. v2. and U3, and then we normalize these
vectors to obtain an orthonormal set.
Take e, =wy = (1,0,1,0). Then
(w2^)
v2 = a-2 ——IT^-UI
Finally.
v3 - w3
= (1,1,1,1) - -(1,0,1,0)
= (0,1,0,1).
(w3,vi) (W3,V2)
\V]
V-i
i''2
= (0,1,2, I)- |(1,0,1,0) - r(0,1.0. 1)
= (-1,0, 1,0).
These vectors can be normalized to obtain the orthonormal basis {a.\. u2, u3],
where
i
u-i = —-rl= -=(i.o. L0),
•'•!l|| v2
u2 = —v2 = —=(o, 1,0,1),
I''2 II " 72
and
us = ir^ir = 4=(-i, 0,1,0). •
II "31| s/2
Example 5
Let V = P(R) with the inner product (f(x).g(x)) = f\ f(t)g(t) dt. and
consider the subspace P2(R) with the standard ordered basis ft. We use the
Gram Schmidt, process to replace 0 by an orthogonal basis {vi,v2,v3} for
P2(R). and then use this orthogonal basis to obtain an orthonormal basis for
P2(/?).
Take r, - 1. Then j|y•, ||2 = / I2 dt - 2, and (x. c,) = / / • 1 dt = 0.
Thus
v2 = x
(vi,x) 0
'V
2 =*~2=x-

346
Furthermore,
Therefore
Chap. 6 Inner Product Spaces
= / t2-ldt=- and (x2,v2) = /2 -tdt = 0.
./-i
V3 — X
<**,«,)„ (:r2.r2)
t, 112
"l
o-U] o-«2
- x2 - - • 1 - 0 • :r
-•*-§•
We conclude that {l,x,x2 — ^} is an orthogonal basis for P2(R).
To obtain an orthonormal basis, we normalize vi, v2, and v3 to obtain
1 1
U\ =
and similarly,
u2 =
f*l2dt v^'
\l!-i*dt
= nx'
w3 =
1'3
I''3 |
= A -(3;r2~l).
V8
Thus {1*1,1*2,1*3} is the desired orthonormal basis for P2(R). •
If we continue applying the Gram Schmidt orthogonalization process to
the basis {1, x, x2.....} for P(R). we obtain an orthogonal basis whose elements
are called the Legendre polynomials. The orthogonal polynomials v\, v2, and
t'3 in Example 5 are the first three Legendre polynomials.
The following result gives us a simple method of representing a vector as
a linear combination of the vectors in an orthonormal basis.
Theorem 6.5. Let. V be a nonzero finite-dimensional inner product space.
Then V has an orthonormal basis ft. Furthermore, if ft — {vj ,v2,.... vn} and
x E V, then
n
x = y^{x,vi)vi.
i=l

Sec. 6.2 Gram-Schmidt Orthogonalization Process 347
Proof. Let 0o be an ordered basis for V. Apply Theorem 6.4 to obtain
an orthogonal set 0' of nonzero vectors with span(,#') = span(/?n) = V. By
normalizing each vector in 0', we obtain an orthonormal set ft that generates
V. By Corollary 2 to Theorem 6.3, ft is linearly independent; therefore ft
is an orthonormal basis for V. The remainder of the theorem follows from
Corollary I to Theorem 6.3.
Example 6
We use Theorem 6.5 to represent the polynomial f(x) = 14- 2x 4- Sx as
a linear combination of the vectors in the orthonormal basis {1*1,1*2,1*3} for
P2{T{) obtained in Example 5. Observe that
/•i i
(/(.,-).«,)- / -=(l + 2/. + 3c2)c/i-2v/2,
.1 - \ v2
2^
</(*), «2> = ±t(i + 2t + 'M2)dt-
and
(/(*W = -(3/"' 1 2t + SP)dt =
2v/10
r 2Vo 2vT0
Therefore f(x) — 2\J2 u\ -\ u2 H 1*3. •
3 0
Theorem 6.5 gives us a simple method for computing the entries of the
matrix representation of a linear operator with respect to an orthonormal
basis.
Corollary. Let V be a finite-dimensional inner product space with an
orthonormal basis 0 = {('1, v2,.... vn}. Let T be a linear operator on V, and
let A — [T]p. Then for any i and j, Aj.j — (T(vj).Vi).
Proof From Theorem 6.5, we have;
Hence Aij = {T(vj),vi). I
The scalars (•'./•/) given in Theorem 6.5 have been studied extensively
for special inner product spaces. Although the vectors V\ ,v2, • • • ,vn were
chosen from an orthonormal basis, we introduce a terminology associated
with orthonormal sets 0 in more1 general inner product spaces.

348 Chap. 6 Inner Product Spaces
Definition. Let ft be an orthonormal subset (possibly infinite) of an
inner product space V, and let x E V. We define the Fourier coefficients
of x relative to ft to be the scalars (x.y). where y E ft.
In the first half of the 19th century, the French mathematician Jean Bap-
tiste Fourier was associated with the study of the scalars
2ir
f(t) sin ntdt and f(t)cosntdt,
./o
or more generally.
•2n
2TT
/(*)'
rint dt.
for a function /. In the context of Example 9 of Section 6.1, we see that
cn = (fifn), where fn(t) — e'"1: that is. cn is the nth Fourier coefficient for a
continuous function / G V relative to S. These coefficients are the ''classical"
Fourier coefficients of a function, and the literature concerning the; behavior of
these coefficients is extensive. We learn more about these1 Fourier coefficients
in the remainder of this chapter.
Example 7
Let S = {eint: n is an integer}. In Example 9 of Section 6.1, S was shown to
be an orthonormal set, in H. We compute the Fourier coefficients of f(t) = t
relative to S. Using integration by parts, we have, for n ^ 0,
(f,fn) = -e
2 71
*2TT
tc'"> dt =
./ll
1
2^
'ii
dt =
-1
171
and. for n = 0.
</,l)
2TT
»27T
i.()dt = 7T.
As a result of these computations, and using Exercise 16 of this section, we
obtain an upper bound for the; sum of a spe;cial infinite series as follows:
i/ii2 >
I
E
71= — fc
-1
E
n = -k
k
if-In)
Z-^t T)2
(/•i)r+x;i{/-/,}i2
fc
y-
n=l
n=

Sec. 62 Gram-Schmidt Orthogonalization Process
for every k. Now. using the' fact that |!/||2 = — it , we obtain
349
U2>2Y + 7T2,
rr- I
6
> E-
Z_^ ,,2
Because this inequality holds for all A:, we may let k —> oo to obtain
•2 oo x
6 - Z^ ,2-
Additional results may be produced by replacing / by other functions. •
We' are' now ready to proceed with the; concept of an orthogonal comple
ment.
Definition. Let S be a nonempty subset of an inner product space V. We
define S (read "S perp") to be the set of all vectors in V that are orthogonal
to every vector in S: that, is, S' — {x E V: (x, y) = 0 for all y E S}. The set
S is called the orthogonal complement of S.
It is easily seen that S ' is a subspace' of V for any subset S of V.
Example 8
Fhe reader should verify that {()}~L = V and V ' — {0} for any inner product
space V. •
Example 9
If V = R'5 and S — {e3}, then S1- equals the xi/-plane (see Exercise 5). •
Exercise 18 provides an interesting example of an orthogonal complement
in an infinite-dimensional inner product, space.
Consider the problem in R,J of finding the distance from a point P to a
plane W. (See figure 6.2.) Problems of this type arise in many settings. If
we let y be the vector determined by 0 and P. we may restate the problem
as follows: Determine the; vector a in W that is "closest" to y. The desired
distance is clearly given by \y — u\. Notice from the figure that the vector
z = y — a is orthogonal to every vector in W, and se> z G W1.
The next result presents a practical method of finding u in the case that
W is a finite-dimensional subspace of an inner product, space.

'
350 Chap. 6 Inner Product Spaces
= y - "
w
Figure 6.2
Theorem 6.6. Let W be a finite-dimensional subspace of an inner product
space V, and let y E V. Then there exist unique vectors u E W and z E W1
such that y = u + z. Furthermore, if {v\ ,v2. • • • • vk} is an orthonormal basis
for W, then
k
u = y^(y,Vi)vj.
i=l
Proof. Let {vi,v2,...,vk} be an orthonormal basis for W. let u lie as
defined in the preceding equation, and let z = y — a. Clearly a E W and
y = u + z.
To show that z E W-1-, it suffices to shew, by Exercise 7, that z is orthog
onal to each Vj. For any j, we have
fc
[Z,Vi) = y- 2_s(y>vi)vi\ >VJ
= (V,VJ) - (y,Vj) =0.
[y>vi) - J2(y>vi)(vi,vj)
i i
To show uniqueness of u and z, suppose that y = u + z = a' + z'. where
u' E W and z' E W±. Then a - v! = z' - z E W nW1 - {()}. Therefore,
u = u' and z = z'. I
Corollary. In the notation of Theorem 0.6. the vector u is the unique
vector in W that is "closest" to y; that is, for any x E W. \y - x\ > \y - u\,
and this inequality is an equality if and only if x = a.
Proof. As in Theorem 6.6, we have that y = u+ z, where z E \NL. Let
x E W. Then t* — x is orthogonal to z, so, by Exercise 10 of Section (i.l. we-

Sec. 6.2 Gram-Schmidt Orthogonalization Process
have
351
\y — x\2 — \u + z — x\2
> ||2||2 = \y- u\2.
[u - x) + z\2 = \u - x\2 + \z\2
Now suppose that \y - x\ = \y — u\. Then the inequality above becomes an
equality, and therefore \u — #||2 + ||z||2 = ||z||2. It follows that ||w — x\ = 0,
and hence x = u. The proof of the converse is obvious.
The vector u in the corollary is called the orthogonal projection of y
on W. We will see the importance; of orthogonal projections of vectors in the
application to least squares in Section 6.3.
Example 10
Let V = P;i(R) with the inner product
(f(x).g(x)) - j f(t)g(t)dt for all f(x),g(x) G V.
We compute the orthogonal projection f(x) of f(x) = x3 on P2(R)-
By Example 5.
{ui,U2, U3}
^ V 2 V 8 v
is an orthonormal basis for P2(^?)- For these vectors, we have
, 1
(f(x),Ul)=J^t3-j=dt = 0, </(*), U2> =
, 3 , V6
and
Hence
~ I t\l-(3t'-l)dt = 0.
fx{x) = (f(x), U]) u, + (f(x), u2) u2 + (f(x),u3) u3 = -x. •
It was shown (Corollary 2 to the replacement theorem, p. 17) that any lin
early independent set in a finite-dimensional vector space can be extended to
a basis. The next theorem provides an interesting analog for an orthonormal
subset of a finite-dimensional inner product space.

352 Chap. 6 Inner Product Spaces
IT
Theorem 6.7. Suppose that S = {i>\. v-2- •..,?'/,} is an orthonormal set
in an n-diinensional inner product space V. Then
(a) S can be extended to an orthonormal basis {v\, v2,.... Vk • Vk+1, • • - , vn}
for V.
(b) If W = span(S'), then Si = {Vk+i,Vk+2, ••• ivn} is an orthonormal
basis for W ' (using the preceding notation).
(c) If W is any subspace ofV, then dim(V) = dini(W) + diin(WL).
Proof, (a) By Corollary 2 to the replacement theorem (p. 47), S can be
extended to an ordered basis S' = [v\, v2, • • • • Vk, Wk \-i, • • •, %} for V. Now
apply the Gram Schmidt process to S'. The first k vectors resulting from
this process are the vectors in .S1 by Exercise 8, and this new set spans V.
Normalizing the last n — k vectors of this set. produces an orthonormal set
that spans V. The result now follows.
(b) Because S\ is a subset of a basis, it is linearly independent. Since S
is clearly a subset of W-. we need only show that it spans W1. Note that,
for any x 6 V. we have
x = 2^{X,Vi)Vi.
lix e W1, then (x, v{) - 0 for 1 < i < k. Therefore,
n
x = ^2 (x, Vj) Vi € span(5]).
i=k+l
(c) Let W be a subspace of V. It is a finite-dimensional inner product
space because V is, and so it has an orthonormal basis {v\,V2, • • • :t'fc}- By
(a) and (b). we have
dim(V) - n = k + (n - k) = dim(W) + dimfVV1). |
Example 11
Let W = span({ei,e2}) in F3. Then x - {a.b.c) e W1 if and only if 0 =
(ar,ei) = a and 0 = (x,e2) = b. So x = (0,0,c), and therefore W- =
span({c,{}). One can deduce the same result by noting that e3 € W"1 and,
from (c), that din^W-1) =3-2 = 1. •
EXERCISES
1. babel the following statements as true or false.
(a) The Gram Schmidt orthogonalization process allows us to con
struct an orthonormal set from an arbitrary set of vectors.

Sec. 6.2 Gram-Schmidt Orthogonalization Process 353
(b) Every nonzero finite-dimensional inner product space has an or
thonormal basis.
(c) The orthogonal complement of any set is a subspace.
(d) If {(,•], 1*2,.... vn} is a basis for an inner product space V, then for
any x € V the scalars (x, c,) are the Fourier coefficients of x.
(e) An orthonormal basis must be an ordered basis.
(f) Every orthogonal set is linearly independent.
(g) Every orthonormal set is linearly independent.
2. In each pari, apply the Gram Schmidt process to the given subset S of
the inner product space V to obtain an orthogonal basis for span(S').
Then normalize the vectors in this basis to obtain an orthonormal basis
ri for span(S'). and compute the Fourier coefficients of the given vector
relative to 0. Finally, use Theorem 6.5 to verify your result.
(a) V - R:\ S - {(1.0, 1).(0. I. I). (1.3.3)}, and x = (1,1,2)
(b) V- R;!. S = {(1, 1, I), (0,1, I), (0,0,1)}, and x - (1,0,1)
(c) V - P2(/?) with the inner product (f(x),g(x)) - J* f(t)g(t)dt,
S= {\..r..r2}. and h(x) = 1+X
(d) V - span(.S'). where S = {(1,2,0), (1 -1,2,4*)}, and
x = (3 + i,4i, 1)
(e) V- R1. S - {(2. 1,-2,4), (-2,1,-5,5), (-1,3,7,11)}, and x =
(-11.8.-4.18)
(f) V = R4, S= {(1,-2, -1,3), (3,6,3, 1), (1,4,2,8)},
and x = (-1,2,1,1)
/ \ \, K, /i»,<- 1/3 5\ (— 9\ (1 -17
(g) V = M2x2(/.), S -
A =
-1 27
-4 8/
V = M2x2{R), S
8 6
25 -13
2 2
2 1 9. n
4 -12
3 -16
and
dA =
f(t)g(t)dt, (i) V = span(.S') with the inner product {/..(/) =
S = {sin/.cos/.!./}. and h(t) ~ 21+ I
(j) V-C. 5- {(1. /,2 - /. -I), (2 1 3z,3i,l -i,2i),
(-l+7i,6+10i,ll-4i,3+4i)},anda:= ( 2 I 7i,6+9i,9-3z,4 l Ai)
(k) V = C. S= {(-4,3 2/,/, 1 -4/),
(-l-5t,5-4i,-3+5i,7-2i),(-27-i, 7 6/.-15+25/,-7 6z)},
and ./• - ( 13 - 7i, -12 + 3/. 39 11 /. -26 •+ 5z)

354 Chap. 6 Inner Product Spaces
(1) V = M2X2(C), 5 =
-25-38/ -2- 13/
12-78/ -7-r24/
>i) V- M2x2(C). 5 =
1 /' -2 - 3/ \ / 8i
2 I 2/ 1 + /' J ' 1-3-3/
and A
-II i
2 i
11 - 132/ -34-31/
7 126/ -71 - 5/
I | 3/
. and A
-2 + 8/ -13 \ i
10-10/ 9-0/
-1-7/ -9 - 8i"
1 -r 10?: -6-2/';
-7 I 5i 3+ 18/
9 - 6/ -3 + 7/
I -1
9' »/0
3. In R2, lei
"-((4=. 4=
Find (he Fourier coefficients of (3,4) relative to 0
4. Let S = {(1,0,i), (1,2,1)} in C3. Compute S1.
5. Let S ./•u}- where XQ is a nonzero vector in R,{. Describe Sn ge
ometrically. Now suppose that S — {x%,x2} is a linearly independent
subset of R3. Describe S geometrically.
6. Let V be an inner product space, and let W be a finite-dimensional
subspace of V. If .;• (f: W, prove that there exisls y c V such that
y e WJ . but (.r.y) / 0. Hint: Use Theorem 0.6.
7. Let 0 be a basis lor a subspace W of an inner product space V. and let
z e V. Prove thai : f W : if and only if (z, v) = 0 for every /' G 3.
8. Prove that if { a-\. w2 ir„ } is an orthogonal set of nonzero vectors.
then the vectors V\, r2 /'„ derived from the (Irani Schmidt process
satisfy v, = ir, for i = 1.2 u. Hint.: Use mathematical induction.
9. Let W - span({(/'.(). 1)}) in C:{. Find orthonormal bases for W and W^.
10. Let W be a. finite-dimensional subspace of an inner product space V.
Prove that there exists a projection T on W along W thai satisfies
|\|(T) = W1. In addition, prove that ||T(J-)|| < ll-''| for all .;• e V.
Hint: Use Theorem 6.6 and Exercise 10 of Section 0.1. (Projections are
defined in the exercises of Sect ion 2.1.)
11. Let A be an // x /; matrix with complex entries. Prove thai .LP = / if
and only if the rows of A form an orthonormal basis for C".
12. Prove that for any matrix A € ;n (R(U.)r =N(L,

Sec. 6.2 Gram-Schmidt Ort nalization Process 355
13. Let V be an inner product space, S and SQ be subsets of V. and W be
a finite-dimensional subspace of V. Prove the following results.
(a) So C 5 implies that 5X C S(f.
(b) S C (6,±)x; so span(S) C (5±)±.
(c) W = (W ' J1, fltnt; Use Exercise 6.
(d) V — W © W^. (See the exercises of Section 1.3.)
14. Let Wi and W2 be subspaces of a. finite-dimensional inner product space.
Prove that (W, + W2)' = WfnW^ and (W, nW2)L = W,1 + W2L. (See
the definition of the sum of subsets of a vector space on page 22.) Hint
for the second equation: Apply Exercise 13(c) to the first equation.
15. Let V be a finite-dimensional inner product space over F.
(a) Parseval's Identity. Let {u\ ,v2,... ,vn} be an orthonormal basis
for V. For any x. y G V prove that
n
(x.y) ="^2{x,Vi)(y,Vi}.
i-l
(b) Use (a) to prove that if 0 is an orthonormal basis for V with inner
product (•, •), then for any x.y 6 V
(Mx),My)Y = ([x}0,[ybY = (x,y), *'
where {•. •) is the standard inner product, on F".
16. (a) Bessel's Inequality. Let V be an inner product space, and let S =
{r,. c2..... vn } be an orthonormal subset of V. Prove that for any
x € V we have
Ml2 >£l <*.*>!'•
Hint: Apply Theorem 6.6 to x e V and W = span(5). Then use
Exercise 10 of Section 6.1.
(b) In the context of (a), prove that Bessel's inequality is an equality
if and only if x G span(5).
17. Let T be a linear operator on an inner product space V. If (T(x),y) = 0
for all .r.y G V, prove that T = To. In fact, prove this result if the
equality holds for all x and y in some basis for V.
18. Let V = C([-l, 1]). Suppose that W,. and W„ denote the subspaces of V
consisting of the even and odd functions, respectively. (See Exercise 22

356 Chap. 6 Inner Product Spaces
of Section 1.3.) Prove that Wg — W0, where the inner product on V is
defined by
</>£> = / MMVdt.
19. In each of the following parts, find the orthogonal projection of the
given vector on the given subspace W of the inner product space V.
(a) V = R2, u = (2,6), and W = {(x, y): y = Ax}.
(b) V = R3, u = (2,1, 3), and W = {{x. y, z): x + 3y - 2z = 0}.
(c) V = P(R) with the inner product. (f(x),g(x)) = J0' /(%(*) *,
/i(.7;) = 4 + 3.7; - 2a;2, and W = P, (R).
20. In each part of Exercise 19, find the distance from the given vector to
the subspace W.
21. Let V = C([-l, 1]) with the inner product (f,g) = f\ f(t)g(t)dt, and
let W be the subspace P2(i?), viewed as a space of functions. Use
the orthonormal basis obtained in Example 5 to compute the "best"
(closest) second-degree polynomial approximation of the function h(t) =
el on the interval [—1,1].
..'
22. Let V = C([0,1]) with the inner product. (f,g) = /J f(t)g(t) dt. Let W
be the subspace spanned by the linearly independent set {/, \/t}.
(a) Find an orthonormal basis for W.
(b) Let h(t) = t2. Use the orthonormal basis obtained in (a) to obtain
the "best" (closest) approximation of //. in W.
23. Let V be the vector space defined in Example 5 of Section 1.2, the
space of all sequences a in F (where F = R or F = C) such that
o~(n) 7^ 0 for only finitely many positive integers n. For rr, fi G V. we
define (a,fi) = y o-(n)fi(n). Since all but a finite number of terms of
n=]
the series are zero, the series converges.
(a) Prove that (•, •) is an inner product on V, and hence V is an inner
product space.
(b) For each positive integer n, let en be the sequence defined by
f-n(k) = <)n.k, where 5n>k is the Kronecker delta. Prove that
{ci, e2,...} is an orthonormal basis for V.
(c) Let an = e\ + en and W = span({<rn : n > 2}.
(i) Prove that eA $ W, so W £ V.
(ii) Prove that W1 = {()}, and conclude that W ^ (\N1)±.

Sec. 6.3 The Adjoint of a Linear Operator 357
Thus the assumption in Exercise 13(c) that W is finite-dimensional
is essential.
6.3 THE ADJOINT OF A LINEAR OPERATOR
In Section 6.1. we defined the conjugate transpose A* of a matrix A. For
a linear operator T on an inner product space V. we now define a related
linear operator on V called the adjoint of T. whose matrix representation
with respect to any orthonormal basis 0 for V is [T]*?. The analogy between
conjugation of complex numbers and adjoints of linear operators will become
apparent. We first need a preliminary result.
Let V be an inner product space, and let y G V. The function g: V —» F
defined by g(.r) = (x,y) is clearly linear. More interesting is the fact that if
V is finite-dimensional, every linear transformation from V into F is of this
form.
Theorem 6.8. Let V be a finite-dimensional inner product space over F,
and let g: V —* F be a linear transformation. Then there exists a unique
vector y G V such that g(x) = (x. y) for all x G V.
Proof. Let 0 = {i?i,v2,.... vn) be an orthonormal basis for V, and let.
» r -
i=l
Define h: V —> F by h(x) = (x.y), which is clearly linear. Furthermore, for
1 < j < w we have
Hvj) = (vj,y) = (vj^gMvi) = Y^g(vi) (vj,vi)
n
^gMSji =g(vj).
i=l / ?'=!
Since g and h both agree on 0, we have that g — h by the corollary to
Theorem 2.6 (p. 73).
To show that y is unique, suppose that g(x) = (x, y') for all ;r. Then
(x.y) = (x,y') for all x; so by Theorem 6.1(e) (p. 333). we have y = y'.
Example 1
Define g: R2 —> R by g(ai, a2) = 2ai +a2; clearly g is a linear transformation.
Let 0 = {ei,e2}, and let y = g(ex)e\ + g(e2)e2 = 2ex + e2 = (2,1), as in the
proof of Theorem 6.8. Then g((i\., a2) = {(ai, a2), (2, 1)) = 2a,\ + a2. •

358 Chap. 6 Inner Product Spaces
Theorem 6.9. Let V be a finite-dimensional inner product space, and let
T be a linear operator on V. Then there exists a unique function T*: V — V
such that (T(.r).u) = (./'. T"(y)) for all .r. y C V. furthermore. T' is linear.
Proof. Lei // G V. Define g: V • F by g(x) - (T(a:),y) for all s G V. We
first show that g is linear. Let X\,x2 € V and c G F. Then
g(cr, + x2) = (T(r.n + x2), y) = (cT(.r,) + T(.r2). y)
- c(T(xl),y) i (T(.r2),//) - cg(x1) + g(.r2).
Hence g is linear.
We now apply Theorem 6.8 to obtain a unique vector ;/ £ V such that
g(x) = (x.y'): that is. (T(.r).y) = {./•.//) for all x G V. Defining T* : V — V
by T'(y) - u'. we have (T(x).y) = (x,T'{y)).
To show that T* is linear, let y\,y2 C V and e G /•'. Then for any x G V,
we have
(./•.T* (r//i +r/2)) = (T(x),q/] +//•_.)
= r(T(.r),/y1) + (T(./-).//2)
= e(*,T*(y1)) + (x,T*(ito)>
= <./T*(y/,) + T*(.y2)).
- a
Since x is arbitrary. T'C"//, -f y2) = cT'(y) + T*(,'/2) by Theorem 6.1(e)
(p. 333).
Finally, we need to show that T* is unique. Suppose thai U: V —» V
is linear and that it satisfies (T(.r).y) == (x, U(y)) for all x.y G V. Then
U. T*(y)) = {./-. 1%)) for all ./•. // G V. so T* - U. I
The lineai operator T' described in Theorem 6.9 is called the adjoint of
the operator T. The symbol T* is read "T star."
Thus T* is the unique operator on V satisfying (T(./•).//) = (x,T*(y)) for
all x.y G V. Note thai we also have
(•'••T(„)} = (T(y).x) = (y,T*(ar)) = (T*(x),y);
so (.r,T(y)) — (T*(.r).y) for all x.y G V. We may view these equations
symbolically as adding a * to T when shifting its position inside the inner
product symbol.
For an infinite-dimensional inner product space, the adjoint of a linear op
erator T may be defined to be the function T* such that (T(x).y) = (x.T'(y))
for all x,y G V. provided it exists. Although the uniqueness and linearity of
T' follow as before, the existence of the adjoint is not guaranteed (see Exer
cise 24). The reader should observe the necessity of the hypothesis of finite-
dimensionality in the proof of Theorem 6.8. Many of the theorems we prove

Sec. 6.3 The Adjoint of a Linear Operator 359
about adjoints. nevertheless, do not depend on V being finite-dimensional.
Thus, unless stated otherwise, for the remainder of this chapter we adopt the
convention that a reference to the adjoin! of a linear operator on an infinite-
dimt nsional inner product space, assumes its existence.
Theorem 6.10 is a useful result for computing adjoints.
Theorem 6.10. Let V be a finite-dimensional inner product space, and
let 3 be an orthonormal basis for V. IfT is a linear operator on V. then
T
0 =
V
Proof. Let A = [T]p, D = [T'V and 0 = {v\, v2. vn}. Then from the
corollary to Theorem 6.5 (p. 346), we have
B,j ~ (r(vj),Vi) = («i,T*(Vi)) = <T(r,),,'y) = Ajt = (A*)ij.
Hence B = A*. |
Corollary. Let A be an n x n matrix. Then L \- — (L ):.
Proof. If 3 is the standard ordered basis for F". then, by Theorem 2.16
(p. 93), we have [lA]0 - A. Hence [(LA)*]/9 = [U]J = A* = [L.,,.].(. and so
(L.,)-=L.,-. I
As an illustration of Theorem 6.10. we compute the adjoint of a.y.speeific
linear operator.
Example 2
Let T be the linear operator on C2 defined by T(a,i,a2) = (2iai+3a2,ai a2).
If 3 is the standard ordered basis for C2. then
So
Hence
m., -
[Ti/j - m -
3
-2/
3 1
T*(ai,a2) = (-2i.a\ -|-a2.3c/] — a2). •
The following theorem suggests an analogy between the conjugates of
complex numbers and the adjoints of linear operators.
Theorem 6.11. Let V be an inner product space, and let T and U be
linear operators on V. Then

360 Chap. 6 Inner Product Spaces
(a) (T+U)*-T* + U*;
(b) (cT)* = e.T* for any c G F;
(c) (TU)* = U*T*:
(d) T** = T:
(e) P = I.
Proof. We prove (a) and (d); the rest are proved similarly. Let x.y G V.
(a) Because
(x,(T + U)*(y)> - ((T4 U)(x),y) = <T(.r) | \J(x).y)
= (T(x),y) + (U(x).y) = (x.T'(y)) f <a?,U*(y))
= (x,T*(y) I U,(K)) = (*.(T*4 U*)(y)>,
T* + U* has the property unique to (T + U)*. Hence T* + U* = (T + U)*.
(d) Similarly, since
(x,T(y)) = {T*(x),y) = (x,T**(y)),
(d) follows.
The same proof works in the infinite-dimensional case, provided that the
existence of T* and U* is assumed.
Corollary. Let A and D be n x n matrices. Then
(a) (.4 4 B)* =A*+B*;
(b) (cA)* =cA* for all e£ F;
(c) (ABY = B*A*;
(d) A**=A;
(c) r - /.
Proof We prove only (c); the remaining parts can be proved similarly.
Since L(/WJ). = (LAB)* = (UU)* - (LB)*(LA)* = LB.LA. - LBM-, we
have (AB)* = B*A*. I
In the preceding proof, we relied on the corollary to Theorem 6.10. An
alternative proof, which holds even for nonsquare matrices, can be given by
appealing directly to the definition of the conjugate transpose of a matrix
(see Exercise 5).
Least Squares Approximation
Consider the following problem: An experimenter collects data by taking
measurements y\. y2,.... ym at times t\,t2,..., /,„, respectively. For example.
he or she may be measuring unemployment at various times during some
period. Suppose that the data (/], y). (t2, y2)...., (tm,ym) are plotted as
points in the plane. (See Figure 6.3.) From this plot, the experimenter

Sec. 6.3 The Adjoint of a Linear Operator 361
feels that there exists an essentially linear relationship between y and t, say
y — ct + d, and would like to find the constants c and d so that the line
y — ct + d represents the best possible fit to the data collected. One such
estimate of fit is to calculate the error E that represents the sum of the
squares of the vertical distances from the points to the line; that is,
E = X> - <*i - d)'
>-i
y = ct + d
Figure 6.3
Thus the problem is reduced to finding the constants c and d that minimize
E. (For this reason the line y = ct + d is called the least squares line.) If
we let
A =
t2
L
and y =
m 1/
yi
\ymJ
then it follows that E — \y — Ax\2.
We develop a general method for finding an explicit vector xn G Fn that
minimizes E: that is, given an m x n matrix A, we find xn £ Fn such that
\y — AXQ\ < \y — Ax\ for all vectors x G Fn. This method not only allows us
to find the linear function that best fits the data, but also, for any positive
integer k, the best fit using a polynomial of degree at most k.

362 Chap. 6 Inner Product Spaces
first, we need some notation and two simple lemmas. For ./.// G F". let
(x,y) denote the standard inner product of .r and y in F". Recall that if x
and y are regarded as column vectors, then (x.y) = ;/"./'.
Lemma 1. Let A < M,„,„(/•')• x c F". and y t Fm. Then
(Ax,y)m = (x,A*y)n.
Proof. By a generalization of the corollary to Theorem (i.ll (see Exer
cise 5(1))). we have
(Ax,y),„ //•(,!./•) (//•,!)./• (A"y)mx= (x,A*y „ . I
Lemma 2. Let A E Mmxn(F). Then rank(^4*.4) rnnk(.-l).
Proof. By the dimension theorem, we need only show that, for ./• G F".
we have A*Ax = 0 if and only if .b- tl. Clearly, Ax II implies that
A*Ax=0. So assume that A*Ax t). Then
0 = {A*Ax,x)n = (Ax,A**x)m Ur.Ax)m.
so that Ax = 0. I
Corollary. If A is an in x n matrix such that iank(.l) n. then A" A is
invertible.
Xow let A he an in x // matrix and y « F'". Define W {Ax: x C F"}:
that is. W = R(L,i). By the corollary to Theorem 6.6 (p. 350). there exists a
unique vector in W that is closest to y. Call this vector Axu. where ./,, r F".
Then ||Ax'o — y\ < \Ax — y\ for all ./• G F"; so .;•„ has the property that
E — \Ax{) — y\ is minimal, as desired.
To develop a practical method for finding such an ,rn. we note from The
orem 6.6 and its corollary that .Lr() y G W" : so (Ax, Ax() tj) in t) for
all ./• C F". Thus, by Lemma 1. we have that (X,A*(AXQ //):„ 0 for all
X G F": that is. A*(Axo — y) — (). So we need only find a solution .;(l to
A*Ax — A'y. If, in addition, we assume that rank(.l) - //. then by Lemma 2
we have XQ = (A*A)' A'y. We summarize this discussion in the following
theorem.
Theorem 6.12. Let A G Mmx„(F) and y G F"'. Then there exist
./•„ G F" such that (A*A)x0 = A*I/ and \Ax0 - y\ < \A.r - y\ for all x • F"
furthermore, if rank(.4) = n. then ,r0 = (A"A)~] A"y.

Sec. 6.3 The Adjoint of a Linear Operator 363
To return to our experimenter, let us suppose that the data collected are
(1,2), (2, 3), (3, 5), and (1.7). Then
/I 1\ /2
A =
2 I
3 1
V4 '7
.r.i
1 2 3 I
1111
and y
0 ]
2 I
3 I
V4 '/
3
\7/
30 10
10 !
Thus
Therefore
20 V—10 30
1 10\ \ 2 3 4
d I ''" 20 V —10 30/ VI
/A
3
5
It follows that the line // = 1.7/ is the least squares line. The error
computed directly as \Axtl — y\ 2 — 0.3.
Suppose that the experimenter chose the times tt (1 < / < m) i
piav be
satisfy
X> = o.
Then the two columns of .4 would be orthogonal, so A*A would be a diagonal
matrix (see Exercise 19). In this case1, the computations are greatly simplified.
In practice, the in x 2 matrix A in our least squares application has rank
equal to two. and hence A*A is invertible by the corollary to Lemma 2. For.
otherwise, the first column of A is a multiple of the second column, which
consists only of ones. But this would occur only if the experimenter collects
all the data at exactly one time.
Finally, the method above may also be applied if, for some /.:, the ex
perimenter wants to fit a polynomial of degree at most /.• to the data. For
instance, if a polynomial y — el2 + dt + e of degree at most 2 is desired, the
appropriate model is
,: - d
.'/ -
fvA
w
and A =

364 Chap. 6 Inner Product Spaces
Minimal Solutions to Systems of Linear Equations
Even when a system of linear equations Ax = b is consistent, there may
be no unique solution. In such cases, it may be desirable to find a solution of
minimal norm. A solution s to Ax — b is called a minimal solution if ||.sj| <
||u|| for all other solutions u. The next theorem assures that every consistent
system of linear equations has a unique minimal solution and provides a
method for computing it.
Theorem 6.13. Let A C Mmx,,(F) and b G F'". Suppose that Ax = b is
consistent. Then the following statements are true.
(a) There exists exactly one minimal solution s of Ax = b, and s G R(L i-).
(b) The vector s is the only solution to Ax = b that lies in R(L,\-); that is,
if u satisfies (AA*)u - b. then s = A*u.
Proof, (a) For simplicity of notation, we let W = R(L4.) and W = N(L.4).
Let x be any solution to Ax = b. By Theorem 6.6 (p. 350). ./• = s + y for
some s G W and y G W±. But W- = W by Exercise 12. and therefore
h = Ax = As + Ay = As. So s is a solution to Ax = b that lies in W. To
prove (a), we need only show that s is the unique minimal solution. Let u be
any solution to Ax — b. By Theorem 3.9 (p. 172). we have that v = s + u.
where u C W. Since s G W. which equals W by Exercise 12, we have
HI2 + > .slh'
by Exercise 10 of Section 6.1. Thus s is a minimal solution. We can also see
from the preceding calculation that if ||v|| = ||.sj|, then u — 0; hence v = s.
Therefore s is the unique minimal solution to Ax — b. proving (a).
(b) Assume that v is also a. solution to Ax = b that lies in W. Then
v -,sewnw'-wnw--j(i|:
so V = s.
Finally, suppose that (AA*)u = b, and let v = A*u. Then v G W and
Av = b. Therefore s — v = A*u by the discussion above;. |
Example 3
Consider the system
x + 2y + z = A
x - y + 2z = -U
x + 5-1/ = 19.
Let

Sec. 63 The Adjoint of a Linear Operator 365
To find the minimal solution to this system, we must first find some solution
a to AA*x = 6. Now
SO we consider the system
6.7- + y+llz = A
x + (u/ - Az - -11
llx-4y 4-262= 19,
for which one solution is
(Any solution will suffice.) Hence
s = A*u =
is the minima] solution to the given system. •
EXERCISES
1. Label the following statements as true or false. Assume that the under
lying inner product spaces are finite-dimensional.
(a) Every linear operator has an adjoint.
(b) Every linear operator on V has the form x —> (x.y) for some y G V.
(c) For every linear operator T on V and every ordered basis 0 for V,
we have [T*]0 = {[T]PY-
(d) The adjoint of a linear operator is unique.
(e) For any linear operators T and U and scalars a and /;.
(aT -f-fcU)* =aV I-6U*.
(f) For any n x n matrix .1. we have (L.i)* = L,\~.
(g) For any linear operator T. we have (T*)* — T.
2. Lor each of the following inner product spaces V (over F) and linear
transformations g: V — /•'. find a vector .// such that g(x) = (x.y) for
all .r G V.

366 Chap. 6 Inner Product Spaces
(a) V = R;i: g(r;.], a2, a-A) — <i\ - 2n2 -f- Aa3
(b) V-C2.g(~, = -••, - 9rn
(c) V - P2(/f) with (/, h) - / f(t)h(t) dt, g(/) = /(0) + /'(l)
./o
For each of the following inner product spaces V and linear operators T
on V, evaluate T* at the given vector in V.
(a) V = R2, T(«. b) = (2a + IK a - 36), x - (3. 5).
(b) V - C2, T(2,, z2) = (22, + iz2.(l - i)Zy), x = (3 - t, 1 + 2i).
(c) V-P,(/?) with (/,<?)= | f(t)g(t)dt,T(f) = f' + 3f,
fit) = 4 - 2/.
Complete the proof of Theorem 6.11.
(a) Complete the proof of the corollary to Theorem 6.11 by using
Theorem 6.11, as in the proof of (c).
(b) State a result for nonsquare matrices that is analogous to the corol
lary to Theorem 6.11, and prove it using a matrix argument.
Let T be a linear operator on an inner product space V. Let lb = T + T*
and U2 - TT*. Prove that U, = U? and U2 - U2.
Give an example of a. linear operator T on an inner product space V
such that N(T) ^ N(T*).
Let V be a finite-dimensional inner product space, and let T be a linear
operator on V. Prove that if T is invertible, then T* is invertible and
m ! = (T-1)*.
9. Prove that if V = W ® W^ and T is the projection on W along W-.
then T - T":. Hint: Recall that N(T) = W' . (For definitions, see the
exercises of Sections 1.3 and 2.1.)
10. Let T be a linear operator on an inner product space V. Prove that
||T(x-)|| - IkH for all x C V if and only if (T(x),T(y)) - (x.y) for all
x.y G V. Hint: Use Exercise 20 of Section 6.1.
11. For a linear operator T on an inner product space V. prove that T*T =
T(, implies T — To. Is the same result true if we assume that TT" = To?
12. Let V be an inner product space, and let T be a linear operator on V.
Prove the following results.
(a) R(T*)± = N(T).
(b) If V is finite-dimensional, then R(T*) - N(T)1 . Hint: Use Exer
cise 13(c) of Section 6.2.

Sec. 6.3 The Adjoint of a Linear Operator 367
13. Let T be a linear operator on a finite-dimensional inner product space
V. Prove the following results.
(a) N(T*T) = N(T). Deduce that rank(T*T) = rank(T).
(b) rank(T) = rank(T*). Deduce from (a) that rank(TT*) = rank(T).
(c) For any n x n matrix A. rank(A*A) = rank(AA*) = rank(^4).
14. Let V be an inner product space, and let, y. z G V'. Define T: V —* V by
T(x) — (x.y)z for all x G V. First prove that T is linear. Then show
that T* exists, and find an explicit expression for it.
The following definition is used in Exercises 15-17 and is an extension of the
definition of the adjoint of a linear operator.
Definition. Let T: V —» W be a linear transformation, where V and W
are finite-dimensional inner product spaces with inner products (•, •) j and
(•, ')2, respectively. A function T*: W —» V is called an adjoint of T if
(T(x). y)2 = (x, T*(y))j for all x6V and y G W.
15. Let T: V —+ W be a linear transformation, where V and W are finite-
dimensional inner product spaces with inner products {•, •), and {•, •),,,
respectively. Prove the following results.
(a) There is a unique adjoint T* of T, and T* is linear.
(b) If 3 and 7 are orthonormal bases for V and W. respectively, then
[T1? = ([T]}Y-
(c) rank(T*) = rank(T).
(d) (T*(aO,y)j = (x, T(y))2 for all x G W and y G V.
(e) For all x G V. T*T(>) = 0 if and only if T(x) = 0.
16. State and prove a result that extends the first four parts of Theorem 6.11
using the preceding definition.
17. Let T: V —* W be a linear transformation, where V and W are finite-
dimensional inner product spaces. Prove that (R(T*)) ' = N(T), using
the preceding definition.
18.T Let A be an n x n matrix. Prove that det(.4*) = det(A).
19. Suppose that A is an rnxn matrix in which no two columns are identical.
Prove that A* A is a diagonal matrix if and only if every pair of columns
of A is orthogonal.
20. For each of the sets of data that follows, use the least squares approx
imation to find the best fits with both (i) a linear function and (ii) a
quadratic function. Compute the error E in both cases.
(a) {(-3,9), (-2,6), (0,2), (1,1)}

368 Chap. 6 Inner Product Spaces
(b) {(1,2), (3,4), (5,7), (7,9), (9,12)}
(c) {(-2,4), (-1,3), (0,1), (1,-1), (2,-3)}
21. In physics. Hooka's law states that (within certain limits) there is a
linear relationship between the length x of a spring and the force y
applied to (or exerted by) the spring. That is, y = ex + d, where c is
called the spring constant. Use the following data to estimate the
spring constant (the length is given in inches and the force is given in
pounds).
Length
X
3.5
4.0
4.5
5.0
Force
y
1.0
2.2
2.8
4.3
22. Find the minimal solution to each of the following systems of linear
(xiuations.
(aj x + 2y - z = 12
x+y-z=0
(c) 2x -y + z = 3
x-y+z=2
x + 2y - z = 1
(b) 2x + 3y + 2 = 2
Ax + ly - z = 4
(d)
x + y + z — w, — 1
2x - y w = 1
23. Consider the problem of finding the least squares line y = ct + d corre
sponding to the rn observations (t\, ij), (f2, y2),..., (tm,ym).
(a) Show that the equation (A*A)xo = A*y of Theorem 6.12 takes the
form of the normal equations:
£'?<'£
ti)d = Y^tiVi
and
^L c + rnd-- ^2'yi-
,i=1 i-1
These equations may also be obtained from the error E by setting
the partial derivatives of E with respect to both c and d. equal to
zero.

Sec. 6.4 Normal and Self-Adjoint Operators 369
(b) Use the second normal equation of (a) to show that the least
squares line must pass through the center of mass, (i,y), where
- 1 V^
tnd
y =
m
£«•
i=l
24. Let V and {ei, e2,...} be defined as in Exercise 23 of Section 6.2. Define
T: V -* V by
oo
T(a)(k) = y a(i) for every positive integer A:.
i=k
Notice that the infinite series in the definition of T converges because
a(i) ^ 0 for only finitely many i.
(a) Prove that T is a linear operator on V.
(b) Prove that for any positive integer n, T(en) = Yl7=i ei-
(c) Prove that T has no adjoint. Hint: By way of contradiction,
suppose that T* exists. Prove that for any positive integer n,
T*(en)(k) 7^ 0 for infinitely many k.
6.4 NORMAL AND SELF-ADJOINT OPERATORS
We have seen the importance of diagonalizable operators in Chapter 5. For
these operators, it is necessary and sufficient for the vector space V to possess
a basis of eigenvectors. As V is an inner product space in this chapter, it
is reasonable to seek conditions that guarantee that V has an orthonormal
basis of eigenvectors. A very important result that helps achieve our goal is
Schur's theorem (Theorem 6.14). The formulation that follows is in terms of
linear operators. The next section contains the more familiar matrix form.
We begin with a lemma.
Lemma. Let T be a linear operator on a finite-dimensional inner prodtict
space V. If T has an eigenvector, then so does T*.
Proof Suppose that e is an eigenvector of T with corresponding eigenvalue
A. Then for any x G V,
0 = (0.x) = ((T - W)(v),x) = (v. (T - Al)*(ar)> = (v, (T* - Al)(ar)> ,
and hence v is orthogonal to the range of T* — Al. So T* — AI is not onto
and hence is not one-to-one. Thus T* — Al has a nonzero null space, and any
nonzero vector in this null space is an eigenvector of T* with corresponding
eigenvalue A.

370 Chap. 6 Inner Product Spaces
Recall (see the exercises of Section 2.1 and see Section 5.4) that a subspace
W of V is said to be T-invariant if T(W) is contained in W. If W is T~
invariant, we may define the restriction Tw: W • W by Tw(x) — T(x) for all
x G W. It is clear that Tw is a linear operator on W. Recall from Section 5.2
that a polynomial is said to split if it factors into linear polynomials.
Theorem 6.14 (Schur). Let T be a linear operator on a finite-
dimensional inner product space V. .Suppose that the characteristic poly
nomial of T splits. Then there exists an orthonormal basis 0 forV such that
the matrix [T].-< is upper triangular.
Proof. The proof is by mathematical induction on the dimension // of V.
The result is immediate if n — I. So suppose that the result is true for linear
operators on (it . — l)-dimensional inner product spaces whose characteristic
polynomials split. By the lemma, we can assume that T* has a unit eigen
vector 2. Suppose that T*(z) = Xz and that W - span({~}). We show that
W ' is T-invariant. If y € W ' and x ~ cz G W, then
M
(T(y),x> = <T(v),cz> =_<y,T*(cz)) -- (y,cT'(z)> - (y,cAz>
= cA (y, z) = cA(0) = 0.
So T(y) G W^. It is easy to show (see Theorem 5.21 p. 314, or as a con
sequence of Exercise 6 of Section 4.4) that the characteristic polynomial of
Tw i divides the characteristic polynomial of T and hence splits. By Theo
rem 6.7(c) (p. 352). (Ini^W1) — n — 1, so we may apply the induction hy
pothesis to Tw and obtain an orthonormal basis 7 of W1 such that [Tw-]-
is upper triangular. Clearly, 0 - )
that [T],i is upper triangular.
LJ is an orthonormal basis for V such
We now return to our original goal of finding an orthonormal basis of
eigenvectors of a linear operator T on a finite-dimensional inner product space
V. Note that if such an orthonormal basis 0 exists, then \T]p is a diagonal
matrix, and hence [T*]^ = [T]!, is also a diagonal matrix. Because diagonal
matrices commute, we conclude that T and T* commute. Thus if V possesses
11.11 ortho-normal, basis of eigenvectors of T, then TT* = T*T .
Definitions. Let V be an inner product space, and let T be a linear
operator on V. We say that T is normal if TT* = T*T. An n. x n real or
complex matrix A is normal if AA* = A* A.
It follows immediately from Theorem 6.10 (p. 359) that T is normal if and
only if [T]/3 is normal, where 0 is an orthonormal basis.
I

Sec. 6.4 Normal and Self-Adjoint Operators 371
Example 1
Let T: R- —* R2 be rotation by 0. where ()<()< TT. The matrix representation
of T in the standard ordered basis is given by
A =
0 - sin 0
and cos 0)
Note that A A* = I = A*A: so A. and hence T. is normal. •
Example 2
Suppose that A is a real skew-symmetric matrix: that is. A' = —A. Then A
is normal because both AA' and A'A are equal to -A . •
Clearly, the operator T in Example 1 does not even possess one eigenvec
tor. So in the case of a real inner product space, we see that normality is not
sufficient to guarantee an orthonormal basis of eigenvectors. All is not lost,
however. We show that normality suffices if V is a complex inner product
space.
Before we prove the promised result for normal operators, we need some
general properties of normal operators.
Theorem 6.15. Let V be an inner product space, and let T be a normal
operator on V. Then the following statements are true.
(a) ||T(.r)|| - ||T*(;r)|| for all x € V.
(b) T — cl is normal for every e G F.
(c) If x is an eigenvector ofT. then x is also an eigenvector of T*. In fact.
ifj(x) = Xx. then T*(x) = Xx.
(d) If Ai and A2 are distinct eigenvalues of T with corresponding eigenvec
tors X] and x2, then X\ and x2 are orthogonal.
Proof (a) For any x G V, we have
||T(,;)||2 = (TOr),T(aO) - <T*T(.r),:,:> = (TT*(x),x)
= (T*(x),T*(x))^\T*(x)\2.
The proof of (b) is left as an exercise.
(c) Suppose that T(x) = Xx for some x G V. Let U = T - Al. Then
U(.r) = 0. and U is normal by (b). Thus (a) implies that
0 = ||U0r)|| - ||U*(.r)|| = ||(T* - AI)0O|| - ||T*0r) - \x\.
Hence T*(.r) = Xx. So x is an eigenvector of T*.
(d) Let A] and A2 be distinct eigenvalues of T with corresponding eigen
vectors X] and x>. Then, using (c), we have
A, (Xl,x2) = (XlXl,x2) - (T(xi),ar2> - (•/;,,T*(:r2)>

372 Chap. 6 Inner Product Spaces
= (xi, X2x2) = X2 (xi,x2) •
Since Ai ^ A2, we conclude that (x\,x2) =0. 1
Theorem 6.16. Let T be a linear operator on a hnitc-dimensional com
plex inner product space V. Then T is normal if and only if there exists an
orthonormal basis for V consisting of eigenvectors of T.
Proof. Suppose that T is normal. By the fundamental theorem of algebra
(Theorem D.4), the characteristic polynomial of T splits. So we may apply
Schur's theorem to obtain an orthonormal basis f3 = {v\,v2,... ,vn} for V
such that [T]tf = A is upper triangular. We know that V] is an eigenvector
of T because A is upper triangular. Assume that V\, v2,... ,Vk-i are eigen
vectors of T. We claim that vk is also an eigenvector of T. It then follows
by mathematical induction on k that all of the Uj's are eigenvectors of T.
Consider any j < k, and let A; denote the eigenvalue of T corresponding to
Jo-
By Theorem 6.15, T*(VJ) = XjVj. Since A is upper triangular.
T(vk) = A\kvi + A2kv2 + ••• + AjkVj + ••• + Akkvk.
Furthermore, by the corollary to Theorem 6.5 (p. 347),
Ajk = (T(vk),Vj) = (vk,T*(Vj)) = (vk,XjVj) = A, (vkfVj) = 0.
It follows that T(vk) = Akkvk, and hence vk is an eigenvector of T. So by
induction, all the vectors in p* are eigenvectors of T.
The converse was already proved on page 370. I
Interestingly, as the next example shows, Theorem 6.16 does not extend
to infinite-dimensional complex inner product spaces.
Example 3
Consider the inner product space H with the orthonormal set S from Exam
ple 9 in Section 6.1. Let V = span(S'), and let T and U be the linear operators
on V defined by T(/) - /,/ and U(/) = f_uf. Then
T(/n) = /„.+ ! and U(/n) = /n_]
for all integers n. Thus
(T(/m),/n) = (/m+li/n) = <S(m+l),n = ^m..{r,-) = {fm, fn-l) ~ (fin- U(/„)) .
It follows that U = T*. Furthermore, TT* = I = T*T: so T is normal.
We show that T has no eigenvectors. Suppose that / is an eigenvector of
T, say, T(/) = Xf for some A. Since V equals the span of S, we may write
in
f = J^ a-ifi, where am ^ 0.

Sec. 6.4 Normal and Self-Adjoint Operators 373
Hence
J^aifi+1=T(f) = Xf = TXaifi
Since am ^ 0, we can write /m+i as a linear combination of f„, /n+i, • • • ,fm-
But this is a contradiction because S is linearly independent. •
Example 1 illustrates that normality is not sufficient to guarantee the
existence of an orthonormal basis of eigenvectors for real inner product spaces.
For real inner product spaces, we must replace normality by the stronger
condition that T = T* in order to guarantee such a basis.
Definitions. Let T be a linear operator on an inner product space V.
We say that. T is self-adjoint (Hermitian) if T = T*. An n x n real or
complex matrix A is self-adjoint (Hermitian) if A = A*.
It follows immediately that if 0 is an orthonormal basis, then T is self-
adjoint if and only if [T]^ is self-adjoint, for real matrices, this condition
reduces to the requirement that A be symmetric.
Before we state our main result for self-adjoint operators, we need some
preliminary work.
By definition, a linear operator on a real inner product space has only
real eigenvalues. The lemma that follows shows that the same can *be said
for self-adjoint operators on a complex inner product space. Similarly, the
characteristic polynomial of every linear operator on a complex inner product
space splits, and the same is true for self-adjoint operators on a real inner
product space.
Lemma. Let T be a self-adjoint operator on a finite-dimensional inner
prochict space V. Then
(a) Every eigenvalue o/'T is real.
(b) Suppose that V is a real inner product space. Then the characteristic
polynomial of T splits.
Proof, (a) Suppose that T(x) = Xx for x -/= 0. Because a self-adjoint
operator is also normal, we can apply Theorem 6.f 5(c) to obtain
XX = T(X) = T*(X) = \X.
So A = A; that is. A is real.
(b) Let n. — dim(V), 0 be an orthonormal basis for V, and A = [T]p.
Then A is self-adjoint. Let T A be the linear operator on C" defined by
TA(X) = Ax for all x G C". Note that TA is self-adjoint because [T.4]7 = A,
where 7 is the standard ordered (orthonormal) basis for C". So, by (a),
the eigenvalues of ~T A are real. By the fundamental theorem of algebra, the

374 Chap. 6 Inner Product Spaces
characteristic polynomial of T.4 splits into factors of the form t — A. Since each
A is real, the characteristic polynomial splits over R. But T.4 has the same
characteristic polynomial as A, which has the same characteristic polynomial
as T. Therefore the characteristic polynomial of T splits. I
We are now able to establish one of the major results of this chapter.
Theorem 6.17. Let T be a linear operator on a finite-dimensional real
inner product space V. Then T is self-adjoint if and only if there exists an
orthonormal basis 0 for V consisting of eigenvectors of T.
Proof. Suppose that T is self-adjoint. By the lemma, we may apply Schur's
theorem to obtain an orthonormal basis 0 for V such that the matrix A = \T]p
is upper triangular. But
A* = [T]J = [Tfc = [T)p = A.
So A and A* are both upper triangular, and therefore A is a diagonal matrix.
Thus 3 must consist of eigenvectors of T.
The converse is left as an exercise. I
Theorem 6.17 is used extensively in many areas of mathematics and statis
tics. We restate this theorem in matrix form in the next section.
Example 4
As we noted earlier, real symmetric matrices are self-adjoint, and self-adjoint
matrices are normal. The following matrix A is complex and symmetric:
*-(! l) -* A*={-i 1
But A is not normal, because (AA*)\2 = 1 + i and (A*A)\2 = 1 — i. Therefore
complex symmetric matrices need not. be normal. •
EXERCISES
1. Label the following statements as true or false. Assume that the under
lying inner product spaces are finite-dimensional.
(a) Every self-adjoint operator is normal.
(b) Operators and their adjoints have the same eigenvectors.
(c) If T is an operator on an inner product space V. then T is normal
if and only if \T]p is normal, where 0 is any ordered basis for V.
(d) A real or complex matrix A is normal if and only if L.4 is normal.
(e) The eigenvalues of a self-adjoint operator must all be real.

Sec. 6.4 Normal and Self-Adjoint Operators 375
(f) The identity and zero operators are self-adjoint.
(g) Every normal operator is diagonalizable.
(h) Every self-adjoint operator is diagonalizable.
2. For each linear operator T on an inner product space V, determine
whether T is normal, self-adjoint, or neither. If possible, produce an
orthonormal basis of eigenvectors of T for V and list the corresponding
eigenvalues.
(a) V - R2 and T is defined by T(a.b) = (2a - 26, -2a \ 56).
(b) V - R:i and T is defined by T(a, 6, c) = (-a + b, 56, Aa - 26 + 5c).
(c) V - C2 and T is defined by T(o, 6) = (2a + ib. a + 26).
(d) V - P2(/r) and T is defined by T(/) - /'. where
(/,<?>= / f(t)g(t)dt.
(e) V = M2x2(/?) and T is defined by T(A) =•. A<
(f) V = M2x2(i?) and T is defined by T
d
c d
a 6
3. Give an example of a linear operator T on R2 and an ordered basis for
R that provides a counterexample to the statement in Exercise 1(c).
>•
4. Lei T and U be self-adjoint operators on an inner product space V.
Prove that TU is self-adjoint if and only if TU = UT.
5. Prove (b) of Theorem 6.15.
6. Let V be a complex inner product space, and let T be a linear operator
on V. Define
Ti = i(T + r) and T2 = i(T- T*).
2 2/
(a) Prove that T| and T2 are self-adjoint and that T = Tj +iT2.
(b) Suppose also that T = U,i +iU2, where U| and U2 are self-adjoint.
Prove that Ui = T] and U2 = T2.
(c) Prove that T is normal if and only if T|T2 = T2T|.
7. Let T be a linear operator on an inner product space V, and let W be
a T-invariant subspace of V. Prove the following results.
(a) If T is self-adjoint, then Tw is self-adjoint.
(b) W is T'-invariant.
(c) If W is both T- and T*-invariant, then (Tw)* = 0~*)w
(d) If W is both T- and T*-invariant and T is normal, then Tw is
normal.

r
376 Chap. 6 Inner Product Spaces
8. Let T be a normal operator on a finite-dimensional complex inner
product, space V. and let W be a. subspace of V. Prove that if W is
T-invariant, then W is also T*-invariant. Hint: Use Exercise 24 of Sec
tion 5.4.
9. Let T be a normal operator on a finite-dimensional inner product space
V. Prove that N(T) - N(T*) and R(T) - R(T*). Hint: Use Theo
rem 6.15 and Exercise 12 of Section 6.3.
10. Let. T be a self-adjoint operator on a finite-dimensional inner product
space V. Prove that for all x C V
!|T(;r)±;;r||2-!|T0r)||2 + ||^||2.
Deduce that T - ,\ is invertible and that [(T - /I)"1]* = (T + i)~l.
11. Assume that T is a. linear operator on a complex (not necessarily finite-
dimensional) inner product space V with an adjoint T*. Prove the
following results.
(a) If T is self-adjoint, then (T(x),x) is real for all ./• G V.
(b) If T satisfies (T(x),x) = 0 for all x G V. then T = T„. Hint:
Replace x by x I y and then by x f iy, and expand the resulting
inner products.
(c) If (T(x).x) is real for all x G V. then T = T*.
12. Let T be a normal operator on a finite-dimensional real inner product
space V whose characteristic polynomial splits. Prove that V has an
orthonormal basis of eigenvectors of T. Hence prove that T is self-
adjoint.
13. An n. x n. n>al matrix A is said to be a Gramian matrix if there exists a
real (square) matrix 13 such that A — P' B. Prove that A is a Gramian
matrix if and only if A is symmetric and all of its eigenvalues are non-
negative. Hint: Apply Theorem 6.17 to T = L4 to obtain an orthonor
mal basis {('], y2,.... v„} of eigenvectors with the associated eigenvalues
Ai, A2,. . ., A„. Define the linear operator U by U (*•';) = \/X~iV;.
14. Simultaneous Diagonalization. Let V be a finite-dimensional real inner
product space, and let U and T be self-adjoint linear operators on V
such that UT = TU. Prove that there exists an orthonormal basis for
V consisting of vectors that are eigenvectors of both U and T. (The
complex version of this result appears as Exercise 10 of Section 6.6.)
Hint: For any eigenspace W — E^ of T. we have that W is both T- and
U-invariant. By Exercise 7. we have that W L is both T- and U-invariant.
Apply Theorem 6.17 and Theorem 6.6 (p. 350).

Sec. 6.4 [\Jormal and Self-Adjoint Operators 377
15. Let .4 and B be symmetric n x n matrices such that AB = BA. Use
Exercise 14 to prove that there exists an orthogonal matrix P such that
P'AP and PtBP an- both diagonal matrices.
16. Prove the Cayley Hamilton theorem for a complex n.xn matrix A. That
is. if /'(/) is the characteristic polynomial of A, prove that f{A) — ().
Hint: Use Schur's theorem to show that A may be assumed to be upper
triangular, in which case
/(*) = U(Au -t)
i i
Now if T = l_4. we have (Ajj\ - T)(e;) G span({< he2 e, i}) for
j > 2, where {e.\,e2,..., en} is the standard ordered basis for C". (The
general case is proved in Section 5.4.)
The following definitions are used in Exercises 17 through 23.
Definitions. A linear operator T on a. linite-dimeusional inner product
space is called positive definite [positive scmidefinitc] if T is self-adjoint
and (T(x),x) > 0 [(T(x)..x) > 0] for all x f 0.
An ii x n matrix A with entries from R or (' is called positive definite
positive semidefinite} HL,\ is positive definite [positive semideBmte].
17. Let T and U be a self-adjoint linear operators on an //-dimensional inner
product space V. and let A = [T]^, where 0 is an orthonormal basis for
V. Prove the following results.
(a) T is positive definite [semidefinite] if and only if all of its eigenval
ues are positive [nonnegative].
(b) T is positive definite if and only if
y AjjO.jU.j > 0 for all nonzero //.-tuples (ai,o2,... ,an).
(c) T is positive semidefinite if and only if A — B* 11 for some square
matrix II.
(d) If T and U are positive semidefinite operators such that T2 — U2.
then T = U.
(e) If T and U are positive definite operators such that TU = UT, then
TU is positive definite.
(f) T is positive definite [semidefinite] if and only if .4 is positive def
inite [semidefinite].
Because of (f). results analogous to items (a) through (d) hold for ma
trices as well as operators.

378 Chap. 6 Inner Product Spaces
18. Let T: V —> W be a linear transformation, where V and W are finite-
dimensional inner product spaces. Prove the following results.
(a) T*T and TT* are positive semidefinite. (See Exercise 15 of Sec
tion 6.3.)
(b) rank(T*T) = rank(TT*) = rank(T).
19. Let T and U be positive definite operators on an inner product space
V. Prove the following results.
(a) T + U is positive definite.
(b) If c > 0, then cT is positive definite.
(c) T is positive definite.
20. Let V be an inner product space with inner product (•, •)• an<l l°t T oe
a positive definite linear operator on V. Prove that (x.y) = (T(x),y)
defines another inner product on V.
21. Let V be a. finite-dimensional inner product space, and let T and U be
self-adjoint operators on V such that T is positive definite. Prove that
both TU and UT are diagonalizable linear operators that have only real
eigenvalues. Hint: Show that UT is self-adjoint with respect to the inner
product (x,y) = (T(a;),y). To show that TU is self-adjoint, repeal the
argument with T_' in place of T.
22. This exercise provides a converse to Exercise 20. Let V be a finite-
dimensional inner product space with inner product (•. •), and let (•. •)
be any other inner product on V.
(a) Prove that there exists a unique linear operator T on V such
that (x.y)' = (T(x),y) for all x and y in V. Hint: Let 0 =
{vi,v2,... ,vn} be an orthonormal basis for V with respect to
(•. •), and define a matrix A by Ajj = (vj,Vi) for all i and j.
Lei T be the unique linear operator on V such that [T],j — A.
(b) Prove that the operator T of (a) is positive definite with respect
to both inner products.
23. Let U be a diagonalizable linear operator on a finite-dimensional inner
product, space V such that all of the eigenvalues of U are real. Prove that
there exist positive definite linear operators T| and T[ and self-adjoint
linear operatorsT2 and T2 such that U = T2Tj = T',T2. Hint: Let. (•, •)
be the inner product associated with V, 0 a basis of eigenvectors for U.
{•. •) the inner product on V with respect to which 0 is orthonormal
(see Exercise 22(a) of Section 6.1), and Ti the positive definite operator
according to Exercise 22. Show that U is self-adjoint with respect to
(•.•)' and U - T^'U'T, (the adjoint is with respect to (•.•)). Let
T2 = Tr1U*.

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 379
24. This argument gives another proof of Schur's theorem. Let T be a linear
opera!or on a finite dimensional inner product space V.
(a) Suppose that 0 is an ordered basis for V such that [T],^ is an upper
triangular matrix. Let 7 be the orthonormal basis for V obtained
by applying the Gram Schmidt orthogonalization process to 0 and
then normalizing the resulting vectors. Prove that [T]7 is an upper
triangular matrix.
(b) Use Exercise 32 of Section 5.4 and (a) to obtain an alternate proof
of Schur's theorem.
6.5 UNITARY AND ORTHOGONAL OPERATORS
AND THEIR MATRICES
In this section, we continue our analogy between complex numbers and linear
operators. Recall that the adjoint of a. linear operator acts similarly to the
conjugate of a complex number (see, for example, Theorem 6.11 p. 359). A
complex number 2 has length 1 if zz = 1. In this section, we study those
lineai- operators T on an inner product space V such that TT* = T*T — I. We
will see that these are precisely the linear operators that "preserve length"
in the sense that ||T(x)|| = ||x|| for all x G V. As another characterization,
we prove that, on a finite-dimensional complex inner product space, these are
the normal operators whose eigenvalues all have absolute value I.
In past chapters, we were interested in studying those functions that pre
serve the structure of the underlying space. In particular, linear operators
preserve the operations of vector addition and scalar multiplication, and iso
morphisms preserve all the vector space structure. It is now natural to con
sider those linear operators T on an inner product space that preserve length.
We will see that this condition guarantees, in fact, that T preserves the inner
product.
Definitions. Let T be a linear operator on a Unite-dimensional inner
product space V (over F). If ||T(.z-)j| = ||./;|| for all x G V, we cail T a unitary
operator if F — (' and an orthogonal operator if F — R.
It should be noted that, in the infinite-dimensional case, an operator sat
isfying the preceding norm requirement is generally called an isornetry. If,
in addition, the operator is onto (the condition guarantees one-to-one), then
the operator is called a unitary or orthogonal operator.
Clearly, any rotation or reflection in R2 preserves length and hence is
an orthogonal operator. We study these operators in much more detail in
Section 6.1 I.

380 Chap. 6 Inner Product Spaces
Example 1
Let h G H satisfy \h(x)\ = 1 for all x. Define the linear operator T on H by
T(/) = hf. Then
l|T(/)||2 = \hff = ±-J "h(t)f(t)Mjm)dt = Wff
since |/?(/)|2 = 1 for all t. So T is a unitary operator. •
Theorem 6.18. Let T be a linear operator on a finite-dimensional inner
product space V. Then the following statements are equivalent.
(a) TT* = T*T = I.
(b) (T(x),T(y)) = (x.y) for all x,y G V.
(c) If ft is an orthonormal basis for V, then T(0) is an orthonormal basis
for V.
(d) There exists an orthonormal basis 0 for V such that. T(/3) is an orthonor
mal basis for V.
(e) ||T(:/;)|| = ||z|| for all x € V.
Thus all the conditions above are equivalent to the definition of a uni
tary or orthogonal operator. From (a), it follows that unitary or orthogonal
operators are normal.
Before proving the theorem, we first prove a lemma. Compare this lemma
to Exercise 11(b) of Section 6.4.
Lemma. Let U be a self-adjoint operator on a finite-dimensional inner
product space V. If (x, U(ar)) = 0 for all x G V, then U = T0.
Proof By either Theorem 6.16 (p. 372) or 6.17 (p. 374). we may choose
an orthonormal basis 0 for V consisting of eigenvectors of U. If x G 0, then
U(x) = Xx for some A. Thus
0 = (x. U(.r)> = <;/;, Xx) = X (x, x);
so A = 0. Hence U(x) = 0 for all x G 0, and thus U = T„. I
Proof of Theorem, 6.18. We prove first that (a) implies (b). Let x,y G V.
Then (a;, y) = (T*T(x), y) = (T(ar),T(y)).
Second, we prove that (b) implies (c). Let 0 = {v\,v2..... vn) be an
orthonormal basis for V; so T(0) = {T(t,1),T(/;2) ,T(e„)}. It follows that
(T(Vi),T(vj)) = (vi,Vj) — Sjj. Therefore T(p') is an orthonormal basis for V.
That (c) implies (d) is obvious.
Next we prove that (d) implies (e). Let x G V, and let 0 — {v\,v2,.... vn}.
Now
E

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices
for some scalars a,;, and so
381
Ixll2 =
^2aiVi,J2aJvJ
3=1
a ii
i=i j=i
i=i j=
Ei«.i2
i=
since 0 is orthonormal.
Applying the same manipulations to
n
T(x) = ]T>T(-,0
r 1
and using the fact, that T(0) is also orthonormal, we obtain
IITMIP = D«,I'-'.
i - i
Hence ||T(.r)|| = ||.x||.
Finally, we prove? that (e) implies (a). For any x G V, we have
[X.X) — \x
= \T(x)\2 = (T(x),T(x)) = (x,T*T(x)
So (re, (I -T*T)0r)) - 0 for all .;• G V. Let U = I - T*T; then U is self-
adjoint, and (re, U(x)) = 0 for all x G V. Hence, by the lemma, we have
To = U = I — T*T. and therefore T*T — I. Since V is finite-dimensional, we
may use Exercise 10 of Section 2.4 to conclude that TT* — I.
It follows immediately from the definition that every eigenvalue of a uni
tary or orthogonal operator has absolute value 1. In fact, even more is true.
Corollary 1. Let T be a linear operator on a finite-dimensional real
inner product space V. Then V has an orthonormal basis of eigenvectors of
T with corresponding eigenvalues of absolute value 1 if and only ifT is both
sell-adjoint and orthogonal.
Proof. Suppose that V has an orthonormal basis {?,'|, v2, • • •, vn} such that
T(vi) — Xji'i and |A,| = 1 for all i. By Theorem 6.17 (p. 374), T is self-adjoint.
Thus (TT*)(vi) = T(Xii'i) = A;A;e, = X'jvj = Vi for each /". So TT* = I, and
again by Exercise 10 of Section 2.4. T is orthogonal by Theorem 6.18(a).
If T is self-adjoint, then, by Theorem 6.17, we have that V possesses an
orthonormal basis {v\, t'2,..., vn} such that T(v.i) = A,.v.j for all ?'. If T is also
orthogonal, we have
I'M "INI = IIAjUiH = ||T(UJ)|| = \vi\;
so I Aj I = 1 for every i.

382 Chap. 6 Inner Product Spaces
Corollary 2. Let T be a linear operator on a finite-dimensional complex
inner product space V. Then V has an orthonormal basis of eigenvectors ofT
with corresponding eigenvalues of absolute value 1 if and only ifT is unitary:
Proof. The proof is similar to the proof of Corollary 1. j|
Example 2
Let T: R2 —> R2 be a rotation by 6>, where 0 < 0 < n. It is clear geometrically
that T "preserves length," that is, that ||T(:r)|| = \x\ for all x G R2. The
fact that rotations by a fixed angle preserve perpendicularity not only can be
seen geometrically but now follows from (b) of Theorem 6.18. Perhaps the
fact that such a transformation preserves the inner product is not so obvious:
however, we obtain this fact from (b) also. Finally, an inspection of the matrix
representation of T with respect to the standard ordered basis, wThich is
cos 0 — sin i
sin 6 cos i
reveals that T is not self-adjoint for the given restriction on 9. As we men
tioned earlier, this fact also follows from the geometric observation that T
has no eigenvectors and from Theorem 6.15 (p. 371). It is seen easily from
the preceding matrix that Tx is the rotation by —$. •
Definition. Let L be a one-dimensional subspace of R2. We may view L
as a line in the plane through the origin. A linear operator T on R2 is called
a reflection of R2 about L if T(x) = x for all x G L and T(x) — —x for all
As an example of a reflection, consider the operator defined in Example 3 of
Section 2.5.
Example 3
Let T be a reflection of R2 about a line L through the origin. We show that
T is an orthogonal operator. Select vectors v\ G L and v2 G L1- such that
\vi || — \v2\ = 1- Then T(?;i) = v\ and T(v2) — —v2. Thus v\ and v2
are eigenvectors of T with corresponding eigenvalues 1 and — 1. respectively.
Furthermore, {vi,v2} is an orthonormal basis for R2. It follows that T is an
orthogonal operator by Corollary 1 to Theorem 6.18. •
We now examine the matrices that represent unitary and orthogonal trans
formations.
Definitions. A square matrix A is called an orthogonal matrix if
A* A = AAl = I and unitary if A* A = A A* = I.

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 383
Since for a real matrix A we have A* = A1, a real unitary matrix is also
orthogonal. In this case, we call A orthogonal rather than unitary.
Note that the condition AA* = / is equivalent to the statement that the
rows of A form an orthonormal basis for Fn because
Si, = hj = {AA*)ij = J2Aik(A*)kj = Y^AikAjk,
k- k=i
and the last term represents the inner product of the ith and jfth rows of A.
A similar remark can be made about the columns of A and the condition
A* A = I.
It also follows from the definition above and from Theorem 6.10 (p. 359)
that a linear operator T on an inner product space V is unitary [orthogonal]
if and only if [T]p is unitary [orthogonal] for some orthonormal basis 0 for V.
Example 4
From Example 2, the matrix
cos 9 — sin 9
sin 9 cos 9
is clearly orthogonal. One can easily see that the rows of the matrix form
an orthonormal basis for R . Similarly, the columns of the matrix form an
orthonormal basis for R2. • v .,
Example 5
Let T be a reflection of R2 about a line L through the origin, let 0 be the
standard ordered basis for R2, and let A = [T]g. Then T = L^. Since T is
an orthogonal operator and 0 is an orthonormal basis, A is an orthogonal
matrix. We describe A.
Suppose that a is the angle from the positive x-sxis to L. Let v\ —
(cos a, sin a) and t'2 = (—sin a, cos a). Then ||i;i|| = ||v2|| = 1, v\ G L,
and v2 G L1. Hence 7 = {v\,v2} is an orthonormal basis for R2. Because
T(fi) = v\ and T(v2) = —v2, we have
[T]7 = [lA]y =
Let
Q =
cos a — sin a
sin a cos a
ly the corollary to Theorem 2.23 (p. 115),
A = Q[lA\1Q-1

384 Chap. 6 Inner Product Spaces
cos a
sin o
shm\ f\ 0
cos a J lo -I
COSQ
— sine*
sin a
COSQ
cos- a sin a j sin a coso
2sinacosa — (cos o. sin of)
I
sin 2o
We know that, for a complex normal real symmetric matrix .4. there
exists an orthonormal basis Q for F" consisting of eigenvectors of A. Hence .4
is similar to a diagonal matrix D. By the corollary to Theorem 2.23 (p. 115),
the matrix Q whose columns are the vectors in 3 is such that D — Q AC).
But since t he columns of Q are an orl lionoiinal basis for F". it follows that Q
is unitary [orthogonal . In this case, we say that A is unitarily equivalent
orthogonally equivalent, to D. It is easily seen (see Exercise 18) that this
relation is an equivalence relation on M„.„(C) [M„, „(li)\. More generally.
.4 and II are unitarily equivalent [orthogonally equivalent] if and only if there
exists a unitary [orthogonal] matrix P such thai A P' IIP.
The preceding paragraph has proved half of each of the next two theo
rems.
Theorem 6.19. Let A be a complex n x n matrix. Then A is normal if
and only if A is unitarily equivalent to a diagonal matrix.
Proof. By the preceding remarks, we need only prove that if A is unitarily
equivalent to a diagonal matrix, then A is normal.
Suppose that A — P*DP. where P is a unitary matrix and I) is a diagonal
matrix. Then
AA* = (P*DP)(P*DP)* - (P*DP){P*D*P) P*DID*P P*DD*P.
Similarly. A*A = P*D*DP. Since D is a diagonal matrix, however, we have
DD* =D*D. Thus AA* = A*A. I
Theorem 6.20. Let A be a real n x /; matrix. Then A is symmetric if
and only if A is orthogonally equivalent to a real diagonal matrix.
Proof. The proof is similar to the proof of Theorem 6.19 and is left as an
exercise. 1
Example 6
Let

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 385
Since A is symmetric, Theorem 6.20 tells us that A is orthogonally equivalent
to a diagonal matrix. We find an orthogonal matrix P and a diagonal matrix
D such that PtAP= I).
To find P. we obtain an orthonormal basis of eigenvectors. It is easy to
show that the eigenvalues of .4 are 2 and 8. The set {(-1,1,0), (-1,0, 1)}
is a basis for the eigenspace corresponding to 2. Because this set is not
orthogonal, we apply the Gram Schmidt process to obtain the orthogonal
set {(-1. 1.0). -1(1,1. -2)}. The set {(1.1.1)} is a basis for the eigenspace
corresponding to 8. Notice that (1,1,1) is orthogonal to the preceding two
vectors, as predicted by Theorem 6.15(d) (p. 371). Taking the union of these
two bases and normalizing the vectors, we obtain the following orthonormal
basis for R'5 consisting of eigenvectors of A:
4( 1,1,0),-^(1,1,-2), -Ul,l,l)).
V2 v0 v3 J
Thus one possible choice for P is
I 1
p =
0
i
2
\/6
I
v •':;
l
v/5/
'2 0 if
I 0 2 0
.0 0 8
Because of Schur's theorem (Theorem 6.14 p. 370). the next "result is
immediate. As it is the matrix form of Schur's theorem, we also refer to it as
Schur's theorem.
Theorem 6.21 (Schur). Let A C M„X„(F) be a matrix whose charac
teristic polynomial splits over F.
(a) If F = ('. then A is unitarily equivalent to a complex upper triangular
matrix.
(b) If F — R. then A is orthogonally equivalent to a real upper triangular
matrix.
Rigid Motions*
The purpose of this application is to characterize the so-called rigid mo
tions of a finite-dimensional real inner product space. One may think intu
itively of such a motion as a transformation that does not affect the shape of
a figure under ifs action, hence the term rigid. The key requirement for such
a transformation is that it preserves distances.
Definition. Let V be a real inner product space. A function f: V —* V
is called a rigid motion if
!./•(•'') -f(y)\ = \x-y

386 Chap. 6 Inner Product Spaces
for all x,y G V.
For example, any orthogonal operator on a finite-dimensional real inner
product space is a rigid motion.
Another class of rigid motions are the translations. A function g: V — V,
where V is a real inner product, space, is called a translation if there exists
a vector e„ G V such that g(x) = X + VQ for all x G V. We say that g is
the translation by VQ. It is a simple exercise to show that translations, as
well as composites of rigid motions on a real inner product space, are also
rigid motions. (See Exercise 22.) Thus an orthogonal operator on a finite-
dimensional real inner product space V followed by a translation on V is a
rigid motion on V. Remarkably, every rigid motion on V may be characterized
in this way.
Theorem 6.22. Let f: V —• V be a rigid motion on a Hnite-dimcnsional
real inner product space V. Then there exists a unique orthogonal operator
T on V and a unique translation g on V such that f = g o T.
Any orthogonal operator is a special case of this composite, in which
the translation is by 0. Any translation is also a special case, in which the
orthogonal operator is the identity operator.
Proof. Let T: V V be defined by
T{x)=f(x)-f{0)
for all x G V. We show that T is an orthogonal operator, from which it
follows that / = g o T, where g is the translation by f(0). Observe that T is
the composite of / and the translation by —f(0); hence T is a rigid motion.
Furthermore, for any x G V
||T(x)||2 = !|/(.r) -f(0)\2 = \x-0\[2 = \x\2.
and consequently ||T(x)|| = ||x|| for any x G V. Thus for any x. y G V.
||T(.r) - T(y)\2 = ||T(z)||2 - 2 (T(x).T(y)) + ||T(y)||2
2<T(aO,T(y)) + ||y|
and
\x - y\2 = ||x||2 - 2 (x,y) + \y\2.
But ||T(x) - T{y)\2 = \x - y\2: so (T(x). T(y)) = (x. y) for all x. y G V.
We are now in a position to show that T is a linear transformation. Let
x, y € V, and let a G R. Then
||T(x + ay) - T(x) - aT(y)\2 = \[T(x + ay) - T(x)] - aT(y)||2

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 387
= ||T(ar + ay) - T(x)\2 + a2\T(y)\2 - 2a (T(x + ay) - T(x), T(y)>
= (x + ay) - x\2 + a2\yf - 2a[{T(x + oy),T(y)> - (T(z),T(y))]
= a2\y\2 + a2\y[\2 - 2a[(x + ay,y) - (x,y)]
= 2a2[\y\2-2a[(x,y)+a\y\2-(x,y)]
= 0.
Thus T(x -f- ay) = T(.r) + oT(y), and hence T is linear. Since T also preserves
inner products. T is an orthogonal operator.
To prove uniqueness, suppose that un and VQ are in V and T and U are
orthogonal operators on V such that
f(x)=T(x)+u0 = U(x)+v0
for all x G V. Substituting x — 0 in the preceding equation yields WQ = Vn,
and hence the translation is unique. This equation, therefore, reduces to
T(x) = U(.r) for all x G V. and hence T -- U. I
Orthogonal Operators on R2
Because of Theorem 6.22, an understanding of rigid motions requires a
characterization of orthogonal operators. The next, result characterizes or
thogonal operators on R2. We postpone the case of orthogonal operators on
more general spaces to Section 6.1 1.
Theorem 6.23. Let. T be an orthogonal operator on R , and let A = [T]^,
where 0 is the standard ordered basis for R2. Then exactly one of the following
conditions is satisfied:
(a) T is a rotation, and det(A) = 1.
(b) T is a reflection about a line through the origin, and det(A) = —1.
Proof. Because T is an orthogonal operator, T(,tf) = {T(ei),T(e2)} is an
orthonormal basis for R2 by Theorem 6.18(c). Since T(ei) is a unit vector,
there is a unique angle 9. 0 < 9 < 2TT, such that T(ei) = (cos9,sin9). Since
T(e2) is a unit vector and is orthogonal to T(ei), there are only two possible
choices for T(c2). Either
T(e2) = (-sin0,cos0) or T(e2) = (sin0, -cos6>).
r,. ._,_,,,. „ „, ..„ . (e.os9 —sin i
first, suppose that I (e->) — -sin 9, cos 0). 1 hen ,4 —I . .,
v ~' K ' \s\\9 cosi
It follows from Example 1 of Section 6.4 that T is a rotation by the angle 9.
Also
del.(.4) = cos2 0 + sin2 9 = 1.

388
Now suppose that T(e2) = (sin0, — cos0). Then A =
Chap. 6 Inner Product Spaces
cos 9 sin 0"
sin0 — cos0
Comparing this matrix to the matrix A of Example 5, we see that T is the
reflection of R2 about a line L, so that a = 0/2 is the angle from the positive
x-axis to L. Furthermore,
det(A) = -cos20-sin20 = -l. |
Combining Theorems 6.22 and 6.23, we obtain the following characteriza
tion of rigid motions on R2.
Corollary. Any rigid motion on R2 is either a rotation followed by a trans
lation or a reflection about a line through the origin followed by a translation.
Example 7
Let
A =
2
\/5
-1
75
We showjthat LA is the reflection of R2 about a line L through the origin, and
then describe L.
Clearly AA* = A* A = I, and therefore A is an orthogonal matrix. Hence
LA is an orthogonal operator. Furthermore,
1 4
det(>4 = -- - - = -1,
5 5
and thus L.4 is a reflection of R2 about a line L through the origin by The
orem 6.23. Since L is the one-dimensional eigenspace corresponding to the
eigenvalue I of L4, it suffices to find an eigenvector of L4 corresponding to 1.
One such vector is v = (2, \/5 — 1). Thus L is the span of {?;}. Alternatively.
L is the line through the origin with slope (\/5 — l)/2, and hence is the line
with the equation
'5- 1
y =
Conic Sections
As an application of Theorem 6.20, we consider the quadratic equation
ax2 + 2bxy + ay2 + dx + ey + f = 0. (2)

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 389
For special choices of the coefficients in (2), we obtain the various conic
sections. For example, if a = c = 1, b = d = e = 0, and / = —1, we
obtain the circle x 4- y2 = 1 with center at the origin. The remaining
conic sections, namely, the ellipse, parabola, and hyperbola, are obtained
by other choices of the coefficients. If /> = 0, then it is easy to graph the
equation by the method of completing the square because the xy-term is
absent. For example, the equation x2 + 2x + y2 + Ay + 2 — 0 may be rewritten
as (x + I)2 4- (y + 2)2 = 3, which describes a circle with radius A/3 and center
at (—1,-2) in the ,r-t/-coordinate system. If we consider the transformation
of coordinates (x.y) —» (x'.y'). where x' = x + 1 and y' = y + 2, then our
equation simplifies to ' ,,.l\2
IV
'\2 — — 3. This change of variable allows us to
eliminate the x- and y-terms.
We now concentrate solely on the elimination of the xy-term. To accom
plish this, we consider the expression
ax2 + 2bxy + cy2, (3)
which is called the associated quadratic form of (2). Quadratic forms are
studied in more generality in Section 6.8.
If we let
A =
a b
b e
and X —
then (3) may be written as X'AX = (AX.X). For example, the quadratic
form 3;r2 + Axy + 6y may be written as
The fact that A is symmetric is crucial in our discussion. For, by Theo
rem 6.20, we may choose an orthogonal matrix P and a diagonal matrix D
with real diagonal entries Ai and A2 such that P'AP = D. Now define
X' =
by X' = P'X or, equivalently, by PX' = PT>fX = X. Then
XlAX = (PX')tA(PXl) = X'f(PtAP)X' = X''DX' = Xx{x')2 XiilJ
>\2
Thus the transformation (x.y) —* (x'.y') allows us to eliminate the xy-term
in (3), and hence in (2).
Furthermore, since P is orthogonal, we have by Theorem 6.23 (with T =
Lp) that det(P) = ±1. If det(P) = —1, we may interchange the columns

390 Chap. 6 Inner Product Spaces
of P to obtain a matrix Q. Because the columns of /' form an orthonormal
basis of eigenvectors of .1. the same is true of the columns of Q. Therefore,
Q'AQ-
A2 0
0 A,
Notice that de(Q) -det(P) = 1. So, if det(P) - 1. we can take Q for
our new /'; consequently, we may always choose P so that det(P) = I. By
Lemma4 to Theorem 6.22 (with T = L/-)- •' follows that matrix P represents
a rotation.
In summary, the ./•//-term in (2) may be eliminated by a rotation of the
x-axis and //-axis to new axes x' and //' given by X PX'. where P is an
orthogonal matrix and del (P) — 1. Furl hermore, the coefficients of (./•')-' and
(//')• are the eigenvalues of
A =
a b
b e
This result is a. restatement of a result known as t he principal axis theort m
for R2. The arguments above, of course, are easily extended to quadratic
(•(Illations in n variables. For example, in the case // - .3. by special choices
of the coefficients, we obtain the quadratic surfaces the elliptic cone, the
ellipsoid! the hyperbolic paraboloid, etc.
As an illustration of the preceding transformation, consider the quadratic
equation
2x2 \xy + ryy2 -36 = 0,
for which the associated quadratic form is 2x Axy I 5//2. In the notation
we have been using,
4
2 -2
-9 n
so that the eigenvalues of A are 1 and 0 with associated eigenvectors
As expected (from Theorem 6.15(d) p. 371). these vectors are orthogonal.
The corresponding orthonormal basis of eigenvectors
i <
r /2_
.Vv/57

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices
determines new axes x' and y' as in Figure 6.1. Hence if
(2 -1
391
/'
then
1 2
V \ 5 v'5 /
P'AP
2 -I
V5V1 2
0 (i
I'nder the transformation X /'A' or
2 , I ,
•'• - 7=* -7?2/
\/-> v->
v -V + 4"'-
V 5 y 5
we have the new quadratic form (x') I (it//')2. Thus the original equation
2x2 — Axy \ by2 36 may be written in the form (./•')"' M>(//)2 = 36 relative to
a new coordinate system with the x'- and //'-axes in the directions of the first
and second vectors ol 4. respectively. It is clear I hat this equation represents
Figure 6.4
an ellipse. (See Figure 6.4.) Note thai the preceding matrix /' has the form
cos 9 sin 0
sin0 cos0
where 9 = cos'1 —= ~ 2(i.(i . So /' is the matrix representation of a rotal ion
v5
of R2 through the angle 0. Thus the change of variable X PX' can be ac
complished by this rotation of the X- and //-axes. There is another possibility

392 Chap. 6 Inner Product Spaces
for P. however. II the eigenvector of A corresponding to the eigenvalue 6 is
taken to be (I. 2) instead of ( 1.2). and the eigenvalues are interchanged,
then we obtain the matrix
/J_ _2_
-2 1
V.75 75.
which is the matrix representation of a rotation through the angle 9 -
9
VO
63.4°. This possibility produces the same ellipse as the
one in figure 0. I. but interchanges the names of the x'- and //'-axes.
EXERCISES
1. Label the following statements as true or false. Assume that the under
lying inner product spaces are finite-dimensional.
(a) Every unitary operator is normal.
(b) Even- orthogonal operator is diagonalizable.
(Q) A matrix is unitary if and only if it is invertible.
(d) If two mat rices are unit arily equivalent. t hen t hey are also similar.
(e) The sum ol unitary matrices is unitary.
(f) The adjoint ol a unitary operator is unitary.
(g) If T is an orthogonal operator on V. then [T j is an orthogonal
matrix for any ordered basis / for V.
(0) If all the eigenvalues of a linear operator are I. then the operator
must be unitary or orthogonal.
(i) A linear operator may preserve the norm, but not the inner prod
uct.
2. for each of the following matrices .1. find an orthogonal or unitary
matrix /' and a diagonal matrix D such thai P"AP = D.
W .
I 2 I)
I)
3-3/
(C) 3-3, 5
/0 2 2\ 2 I I'
(d) 2 (l 2 (e) 12 1
\2 2 0/ VI I 2
3. Prove that the composite ol unitary orthogonal operators is unitary
ort hogonalb

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 393
4. For z 6 C, define T~: C —> C by T-;(</) = 2U. Characterize those z for
which Ta is normal, self-adjoint, or unitary.
5. Which of the following pairs of matrices are unitarily equivalent?
'0 f
0
0 1
1 0
(b)
0
0
and
and
and
and
6. Let V be the inner product space of complex-valued continuous func
tions on [0,1] with the inner product
</,<?)= / f(t)gjt)dt.
Let h C V. and define T: V — V by T(/) = /;,/'. Prove that T is a
unitary operator if and only if \h(t)\ — I for 0 < t < 1.
7. Prove that if T is a. unitary operator on a. finite-dimensional inner prod
uct space V. then T has a unitary square, root: that is, there exists a
unitary operator U such that T = U .
8. Let T be a self-adjoint linear operator on a finite-dimensional inner
product space. Prove that (T + /'I)(T — y'l)~' is unitary using Exercise 10
of Section 6.4.
9. Let U be a linear operator on a finite-dimensional inner product space
V. If ||U(.r)|| — ||./;|| for all x in some orthonormal basis for V. must U
be unitary? Justify your answer with a proof or a counterexample.
10. Let A be an n x /; real symmetric or complex normal matrix. Prove
that
,r(/l) = ]TXi and tr(A*A) = ^ |A,|2.
i-i
where the A,'s are the (not necessarily distinct) eigenvalues of A.

394 Chap. 6 Inner Product Spaces
11. Find an orthogonal matrix whose first row is (|, |, ^).
12. Let A be an n x n real symmetric or complex normal matrix. Prove
that
n
(let 1,1. - ]]A,.
where the A,'s are the (not necessarily distinct) eigenvalues of A.
13. Suppose that .4 and 11 are diagonalizable matrices. Prove or disprove
that A is similar to II if and only if A and 11 are unitarily equivalent.
14. Prove that if A and 11 are unitarily equivalent matrices, then A is pos
itive definite [semidefinite] if and only if II is positive definite semidef
inite]. (See t he definit ions in the exercises in Section (i. 1.)
15. Let U be a unitary operator on an inner product space V. and let W be
a finite-dimensional U-invariant subspace of V. Prove that
(a) U(W) W:
(b) W- is U-invariant.
('ontrast (b) with Exercise 16.
16. Find an example of a unitary operator U on an inner product space and
a U-invariant subspace W such thai W - is not U-invariant.
17. Prove that a matrix that is both unitary and upper triangular must be
a. diagonal mat rix.
18. Show that "is unitarily equivalent to" is an equivalence relation on
Mnxn(C).
19. Let W be a finite-dimensional subspace of an inner product space V.
By Theorem 0.7 (p. 352) and the exercises of Section 1.3. V W :AN1.
Define U : V -V by U(e, I u2) = V\ - V2, where e, G W and v2 G W .
Prove that U is a. self-adjoint unitary operator.
20. Let V be a finite-dimensional inner product space. A linear operator U
on V is called a partial isometry if there exists a subspace W of V
such that |!U(.r)|| = ||;r|| for all x C W and U(.r) 0 for all x G W-.
Observe that W need not be U-invariant. Suppose that U is such an
operator and {v\, r-> Uk} is an orthonormal basis for W. Prove the
following results.
(a) (U(:r).U(//)) = (x,y) for all x.y e W. Hint: Use Exercise 20 of
Section 6.1.
(b) {U(/']).U(r,) U(a-)} is an orthonormal basis for R(U).

mm
Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 395
(c) There exists an orthonormal basis 7 for V such that the first
k columns of [U]7 form an orthonormal set and the remaining
columns are zero.
(d) Let {wi,iu-2, • • • • Wj} be an orthonormal basis for R(U)1- and 0 =
{U(ci), U(e2),..., \J(vk), W],..., Wj}. Then 0 is an orthonormal
basis for V.
(e) Let T be the linear operator on V that satisfies T(U(t',;)) = v
(1 < i < k) and T(v)i) = 0 (1 < i < 3). Then T is well defined,
and T = U*. Hint: Show that (D(x),y) = (x,T(y)) for all x,y G 0.
There are four cases.
(f) U* is a partial isometry.
This exercise is continued in Exercise 9 of Section 6.6.
21. Let A and B be n x n matrices that, are unitarily equivalent.
(a) Prove that tr(AM) = ti(B*B).
(b) Use (a) to prove that
E I^I2 = E I%I2-
•1,3-] t,j = ]
(c) Use (b) to show- that the matrices
1 2\ . fi 4
2 i) m,d (l 1
are not unitarily equivalent.
22. Let V be a real inner product space.
(a) Prove that any translation on V is a rigid motion.
(b) Prove that the composite of any two rigid motions on V is a rigid
motion on V.
23. Prove the following variation of Theorem 6.22: If /: V —> V is a rigid
motion on a finite-dimensional real inner product space V, then there
exists a unique orthogonal operator T on V and a unique translation g
on V such that f = T o g.
24. Let T and U be orthogonal operators on R2. Use Theorem 6.23 to prove
the following results.
(a) If T and U are both reflections about lines through the origin, then
UT is a rotation.
(b) If T is a rotation and U is a reflection about a line through the
origin, then both UT and TU arc reflections about lines through
the origin.

396 Chap. 6 Inner Product Spaces
25. Suppose that T and U are reflections of R2 about the respective lines
L and L' through the origin and that 0 and v are the angles from
the positive x-axis to L and L'. respectively. By Exercise 24. UT is a
rotation. Find its angle of rotation.
26. Suppose that T and U are orthogonal operators on R such that T is
the rotation by the angle o and U is the reflection about the line L
through the origin. Let r be the angle from the positive .r-axis to L.
By Exercise 24. both UT and TU are reflections about lines l_i and L2,
respectively, through the origin.
(a) bind the angle 9 from the positive rc-axis to L|.
(b) bind the angle 0 from the positive .r-axis to L2.
27. Find new coordinates x',y' so that the following quadratic forms can
be written as Xi(x') -t- A2(j/7) .
(a) x2 + Axy + y2
(b) 2x2 + 2xy 4- 2
(c) x2 - Ylxy - Ay2
(d) 3./-2 + 2xy + 3//2
(e) x2 - 2.ry + y2
28. Consider the expression X AX, where X1 = (./•. //. z) and .4 is as defined
in Exercise 2(e). find a change of coordinates x'.y'.z' so that the
preceding expression is of the form A| (.r')2 4- X->(y')2 + X%(z')2.
29. QR-Factorization. Let wi,W2,-.. ,w„ be linearly independent vectors
in F". and let L'i.r> rn be the orthogonal vectors obtained from
W\,W2 a',, by the Gram Schmidt process. Let Ui.uo "„ be the
orthonormal basis obtained by normalizing the r,'s.
(a) Solving (1) in Section 6.2 for Wk in terms of u&, show that
A--1
ifk = I Vk\Uk + ^ (wk,
Ui)U3 [\<k< n).
(b) Let A and Q denote the n x // mat rices in which the fcth columns
are Wk and Uk, respectively. Define /? G M„>„(F) by
'•;ll ifj=*
Rjk = { (wk,Uj) if j <k
0 ifj>Ar.
Prove A = QR.
(c) Compute Q and R as in (b) for the 3x3 matrix whose columns
are the vectors (1,1.0), (2,0,1). and (2,2.1).

9^
Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 397
(d) Since Q is unitary [orthogonal] and /? is upper triangular in (b),
we have shown that every invertible matrix is the product of a uni
tary [orthogonal] matrix and an upper triangular matrix. Suppose
that A G Mnxn(F) is invertible and A = Q1Ri = Q>Il>- where
Q\-Qz £ M„x„(/'') are unitary and P\.P, G M„x„(/'') are upper
triangular. Prove that D = i?2/?j" is a unitary diagonal matrix.
Hint: Use Exercise 17.
(e) The QR factorization described in (b) provides an orthogonaliza-
tion method for solving a linear system Ax — b when A is in
vertible. Decompose A to QR. by the Gram Schmidt process or
other means, where Q is unitary and />' is upper triangular. Then
QRx — b. and hence Rx = Q'b. This last system can be easily
solved since P is upper triangular.
Use the orthogonalization method and (c) to solve the system
.r, 4- 2ar2 4- 2x3 - I
Xj • 2.r•> - I I
•''2 + X3 = -1.
30. Suppose that 3 and *) are ordered bases for an //-dimensional real [com
plex] inner product space V. Prove thai if Q is an orthogonal [unitary]
ii x ii matrix that changes 7-coordinates into ..^-coordinates, then (3 is
orthonormal if and only if' is orthonormal.
The following definition is used in Exercises 31 and 32.
Definition. Let V be a finite-dimensional complex [real] inner product.
sj)<>ce. and let u be a unit vector in V. Define the Householder operator
H„: V — V by H„ ('./•) = x - 2 (x, it) 11 for all x C V.
31. Let H„ be a Householder operator on a finite-dimensional inner product
space V. Prove the following results.
(a) H„ is linear.
(b) H„(.r) — X if and only if X is orthogonal to u.
(c) Hu(u) = -u.
(d) H* = H„ and H2 = I, and hence H„ is a unitary [orthogonal]
operator on V.
(Note: If V is a real inner product space, then in the language of Sec
tion 6.11, H„ is a reflection.)
'At one time, because of its great stability, this method for solving large sys
tems of linear equations with a computer was being advocated as a better method
than Gaussian elimination even though it requires about three times as much work.
(Later, however. J. II. Wilkinson showed that if Gaussian elimination is done "prop
erly," then it is nearly as stable as the orthogonalization method.)

398 Chap. 6 Inner Product Spaces
32. Let V be a finite-dimensional inner product space over F. Let x and y
be linearly independent vectors in V such that ||cc|| = \y\.
(a) If F — (', prove that there exists a. unit vector u in V and a complex
number 0 with |0j — 1 such that H„(.;•) — 9y. Hint: Choose 9 so
that (x.9y) is real, and set u — r. r-Ax — 9y).
\x — 9y
(b) If F = R. prove that there exists a unit vector // in V such thai
r\u(x) = y.
6.6 ORTHOGONAL PROJECTIONS
AND THE SPECTRAL THEOREM
In this sect ion. we rely heavily on Theorems 6.16 (p. 372) and 0.17 (p. 374) to
develop an elegant representation of a normal (if /•' — C) or a self-adjoint (if
F = R) operator T on a finite-dimensional inner product space. We prove that
T can be written in the form A| T[ 4- A2T2 + • • • 1 A/,T/,. where A,. X> A/,
are the distinct eigenvalues of T and T|. T_>,.... T/,. arc orthogonal project ions.
We must first develop some results about these special projections.
We assume that the reader is familiar with the results about direct sums
developed at the end of Section 5.2. The special case where V is a direct sum
of two subspaces is considered in the exercises of Section 1.3.
Recall from the exercises of Section 2.1 that if V = W] ©W2, then a linear
operator T on V is t he projection on W, along W2 if. whenever ./• — X] 4-.r2.
with x 1 G Wi and x-z C WL>. we have T(.r) = x\. By Exercise 20 of Section 2.1.
we have
R(T)=W, = beV: T(x) =x d N(T) = W->.
So V = R(T) © N(T). Thus there is no ambiguity if we refer to T as a
"projection on W|" or simply as a "projection." fn fact, it can be shown
(see Exercise 17 of Section 2.3) that T is a. projection if and only if T == T2.
Because V - W, ©W2 = W, • W:i does not imply that W2 - W;i. we see that
Wi does not uniquely determine T. For an orthogonal projection T. however,
T is uniquely determined by its range.
Definition. Let V be an inner product space, und let T: V — V be a
projection. We say that T is an orthogonal projection if R(T) = N(T)
and N(T)1 - R(T).
Note that by Exercise 13(c) of Section 6.2, if V is finite-dimensional, we
need only assume that one of the preceding conditions holds, for example, if
R(T) - N(T). then R(T) - R(T)' ' = N(T) .
Now assume that W is a. finite-dimensional subspace of an inner product
space V. In the notation of Theorem 6.6 (p. 350), we can define a function

Sec. 6.6 Orthogonal Projections and the Spectral Theorem 399
T: V — V by T(/y) — u. It is easy to show that T is an orthogonal projection
on W. We can say even more there exists exactly one orthogonal projection
on W. For if T and U are orthogonal projections on W, then R(T) — W —
R(U). Hence N(T) = R(T)J - R(U)J- = N(U), and since every projection is
uniquely determined by its range and null space, we have T — U. We call T
the orthogonal projection of V on W.
To understand the geometric difference between an arbitrary projection
on W and the orthogonal projection on W. let V = R2 and W = span{(l, 1)}.
Define U and T as in Figure 6.5. where T(v) is the foot of a. perpendicular
from v on the line y — ./• and U(ai,a2) = (<i\,a). Then T is the orthogo
nal projection of V on W. and U is a different projection on W. Note that
V - T(v) G W-L, whereas v - U(» £ W ' .
Figure: 6.5
From Figure 6.5, we see that T(v) is the "best approximation in W to y";
that is. if ir G W. then \w — v[\ > |JT(r) v\. In fact, this approximation
property characterizes T. These results follow immediately from the corollary
to Theorem 0.0 (p. 350).
As an application to Fourier analysis, recall the inner product space H and
the orthonormal set S in Example 9 of Section 6.1. Define a trigonometric
polynomial of degree // to be a function g G H of the form
g(t)
X "jfj1 E
j=-n
,ijt
where c/„ or (/ „ is nonzero.
Let / G H. We show that the best approximation to / by a trigonometric
polynomial of degree less than or equal to n is the trigonometric polynomial

r
400 Chap. 6 Inner Product Spaces
whose coefficients are the Fourier coefficients of/ relative to the orthonormal
set S. For this result, let W = span({/,: \j\ < //}), and let T be the orthogo
nal projection of H on W. The corollary to Theorem 6.6 (p. 350) tells us that
the best approximation to / by a function in W is
T(/)= £ (fjj)fj
An algebraic characterization of orthogonal projections follows in the next
theorem.
Theorem 6.24. Let V be an inner product space, and let T be a linear
operator on V. Then T is an orthogonal projection if and only if T has an
adjoint T* and T2 — T = T".
Proof. Suppose that T is an orthogonal projection. Since T2 = T because
T is a projection, we need only show that T* exists and T = T*. Now
V = R(T) © N(T) and R(T)1 N(T). Let x.y G V. Then we can write
x = X\ 4 x-2 and y = yx 4- y>. where X\.y\ G R(T) and .r2.y2 G N(T). Hence
and
(•'•-T(//)) - (,ri 4.r2.//i) = (li,yi) + {•f2-!l\} - {•''! •!!))
(T(.r).//) = (x-i.yi 4- !)•>} = (xi,yi) + (x1,y2) - <•'•]•</]} •
So (;r.T(//)) {T(x).y) for all x.y ( V; thus T* exists and T = T*.
Now suppose that T = T == T*. Then T is a projection by Exercise 17 of
Section 2.3, and hence we must show that R(T) - N(T) ' and R(T)' = N(T).
Let .r C R(T) and y G N(T). Then x = T(x) = T*(.r). and so
(x.y) = (T*(./•).//) = (x.T(y)) - (x.0) = 0.
Therefore x G N(T)1. from which it follows that R(T) C N(T)X.
Lei y G N(T)±. We must show that y G R(T). that is. T(y) = y. Now
l|y-T(»)||2 = <y-T(y),y-T(y)>
= (y,y-T(y))-(T(y),y T(y)).
Since // T(y) G N(T), the first term must equal zero. But also
(T(y),y-T(y)) = (y.T'dl T(y))) = (y,T(y - T(y))) = (y, 0) = 0.
Thus y T(y) = 0: that is. // - T(y) G R(T). Hence R(T) - N(T)~.
I sing the preceding results, we have R(T) ' N(T)1- D N(T) by Exer
cise 13(b) of Section 0.2. Now suppose that ;/• C RfT)^. For any y G V. we
have (T(.r).//) = (.r.T*(//)) = (x.J(y)) = 0. Si.T(.r) = 0. and thus. /• C N(T).
Hence R(T)- = N(T). I

Sec. 6.6 Orthogonal Projections and the Spectral Theorem 401
Let V be a finite-dimensional inner product space, W be a subspace of V,
and T be the orthogonal projection of V on W. We may choose an orthonormal
basis j3 = {v\, v<i- • • •, vn} for V such that {v\,V2,..., Vk} is a basis for W.
Then [T]/j is a diagonal matrix with ones as the first k diagonal entries and
zeros elsewhere. In fact, [T]^ has the form
h Or
Oi o3
If U is any projection on W, we may choose a basis 7 for V such that [U]7 has
the form above; however 7 is not necessarily orthonormal.
We are now ready for the principal theorem of this section.
Theorem 6.25 (The Spectral Theorem). Suppose that T is a linear
operator on a finite-dimensional inner product space V over F with the dis
tinct eigenvalues Ai, A2,..., Xk- Assume that T is normal if F = C and that
T is self-adjoint if F — R. For each i (1 < i < k), let W* be the eigenspace of
T corresponding to the eigenvalue Aj, and let T; be the orthogonal projection
of\/ on W,;. Then the following statements are true.
(a) v = w1©w2e--- e\Nk.
If W^ denotes the direct sum of the subspaces \Nj for j 7^ i, then
(c) TjTj = SijT-i for 1 < i, j < k.
I = T,+T2 + --- +Tk.
T = AJTJ + A2T2 + • • • + AfcTfc.
(b
(d)
(e)
Proof, (a) By Theorems 6.16 (p. 372) and 6.17 (p. 374), T is diagonalizablc;
so
V = Wi © w2 w*
by Theorem 5.11 (p. 278).
(b) If ar € Wi and y G Wj for some i ^ j, then (x, y) — 0 by The
orem 6.15(d) (p. 37T
From (a), we have
It follows easily from this result that W^ C W4-
dim(W^) = ^dim(Wj) = dim(V) - dim(W;).
On the other hand, we have dim(W/-) = dim(V) -dim(Wt) by Theorem 6.7(c)
(p. 352). Hence W^ = WT1, proving (b).
(c) The proof of (c) is left as an exercise.
(d) Since T* is the orthogonal projection of V on Wj, it follows from
(b) that N(Ti) = R(Tj)x = W-1 = \N[. Hence, for x € V, we have X =
%i + %2 + r- Xk, where Ti(x) = x% £ Wj, proving (d).

402 Chap. 6 Inner Product Spaces
(e) For x € V, write x = x\ + x2 -\ h xk, where Xi <E Wj. Then
T(x) = T(.r,) + T(x2) + ••• + T(xfc)
= Ai#i + \2x2 H 1- AfcZfc
= AjT.Ox) + A2T2(:r) + • • • + AfcTfc(z)
-(AiTi + AaTa + "- + AfcTfc)(a;). I
The set {Ai, A2,..., A^} of eigenvalues of T is called the spectrum of T,
the sum I = Ti +T2 + • • • +T*. in (d) is called the resolution of the identity
operator induced by T, and the sum T = A1T1 + A2T2 + • • • + AfcTfc in (e)
is called the spectral decomposition of T. The spectral decomposition of
T is unique up to the order of its eigenvalues.
With the preceding notation, let 0 be the union of orthonormal bases of
the Wj's and let raj = dim(Wj). (Thus raj is the multiplicity of Aj.) Then
[T]^ has the form
(X\Imi
o
o
Xol

0
2*m2
0
0
o
Xklmk)
that is, \T]f) is a diagonal matrix in which the diagonal entries are the eigen
values Aj of T, and each Aj is repeated raj times. If A1T1 + A2T2 + • • • + A^Tfc
is the spectral decomposition of T, then it follows (from Exercise 7) that
fl(T) = g(Xi)T\ + #(A2)T2 + • • • 4- g{Xk)Tk for any polynomial g. This fact is
used below.
We now list several interesting corollaries of the spectral theorem; many
more results are found in the exercises. For what follows, we assume that T
is a linear operator on a finite-dimensional inner product space V over F.
Corollary 1. If F = C, then T is normal if and only if T* = g(J) for
some polynomial g.
Proof. Suppose first that T is normal. Let T = A1T1 + A2T2 + • • • + A^T^
be the spectral decomposition of T. Taking the adjoint of both sides of the
preceding equation, we have T* = A1T1 + A2T2 + • • • + AfcT& since each Tj is
self-adjoint. Using the Lagrange interpolation formula (see page 52), we may
choose a polynomial g such that g(X{) = Aj for 1 < i < k. Then
9(V=g(X1)T1+g(X2)T2 + • • • + .9(Afc)Tfc = A1T1 + A2T2 + ••• + AfcTfc=T*.
Conversely, if T* = <?(T) for some polynomial g, then T commutes with
T* since T commutes with every polynomial in T. So T is normal. 1

Sec. 6.6 Orthogonal Projections and the Spectral Theorem 403
Corollary 2. If F = C, then T is unitary if and only if T is normal and
|A| = 1 for every eigenvalue A ofT.
Proof. If T is unitary, then T is normal and every eigenvalue of T has
absolute value 1 by Corollary 2 to Theorem 6.18 (p. 382).
Let T = Ai Tx + A2T2 -I- • —f- XkTk be the spectral decomposition of T. If
|A| = 1 for every eigenvalue A of T, then by (c) of the spectral theorem,
TT* = (AiTi + A2T2 + • • • + AfcTfcXAiTi + A2T2
= |A1|2T1 + |A2|2T2 + --- + |Afc|2Tfc
= Tj + T2 + • • • + Tfc
A*Tfc)
Hence T is unitary. 1
Corollary 3. If F = C and T is normal, then T is self-adjoint if and
only if every eigenvalue of T is real.
Proof. Let T = AiTi + A2T2 -f- h A^T*. be the spectral decomposition
of T. Suppose that every eigenvalue of T is real. Then
T* = AiTi + A2T2 + - • • + AfcTfc = A,Ti + A2T2 + • • • + XkTk = T.
The converse has been proved in the lemma to Theorem 6.17 (p. 374).
Corollary 4. Let T he as in the spectral theorem with spectral decom
position T = AiTi + A2T2 + • • • + AfcTfc. Then each Tj is a polynomial in
T.
Proof Choose a polynomial gj (1 < j < k) such that #?(Aj) = 6^. Then
9j(V = Pi(Ai)Ti + gj{X2)T2 + ••• + gj{Xk)Tk
— 5\jT\ + S2jT2 -\ h SkjTk = Tj. I
EXERCISES
1. Label the following statements as true or false. Assume that the under
lying inner product spaces are finite-dimensional.
(a) All projections are self-adjoint.
(b) An orthogonal projection is uniquely determined by its range.
(c) Every self-adjoint operator is a linear combination of orthogonal
projections.

404 Chap. 6 Inner Product Spaces
(d) If T is a projection on W, then T(x) is the vector in W that is
closest to x.
(e) Every orthogonal projection is a unitary operator.
2. Let V = R2, W = span({(l,2)}), and /? be the standard ordered basis
for V. Compute [T]^, where T is the orthogonal projection of V on W.
Do the same for V = R3 and W = span({(l, 0,1)}).
3. For each of the matrices A in Exercise 2 of Section 6.5:
(1) Verify that LA possesses a spectral decomposition.
(2) For each eigenvalue of LA, explicitly define the orthogonal projec
tion on the corresponding eigenspace.
(3) Verify your results using the spectral theorem.
4. Let W be a finite-dimensional subspace of an inner product space V.
Show that if T is the orthogonal projection of V on W, then I — T is the
orthogonal projection of V on W-1.
5. Let T be a linear operator on a finite-dimensional inner product space
V.
(a) If T is an orthogonal projection, prove that ||T(a:)|| < ||rr|| for all
x e V. Give an example of a projection for which this inequality
does not hold. What can be concluded about a projection for
which the inequality is actually an equality for all x 6 V?
(b) Suppose that T is a projection such that ||T(.x)|| < ||x|| for x G V.
Prove that T is an orthogonal projection.
6. Let T be a normal operator on a finite-dimensional inner product space.
Prove that if T is a projection, then T is also an orthogonal projection.
7. Let T be a normal operator on a finite-dimensional complex inner prod
uct space V. Use the spectral decomposition AiTi + A2T2 + • —h XkTk
of T to prove the following results.
(a) If g is a polynomial, then
(b)
(c)
(d)
(e)
(f)
SCO = £>(Ai)T<
z=i
If T" = To for some n, then T = To-
Let U be a linear operator on V. Then U commutes with T if and
only if U commutes with each Tj.
There exists a normal operator U on V such that U2 = T.
T is invertible if and only if Aj ^ 0 for 1 < i < k.
T is a projection if and only if every eigenvalue of T is 1 or 0.

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse
(g) T = —T* if and only if every Aj is an imaginary number.
405
8. Use Corollary 1 of the spectral theorem to show that if T is a normal
operator on a complex finite-dimensional inner product space and U is
a linear operator that commutes with T, then U commutes with T*.
9. Referring to Exercise 20 of Section 6.5, prove the following facts about
a partial isometry U.
(a) U*U is an orthogonal projection on W.
(b) UU*U = U.
10. Simultaneous diagonalization. Let U and T be normal operators on a
finite-dimensional complex inner product space V such that TU = UT.
Prove that there exists an orthonormal basis for V consisting of vectors
that are eigenvectors of both T and U. Hint: Use the hint of Exercise 14
of Section 6.4 along with Exercise 8.
11. Prove (c) of the spectral theorem.
6.7* THE SINGULAR VALUE DECOMPOSITION
AND THE PSEUDOINVERSE
In Section 6.4, we characterized normal operators on complex spaces and self-
adjoint operators on real spaces in terms of orthonormal bases of eigenvectors
and their corresponding eigenvalues (Theorems 6.16, p. 372, and 6.17, p. 374).
In this section, we establish a comparable theorem whose scope is the entire
class of linear transformations on both complex and real finite-dimensional
inner product spaces—the singular value theorem for linear transformations
(Theorem 6.26). There are similarities and differences among these theorems.
All rely on the use of orthonormal bases and numerical invariants. However,
because of its general scope, the singular value theorem is concerned with
two (usually distinct) inner product spaces and with two (usually distinct)
orthonormal bases. If the two spaces and the two bases are identical, then the
transformation would, in fact, be a normal or self-adjoint operator. Another
difference is that the numerical invariants in the singular value theorem, the
singular values, are nonnegative, in contrast to their counterparts, the eigen
values, for which there is no such restriction. This property is necessary to
guarantee the uniqueness of singular values.
The singular value theorem encompasses both real and complex spaces.
For brevity, in this section we use the terms unitary operator and unitary-
matrix to include orthogonal operators and orthogonal matrices in the context
of real spaces. Thus any operator T for which (T(x),T(y)) = {x,y), or any
matrix A for which (Ax, Ay) = (x, y), for all x and y is called unitary for the
purposes of this section.

406 Chap. 6 Inner Product Spaces
In Exercise 15 of Section 6.3, the definition of the adjoint of an operator
is extended to any linear transformation T: V —• W, where V and W arc
finite-dimensional inner product spaces. By this exercise, the adjoint T* of
T is a linear transformation from W to V and [T*]^ = ([T]!)*, where 3 and
7 are orthonormal bases for V and W, respectively. Furthermore, the linear
operator T*T on V is positive semidefinite and rank(T*T) — rank(T) by
Exercise 18 of Section 6.4.
With these facts in mind, we begin with the principal result.
Theorem 6.26 (Singular Value Theorem for Linear Transforma
tions). Let V and W be finite-dimensional inner product spaces, and let
T: V —> W be a linear transformation of rank r. Then there exist orthonormal
bases {v\,v2,... ,vn} for V and {u\,u2,... ,um} for W and positive scalars
o~\ > <r2 > • • • > or such that
T{Vi) =
crjUj if 1 r.
(4)
Conversely, suppose that the preceding conditions are satisfied. Then for
1 r. Therefore the scalars o\_,o2,... ,o~r arc uniquely
determined by T.
Proof. We first establish the existence of the bases and scalars. By Ex
ercises 18 of Section 6.4 and 15(d) of Section 6.3, T*T is a positive semidef
inite linear operator of rank r on V; hence there is an orthonormal basis
{v\, v2, • • •, v„} for V consisting of eigenvectors of T*T with corresponding
eigenvalues Aj, where Ai > A2 > • • • > Ar > 0, and Aj = 0 for i > r. For
1 < i < r, define <Tj = y/Xi and Ui = —T(vA We show that {u{, u2,..., ur}
is an orthonormal subset of W. Suppose I <i,j < r. Then
(ui,Uj) = (- T(vi),- T(Vj))
\Oi Oj
(JiO
o.o
(Vi,Vj)
I"}
= Sij,

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 407
and hence {ui,u2,... ,ur} is orthonormal. By Theorem 6.7(a) (p. 352), this
set extends to an orthonormal basis {u\, u2,..., ur,..., um} for W. Clearly
T(vi) = GiUi if 1 r, then T*T(^j) = 0, and so T(vi) = 0 by
Exercise 15(d) of Section 6.3.
To establish uniqueness, suppose that {v\,v2,... ,vn}, {u\,u2,...,um},
and o~\ > a2 > • • • > or > 0 satisfy the properties stated in the first part of
the theorem. Then for 1 < i < m and 1 < j < n,
(T*(ui),Vj) = (ui,T(vj))
ai if i — j < r
0 otherwise,
and hence for any 1 r. Therefore each Vi is an eigenvector of
T*T with corresponding eigenvalue of if i < r and 0 if i > r. I
Definition. The unique scalars o\,o2,...,or in Theorem 6.26 are called
the singular values of T. If r is less than both m and n, then the term
singular value is extended to include or+i = • • • = ok = 0, where k is the
minimum of m and n.
Although the singular values of a linear transformation T are uniquely de
termined by T, the orthonormal bases given in the statement of Theorem 6.26
are not uniquely determined because there is more than one orthonormal basis
of eigenvectors of T*T.
In view of (5), the singular values of a linear transformation T: V —* W
and its adjoint T* are identical. Furthermore, the orthonormal bases for V
and W given in Theorem 6.26 are simply reversed for T*.
Example 1
Let P2(R) and Pi(-R) be the polynomial spaces with inner products defined
by
(f(x),g(x))= / f{t)g{t)dt.
-l

408 Chap. 6 Inner Product Spaces
Let T: P2(/?) —• P(P) be the linear transformation defined by T(f(x)) =
f'{x). Find orthonormal bases li = {v\, v2,v3] for P2(R) and 7 = {u\, u2} for
Pi(H) such that T(i',) = OjUi for i = 1.2 and T(?;3) = 0, where o\ > a2 > 0
are the nonzero singular values of T.
To facilitate the computations, we translate this problem into the corre
sponding problem for a matrix representation of T. Caution is advised here
because not any matrix representation will do. Since the adjoint is defined
in terms of inner products, we must use a matrix representation constructed
from orthonormal bases for P2(li) and P(R) to guarantee that the adjoint
of the matrix representation of T is the same as the matrix representation of
the adjoint of T. (See Exercise 15 of Section 6.3.) For this purpose, we use
the results of Exercise 21(a) of Section 6.2 to obtain orthonormal bases
for P2(7?) and P^ft), respectively.
Let
A = m; =
0 \/3 0
0 0 v7!^
Then
which has eigenvalues (listed in descending order of size) Ai = 15, A2 = 3,
and A3 = 0. These eigenvalues correspond, respectively, to the orthonormal
eigenvectors e3 = (0,0,1), e2 = (0,1,0), and c, = (1,0,0) in R3. Translating
everything into the context of T. P2(i?). and Pi(/?), let
and V-A — —=..
v/2
ui = \/g(3^-l),
Then (3 — {^1,^2,^3} is an orthonormal basis for P2(R) consisting of eigen
vectors of T*T with corresponding eigenvalues Ai. A2. and A3. Now set
°~i — \A7 = N/15 and a2 =• \[X2 — v3, the nonzero singular values of T,
and take
Mi = — T(i'i) = \ - x and u2 = —T{v2) = -7=,
o~\ V 2 cr2 s/2
to obtain the required basis 7 = {ui,u2} for Pi(/?).

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 409
We can use singular values to describe how a figure is distorted by a linear
transformation. This is illustrated in the next example.
Example 2
Let T be an invertible linear operator on R2 and S'={a:GR2:||3;|| = l}, the
unit circle in R2. We apply Theorem 6.26 to describe 5' = T(5).
Since T is invertible, it has rank equal to 2 and hence has singular values
o\ > <r2 > 0. Let {vi,v2} and (3 = {ui,u2} be orthonormal bases for R2 so
that T(i>i) = o~iUi and T(f2) = a2u2, as in Theorem 6.26. Then /3 determines
a coordinate system, which we shall call the aj'^/'-coordinate system for R ,
where the x'-axis contains u\ and the ?/-axis contains u2. For any vector
u € R2, if u = x\u\ + x'2u2, then [u]^ = ( ,
X;
is the coordinate vector of u
relative to 0. We characterize S' in terms of an equation relating x[ and x2.
For any vector v = xi^i + x2v2 G R2, the equation u = T(v) means that
u = T(a;1vi + x2v2) = xiT(t;i) -f- :r2T('<j2) = XiOi'ti\ + x2o2u2.
Thus for u = x'xu\ + x'2u2, we have x\ = x\a\ and x'2 = x2a2. Furthermore,
u € S' if and only if v € S if and only if
i^=xl+x22 = l.
If o\ = o2, this is the equation of a circle of radius o\, and if a\ > o2, this is
the equation of an ellipse with major axis and minor axis oriented along the
rc'-axis and the y'-axis, respectively. (See Figure 6.6.) •
T
V — X\V\ + X1V2
U = X1U1 + X2U2
Figure 6.6

410 Chap. 6 Inner Product Spaces
The singular value theorem for linear transformations is useful in its ma
trix form because we can perform numerical computations on matrices. We
begin with the definition of the singular values of a matrix.
Definition. Let A be an m x n matrix. We define the singular values
of A to be the singular values of the linear transformation LA •
Theorem 6.27 (Singular Value Decomposition Theorem for Ma
trices). Let A be an m x n matrix of rank r with the positive singular
values o\ > o2 > • • • > ay, and let E be the ra x n matrix defined by
Sij —
Oi ifi = j < r
0 otherwise.
Then there exists an m x ra unitary matrix U and annxn unitary matrix
V such that
A = UXV*.
Proof. Let T = L^: Fre -> Fm. By Theorem 6.26, there exist orthonormal
bases f3 — {v\, v2,..., vn} for Fn and 7 = {u\,u2,...,um} for Fm such that
T(vi) — OiUi for 1 r. Let U be the ra x ra
matrix whose jth column is Uj for all j, and let V be the nxn matrix whose
jth. column is Vj for all j. Note that both U and V are unitary matrices.
By Theorem 2.13(a) (p. 90), the jth column of AV is AVJ = Ojiij. Observe
that the jth column of E is CTJCJ, where ej is the jth standard vector of Fm.
So by Theorem 2.13(a) and (b), the jth column of UT. is given by
U{o-3e3) = o~3U{e3) = OjUj.
It follows that AV and UT, are ra x n matrices whose corresponding columns
are equal, and hence AV = UE. Therefore A = AW* = UEV*. I
Definition. Let A be an ra x n matrix of rank r with positive singular
values o\ > o2 > • • • > or. A factorization A = U"EV* where U and V are
unitary matrices and E is the m x n matrix defined as in Theorem 6.27 is
called a singular value decomposition of A.
In the proof of Theorem 6.27, the columns of V are the vectors in (3, and
the columns of U are the vectors in 7. Furthermore, the nonzero singular
values of A are the same as those of L^; hence they are the square roots of
the nonzero eigenvalues of A*A or of AA*. (See Exercise 9.)

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse
Example 3
We find a singular value decomposition for A =
First observe that for
411
1 -1
1 -1
>'\ =
1
73 *-£i-j
and U3 =
_1_
7e
the set 0 = {^1,^2,^3} is an orthonormal basis for R3 consisting of eigen
vectors of A* A with corresponding eigenvalues Ai = 6, and A2 = A3 = 0.
Consequently, o\ = \/6 is the only nonzero singular value of A. Hence, as in
the proof of Theorem 6.27, we let V be the matrix whose columns are the
vectors in 0. Then
E =
y/Q 0 0
0 0 0
and V =
/1 J_ 1
V^ \/2 V^
1 -1 1
</Z V2 V6
\V3
Also, as in Theorem 6.27, we take
1 . / \ 1 A 1
«i = —LA (Mi) = —Av\ = —fc
o~i Oi V2
Next choose u2 = —j= I J, a unit vector orthogonal to Wi, to obtain the
orthonormal basis 7 = {«I,M2} for R2, and set
U =
1 1 — X I
\y/2 y/2/
Then A — UT.V* is the desired singular value decomposition. •
The Polar Decomposition of a Square Matrix
A singular value decomposition of a matrix can be used to factor a square
matrix in a manner analogous to the factoring of a complex number as the
product of a complex number of length 1 and a nonnegative number. In the
case of matrices, the complex number of length 1 is replaced by a unitary
matrix, and the nonnegative number is replaced by a positive semidefinite
matrix.
Theorem 6.28 (Polar Decomposition). For any square matrix A,
there exists a unitary matrix W and a positive semidefinite matrix P such
that
A = WP.

412 Chap. 6 Inner Product Spaces
Furthermore, if A is invertible, then the representation is unique.
Proof. By Theorem 6.27, there exist unitary matrices U and V and a
diagonal matrix E with nonnegative diagonal entries such that A = t/EV*.
So
A = UYV* = UV*VY,V* = WP,
where W = UV* and P = VEV*. Since W is the product of unitary matrices,
W is unitary, and since E is positive semidefinite and P is unitarily equivalent
to E, P is positive semidefinite by Exercise 14 of Section 6.5.
Now suppose that A is invertible and factors as the products
A = WP = ZQ.
where W and Z are unitary and P and Q are positive semidefinite. Since A
is invertible, it follows that P and Q are positive definite and invertible, and
therefore Z*W = QP~X. Thus QP~X is unitary, and so
/ = (QF-'YiQp-1) = P-1Q2P~].
Hence P2 = Q2. Since both P and Q are positive definite', it follows that
P = Q by Exercise 17 of Section 6.4. Therefore W = Z, and consequently
the factorization is unique. I
The factorization of a square matrix A as WP where W is unitary and P
is positive semidefinite, is called a polar decomposition of A.
Example 4
To find the polar decomposition of A =
11 -5
-2 10
, we begin by finding a sin
gular value decomposition WEV* of A. The object is to find an orthonormal
basis 0 for R2 consisting of eigenvectors of A* A. It can be shown that
* = 71 (-1 "2 = 7*2 (I
arc orthonormal eigenvectors of A* A with corresponding eigenvalues Ai — 200
and A2 = 50. So 0 — {v\,v2} is an appropriate basis. Thus o\ = v200 =
10v2 and o2 = VoO = 5\/2 are the singular values of A. So we have
V = E =
/l0\/2 0
5V2
Next, we find the columns u\ and u2 of U:
Mi = —Av\ = - I „ ) and
<ri 5 V-
u2 = —Av2
02

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse
Thus
U =
Therefore, in the notation of Theorem 6.28, we have
413
W = UV* =
and
P = VZV* =
The Pseudoinverse
Let V and W be finite-dimensional inner product spaces over the same
field, and let T: V —• W be a linear transformation. It is desirable to have a
linear transformation from W to V that captures some of the essence of an
inverse of T even if T is not invertible. A simple approach to this problem
is to focus on the "part" of T that is invertible, namely, the restriction of
T to N(T)"1. Let L: N(T)1- —• R(T) be the linear transformation defined by
L(x) = T(x) for all x G N(T)-1-. Then L is invertible, and we can use the
inverse of L to construct a linear transformation from W to V that salvages
some of the benefits of an inverse of T.
Definition. Let V and W be finite-dimensional inner product spaces
over the same field, and let T: V —* W be a linear transformation. Let
L: N(T)-1 —• R(T) be the linear transformation defined by L(x) = T(x) for all
x G N(T)-1. Tlie pseudoinverse (or Moore-Penrose generalized inverse) of
T, denoted by T', is defined as the unique linear transformation from W to
V such that
Tf(M) =
L-HM) foryeR(T)
0 foryeRiT)2-.
The pseudoinverse of a linear transformation T on a finite-dimensional
inner product space exists even if T is not invertible. Furthermore, if T
is invertible, then T* = T_1 because N(T)-1 = V, and L (as just defined)
coincides with T.
As an extreme example, consider the zero transformation To: V —> W
between two finite-dimensional inner product spaces V and W. Then R(To) =
{0}, and therefore T* is the zero transformation from W to V.

414 Chap. 6 Inner Product Spaces
We can use the singular value theorem to describe the pseudoinverse of a
linear transformation. Suppose that V and W are finite-dimensional vector
spaces and T: V —> W is a linear transformation or rank r. Let {v\,v2,..., vn}
and {u\,u2,... ,um} be orthonormal bases for V and W, respectively, and let
oi > o2 > • • • > or be the nonzero singular values of T satisfying (4) in Theo
rem 6.26. Then {v\,v2,..., ty} is a basis for N(T)-1-, {vr+i,vr+2,..., vn} is a
basis for N(T), {u\,u2,... ,ur] is a basis for R(T), and {ur+\,ur+2,... ,um} is
a basis for R(T)X. Let L be the restriction of T to N(T)-1, as in the definition
of pseudoinverse. Then L_1(UJ) — —Vi for 1 < i < r. Therefore
Oi
I —Vi if 1 < i < r
Tf(Wj) = I Oi
\ 0 if r < i < ra.
(6)
Example 5
Let T: P2(R) —* Pi(-R) be the linear transformation defined by T(f(x)) —
f'{x), as in Example 1. Let 0 = {^1,^,^3} and 7 = {ui,u2} be the or
thonormal bases for P2(R) and P(R) in Example 1. Then a\ = \/l5 and
a2 = v3 are the nonzero singular values of T. It follows that
Tt(y|,)=Tt(Wi) = iVl^^ 1),
and hence
Tt(*) = i(3z2-1).
Similarly, T*(l) = x. Thus, for any polynomial a + bx G P(R),
Tf(a + bx) = aV(l) + bV{x) = ax + ^{3x2 - 1). •
The Pseudoinverse of a Matrix
Let A be an ra x n matrix. Then there exists a unique n x ra matrix B
such that (Lyi)t: FTO —> Fn is equal to the left-multiplication transformation
LB- We call B the pseudoinverse of A and denote it by B — A^. Thus
(U)+ = LAt.
Let A be an m x n matrix of rank r. The pseudoinverse of A can be
computed with the aid of a singular value decomposition A = UY.V*. Let
0 and 7 be the ordered bases whose vectors are the columns of V and U,

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 415
respectively, and let o\ > o2 > • • • > ay be the nonzero singular values of
A. Then 0 and 7 are orthonormal bases for F" and Fm, respectively, and (4)
and (6) are satisfied for T = L^. Reversing the roles of 0 and 7 in the proof
of Theorem 6.27, we obtain the following result.
Theorem 6.29. Let A be anmxn matrix of rank r with a singular value
decomposition A = UT.V* and nonzero singular values o\ > o2 > • • • > ay.
Let E* be the n x ra matrix defined by
, I — if i = j < r
\ 0 otherwise.
Then A^ — VT^U*, and this is a singular value decomposition of A^.
Notice that E^ as defined in Theorem 6.29 is actually the pseudoinverse
of E.
Example 6
We find A^ for the matrix A —
Since A is the matrix of Example 3, we can use the singular value decom
position obtained in that example:
A = = UZV* =
/1
f
\72
1
V2
-1
v/2
'v^ 0 0
0 0 0
fa
1
73
&
1
v/2
-1
V2
0
V6)
1
_2_ ,
v/6/
By Theorem 6.29, we have
A* = VtfU* =
&
1
V3
-1
1
V2
-1
V2
0
3s)
1
V§
_2_
7e/
fa °
v <J 1
0 0
1° °y
/1
( V2
[jL
\V2
1
V2
-1
V2
1 /
7
6 I
' 1
1
-1
1
1
-1
Notice that the linear transformation T of Example 5 is L^, where A is
the matrix of Example 6, and that T* = L^t •
The Pseudoinverse and Systems of Linear Equations
Let A be an ra x n matrix with entries in F. Then for any b G Fm, the
matrix equation Ax — b is a system of linear equations, and so it either has no
solutions, a unique solution, or infinitely many solutions. We know that the

416 Chap. 6 Inner Product Spaces
system has a unique solution for every /> G Fm if and only if A is invertible.
in which case the solution is given by A~xb. Furthermore, if A is invertible.
then A~l — A*, and so the solution can be written as x = A^b. If, on the
other hand, A is not invertible or the system Ax = bis inconsistent, then A^b
still exists. We therefore pose the following question: In general, how is the
vector ,4*6 related to the system of linear equations Ax = 6?
In order to answer this question, we need the following lemma.
Lemma. Let V and W be finite-dimensional inner product .spaces, and let
T: V '-> W be linear. Then
(a) T'T is the orthogonal projection of V on N(T)"1.
(b) TT' is the orthogonal projection of W on R(T).
Proof. As in the earlier discussion, we define L: N(T)-1- —> W by L(x) —
T{x) for all x G N(T)'. If x G N(T)X, then T+T(a;) = L"1L(a:) = x, and if
x G N(T), then TtT(./;) = T+(0) = 0. Consequently VT is the orthogonal
projection of V on N(T)X. This proves (a).
The proof of (b) is similar and is left as an exercise. I
Theorem 6.30. Consider the system of linear equations Ax = b. where
A is an in x // matrix and b G F'". If z = A^b. then z has the following
properties.
(a) If Ax = b is consistent, then z is the unique solution to the system
having minimum norm. That is, z is a solution to the system, and if y
is any solution to the system, then \z\ < \y\ with equality if and only
if z = y.
(b) If Ax = b is inconsistent, then z is the unique best approximation to a
solution having minimum norm. That is, \Az — b\ < \Ay — b\ for any
i) G F". with equality if and only if Az = Ay. Furthermore, if Az = Ay,
then ||z|| < || w|| with equality if and only if z — y.
Proof. For convenience, let T = L..\.
(a) Suppose that Ax = b is consistent, and let z = A^b. Observe that
b G R(T), and therefore Az = AA*b = TTf(/;) = 6 by part (b) of the lemma.
Thus z is a solution to the system. Now suppose that y is any solution to the
system. Then
T+T(w) = A]Ay = AU) = z,
and hence z is the orthogonal projection of y on N(T)"1" by part (a) of the
lemma. Therefore, by the corollary to Theorem 6.6 (p. 350), we have that
ll^ll 5= ||M|| with equality if and only if z = y.
(b) Suppose that Ax = b is inconsistent. By the lemma. Az = AA^b =
TT'(6) = b is the orthogonal projection of b on R(T); therefore, by the corol
lary to Theorem 6.6 (p. 350), Az is the vector in R(T) nearest b. That is, if
• i

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 417
Ay is any other vector in R(T), then \Az — b\ < \Ay — b\ with equality if
and only if Az = Ay.
Finally, suppose that y is any vector in Fn such that Az — Ay = c. Then
A*c = A^Az = A^AA^b = A^b = z
by Exercise 23; hence we may apply part (a) of this theorem to the system
Ax = c to conclude that ||z|| < ||y|| with equality if and only if z = y. ii
Note that the vector z = A'b in Theorem 6.30 is the vector x'o described
in Theorem 6.12 that arises in the least squares application on pages 360 364.
Example 7
Consider the linear systems
X\ -I- x2 - X3 = 1
x\ + x2 - X3 = 1
and
Xi + X2 - X3 = 1
x\ + x2 - x3 = 2.
The first system has infinitely many solutions. Let A =
'1
coefficient matrix of the system, and let b =
1 1 -1
1 1 -1
By Example 6,
the
and therefore
D
-1 -lj
is the solution of minimal norm by Theorem 6.30(a).
The second system is obviously inconsistent. Let b =
though
Thus, al-
z = A^b = -
6
1
1
-1
is not a solution to the second system, it is the "best approximation" to a
solution having minimum norm, as described in Theorem 6.30(b). •

418 Chap. 6 Inner Product Spaces
1.
2.
EXERCISES
Label the following statements as true or false.
(a) The singular values of any linear operator on a finite-dimensional
vector space are also eigenvalues of the operator.
(b) The singular values of any matrix A are the eigenvalues of A* A.
(c) For any matrix A and any scalar c, if a is a singular value of A,
then |c|a is a singular value of cA.
(d) The singular values of any linear operator are nonnegative.
(e) If A is an eigenvalue of a self-adjoint matrix A, then A is a singular
value of A.
(f) For any rnxn matrix A and any b G Fn, the vector A*b is a solution
to Ax = b.
(g) The pseudoinverse of any linear operator exists even if the operator
is not invertible.
Let T: V —• W be a linear transformation of rank r, where V and W
are finite-dimensional inner product spaces. In each of the following,
find orthonormal bases {vi,v2,..., vn} for V and {u\, u2, • • •, um} for
W, and the nonzero singular values ai > a2 > • • • > ar of T such that
T(vi) = OiUi for 1 < i < r.
(a) T: R2 —+ R3 defined by T{x\,x2) = (x\, X\ + x2, X\ — x2)
(b) T: P2(R) -• Pi(R), where T(f(x)) = f"(x), and the inner prod
ucts are defined as in Example 1
(c) Let V = W = span({l, sinx, cosx}) with the inner product defined
by (/, g) = C f(t)g(t) dt, and T is defined by T(/) = f + 2/
(d) T: C2 -> C2 defined by T(z1,z2) = ((1 - i)z2, (1 + i)zx + z2)
3. Find a singular value decomposition for each of the following matrices.
(a)
(d)
(b)
(e)
1 0 1
1 0 -1
1-M 1
1 — i —i
(c)
(0
/I
0
1
V
i
I
0
V
l
I
l
I
0
-1
1 1
-2 1
1 1
4. Find a polar decomposition for each of the following matrices.
(a) 0 -i) (b> (1 j {
5. Find an explicit formula for each of the following expressions.
•M

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 419
(a) T+ (x\, x2, X3), where T is the linear transformation of Exercise 2(a)
(b) T+(a + bx + ex2), where T is the linear transformation of Exer
cise 2(b)
(c) T+(a + 6 sin a: + coos re), where T is the linear transformation of
Exercise 2(c)
(d) T+iz\,z2), where T is the linear transformation of Exercise 2(d)
6. Use the results of Exercise 3 to find the pseudoinverse of each of the
following matrices.
7.
(a)
(d)
(b)
(e)
1 0
1 0
1-H
1-i
1
1N
(c)
(f)
/l
0
1
V
i
I
0
ly
I
I
I
I
0
-l
I I
-2 1
1 1
For each of the given linear transformations T: V —» W,
(i) Describe the subspace Zi of V such that T*T is the orthogonal
projection of V on Z\.
(ii) Describe the subspace Z2 of W such that TT* is the orthogonal
projection of W on Z2.
(a) T is the linear transformation of Exercise 2(a)
(b) T is the linear transformation of Exercise 2(b)
(c) T is the linear transformation of Exercise 2(c)
(d) T is the linear transformation of Exercise 2(d)
8. For each of the given systems of linear equations,
(i) If the system is consistent, find the unique solution having mini
mum norm.
(ii) If the system is inconsistent, find the "best approximation to a
solution" having minimum norm, as described in Theorem 6.30(b).
(Use your answers to parts (a) and (f) of Exercise 6.)
X\ + x2 = 1 X\ + x2 + xs -\- x4 — 2
(a) x\ + x2 = 2 (b) Xi - 2x3 + x4 = -1
—X\ -\ X2 = 0 X\ — X2 + .T3 + £4 = 2
9. Let V and W be finite-dimensional inner product spaces over F, and sup
pose that {v\,v2,... ,vn} and {u\,u2,...,um} are orthonormal bases
for V and W, respectively. Let T: V —• W is a linear transformation of
rank r, and suppose that ai > a2 > • • • > ar > 0 arc such that
T(Mj) =
OjUi if 1 < i < r
0 if r < i.

420 Chap. 6 Inner Product Spaces
(a) Prove that {u\,u2,... ,um} is a set of eigenvectors of TT* with
corresponding eigenvalues Ai, A2,..., Am, where
A, =
(b) Let A be an ra x n matrix with real or complex entries. Prove that
the nonzero singular values of A are the positive square roots of
the nonzero eigenvalues of AA*, including repetitions.
(c) Prove that TT* and T*T have the same nonzero eigenvalues, in
cluding repetitions.
(d) State and prove a result for matrices analogous to (c).
10. Use Exercise 8 of Section 2.5 to obtain another proof of Theorem 6.27,
the singular value decomposition theorem for matrices.
11. This exercise relates the singular values of a well-behaved linear operator
or matrix to its eigenvalues.
(a) Let T be a normal linear operator on an ?i-dimensional inner prod
uct space with eigenvalues Ai, A2,..., An. Prove that the singular
values of T are |Ai|, |A2|,..., |An|.
(b) State and prove a result for matrices analogous to (a).
12. Let A be a normal matrix with an orthonormal basis of eigenvectors
0 — {v\,v2,... ,vn} and corresponding eigenvalues Ai, A2,..., An. Let
V be the n x n matrix whose columns are the vectors in 0. Prove that
for each i there is a scalar 0j of absolute value 1 such that if U is the
n x n matrix with 9iVt as column i and E is the diagonal matrix such
that Ejj = |Aj| for each i, then UT,V* is a singular value decomposition
of A
13. Prove that if A is a positive semidefinite matrix, then the singular values
of A are the same as the eigenvalues of A.
14. Prove that if A is a positive definite matrix and A — UEV* is a singular
value decomposition of A, then U = V.
15. Let A be a square matrix with a polar decomposition A = WP.
(a) Prove that A is normal if and only if WP2 = P2W.
(b) Use (a) to prove that A is normal if and only if WP = PW.
16. Let A be a square matrix. Prove an alternate form of the polar de
composition for A: There exists a unitary matrix W and a positive
semidefinite matrix P such that A = PW.

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 421
17. Let T and U be linear operators on R2 defined for all (xi,x2) G R2 by
T(xi,x2) = (xi,0) and U(xi,x2) = (xi + x2,0).
(a) Prove that (UT)* ^ T*U*.
(b) Exhibit matrices A and B such that AB is defined, but (AB)^ ^
18. Let ^4 be an ra x n matrix. Prove the following results.
(a) For any ra x ra unitary matrix G, (GAY = A^G*.
(b) For any n x n unitary matrix H, (A//)* = ifM*.
19. Let A be a matrix with real or complex entries. Prove the following
results.
(a) The nonzero singular values of A are the same as the nonzero
singular values of A*, which are the same as the nonzero singular
values of A1.
(b) (.4*)* = (A)*.
(c) (At)«-(A*)t.
20. Let A be a square matrix such that A2 = O. Prove that (A*)2 = O.
21. Let V and W be finite-dimensional inner product spaces, and let
T: V —> W be linear. Prove the following results.
(a) TT*T = T.
(b) T*TT*=T*.
(c) Both T*T and TT* are self-adjoint.
The preceding three statements are called the Penrose conditions,
and they characterize the pseudoinverse of a linear transformation as
shown in Exercise 22.
22. Let V and W be finite-dimensional inner product spaces. Let T: V —> W
and U: W —> V be linear transformations such that TUT = T, UTU = U,
and both UT and TU are self-adjoint. Prove that U = T*.
23. State and prove a result for matrices that is analogous to the result of
Exercise 21.
24. State and prove a result for matrices that is analogous to the result of
Exercise 22.
25. Let V and W be finite-dimensional inner product spaces, and let
T: V —* W be linear. Prove the following results.
(a) If T is one-to-one, then T*T is invertible and T* = (T*T)~1T*.
(b) If T is onto, then TT* is invertible and T* = T*(TT*)_1.

422 Chap. 6 Inner Product Spaces
26. Let V and W be finite-dimensional inner product spaces with orthonor
mal bases 0 and 7, respectively, and let T: V —» W be linear. Prove
that ([T|j)t = [Tt]?.
27. Let V and W be finite-dimensional inner product spaces, and let
T: V —> W be a linear transformation. Prove part (b) of the lemma
to Theorem 6.30: TT* is the orthogonal projection of W on R(T).
6.8* BILINEAR AND QUADRATIC FORMS
There is a certain class of scalar-valued functions of two variables defined on
a vector space that arises in the study of such diverse subjects as geometry
and multivariable calculus. This is the class of bilinear forms. We study the
basic properties of this class with a special emphasis on symmetric bilinear
forms, and we consider some of its applications to quadratic surfaces and
multivariable calculus.
Bilinear Forms
Definition. Let V be a vector space over a field F. A function II from
the set V x V of ordered pairs of vectors to F is called a bilinear form on V
if H is linear in each variable when the other variable is held fixed; that is,
H is a bilinear form on V if
(a) H{ax\ + x2, y) = aH(x], y) + H(x2, y) for allx\,x2,y G V and a G F
(b) H(x, mj\ 4- y2) — aH(x, yi) 4 H(x, y2) for all x, y\,y2 G V and a G F.
We denote the set of all bilinear forms on V by #(V). Observe that an
inner product on a vector space is a bilinear form if the underlying field is
real, but not. if the underlying field is complex.
Example 1
Define a function H: R2 x R2
H
R by
= 2ai&i 4- 3ai&2 + 4a2&i — a2b2 for GR2.
We could verify directly that II is a bilinear form on R2. However, it is more
enlightening and less tedious to observe that if
A =
2
4
?v
-w
/«,
\a2
and y —
then
H(x,y) = x 'Ay.
The bilinearity of H now follows directly from the distributive property of
matrix multiplication over matrix addition. •

Sec. 6.8 Bilinear and Quadratic Forms 423
The preceding bilinear form is a special case of the next example.
Example 2
Let V = Fn, where the vectors are considered as column vectors. For any
A G MnXn(F), define H: V x V -• F by
H(x, y) = xlAy for x, y G V.
Notice that since x and y are n x 1 matrices and A is an nxn matrix, H(x,y)
is a 1 x 1 matrix. We identify this matrix with its single entry. The bilinearity
of H follows as in Example 1. For example, for a G F and x\,x2,y G V, we
have
H(axi 4- x2,y) = (axi + x2)tAy = {ax\ 4 x)Ay
= ax\Ay 4 x\Ay
= aH{x1,y) + H(x2,y). •
We list several properties possessed by all bilinear forms. Their proofs are
left to the reader (see Exercise 2).
For any bilinear form iJona vector space V over a field F, the following-
properties hold.
1. If, for any x G V, the functions Lx, Rx: V —* F are defined by
Lx(y) = H{x,y) and Rx(y) = H(y,x) forallyGV,
then Lx and R^ are linear.
2. H(0, x) = H(x, 0) = 0 for all x G V.
3. For all x, y,z,w G V,
H(x + y,z + w) = H(x, z) 4 H(x, w) + H(y, z) 4 H(y, w).
4. If .7: V x V —> F is defined by J(x,y) = H(y,x), then J is a bilinear
form.
Definitions. Let V be a vector space, let H\ and H2 be bilinear forms
on V, and let a be a scalar. We define the sum Hi 4 H2 and the scalar
product aH\ by the equations
(f/i 4- H2){x,y) = Hx{x,y) 4 H2{x,y)
and
(aHi)(x,y) = a(Hi(x,y)) forallx,y G V.
The following theorem is an immediate consequence of the definitions.

424 Chap. 6 Inner Product Spaces
Theorem 6.31. For any vector space V, the sum of two bilinear forms
and the product of a scalar and a bilinear form on V are again bilinear forms
on V. Furthermore, B(V) is a vector space with respect to these operations.
Proof. Exercise. |
Let 0 = {vx,v2,... ,vn} be an ordered basis for an n-dimensional vector
space V, and let H G £(V). We can associate with // an ;/ x n matrix A
whose entry in row i and column j is defined by
Aij = H{vi, Vj) for i, j = 1,2,..., n.
Definition. The matrix A above is called the matrix representation
of H with respect to the ordered basis 0 and is denoted bytppffi).
We can therefore regard ijjp as a mapping from S(V) to M„xr,(F). where
F is the field of scalars for V, that takes a bilinear form H into its matrix
representation ipp(H). We first consider an example and then show that ipp
is an isomorphism.
Example 3
Consider the bilinear form H of Example 1, and let
r
Then
and
So
0 =
I3U =H
BV2 = H
B2\ = H
B22 =a H
-1
and B = $p(H).
= 2 + 3 + 4-1 = 8,
= 2-3 + 441=4,
= 243-4 + 1 = 2.
= 2-3-4-1 =-6.
MH) =
4
,2
If 7 is the standard ordered basis for R , the reader can verify that
'2 3N
iMH) =
4 -1

Sec. 6.8 Bilinear and Quadratic Forms 425
Theorem 6.32. For any n-dimcnsional vector space V over F and any
ordered basis 0 for V, iftp: i?(V) —> MrtXn(F) is an isomorphism.
Proof. We leave the proof that ipp is linear to the reader.
To show that ipp is one-to-one, suppose that ij^p(H) = O for some H G
B(V). Fix Vi G 0, and recall the mapping LVi: V —* F, which is linear by
property 1 on page 423. By hypothesis, LVi(vj) = H(vi,Vj) = 0 for all Vj G 0.
Hence LVi is the zero transformation from V to F. So
H{vi, x) = LVi (x) = 0 for all x G V and v{ G 0. (7)
Next fix an arbitrary y G V, and recall the linear mapping Ry: V —> F defined
in property 1 on page 423. By (7), Ry(vi) = H(vi,y) = 0 for all Vi G 0, and
hence Ry is the zero transformation. So H{x,y) — Ry(x) = 0 for all x,y G V.
Thus H is the zero bilinear form, and therefore V/3 is one-to-one.
To show that i/jp is onto, consider any A G MnXn(F). Recall the isomor
phism <f>p: V —> Fn defined in Section 2.4. For x G V, we view (f)p(x) G Fn as
a column vector. Let H: V X V —> F be the mapping defined by
H(x, y) = lMx)]*A\^p(y)] for all x, y G V.
A slight embellishment of the method of Example 2 can be used to prove that
H G B{V). We show that tj>p(H) = A. Let viyv3 G 0. Then <^(fj) = ej and
4>(i{vj) = e,j\ hence, for any i and j,
H(Vi,Vj) = [Mvi)]tA[Mvj)} = e\Ae3- = Ai3.
We conclude that •tfipiH) = A and V^/3 is onto. |
Corollary 1. For any n-dimensional vector space V, #(V) lias dimen
sion n2.
Proof. Exercise. I
The following corollary is easily established by reviewing the proof of
Theorem 6.32.
Corollary 2. Let V be an n-dimensional vector space over F with
ordered basis 0. If H G B{\1) and A G M„Xn(-F), ^cn ij)0{H) = A if and
only ifH(x,y) = [^(x)]*A[<f>e(y)] for all x,y G V.
The following result is now an immediate consequence of Corollary 2.
Corollary 3. Let F be a field, n a positive integer, and 0 be the standard
ordered basis for Fn. Then for any H G B(fn), there exists a unique matrix
A G M„,xn(F), namely, A = ipp{H), such that
H(x, y) = xlAy for all x, y G F".

426 Chap. 6 Inner Product Spaces
Example 4
Define a function H: R2 x R2 -^ R by
//
"l h
.nur'n: B-^-^ ta&MS>
6i
It can be shown that H is a bilinear form. We find the matrix A in Corollary 3
such that H{x,y) - x'Ay for all x,y G R2.
Since Ai3 — H(ci,e.j) for all i and j, we have
A12 = det(j f)=l, Av = det| * ())
'''" {<1 01
and A22 = det ( 1
Therefore A =
0 1
-1 0
There is an analogy between bilinear forms and linear operators on finite-
dimensional vector spaces in that both are associated with unique square
matrices and the correspondences depend on the choice of an ordered basis for
the vector space. As in the case of linear operators, one can pose the following
question: How docs the matrix corresponding to a fixed bilinear form change
when the ordered basis is changed? As we have seen, the corresponding
question for matrix representations of linear operators leads to the definition
of the similarity relation on square matrices. In the case of bilinear forms,
the corresponding question leads to another relation on square matrices, the
congruence, relation.
Definition. Let A, B G M„xn(F). Then B is said to be congruent to
A if there exists an invertible matrix Q G Mnxn(F) such that B = Q'AQ.
Observe that the relation of congruence is an equivalence relation (sec
Exercise 12).
The next theorem relates congruence to the matrix representation of a
bilinear form.
Theorem 6.33. Let V be a finite-dimensional vector space with ordered
bases 0 = {vi,v2,.. •, vn) and 7 = {w\,w2,..., w7l), and let Q be the change
of coordinate matrix changing ^-coordinates into 0-coordinates. Then, for
any H G #(V), we have 07(J7) = Q'ipg{H)Q. Therefore ij^{H) is congruent
tod'3{H).
Proof. There are essentially two proofs of this theorem. One involves a
direct computation, while the other follows immediately from a clever obser
vation. We give the more direct proof here, leaving the other proof for the
exercises (see Exercise 13).

Sec. 6.8 Bilinear and Quadratic Forms 427
Suppose that A — ipp(H) and B = ipy(H). Then for 1 < i, j < n,
= ^,QkiVk and w3 =^TQrjVr.
k= r=l
Thus
Bij = H{w.i,Wj) = H I 'Y^QkiVk.Wj J
n
= ^QkiH(vk,Wj)
fc=sl
= YL ®kiH I Vk> 5Z QriVr)
fc=l V 7-1 /
Tt n
= 22 ^kl S Q'jH(vk, Vr)
fc=l r=l
n n
fc=l 7=1
» 71
fc=l r=l
7i
fc=l
TJ
= y£Qlk(AQhJ = (QtAQ)lj.
fe=i
Hence B = Q*AQ. I
The following result is the converse of Theorem 6.33.
Corollary. Let V he an n-dimensional vector space with ordered basis 0,
and let H be a bilinear form on V. For any n xn matrix B, if B is congruent
to ipp(H), then there exists an ordered basis 7 for V such that ip^{H) = B.
Furthermore, if B = Qt"iftp(H)Q for some invertible matrix Q, then Q changes
7-coordinates into 0-coordinates.
Proof. Suppose that B = Q'ij)ft(H)Q for some invertible matrix Q and
that 0 = {v\,v2,... ,vn}. Let 7 = {w\,w2,...,wn], where
== 2_, Qijvi f°r 1 < i < M.
7=1

428 Chap. 6 Inner Product Spaces
Since Q is invertible. 7 is an ordered basis for V, and Q is the change of
coordinate matrix that changes 7-coordinates into /^-coordinates. Therefore,
by Theorem 6.32,
B = QtMH)Q=<!h(H). I
Symmetric Bilinear Forms
Like the diagonalization problem for linear operators, there is an analogous
diagonalization problem for bilinear forms, namely, the problem of determin
ing those bilinear forms for which there are diagonal matrix representations.
As we will see, there is a close relationship between diagonalizable bilinear
forms and those that are called symmetric.
Definition. A bilinear form H on a vector space V is symmetric if
H(x,y) = H(y,x) for all x.y G V.
As the name suggests, symmetric bilinear forms correspond to symmetric
matrices.
Theorem 6.34. Let H be a bilinear form on a finite-dimensional vector
space V, and let ii be an ordered basis for V. Then II is symmetric if and
only ifif)p(H) is symmetric.
Proof. Let 6 = {wi.va,.... v„ } and B = ip0{H).
First assume that H is symmetric. Then for 1 < i,j < n,
Bij = H(vi,vj) = H(v3yvi) = Bji,
and it follows that B is symmetric.
Conversely, suppose that B is symmetric. Let ./: V x V —• F, where F is
the field of scalars for V, be the mapping defined by J(x,y) = H(y,x) for all
x, y G V. By property 4 on page 423, ,/ is a bilinear form. Let C — ipe{J)-
Then, for 1 < i, j < n,
CIJ=J(v1,vj) = H(vJ.vl) = BJI = Bu.
Thus C = B. Since ijjp is one-to-one. we have ./ = //. Hence H(y,x) =
J(x,y) = H(.r.y) for all x.y G V, and therefore H is symmetric. LI
Definition. A bilinear form H on a finite-dimensional vector space V is
called diagonalizable if there is an ordered basis 0 for V such that 0^(H)
is a diagonal matrix.
Corollary. Let H be a diagonalizable bilinear form ou a finite-dimensional
vector space V. Then H is symmetric.

Sec. 6.8 Bilinear and Quadratic Forms 429
Proof Suppose that H is diagonalizable. Then there is an ordered basis 0
for V such that ij)@ (H) = D is a diagonal matrix. Trivially, D is a symmetric
matrix, and hence, by Theorem 6.34, H is symmetric. 1
Unfortunately, the converse is not true, as is illustrated by the following
example.
Example 5
Let F = Z2,\/ = F2, and H: V x V
H
F be the bilinear form defined by
= a2 +a2bx.
Clearly H is symmetric. In fact, if 0 is the standard ordered basis for V, then
A = #(//) = ^ JV
a symmetric matrix. We show that H is not diagonalizable.
By way of contradiction, suppose that H is diagonalizable. Then there is
an ordered basis 7 for V such that B — 07(i/) is a diagonal matrix. So by
Theorem 6.33, there exists an invertible matrix Q such that B = Q* AQ. Since
Q is invertible, it follows that rank (73) = rank(^4) = 2, and consequently the
diagonal entries of B are nonzero. Since the only nonzero scalar of F is 1,
Suppose that
B =
Q-
1 0
0 1
Then
= B = Q'AQ
a c
b d
0 1
1 0
a b
c d
ac + ac be + ad
be + ad bd + bd
But p + p = 0 for all p G F; hence ac + ac — 0. Thus, comparing the row
1, column 1 entries of the matrices in the equation above, we conclude that
1 = 0, a contradiction. Therefore H is not diagonalizable. •
The bilinear form of Example 5 is an anomaly. Its failure to be diagonal
izable is due to the fact that the scalar field Z2 is of characteristic two. Recall

430 Chap. 6 Inner Product Spaces
from Appendix C that a field F is of characteristic two if 1 + 1 = 0 in F,
If F is not of characteristic two, then 1 + 1=2 has a multiplicative inverse,
which we denote by 1/2.
Before proving the converse of the corollary to Theorem 6.34 for scalar
fields that are not, of characteristic two, we establish the following Lemma.
Lemma. Let H be a nonzero symmetric bilinear form on a vector space
V over a field F not of characteristic two. Then there is a vector x in V such
that H(x, x) ^ 0.
Proof. Since H is nonzero, we can choose vectors u, v G V such that
H(u,v) y£ 0. If H(u,u) ^ 0 or H(v,v) ^ 0, there is nothing to prove.
Otherwise, set x = u + v. Then
H(x,x) = H(u,u) + H{u,v) + H(v.u) + H{v,v) = 2H(u.v) ^ 0
because 2 ^ 0 and H{u. v) ^ 0. |
Theorem 6.35. Let V be a finite-dimensional vector space over a field
F not of characteristic two. Then every symmetric bilinear form on V is
diagonalizable.
Proof. We use mathematical induction on n = dim(V). If n — 1, then every
element of <B(V) is diagonalizable. Now suppose that the theorem is valid
for vector spaces of dimension less than n for some fixed integer n > 1. and
suppose that diin(V) = n. If // is the zero bilinear form on V. then trivially H
is diagonalizable; so suppose that 77 is a nonzero symmetric bilinear form on
V. By the lemma, there exists a nonzero vector x in V such that 77(x,x) / 0.
Recall the function L,:: V — F defined by L,(/y) = H(x,y) for all y G V. By
property 1 on page 423, L^ is linear. Furthermore, since La(x) = II(x,x) ^ 0,
Lx is nonzero. Consequently, rank(L,) = 1. and hence dim(N(LJ)) = n - 1.
The restriction of 77 to N(L.r) is obviously a symmetric bilinear form on
a vector space of dimension // — 1. Tims, by the induction hypothesis, there
exists an ordered basis {vi,v2,... ,vn i} for INKL^) such that II(vi,Vj) = 0
for i / j (1 < i.j < n - 1). Set yn = x. Then vn <£ N(LX), and so
0 ~ {MI,U2, ... ,vn) is an ordered basis for V. In addition, /7(u,, i;„) =
H(un,Vi) — 0 for i = 1,2,... ,n — 1. We conclude that ijipiH) is a diagonal
matrix, and therefore II is diagonalizable. 1
Corollary. Let F be a field that is not of characteristic two. If A G
M„x„(F) is a symmetric matrix, then A is congruent to a diagonal matrix.
Proof. Exercise. I

Sec. 6.8 Bilinear and Quadratic Forms 431
Diagonalization of Symmetric Matrices
Let A be a symmetric n x n matrix with entries from a field F not of
characteristic two. By the corollary to Theorem 6.35, there are matrices
Q,D G Mnxn(F) such that Q is invertible, D is diagonal, and Ql AQ = D. We
now give a method for computing Q and D. This method requires familiarity
with elementary matrices and their properties, which the reader may wish to
review in Section 3.1.
If E is an elementary nxn matrix, then AE can be obtained by performing
an elementary column operation on A. By Exercise 21, ElA can be obtained
by performing the same operation on the rows of A rather than on its columns.
Thus E'AE can be obtained from A by performing an elementary operation
on the columns of A and then performing the same operation on the rows
of AE. (Note that the order of the operations can be reversed because of
the associative property of matrix multiplication.) Suppose that Q is an
invertible matrix and D is a diagonal matrix such that QlAQ = D. By
Corollary 3 to Theorem 3.6 (p. 159), Q is a product of elementary matrices,
sayQ = E1E2---Ek. Thus
D = Q'AQ = ££££_i • • • E\AE,E2 • • • Ek.
From the preceding equation, we conclude that by means of several elemen
tary column operations and the corresponding row operations, A can be trans
formed into a diagonal matrix D. Furthermore, if E\,E2,... ,Ek are the
elem,entary matrices corresponding to these elementary column operations in
dexed in the order performed, and if Q — E\E2 • - • Ek, then Q'AQ = D.
Example 6
Let A be the symmetric matrix in M:ix;i(R) defined by
We use the procedure just described to find an invertible matrix Q and a
diagonal matrix D such that Q'AQ = D.
We begin by eliminating all of the nonzero entries in the first row and
first column except for the entry in column 1 and row 1. To this end, we
add the first column of A to the second column to produce a zero in row 1
and column 2. The elementary matrix that corresponds to this elementary
column operation is
E, =

432 Chap. 6 Inner Product Spaces
We perforin the corresponding elementary operation on the rows of AE\ to
obtain
/1 0 3N
E\AEi = 0 1 4 | .
\3 4 1
We now use the first column of E[AE\ to eliminate the 3 in row 1 column 3,
and follow this operation with the corresponding row operation. The corre
sponding elementary matrix E2 and the result of the elementary operations
E^E\AEXE2 are, respectively,
E,= 0
0
0
1
0
-3
0
1
and E\E\AEiE2 =
1 0
0 1
0 4
0
4
-8
Finally, we subtract 4 times the second column of E2E\AE\E2 from the
third column and follow this with the corresponding row operation. The cor
responding elementary matrix E3 and the result of the elementary operations
E!iE2E\AE\E2E:i are, respectively,
E3 =
1 0
0 1
0 0
0
-4
1
and E^El2E\AElE2E3 = 0
0
0
1
0
0
0
-24
Since we have obtained a diagonal matrix, the process is complete. So we let
/I 1 -7\ /I 0 0N
Q = E1E2E3 = I 0 1 -4 and D = I 0 1 0
\0 0 1/ \0 0 -24,
to obtain the desired diagonalization Q' AQ = D. •
The reader should justify the following method for computing Q without
recording each elementary matrix separately. The method is inspired by the
algorithm for computing the inverse of a matrix developed in Section 3.2.
We use a sequence of elementary column operations and corresponding row
operations to change the n x 2n matrix (A\I) into the form (D\B), where D
is a diagonal matrix and B = Q*. It then follows that D = Q'AQ.
Starting with the matrix A of the preceding example, this method pro
duces the following sequence of matrices:
(A\I)= -
1
1
3
-1
2
1
3
1
1
1
1
3
0 3
1 1
1 1

Sec. 6.8 Bilinear and Quadratic Forms 433
Therefore
F> = Ql =
(l 1 -V
and Q = I 0 1 -4
VO 0 1,
Quadratic Forms
Associated with symmetric bilinear forms are functions called quadratic
forms.
Definition. Let V be a vector space over F. A function K: V —• F is
called a quadratic form if there exists a symmetric bilinear form 77 G #(V)
such that
K(x) = H(x, x) for all x G V. (8)
If the field F is not of characteristic two, there is a one-to-one correspon
dence between symmetric bilinear forms and quadratic forms given by (8).
In fact, if K is a quadratic form on a vector space V over a field F not of
characteristic two, and K(x) = 77(x, x) for some symmetric bilinear form 77
on V, then we can recover 77 from K because
H(x,y) = -[K(x + y) K{x) - K(y)} (9)
(See Exercise 16.)
Example 7
The classic example of a quadratic form is the homogeneous second-degree
polynomial of several variables. Given the variables t\,t2,...,tn that take
values in a field F not of characteristic two and given (not necessarily distinct)
scalars aij (1 < i < j < n), define the polynomial
f(h,t2,..., tn) = ^2 a-ijtitj-
i<3
.

434 Chap. 6 Inner Product Spaces
Any such polynomial is a quadratic form. In fact, if 3 is the standard or
dered basis for F", then the symmetric bilinear form 77 corresponding to the
quadratic form / has the matrix representation ipn(H) = A. where
A 7 — A'* ~~
an if t = j
\ai3 if i^j.
To see this, apply (9) to obtain H(ei,e3) = Aj from the quadratic form K,
and verify that / is computable from H by (8) using / in place of K.
For example, given the polynomial
/('i.'2. h) = 2/'f - t\ + 6tit2 - 4t2t3
with real coefficients, let
Setting H(x,y) = x' Ay for all x.y G R'\ we see that
%
f(ti,t2)t3) = (tut2,t3)A ( t2
Quadratic Forms Over the Field R
Since symmetric matrices over 7? are orthogonally diagonalizable (see The
orem 6.20 p. 384), the theory of symmetric bilinear forms and quadratic forms
on finite-dimensional vector spaces over 77 is especially nice. The following
theorem and its corollary are useful.
Theorem 6.36. Let V be a finite-dimensional real inner product space,
and let H be a symmetric bilinear form on V. Then there exists an orthonor
mal basis 0 for V such that «/',*( 77) is a diagonal matrix.
Proof. Choose any orthonormal basis 7 = {vi,v2,... ,vn} for V, and let
A — */>-v(T7). Since A is symmetric, there exists an orthogonal matrix Q
and a diagonal matrix D such that D — Q' AQ by Theorem 6.20. Let 0 =
{wi,w2,..., Wn} be defined by
ivj = YJ QijVi for 1 < 7 < n.
7=1
By Theorem 6.33, ^#(77) = D. Furthermore, since Q is orthogonal and 7 is
orthonormal, 0 is orthonormal by Exercise 30 of Section 6.5. SI
L.

Sec. 6.8 Bilinear and Quadratic Forms 435
Corollary. Let K be a quadratic form on a finite-dimensional real inner
product space V. There exists an orthonormal basis 0 = {v\,v2,... ,vn} for
V and scalars Ai, A2,..., An (not necessarily distinct) such that if x G V and
n
x = ^2siVi, Si G 7?,
;=i
then
K(x) = J2XiS2.
In fact, if 77 is the symmetric bilinear form determined by K, then 0 can
be chosen to be any orthonormal basis for V such that il>p(H) is a diagonal
matrix.
Proof. Let 77 be the symmetric bilinear form for which K{x) = H(x,x)
for all x G V. By Theorem 6.36, there exists an orthonormal basis 0 —
{v\,v2, ... ,vn} for V such that tpp(H) is the diagonal matrix
/Ai 0
0 A2
D =
0
0
\0 0 ••• An/
Let x G V, and suppose that x = X^iLi SiVi- Then
K(x)=H(x,x) = [<f>p(x)]tD[<f>(i(x)} = (s1,s2,...,sn)D
fsi
s2
=I>f. I
7=1
Example 8
For the homogeneous real polynomial of degree 2 defined by
/(ti,t2) = 5i2 + 2r;2 + 4ll,i2, (10)
we find an orthonormal basis 7 = {xi,x2} for R2 and scalars Ai and A2 such
that if
1 ) G R2 and ( '} ] = SiXi + s2x2,
then f{t\,t2) = Xis2 + A2A,2>. We can think of s\ and $2 as the coordinates of
(ti,t2) relative to 7. Thus the polynomial f(ti,t2), as an expression involving

436 Chap. 6 Inner Product Spaces
the coordinates of a point with respect to the standard ordered basis for R2,
is transformed into a new polynomial g(si,s2) = XiS2 + A2s2 interpreted as
an expression involving the coordinates of a point relative to the new ordered
basis 7.
Let 77 denote the symmetric bilinear form corresponding to the quadratic
form defined by (10), let 0 be the standard ordered basis for R2, and let
i4 = ^(H). Then
Next, we find an orthogonal matrix Q such that Q'AQ is a diagonal matrix.
For this purpose, observe that Ai = 6 and A2 = 1 are the eigenvalues of A
with corresponding orthonormal eigenvectors
", = 7i(i) •°d W2 = 7s(-2)'
Let 7 = {vi,v2}. Then 7 is an orthonormal basis for R2 consisting of eigen
vectors of A. Hence, setting
Q
J_ (2 1
we see that Q is an orthogonal matrix and
'6 0
QlAQ =
0 1
Clearly Q is also a change of coordinate matrix. Consequently,
'6 0N
V7(77) = Q*il>p{H)Q = QlAQ = y (| L
Thus by the corollary to Theorem 6.36,
K(x) = 6sl + si
for any x = s\v 1 + s2i>2 G R2. So g(si, s2) = 6s2 + s2. •
The next example illustrates how the theory of quadratic forms can be
applied to the problem of describing quadratic surfaces in R3.
Example 9
Let «S be the surface in R3 defined by the equation ,
2*f + 6*i*2 + 5*1 - 2*2*3 + 2*1 + 3*i - 2*2 - *3 + 14 = 0. (11)

Sec. 6.8 Bilinear and Quadratic Forms 437
Then (11) describes the points of S in terms of their coordinates relative to 0,
the standard ordered basis for R3. We find a new orthonormal basis 7 for R3
so that the equation describing the coordinates of <S relative to 7 is simpler
than (11).
We begin with the observation that the terms of second degree on the left
side of (11) add to form a quadratic form K on R3:
f< I t2 = 2*2 + 6*i*2 + 5*1 - 2*2*3 + 2*§.
Next, we diagonalize K. Let 77 be the symmetric bilinear form corre
sponding to K, and let A — ipp{H). Then
The characteristic polynomial of A is (—1)(* — 2)(* — 7)*; hence A has the
eigenvalues Ai = 2, A2 = 7, and A3 = 0. Corresponding unit eigenvectors are
Vi =
VTo\3J
Set 7 = {vi,v2,v3} and
v2 =
and V3 =
1
vTo
0
3
3
\/35
5
-1
-3
y/Ti
2
1
W
Q =
\VTo
As in Example 8, Q is a change of coordinate matrix changing 7-coordinates
to /3-coordinates, and
/2 0 0
^(H) = QtMH)Q = QtAQ= [0 7 0).
^ \0 0 0/
By the corollary to Theorem 6.36, if x = siVi + s2v2 + S3M3, then
K(x) = 2s2 + 7s|. (12)

438 Chap. 6 Inner Product Spaces
Figure 6.7
We are now ready to transform (11) into an equation involving coordinates
relative to 7. Let x — (ti, t2, t3) G R3, and suppose that x = S1V1 -r82v2+s$V3.
Then, by Theorem 2.22 (p. Ill),
and therefore
10 V35
Si 3,s2 3.S3
*i = —7= •+
*2 =
5.s2 2s3
\/3o~ \/\l
and
*a =
3.S1 s2 S3
'35 VT4

Sec. 6.8 Bilinear and Quadratic Forms
Thus
3*i — 2*2 — *3 = —
439
14s?
= -V14s3.
Combining (11), (12), and the preceding equation, we conclude that if x G R3
and x = Sifi + S2^2 + S3M3, then x G S if and only if
2s^ + lsA2 - \/Us3 + 14 = 0 or s3 = ^s? + — &\ + Vu.
Consequently, if we draw new axes x',y', and z' in the directions of V\,v2,
and V3, respectively, the graph of the equation, rewritten as
z —
.'\2
(xT + ^iy') 14,
7 v ' 2
coincides with the surface S. We recognize S to be an elliptic paraboloid.
Figure 6.7 is a sketch of the surface S drawn so that the vectors V\, v2 and
V3 are oriented to lie in the principal directions. For practical purposes, the
scale of the z' axis has been adjusted so that the figure fits the page. •
The Second Derivative Test for Functions of Several Variables
We now consider an application of the theory of quadratic forms to mul
tivariable calculus—the derivation of the second derivative test for local ex-
trema of a function of several variables. We assume an acquaintance with the
calculus of functions of several variables to the extent of Taylor's theorem.
The reader is undoubtedly familiar with the one-variable version of Taylor's
theorem. For a statement and proof of the multivariable version, consult, for
example, An Introduction to Analysis 2d ed, by William R. Wade (Prentice
Hall, Upper Saddle River, N.J., 2000).
Let z — /(*i, *2, • • •, *n) be a fixed real-valued function of n real variables
for which all third-order partial derivatives exist and are continuous. The
function / is said to have a local maximum at a point p G Rn if there exists
a 6 > 0 such that f(p) > f(x) whenever \x — p|| < 8. Likewise, / has a local
minimum at p G Rn if there exists a 5 > 0 such that f(p) < f(x) whenever
||x— p\ < 6. If / has either a local minimum or a local maximum at p, we say
that / has a local extremum at p. A point p G Rn is called a critical point
of / if df(p)/dti = 0 for i = 1,2,..., n. It is a well-known fact that if / has
a local extremum at a point p G Rn, then p is a critical point of /. For, if /
has a local extremum at p = (pi,p2, • • • iPn)-, then for any i = 1,2,..., n the

440 Chap. 6 Inner Product Spaces
function <f>i defined by 4>i(t) = f(pi,p2,... ,pi-\,t,pi+i,... ,pn) has a local
extremum at * = pi. So, by an elementary single-variable argument,
df(p) d&ipi]
Ok dt
= 0.
Thus p is a critical point of /. But critical points are not necessarily local
extrema.
The second-order partial derivatives of / at a critical point p can often
be used to test for a local extremum at p. These partials determine a matrix
A(p) in which the row i, column j entry is
d2f{p)
(dti)(dtj)-
This matrix is called the Hessian matrix of / at p. Note that if the third-
order partial derivatives of / are continuous, then the mixed second-order
partials of / at p are independent of the order in which they are taken, and
hence A(p) is a symmetric matrix. In this case, all of the eigenvalues of A(p)
are real.
Theorem 6.37 (The Second Derivative Test). Let /(ti, *2,..., *n)
be a real-valued function in n real variables for which all third-order partial
derivatives exist and arc continuous. Let p = (p\,p2,... ,pn) be a critical
point of f, and let A(p) be the Hessian of f at p.
(a) If all eigenvalues of A(p) arc positive, then f has a local minimum at p.
(b) If all eigenvalues of A(p) are negative, then f has a local maximum at p.
(c) 7f A(p) has at least one positive and at least one negative eigenvalue,
then f has no local extremum at p (p is called a saddle-point of f).
(d) If rank(A(p)) < n and A(p) does not have both positive and negative
eigenvalues, then the second derivative test is inconclusive.
Proof. If p ^ 0, we may define a function g: Rn —• 77 by
0(*i,*2,...,tn) = /(*i +pl,t2 +p2,...,pn +*n) - f(p)-
The following facts are easily verified.
1. The function / has a local maximum [minimum] at p if and only if g
has a local maximum [minimum] at 0 — (0,0,..., 0).
2. The partial derivatives of g at 0 are equal to the corresponding partial
derivatives of / at p.
3. 0 is a critical point of g.

Sec. 6.8 Bilinear and Quadratic Forms 441
In view of these facts, we may assume without loss of generality that p = 0
and f(p) = 0.
Now we apply Taylor's theorem to / to obtain the first-order approxima
tion of / around 0. We have
/(*.,«,,. .-.*.)-/W+ETT^+SE
7=1
d2f(0)
dU ^ ' 2 ^(dti)(dtj)
*z*j+5(*i,*2,. ..,*„)
1 n
a2/(fl)
2^(^)(a*j
*i*j + 5(*i,*2, • • • ,*n)>
where 5 is a real-valued function on Rn such that
lim
5(x)
lim
5(*i,*2, • •. ,*n)
x-»0 ||x[|2 (ti,ta,...,*n)-»0 *i + *i + r- *n
? = 0.
(13)
(14)
Let 7f: Rn —• 77 be the quadratic form defined by
1 y d2f(0)
K
(ti
*2
v»/
2 .^{dt^dt,
"Cjtj, (15)
77 be the symmetric bilinear form corresponding to K, and 0 be the standard
ordered basis for Rn. It is easy to verify that ip@(H) — $A(p). Since A(p)
is symmetric, Theorem 6.20 (p. 384) implies that there exists an orthogonal
matrix Q such that
QtA{p)Q =
/Ai 0
0 A2
\0 0
0
0
XnJ
is a diagonal matrix whose diagonal entries are the eigenvalues of A(p). Let
7 = {v\,v2,... ,vn} be the orthogonal basis for Rn whose ith vector is the
ith column of Q. Then Q is the change of coordinate matrix changing 7-
coordinates into /^-coordinates, and by Theorem 6.33
^7(77) = QlMH)Q = ^QtA(p)Q =
'* 0
0 0
T/

442 Chap. 6 Inner Product Spaces
Suppose that A(p) is not the zero matrix. Then A(p) has nonzero eigen
values. Choose e > 0 such that e < |Aj|/2 for all Aj ^ 0. By (14), there
exists S > 0 such that for any x G Rn satisfying 0 < ||x|| < 6, we have
|S(x)| < e||x||2. Consider any x G Rn such that 0 < ||x|| < S. Then, by (13)
and (15),
and hence
(x)-K(x)\ = \S(x)\<e\x
K{x) - e\x\2 < f(x) < K{x) + f ||x| :i6)
Suppose that x = > SiVt. Then
i=
Z*
1
and K(x) = - y^Xn
2 7=1
Combining these equations with (16), we obtain
7=1 ^ ' i=l > '
Now suppose that all eigenvalues of A(p) are positive. Then ^A, - f > 0
for all i, and hence, by the left inequality in (17),
/(0) = 0<E(iA,-eVs2</(x).
7=1 ^ '
Thus f(0) < /(x) for ||x|| < 6, and so / has a local minimum at 0. By a
similar argument using the right inequality in (17), we have that if all of the
eigenvalues of A(p) are negative, then / has a local maximum at 0. This
establishes (a) and (b) of the theorem.
Next, suppose that A(p) has both a positive and a negative eigenvalue,
say, Aj > 0 and Aj < 0 for some i and j. Then ^Aj - r > 0 and ^Xj + e < 0.
Let s be any real number such that 0 < |s| < 6. Substituting x = svi and
x = SVJ into the left inequality and the right inequality of (17), respectively,
we obtain
f(0) = 0<(±Xi-e)s2<f(sVi) and f(SVj) < (JA, + e)s2 < 0 = f(0).
Thus / attains both positive and negative values arbitrarily close to 0; so /
has neither a local maximum nor a local minimum at 0. This establishes (c).

Sec. 6.8 Bilinear and Quadratic Forms 443
To show that the second-derivative test is inconclusive under the condi
tions stated in (d), consider the functions
/(*i,*2) = 4-4 and g{tiM) = tl+4
at p = 0. In both cases, the function has a critical point at p, and
'2 0N
A(p) =
0 0
However, / does not have a local extremum at 0, whereas g has a local
minimum at 0. I
Sylvester's Law of Inertia
Any two matrix representations of a bilinear form have the same rank
because rank is preserved under congruence. We can therefore define the
rank of a bilinear form to be the rank of any of its matrix representations.
If a matrix representation is a diagonal matrix, then the rank is equal to the
number of nonzero diagonal entries of the matrix.
We confine our analysis to symmetric bilinear forms on finite-dimensional
real vector spaces. Each such form has a diagonal matrix representation in
which the diagonal entries may be positive, negative, or zero. Although these
entries are not unique, we show that the number of entries that are positive
and the number that are negative are unique. That is, they are independent
of the choice of diagonal representation. This result is called Sylvester's law
of inertia. We prove the law and apply it to describe the equivalence classes
of congruent symmetric real matrices.
Theorem 6.38 (Sylvester's Law of Inertia). Let 77 be a symmetric
bilinear form on a finite-dimensional real vector space V. Then the number of
positive diagonal entries and the number of negative diagonal entries in any
diagonal matrix representation of H are each independent of the diagonal
representation.
Proof. Suppose that 0 and 7 are ordered bases for V that determine di
agonal representations of 77. Without loss of generality, we may assume that
0 and 7 are ordered so that on each diagonal the entries are in the order
(of positive, negative, and zero. It suffices to show that both representations
have the same number of positive entries because the number of negative en
tries is equal to the difference between the rank and the number of positive
entries. Let p and q be the number of positive diagonal entries in the matrix
representations of 77 with respect to 0 and 7, respectively. We suppose that
p •£ q and arrive at a contradiction. Without loss of generality, assume that
p < q. Let
0 = {v\,v2,...,vp,...,vr,...,vn} and 7 = {wi,w2,... ,wq,... ,wr,... ,wn},

u ••
444 Chap. 6 Inner Product Spaces
where r is the rank of 77 and n = dim(V). Let L: V —* RP+r t be the mapping
defined by
L(x) = (H(x,vi), H(x,v2),..., H{x,vp), H{x,wq+i),..., H(x,wr)).
It is easily verified that L is linear and rank(L) n — (p + r — q) > n — r.
So there exists a nonzero vector VQ such that vo ^ span({?;r+i, vr+2,..., vn}),
but vo G N(L). Since v0 G N(L). it follows that 77(t'0,Vj) = 0 for i o = ^a3v3 =E'V-
j=i j=i
For any i < p,
H(v0,Vi) = 77 I EaiviJv* I = ^2ajH(v3,Vi) = a.iH{vi,Vi).
\j-i J i-i
But for i < p, we have H{vi,Vj) > 0 and H(v{),Vi) = 0, so that Uj =
0. Similarly, 6j = 0 for q + 1 Mn}, it follows that at ^ 0 for some p 2jH{v3,v3)= E oJ/T(«i,t;j)<0.
y=i 7=1 / j=i j=P+i
Furthermore,
H(v0,v0) = H E^^'E/w =E6?H^',WJ)= E />J^(M'J,M;J)>0.
\j=l 7=1 / j=l j=P+l
So 77(fn,Mo) < 0 and H(vo,vo) > 0, which is a contradiction. We conclude
that p — q. 1
Definitions. The number of positive diagonal entries in a diagonal
representation of a symmetric bilinear form on a real vector space is called
the index of the form. The difference between the number of positive and
the number of negative diagonal entries in a diagonal representation of a
symmetric bilinear form is called the signature of the form. The three terms
rank, index, and signature are called the invariants of the bilinear form
because they are invariant with respect to matrix representations. These
same terms apply to the associated quadratic form. Notice that the values of
any two of these invariants determine the value of the third.

Sec. 6.8 Bilinear and Quadratic Forms 445
Example 10
The bilinear form corresponding to the quadratic form K of Example 9 has
a 3 x 3 diagonal matrix representation with diagonal entries of 2, 7, and 0.
Therefore the rank, index, and signature of K are each 2. •
Example 11
The matrix representation of the bilinear form corresponding to the quadratic
form K(x,y) = x2 — y2 on R2 with respect to the standard ordered basis is
the diagonal matrix with diagonal entries of 1 and — 1. Therefore the rank of
K is 2, the index of K is 1, and the signature of K is 0. •
Since the congruence relation is intimately associated with bilinear forms,
we can apply Sylvester's law of inertia to study this relation on the set of real
symmetric matrices. Let A be an n x n real symmetric matrix, and suppose
that D and E are each diagonal matrices congruent to A By Corollary 3
to Theorem 6.32, A is the matrix representation of the bilinear form 77 on
Rn defined by H(x,y) = xlAy with respect to the standard ordered basis for
Rn. Therefore Sylvester's law of inertia tells us that D and E have the same
number of positive and negative diagonal entries. We can state this result as
the matrix version of Sylvester's law.
Corollary 1 (Sylvester's Law of Inertia for Matrices). Let A be
a real symmetric matrix. Then the number of positive diagonal entries and
the number of negative diagonal entries in any diagonal matrix congruent to
A is independent of the choice of the diagonal matrix.
Definitions. Let A be a real symmetric matrix, and let D be a diagonal
matrix that is congruent to A. The number of positive diagonal entries of
D is called the index of A. The difference between the number of positive
diagonal entries and the number of negative diagonal entries of D is called
the signature of A. As before, the rank, index, and signature of a matrix
are called the invariants of the matrix, and the values of any two of these
invariants determine the value of the third.
Any two of these invariants can be used to determine an equivalence class
of congruent real symmetric matrices.
Corollary 2. Two real symmetric n x n matrices are congruent if and
only if they have the same invariants.
Proof. If A and B are congruent n x n symmetric matrices, then they are
both congruent to the same diagonal matrix, and it follows that they have
the same invariants.
Conversely, suppose that A and B are nx n symmetric matrices with the
same invariants. Let D and E be diagonal matrices congruent to A and B,

446 Chap. 6 Inner Product Spaces
respectively, chosen so that the diagonal entries are in the order of positive,
negative, and zero. (Exercise 23 allows us to do this.) Since A and B have
the same invariants, so do D and E. Let p and r denote the index and the
rank, respectively, of both D and E. Let di denote the ith diagonal entry
of D, and let Q be the n x n diagonal matrix whose ith diagonal entry qi is
given by
( 1
-7= if 1 < i < p
\ di
Qi = S
1
if p < i < r
Then QtDQ — Jpr, where
\l-di
1 if r < i.
O
Jpr — I *r—p
p o
It follows that A is congruent to Jpr- Similarly, B is congruent to Jpr, and
hence A is congruent to B. [
The matrix Jpr acts as a canonical form for the theory of real symmet
ric matrices. The next corollary, whose proof is contained in the proof of
Corollary 2, describes the role of Jpr.
Corollary 3. A real symmetric n x n matrix A has index p and rank r
if and only if A is congruent to Jpr (as just defined).
Example 12
Let
A =
1 1
1 2
3 1
'3
1 '
1/
77 =
I 2 1
1 2 3 2
\l 2 1
and C =
We apply Corollary 2 to determine which pairs of the matrices A, B, and C
are congruent.
The matrix A is the 3x3 matrix of Example 6, where it is shown that
A is congruent to a diagonal matrix with diagonal entries 1, 1, and —24.
Therefore, A has rank 3 and index 2. Using the methods of Example 6 (it is
not necessary to compute Q), it can be shown that B and C are congruent,
respectively, to the diagonal matrices
and

\ x- -«
Sec. 6.8 Bilinear and Quadratic Forms 447
It follows that both A and C have rank 3 and index 2, while B has rank 3 and
index 1. We conclude that A and C are congruent but that B is congruent
to neither A nor C. •
EXERCISES
1. Label the following statements as true or false.
(a) Every quadratic form is a bilinear form.
(b) If two matrices are congruent, they have the same eigenvalues.
(c) Symmetric bilinear forms have symmetric matrix representations.
(d) Any symmetric matrix is congruent to a diagonal matrix.
(e) The sum of two symmetric bilinear forms is a symmetric bilinear
form.
(f) Two symmetric matrices with the same characteristic polynomial
are matrix representations of the same bilinear form.
(g) There exists a bilinear form 77 such that H(x,y) ^ 0 for all x and
y-
(h) If V is a vector space of dimension n, then dim(B(V)) = 2n.
(i) Let 77 be a bilinear form on a finite-dimensional vector space V
with dim(V) > 1. For any x G V, there exists y G V such that
y^0,butH(x,y) = 0.
(j) If 77 is any bilinear form on a finite-dimensional real inner product
space V, then there exists an ordered basis 0 for V such that ijjp(H)
is a diagonal matrix.
2. Prove properties 1, 2, 3, and 4 on page 423.
3. (a) Prove that the sum of two bilinear forms is a bilinear form.
(b) Prove that the product of a scalar and a bilinear form is a bilinear
form.
(c) Prove Theorem 6.31.
4. Determine which of the mappings that follow are bilinear forms. Justify
your answers.
(a) Let V = C[0,1] be the space of continuous real-valued functions on
the closed interval [0,1]. For /, g G V, define
H(f,g)= f f(t)g(t)dt.
Jo
(b) Let V be a vector space over F, and let J G B(V) be nonzero.
Define 77: VxV^F by
77(x, y) = [J(x, y)}2 for all x, y G V.

448 Chap. 6 Inner Product Spaces
(c) Define 77: 77 x R -> R by II{h,t2) = *j + 2*2.
(d) Consider the vectors of R2 as column vectors, and let 77: R2 —* R
be the function defined by H(x,y) = det(x,y), the determinant of
the 2x2 matrix with columns x and y.
(e) Let V be a real inner product space, and let 77: V x V —» 7? be the
function defined by H(x,y) = (x.y) for x, y G V.
(f) Let V be a complex inner product space, and let 7/: V x V —* C
be the function defined by H(x,y) = (x,y) for x.y G V.
5. Verify that each of the given mappings is a bilinear form. Then compute
its matrix representation with respect to the given ordered basis 0.
(a) H: R3 x R3 -> R, where
= ai — 2a2 + a2^i — 0363
6.
7.
and
(b) Let V = M2x2(77) and
0 =
1 0
0 0
0 1
0 0
0 0
1 0
0 0
0 1
(c)
Define 77: V x V - R by 77(A B) = tr(A) • tr(77).
Let 0 = {cos*,sin*,cos2t,sin21}. Then 0 is an ordered basis
for V = span(tf), a four-dimensional subspace of the space of all
continuous functions on 7?. Let H: V x V
defined by H(f,g) = f'(0)-g"(0).
Let 77: R2
77 be the function
//
R be the function defined by
= a i62 + a2b\ for G R'"
(a) Prove that 77 is a bilinear form.
(b) Find the 2x2 matrix A such that 77(x. y) = x' Ay for all x, y G R2.
For a 2 x 2 matrix A7 with columns x and y, the bilinear form 77(A7) =
H(x,y) is called the permanent of M.
Let V and W be vector spaces over the same field, and let T: V —* W be
a linear transformation. For any 77 G #(W), define f (77): VxV-»F
by f(H)(x,y) = H(T(x),T(y)) for all x,y G V. Prove the following
results.

Sec. 6.8 Bilinear and Quadratic Forms 449
(a) If 77 G B(W), then f (77) G £(V).
(b) T: #(W) —> B() is a linear transformation.
(c) If T is an isomorphism, then so is T.
8. Assume the notation of Theorem 6.32.
(a) Prove that for any ordered basis 0, ipp is linear.
(b) Let 0 be an ordered basis for an n-dimensional space V over F, and
let 4>p: V —* Fn be the standard representation of V with respect
to 0. For A G MnXn(F), define 77: V x V -> F by H(x,y) =
[(f>0(x)\tA[(j)fi{y)]. Prove that 77 G #(V). Can you establish this as
a corollary to Exercise 7?
(c) Prove the converse of (b): Let 77 be a bilinear form on V. If
A = MH), then H(x,y) = [M^Y^Mv)]-
9. (a) Prove Corollary 1 to Theorem 6.32.
(b) For a finite-dimensional vector space V, describe a method for
finding an ordered basis for <3(V).
10. Prove Corollary 2 to Theorem 6.32.
11. Prove Corollary 3 to Theorem 6.32.
12. Prove that the relation of congruence is an equivalence relation.
13. The following outline provides an alternative proof to Theorem 6.33.
(a) Suppose that 0 and 7 are ordered bases for a finite-dimensional
vector space V, and let Q be the change of coordinate matrix
changing 7-coordinates to /^-coordinates. Prove that <f>p — LQ^,
where 4>& and 07 are the standard representations of V with respect
to 0 and 7, respectively.
(b) Apply Corollary 2 to Theorem 6.32 to (a) to obtain an alternative
proof of Theorem 6.33.
14. Let V be a finite-dimensional vector space and 77 G #(V). Prove that,
for any ordered bases 0 and 7 of V, rank(V^(77)) = rank(?^7(77)).
15. Prove the following results.
(a) Any square diagonal matrix is symmetric.
(b) Any matrix congruent to a diagonal matrix is symmetric.
(c) the corollary to Theorem 6.35
16. Let V be a vector space over a field F not of characteristic two, and let
77 be a symmetric bilinear form on V. Prove that if K(x) = 77(x, x) is
the quadratic form associated with 77, then, for all x, y G V,
H(x,y) = -[K(x + y)-K(x)-K(y)}.

450 Chap. 6 Inner Product Spaces
17. For each of the given quadratic forms i^ona real inner product space
V, find a symmetric bilinear form 77 such that K(x) = 77(x, x) for all
x G V. Then find an orthonormal basis 0 for V such that ipp(H) is a
diagonal matrix.
(a) K: R2
(b) K: R2
R defined by K 'tl * "J " '
*2.
-2*^+4*1*2 + *|
77 defined by K r1 J = lt\ - 8*i*2 + t
(u
R defined by K *2 = 3*? + Zt\ + 3*§ - 2*i*3
W
(c) K: R3
VW
18. Let S be the set of all (*i, i2, *3) G R3 for which
3*? + St2, + 3*1 - 2*1*3 + 2v^(*i + *3) + 1 = 0.
Find an orthonormal basis 0 for R3 for which the equation relating
the coordinates of points of S relative to 0 is simpler. Describe <S
geometrically.
19. Prove the following refinement of Theorem 6.37(d).
(a) If 0 < rank(^4) < n and A has no negative eigenvalues, then / has
no local maximum at p.
(b) If 0 < rank(^4) < n and A has no positive eigenvalues, then / has
no local minimum at p.
20. Prove the following variation of the second-derivative test for the case
n = 2: Define
D =
\d2f(P)]
&i J
\d2f{p)'
dt
-
\d2f(P)]
0*10*2
n2
21.
(a) If D > 0 and d2f(p)/dt2 > 0, then / has a local minimum at p.
(b) If D > 0 and d2f(p)/dt2 < 0, then / has a local maximum at p.
(c) If D < 0, then / has no local extremum at p.
(d) If D = 0, then the test is inconclusive.
Hint: Observe that, as in Theorem 6.37, D = det(A) — AiA2, where Ai
and A2 are the eigenvalues of A.
Let A and E be in Mnxn(F), with E an elementary matrix. In Sec
tion 3.1, it was shown that AE can be obtained from A by means of
an elementary column operation. Prove that EtA can be obtained by
means of the same elementary operation performed on the rows rather
than on the columns of A. Hint: Note that ElA = (AtE)t.

Sec. 6.9 Einstein's Special Theory of Relativity 451
22. For each of the following matrices A with entries from R, find a diagonal
matrix D and an invertible matrix Q such that QlAQ = D.
24.
25.
#m£ /or ^: Use an elementary operation other than interchanging
columns.
23. Prove that if the diagonal entries of a diagonal matrix are permuted,
then the resulting diagonal matrix is congruent to the original one.
Let T be a linear operator on a real inner product space V, and define
H: V x V -> R by H(x, y) = (x, T(y)) for all x, y € V.
(a) Prove that H is a bilinear form.
(b) Prove that H is symmetric if and only if T is self-adjoint.
(c) What properties must T have for H to be an inner product on V?
(d) Explain why H may fail to be a bilinear form if V is a complex
inner product space.
Prove the converse to Exercise 24(a): Let V be a finite-dimensional real
inner product space, and let H be a bilinear form on V. Then there
exists a unique linear operator T on V such that H(x, y) = (ar, T(y)) for
all x, yeV, Hint: Choose an orthonormal basis (3 for V, let A = ipp(H),
and let T be the linear operator on V such that [T]^ = A. Apply
Exercise 8(c) of this section and Exercise 15 of Section 6.2 (p. 355).
26. Prove that the number of distinct equivalence classes of congruent nxn
real symmetric matrices is
(n + l)(n + 2)
6.9* EINSTEIN'S SPECIAL THEORY OF RELATIVITY
As a consequence of physical experiments performed in the latter half of the
nineteenth century (most notably the Michelson Morley experiment of 1887),
physicists concluded that the results obtained in measuring the speed of light
are independent of the velocity of the instrument used to measure the speed of
light. For example, suppose that while on Earth, an experimenter measures
the speed of light emitted from the sun and finds it to be 186,000 miles per
second. Now suppose that the experimenter places the measuring equipment
in a spaceship that leaves Earth traveling at 100,000 miles per second in a
direction away from the sun. A repetition of the same experiment from the
spaceship yields the same result: Light is traveling at 186,000 miles per second

452 Chap. 6 Inner Product Spaces
relative to the spaceship, rather than 86,000 miles per second as one might
expect!
This revelation led to a new way of relating coordinate systems used to
locate events in space—time. The result was Albert Einstein's special theory
of relativity. In this section, we develop via a linear algebra viewpoint the
essence of Einstein's theory.
S'
Figure 6.8
The basic problem is to compare two different inertial (nonaccelerating)
coordinate systems S and S' in three-space (R3) that are in motion relative
to each other under the assumption that the speed of light is the same when
measured in either system. We assume that S' moves at a constant velocity
in relation to S as measured from S. (See Figure 6.8.) To simplify matters,
let us suppose that the following conditions hold:
1. The corresponding axes of S and S' (x and x', y and y', z and z') are
parallel, and the origin of S' moves in the positive direction of the x-axis
of S at a constant velocity v > 0 relative to S.
2. Two clocks C and C are placed in space—the first stationary relative
to the coordinate system S and the second stationary relative to the
coordinate system S'. These clocks are designed to give real numbers
in units of seconds as readings. The clocks are calibrated so that at the
instant the origins of S and S" coincide, both clocks give the reading
zero.
3. The unit of length is the light second (the distance light travels in 1
second), and the unit of time is the second. Note that, with respect to
these units, the speed of light is 1 light second per second.
Given any event (something whose position and time of occurrence can be
described), we may assign a set of space-time coordinates to it. For example,

Sec. 6.9 Einstein's Special Theory of Relativity
if p is an event that occurs at position
453
relative to S and at time t as read on clock C, we can assign to p the set of
coordinates
M
y
z
W
This ordered 4-tuple is called the space—time coordinates of p relative to
S and C. Likewise, p has a set of space-time coordinates
(x'
y
z
v«7
relative to S' and C.
For a fixed velocity v, let Jv: R4
T,
/x
y
z
W
R4 be the mapping defined by
fx'
y
z
V)
where
(x
y
z
and
fx'
y
z
are the space—time coordinates of the same event with respect to 5 and C
and with respect to S' and C", respectively.
Einstein made certain assumptions about T„ that led to his special theory
of relativity. We formulate an equivalent set of assumptions.
Axioms of the Special Theory of Relativity
(R 1) The speed of any light beam, when measured in either coordinate system
using a clock stationary relative to that coordinate system, is 1.

454
(R 2) The mapping T„: R1
(R 3) If
Chap. 6 Inner Product Spaces
R4 is an isomorphism.
T,
/x
y
—
(x'
y
z
V)
then y' = y and z' = z.
(R.4) If
T,
(x
l)
-1
V)
,'
-k
u
V
and Tr
/x
)
=
(x"
y"
z"
V")
then x" = x' and t" - t'.
(R 5) The origin of S moves in the negative direction of the rr'-axis of Sf at
the constant velocity — v < 0 as measured from S'.
Axioms (R 3) and (R 4) tell us that for p 6 R4, the second and third coor
dinates of T„(p) are unchanged and the first and fourth coordinates of Tv(p)
are independent of the second and third coordinates of p.
As we will see, these five axioms completely characterize Tt,. The operator
Tv is called the Lorentz transformation in direction :/;. We intend to
compute T„ and use it to study the curious phenomenon of time contraction.
Theorem 6.39. On R4, the following statements are true.
(a) Tv(ei) = e-i for i = 2,3.
(b) span({e2, e.3}) is T,,-invariant.
(c) span({f'i,e4}) is Tv-invariant.
(d) Both span({r;2lfi3}) and span({/?i,C4}) are T*-/nvariant.
(e) T*v(('i) = et fori = 2,3.
Proof, (a) By axiom (R 2),
T„
and hence, by axiom (R 4), the first and fourth coordinates of
(°)
0
0
w
(o
0
0
w
T,
w

V- ""Mf
(°)
1
0
W
(0
1
0
v^
and Tv
(0
0
1
vO>
/0
0
1
v^y
Sec. 6.9 Einstein's Special Theory of Relativity 455
are both zero for any a,b € R. Thus, by axiom (R 3),
T,
The proofs of (b), (c), and (d) are left as exercises.
(e) For any j / 2, (T*,(e2), ej) - (e2, T^e,)) = 0 by (a) and (c); for j = 2,
(Tl{e2),ej) = <e2,T„(e2)) = (e2,e2) = 1 by (a). We conclude that T*(e2) is
a multiple of e2 (i.e., that T*(e2) — ke.2 for some k € R). Thus,
1 = (e2,e2) = (e2,Tu(e2)) = (T*,(e2),e2) = (fcc2,e2) = k,
and hence T*(e2) = e2. Similarly, T*(e3) = 63. I
Suppose that, at the instant the origins of S and S' coincide, a light
flash is emitted from their common origin. The event of the light flash when
measured either relative to S and C or relative to S' and C has space time
coordinates
M
0
0
w
Let P be the set of all events whose space—time coordinates
/x
y
z
w
relative to S and C are such that the flash is observable from the point with
coordinates
(as measured relative to S) at the time t (as measured on C). Let us charac
terize P in terms of x, y, z, and t. Since the speed of light is 1, at any time
t > 0 the light flash is observable from any point whose distance to the origin
of S (as measured on S) is t • 1 = t. These are precisely the points that lie on
the sphere of radius t with center at the origin. The coordinates (relative to

456 Chap. 6 Inner Product Spaces
S) of such points satisfy the equation x2 + y2 + z2 — t2 — 0. Hence an event
lies in P if and only if its space time coordinates
[t > 0)
relative to S and C satisfy the equation x2 + y2 + z2 ~ t2 = 0. By virtue of
axiom (R 1), we can characterize P in terms of the space time coordinates
relative; to S" and C similarly: An event lies in P if and only if, relative to
S' and C, its space-time coordinates
f.r'
II
w
(t > 0)
satisfy the equation {x')2 + (;/)2 + {z')2 - {t')2 = 0.
Let
A =
/l 0 0
0 1 0
0 0 1
\0 0 0 -l)
Theorem 6.40. If (U(u>), w) - 0 for some w € R4, then
{VvLATv(w),w)=0.
Proof. Let
'A
w —
(I
'I
GR4,
and suppose that (LA(W).W) — 0.
CASE 1. t > 0. Since (L^w), w?) = x2 + y2 + z2 — t2, the vector w gives
the coordinates of an event in P relative to S and C. Because
/.A
and
fx'
y'
Vl VJ

Sec. 6.9 Einstein's Special Theory of Relativity 457
are the space—time coordinates of the same event relative to S' and C, the
discussion preceding Theorem 6.40 yields
{x'Y + {y'Y + (z'Y - {t'Y = 0.
Thus (T*vLATv(w),w) = (lATv(w),Tv(w)) = (x')2 + (y')2 + (z')2 - (t/)2 = 0,
and the conclusion follows.
CASE 2. t < 0. The proof follows by applying case 1 to — w. 1
We now proceed to deduce information about Tv. Let
wi =
n
0
0
W
and Wo. =
( W
0
0
v-v
By Exercise 3, {wi,w2} is an orthogonal basis for span({ei,e4J), and
span({ei,e4}) is T*L^Tv-invariant. The next result tells us even more.
Theorem 6.41. There exist nonzero scalars a, and b such that
(a) TlLATv(wi) = aw2.
(b) T*vLATv(u)2) = bwi.
Proof, (a) Because (\-A(wi),w) — 0, (TlLAlv{w),w1) = 0 by Theo
rem 6.40. Thus T*L^Tw(u'i) is orthogonal to w\. Since span({ei,e4}) =
span({w;i,tij2}) is T*LATv-invariant, T*\-ATv{ivi) must lie in this set. But
{wi,u>2} is an orthogonal basis for this subspace, and so T^Lj^Jv{w{) must
be a multiple of w2- Thus T*L^Tu(w;1) = aw2 for some scalar a. Since Tv
and A are invertible, so is T,*L^T„. Thus a ^ 0, proving (a).
The proof of (b) is similar to (a). I
Corollary. Let Bv — [Tv]0, where ft is the standard ordered basis for R4.
Then
(a) B*ABV = A.
(b) T^L^T. = LA.
We leave the proof of the corollary as an exercise. For hints, see Exercise 4.
Now consider the situation 1 second after the origins of S and S' have
coincided as measured by the clock C. Since the origin of S' is moving along
the #-axis at a velocity v as measured in S, its space time coordinates relative
to S and C are

458 Chap. 6 Inner Product Spaces
Similarly, the space—time coordinates for the origin of S' relative to S' and
C must be
/o
0
0
VI
for some t' > 0. Thus we have
T,
fv
0
0
W
/o
0
0
w
for some f > 0. (18)
By the corollary to Theorem 6.41,
T*LAT„
(v
0
0
VI
i
(v
0
0
VI
= (LA
fv
0
0
VI
)
fv
0
0
VI
= v2-l. (19)
But also
T;LATW
/v
0
0
VI
5
(v
0
0
VI
= (LATV
= u
/tl
0
0
VI
,TV
fv
0
0
VI
/o
0
0
VI
i
(0
0
0
VI
= -(t
f\2 (20)
Combining (19) and (20), we conclude that v — 1 = —(f), or
t' = VT -u*.
(21)
Thus, from (18) and (21), we obtain
T,
M
o
0
w
/ o
0
0
\Vi=^I
(22)
Next recall that the origin of S moves in the negative direction of the
x'-axis of S' at the constant velocity — v < 0 as measured from S'. [This fact

Sec. 6.9 Einstein's Special Theory of Relativity 459
is axiom (R 5).] Consequently, 1 second after the origins of S and S' have
coincided as measured on clock C, there exists a time t" > 0 as measured on
clock C such that
T,
/0
0
0
W
f-vt"
0
0
V '" /
(23)
From (23), it follows in a manner similar to the derivation of (22) that
1
t" =
vT^2'
(24)
hence, from (23) and (24),
T,
/0
0
0
VI
/ -v
Vl-v2
0
0
1
(25)
WT^v2!
The following result is now easily proved using (22), (25), and Theorem 6.39.
Theorem 6.42. Let 0 be the standard ordered basis for R4. Then
{ 1
Pwjfl — "v —
o
0
—v
\VT^
0 0
1 0
0 1
0 0

VT^
0
1
Time Contraction
A most curious and paradoxical conclusion follows if we accept Einstein's
theory. Suppose that an astronaut leaves our solar system in a space vehicle
traveling at a fixed velocity v as measured relative to our solar system. It
follows from Einstein's theory that, at the end of time t as measured on Earth,
the time that passes on the space vehicle is only t\/\ — v2. To establish this
result, consider the coordinate systems S and S' and clocks C and C that
we have been studying. Suppose that the origin of S' coincides with the
space vehicle and the origin of S coincides with a point in the solar system

460 Chap. 6 Inner Product Spaces
(stationary relative to the sun) so that the origins of 5" and S' coincide and
clocks C and C read zero at the moment the astronaut embarks on the trip.
As viewed from 5, the space-time coordinates of the vehicle at any time
t > 0 as measured by C are
/vt
0
0
VI
whereas, as viewed from S', the space-time coordinates of the vehicle at any
time f > 0 as measured by C are
/0
0
0
VI
But if two sets of space -time coordinates
M
0
0
VI
and
0
0
VI
are to describe the same event, it must follow that
T,
fvt
0
0
VI
(o
0
0
VI
Thus
L' vt — &v —
( 1
Vl-v2
0
0
—v
1
0
0
0
1
0
VvT
— V
VT^
o
I

/vt\ /0
0
0
V^
0
0
VI
From the preceding equation, we obtain
-vH
+
v7!3^ VT^
= f, or
t' = ty/l-v2. (26)

Sec. 6.9 Einstein's Special Theory of Relativity 461
This is the desired result.
A dramatic consequence of time contraction is that distances are con
tracted along the line of motion (see Exercise 9).
Let us make one additional point. Suppose that we consider units of
distance and time more commonly used than the light second and second,
such as the mile and hour, or the kilometer and second. Let c denote the
speed of light relative to our chosen units of distance. It is easily seen that if
an object travels at a velocity v relative to a set of units, then it is traveling
at a velocity v/c in units of light seconds per second. Thus, for an arbitrary
set of units of distance and time, (26) becomes
f=«,/i-?.
EXERCISES
1. Prove (b), (c), and (d) of Theorem 6.39.
2. Complete the proof of Theorem 6.40 for the case t < 0.
3. For
W\ =
show that
(a) {w\,W2} is an orthogonal basis for span({ei,C4});
(b) span({ei,e4}) is T*LATw-invariant.
4. Prove the corollary to Theorem 6.41.
Hints:
fi
0
0
w
and u)2 —
( l
0
0
v-v
(a) Prove that
B*ABV =
(p
0
0
Vi
0
1
0
0
0
0
1
0
0
0
-p)
where
a + b a — b
p = —-— and q = —-—.

462 Chap. 6 Inner Product Spaces
w =
(b) Show that q — 0 by using the fact that B*ABV is self-adjoint.
(c) Apply Theorem 6.40 to
M
i
o
VI
to show that p = 1.
5. Derive (24). and prove that
T„
f{))
0
0
V'J
'
s/l-v*
0
0
1
(25)
Xs/T^I
Hint: Use a technique similar to the derivation of (22).
6. Consider three coordinate systems S. S', and S" with the corresponding
axes (j;,:/;',:/;"; y,y',y": and z,z'.z") parallel and such that the x-, x'-.
and :r"-axes coincide. Suppose that S' is moving past S at a velocity
t'i > 0 (as measured on S). S" is moving past S' at a velocity u2 > 0
(as measured on 5'), and S" is moving past S at a velocity V3 > 0 (as
measured on .9), and that there are three clocks C. C, and C" such
that C is stationary relative to S, C is stationary relative to S', and
C" is stationary relative to S". Suppose; that when measured on any
of the three clocks, all the origins of S, S'. and S" coincide at time 0.
Assuming that Tr., = T„2T,., (i.e.. BV.A = BV2BVl), prove that
V3 =
Note that substituting v2 = 1 in this equation yields V3 = 1. This tells
\is that the speed of light as measured in 5 or S' is the same. Why
would we be surprised if this were not the case?
7. Compute (By)'1. Show (Br)_l = B{ r). Conclude that if S' moves at
a negative velocity v relative to S. then [T,.] ,^ = B,.. where B,. is of the
form given in Theorem 6.42.
8. Suppose that an astronaut left Earth in the year 2000 and traveled to
a star 99 light years away from Earth at 99% of the speed of light and
that upon reaching the star immediately turned around and returned
to Earth at the same speed. Assuming Einstein's special theory of

Sec. 6.9 Einstein's Special Theory of Relativity 463
relativity, show that if the astronaut was 20 years old at the time of
departure, then he or she would return to Earth at age 48.2 in the year
2200. Explain the use of Exercise 7 in solving this problem.
9. Recall the moving space vehicle considered in the study of time contrac
tion. Suppose that the vehicle is moving toward a fixed star located on
the :r-axis of 5 at a distance b units from the origin of S. If the space
vehicle moves toward the star at velocity v, Earthlings (who remain "al
most" stationary relative to S) compute the time it takes for the vehicle
to reach the star as t = b/v. Due to the phenomenon of time contraction,
the astronaut perceives a time span of f = ty/1 — v2 = (b/v)y/l — v2.
A paradox appears in that the astronaut perceives a time span incon
sistent with a distance of b and a velocity of v. The paradox is resolved
by observing that the distance from the solar system to the star as
measured by the astronaut is less than b.
Assuming that the coordinate systems S and S' and clocks C and C
are as in the discussion of time contraction, prove the following results.
(a) At time t (as measured on C), the space time coordinates of star
relative to S and C are
fb
0
0
VI
(b) At time t (as measured on C), the space—time coordinates of the
star relative to S' and C are
/ b - vt
0
0
t-bv
V/T^I
(c) For
, b — tv . t — bv
x = , and t =
VT=r vT^'
we have x' = by/1 — v2 — fv.
This result may be interpreted to mean that at time t! as measured by
the astronaut, the distance from the astronaut to the star as measured
by the astronaut (see Figure 6.9) is
bJ\ - v2 - fv.

464 Chap. 6 Inner Product Spaces
S'
Figure 6.9
(x',0,0)
coordinates
relative to S'
*(star)
(6,0,0)
coordinates
relative to S
(d) Conclude from the preceding equation that
(1) the speed of the space vehicle relative to the star, as measured
by the astronaut, is v,
(2) the distance from Earth to the star, as measured by the astro
naut, is b\/\ — v2.
Thus distances along the line of motion of the space vehicle appear
to be contracted by a factor of s/l — v2.
6.10* CONDITIONING AND THE RAYLEIGH QUOTIENT
In Section 3.4, we studied specific techniques that allow us to solve systems of
linear equations in the form Ax — b. where A is an m x n matrix and 6 is an
m x 1 vector. Such systems often arise in applications to the real world. The
coefficients in the system are frequently obtained from experimental data,
and. in many cases, both m and n are so large that a computer must be used
in the calculation of the solution. Thus two types of errors must be considered.
First, experimental errors arise; in the collection of data since no instruments
can provide completely accurate measurements. Second, computers introduce
roundoff errors. One might intuitively feel that small relative changes in the
coefficients of the system cause; small relative errors in the solution. A system
that has this property is called well-conditioned: otherwise, the system is
called ill-conditioned.
We now consider several examples of these types of errors, concentrating
primarily on changes in 6 rather than on changes in the entries of A. In
addition, we assume that A is a square;, complex (or real), invcrtible matrix
since this is the case most freemently encountereel in applications.
i

Sec. 6.10 Conditioning and the Rayleigh Quotient 465
Example 1
Consider the system
The solution to this system is
x\ + x2 = 5
X\ — X2 — 1.
2/-
Now suppose that we change the system somewhat and consider the new
system
x\ + x2 = 5
xi - x2 = 1.0001.
This modified system has the solution
3.00005
1.99995/ '
We see that a change of 10 4 in one coefficient has caused a change of less
than 10-4 in each coordinate of the new solution. More generally, the system
$i + X2 = 5
X\ — X2 = 1 4- 5
has the solution
3 + 6/2
2-8/2
Hence small changes in b introduce small changes in the solution. Of course,
we are really interested in relative changes since a change in the solution of,
say, 10, is considered large if the original solution is of the order 10-2, but
small if the original solution is of the order 106.
We use the notation 5b to denote the vector b' — b, where b is the vector
in the original system and b' is the vector in the modified system. Thus we
have
8b =
We now define the relative change in b to be the scalar ||o"6||/||&||, where
|| • || denotes the standard norm on Cn (or Rn); that is, ||6|| = y/(b,b). Most

466 Chap. 6 Inner Product Spaces
of what follows, however, is true for any norm. Similar definitions holel for
the relative change in x. In this example,
IN '26
and
\6x
(3 + (h/2)
\2 ~ (h/2)J
0
-G)
'26
Thus the relative change in x equals, coincidentally, the relative change in b;
se> the system is well-conditioned. •
Example 2
Consider the system
X\ + X2 = 3
X\ + 1.00001a-2 = 3.00001,
which has
as its solution. The solution to the relateel system
x\ + x2 = 3
X] + 1.00001x2 = 3.00001 + e5
is
Hence,
while1
llfall
ll-rll
2 - (105)//'
i + (io5)/».y •
= 105>/275|/i| > 104|//.|,
ll^l
Zy/2
Thus the relative change in x is at least 104 times the relative change in 6!
This system is very ill-conditioned. Observe; that the lines defineel by the two
equations are nearly coincident. Se> a small change in either line could greatly
alter the point of intersection, that is, the solution to the system. •

Sec. 6.10 Conditioning and the Rayleigh Quotient 467
To apply the full strength of the theory of self-adjoint matrices to the
study of conditioning, we need the notion of the norm of a matrix. (See
Exercise 24 of Section 6.1 for further results about norms.)
Definition. Let A be a complex (or real) n x n matrix. Define the
(Euclidean) norm of A by
\A\ = max
\Ax
xjtO \x
where x G Cn or x G Rn.
Intuitively, \A\ represents the maximum magnification of a vector by the
matrix A. The question of whether or not this maximum exists, as well as
the problem of how to compute it, can be answered by the use of the so-called
Rayleigh quotient.
Definition. Let B be an n x n self-adjoint matrix. The Rayleigh
quotient for x ^ 0 is defined to be the scalar R(x) = (Bx,x) /||x||2.
The following result characterizes the extreme values of the Rayleigh quo
tient of a self-adjoint matrix.
Theorem 6.43. For a self-adjoint matrix B G MnXn(F), we have that
max R(x) is the largest eigenvalue ofB and min R(x) is the smallest eigenvalue
x^O xj^O
ofB.
Proof. By Theorems 6.19 (p. 384) and 6.20 (p. 384), we may choose an
orthonormal basis {vi, v2,..., vn} of eigenvectors of B such that Bvi = Ajfj
(1 A2 > • • • > ATO. (Recall that by the lemma to
Theorem 6.17, p. 373, the eigenvalues of B are real.) Now, for x G Fn, there
exist scalars oi, a2,..., an such that
Hence
R(x) =
X = ^TaiVi.
i=l
BX, X) \E"=1 aiXiVi' EJ=1 aJV>
I j - — | >' \ ~~ II T* I ^
It is easy to see that R(vi) = Aj, so we have demonstrated the first half of
the theorem. The second half is proved similarly. H

468 Chap. 6 Inner Product Spaces
Corollary 1. For any square matrix A, \A\ is finite and, in fact, equals
y/X, where A is the largest eigenvalue of A* A.
Proof. Let B be the self-adjoint matrix A*A, and let A be the largest
eigenvalue of B. Since, for x / 0,
0<
|Ar||2 (Ax, Ax) (A*Ax,x)
it follows from Theorem 6.43 that p||2 = A.
x
2 _
<£#-*«>.
Observe that the proof of Corollary 1 shows that all the eigenvalues of
A* A are nonnegative. For our next result, we need the following lemma.
Lemma. For any square matrix A, A is an eigenvalue of A* A if and only
if A is an eigenvalue of AA*.
Proof. Let A be an eigenvalue of A* A. If A = 0, then A* A is not invertible.
Hence A and A* are not invertible, so that A is also an eigenvalue of AA*.
The proof of the converse is similar.
Suppose now that A^O. Then there exists x ^ 0 such that A* Ax = \x.
Apply A to both sides to obtain (AA*)(Ax) = (Ax). Since Ax ^ 0 (lest
Xx = 0), we have that A is an eigenvalue of AA*. The proof of the converse
is left as an exercise. I
Corollary 2. Let A be an invertible matrix. Then ||J4_1|| = l/\/A,
where A is the smallest eigenvalue of A* A.
Proof. Recall that A is an eigenvalue of an invertible matrix if and only if
A-1 is an eigenvalue of its inverse.
Ne>w let Ai > A2 > • • • > ATl be the eigenvalues of A* A, which by the
lemma are the eigenvalues of AA*. Then \A~~1 \2 equals the largest eigenvalue
of (A'1)*A'1 = (AA*)~\ which equals 1/An. I
For many applications, it is only the largest and smallest eigenvalues that
are of interest. For example, in the case of vibration problems, the smallest
eigenvalue represents the lowest frequency at which vibrations can occur.
We see the role of both of these eigenvalues in our study of conditioning.
Example 3
Let

Sec. 6.10 Conditioning and the Rayleigh Quotient
Then
469
The eigenvalues of B are 3, 3, and 0. Therefore, ||^4|| = \/3- For any
x=\^0,
we may compute R(x) for the matrix B as
(Bx, x) 2(a2 + b2 + c2 -ab + ac + be)
3 > R(x) =
a2 + b2-r c2
Now that we know ||^4|| exists for every square matrix A, we can make use
of the inequality \Ax\ < \A\ • \x\, which holds for every x.
Assume in what follows that A is invertible, b ^ 0, and Ax = b. For
a given 8b, let 5x be the vector that satisfies A(x + 8x) = b + 8b. Then
A(8x) — 8b, and so 8x = A-1 (8b). Hence
- \Ax\ < \A\ • \x\ and \8x\ = H^WII < P" II • \W\.
Thus
\8x
< <
WA-'w-im-wAw
imi II&II/PII
Similarly (see Exercise 9),

=«,,.-„.(«).
(¥b\\^ \Sx
\A\-\A-i\ \\J
<
The number \A\ • ||>1-1|| is called the condition number of A and is
denoted cond(yl). It should be noted that the definition of cond(^4) depends
on how the norm of A is defined. There are many reasonable ways of defining
the norm of a matrix. In fact, the only property needed to establish the
inequalities above is that \Ax\ < \A\ • \x\ for all x. We summarize these
results in the following theorem.
Theorem 6.44. For the system Ax — b, where A is invertible and b ^ 0,
the following statements are true.
1 \5b
(a) For any norm || • ||, we have
cond(A)
ll<MI . ,/.JI<H>ll

470 Chap. 6 Inner Product Spaces
(b) If || • || is the Euclidean norm, then cond(A) = \/Ai/An , where Ai and
An are the largest and smallest eigenvalues, respectively, of A* A.
Proof. Statement (a) follows from the previous inequalities, and (b) follows
from Corollaries 1 and 2 to Theorem 6.43. 1
It is clear from Theorem 6.44 that cond(^4) > 1. It is left as an exercise
to prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary or
orthogonal matrix. Moreover, it can be shown with some work that equality
can be obtained in (a) by an appropriate choice of b and 8b.
We can see immediately from (a) that if cond(^4) is close to 1, then a
small relative error in b forces a small relative error in x. If cond(A) is large,
however, then the relative error in x may be small even though the relative
error in b is large, or the relative error in x may be large even though the
relative error in b is small! In short, cond(^4) merely indicates the potential
for large relative errors.
We have so far considered only errors in the vector b. If there is an error
8A in the coefficient matrix of the system Ax — b, the situation is more
complicated. For example, A + 8A may fail to be invertible. But under the
appropriate assumptions, it can be shown that a bound for the relative error
in x can be given in terms of cond(A). For example, Charles Cullen (Charles
G. Cullen, An Introduction to Numerical Linear Algebra, PWS Publishing
Co., Boston 1994, p. 60) shows that if A + 8A is invertible, then
\5,
8x
< cond(.4)
\8A
JM
It should be mentioned that, in practice, one never computes cond(v4)
from its definition, for it would be an unnecessary waste of time to compute
A-1 merely to determine its norm. In fact, if a computer is used to find
A~l, the computed inverse of A in all likelihood only approximates A~l, and
the error in the computed inverse is affected by the size of cond(A). So we
are caught in a vicious circle! There are, however, some situations in which
a usable approximation of cond(^4) can be found. Thus, in most cases, the
estimate of the relative error in x is based on an estimate of cond(^4).
EXERCISES
1. Label the following statements as true or false.
(a) If Ax = b is well-conditioned, then cond(^4) is small.
(b) If cond(A) is large, then Ax = b is ill-conditioned.
(c) If cond(^4) is small, then Ax — b is well-conditioned.
(d) The norm of A equals the Rayleigh quotient.
(e) The norm of A always equals the largest eigenvalue of A.

Sec. 6.10 Conditioning and the Rayleigh Quotient 471
2. Compute the norms of the following matrices,
(a)
4 0
1 s)
(b)
(-ii)
(e)
f1 % °
0 Tz l
\° 75 j
3. Prove that if B is symmetric, then ||Z?|| is the largest eigenvalue of B.
4. Let A and A-1 be as follows:
/ 6 13 -vr
A = 13 29 -38
V—17 -38 50,
and A l =
The eigenvalues of A are approximately 84.74, 0.2007, and 0.0588.
(a) Approximate ||i4||, ||A-1||, and cond(^l). (Note Exercise 3.)
(b) Suppose that we have vectors x and x such that Ax = b and
||6 — Ax\ < 0.001. Use (a) to determine upper bounds for
\x - A~lb\ (the absolute error) and \x - .4-16||/||.A_1&|| (the rel
ative error).
5. Suppose that x is the actual solution of Ax = b and that a computer
arrives at an approximate solution x. If cond(J4) = 100, ||6|| = 1, and
||6 — Ax\ = 0.1, obtain upper and lower bounds for \x — x||/||x||.
6. Let
B =
Compute
| B ||, and cond(Z?).
7. Let Bbea symmetric matrix. Prove that min R(x) equals the smallest
x^O
eigenvalue of B.
8. Prove that if A is an eigenvalue of AA*, then A is an eigenvalue of A* A.
This completes the proof of the lemma to Corollary 2 to Theorem 6.43.
9. Prove that if A is an invertible matrix and Ax — b, then
i / -8b. \ . \8x
\A\-\A -ii
nm\<
v II&II J -

472 Chap. 6 Inner Product Spaces
10. Prove the left inequality of (a) in Theorem 6.44.
11. Prove that cond(.4) = 1 if and only if A is a scalar multiple of a unitary
or orthogonal matrix.
12. (a) Let A and B be square matrices that are unitarily equivalent.
Prove that \A\ = \B\.
(b) Let T be a linear operator on a finite-dimensional inner product
space V. Define
|T|| = max
\T(x)
Prove that ||T|| = ||[T]^||,.where (3 is any orthonormal basis for V.
(c) Let V be an infinite-dimensional inner product space with an or
thonormal basis {vi,v2,...}. Let T be the linear operator on V
such that T(vk) = kvk- Prove that ||T|| (defined in (b)) does not
exist.
The next exercise assumes the definitions of singular value and pseudoinverse
and the results of Section 6.7.
13. Let A be an n x n matrix e)f rank r with the nonzero singular values
o~i > cr2 > • • • > rrr. Prove each of the following results.
(a) ||A||=ai.
(b) ||At|| = i.
(c) If A is invertible (and hence r = n), then cond(vl) = —.
o~„
6.11* THE GEOMETRY OF ORTHOGONAL OPERATORS
By Theorem 6.22 (p. 386), any rigid motion on a finite-dimensional real inner
product space is the composite of an orthogonal operator and a translation.
Thus, to understand the geometry of rigid motions thoroughly, we must ana
lyze the structure of orthogonal ejperators. Such is the aim of this section. We
show that any orthogonal operator on a finite-dimensional real inner product
space is the composite of rotations and reflections.
This material assumes familiarity with the results about direct sums de
veloped at the end of Section 5.2, and familiarity with the definition and
elementary properties of the determinant of a linear operator defined in Ex
ercise 7 of Section 5.1.
Definitions. Let T be a linear operator on a finite-dimensional real inner
product space V. The operator T is called a rotation if T is the identity on

Sec. 6.11 The Geometry of Orthogonal Operators 473
V or if there exists a two-dimensional subspace W ol'V, an orthonormal basis
(3 = {xi, x2} ^or W, and a real number 0 such that
T(xi) = (cos0)xi + (sin0)x2, T(x2) = (—sint9)xi + (cos0)x2,
and T(y) — y for all y G \N±. In this context, T is called a rotation of W
about W"1. The subspace W-1 is called the axis of rotation.
Rotations are defined in Section 2.1 for the special case that V = R2.
Definitions. Let T be a linear operator on a finite-dimensional real
inner product space V. The operator T is called a reflection if there exists
a one-dimensional subspace W of V such that T(x) = —x for all x G W and
T(y) = y for all y G W-1. In this context, T is called a reflection ofV about
W1.
It should be noted that rotations and reflections (or composites of these)
are orthogonal operators (see Exercise 2). The principal aim of this section
is to establish that the converse is also true, that is, any orthogonal operates
on a finite-dimensional real inner product space is the composite of rotations
and reflections.
Example 1
A Characterization of Orthogonal Operators on a One-Dimensional Real In
ner Product Space
Let T be an orthogonal operator on a one-dimensional inner product space
V. Choose any nonzero vector x in V. Then V = span({x}), and so T(x) = Ax
for some A G R. Since T is orthogonal and A is an eigenvalue of T, A = ±1.
If A = 1, then T is the identity on V, and hence T is a rotation. If A = —1,
then T(x) — —x for all x G V; so T is a reflection of V about V1- = {0}. Thus
T is either a rotation or a reflection. Note that in the first case, det(T) = 1,
and in the second case, det(T) = — 1. •
Example 2
Some Typical Reflections
(a) Define T: R2 -> R2 by T(a,b) = (-0,6), and let W = span({ei}).
Then T(x) = -x for all x G W, and T(y) = y for all 1/6W1. Thus T is a
reflection of R2 about W1 = span({e2}), the y-axis.
(b) Let T: R3 -» R3 be defined by T(a,b,c) = (a,b,-c), and let W =
span({e3}). Then T(x) = -x for all x G W, and T(y) - y for all y € W"1 =
span({ei,e2}), the xy-plane. Hence T is a reflection of R3 about W-1-. •
Example 1 characterizes all orthogonal operators on a one-dimensional
real inner product space. The following theorem characterizes all orthogonal

474 Chap. 6 Inner Product Spaces
operators on a two-dimensional real inner product space V. The proof fol
lows from Theorem 6.23 (p. 387) since all two-dimensional real inner product
spaces are structurally identical. For a rigorous justification, apply Theo
rem 2.21 (p. 104), where j3 is an orthonormal basis for V. By Exercise 15 of
Section 6.2, the resulting isomorphism (j>p: V —» R2 preserves inner products.
(See Exercise 8.)
Theorem 6.45. Let T he an orthogonal operator on a two-dimensional
real inner product space V. Then T is either a rotation or a reflection. Fur
thermore, T is a rotation if and only if det(T) = 1, and T is a reflection if
and only if det(T) = — 1.
A complete description of the reflections of R2 is given in Section 6.5.
Corollary. Let V be a two-dimensional real inner product space. The
composite of a reflection and a rotation on V is a reflection on V.
Proof. If Ti is a reflection on V and T2 is a rotation on V, then by
Theorem 6.45, det(Ti) = 1 and det(T2) = -1. Let T = T2Tj be the
composite. Since T2 and Ti are orthogonal, so is T. Moreover, det(T) =
det(T2)« det(Ti) = —1. Thus, by Theorem 6.45, T is a reflection. The proof
for TiT2 is similar. (S|
We now study orthogonal operators on spaces of higher dimension.
Lemma. If T is a linear operator on a nonzero finite-dimensional real
vector space V, then there exists a T-invariant subspace W of V such that
1 < dim(W) < 2.
Proof. Fix an ordered basis (3 = {y\, y2, • • •, 2/n} for V, and let A = [T]p.
Let <f>p: V —> Rn be the linear transformation defined by <f>p(yi) — e% for
i — 1,2,... ,n. Then <f>p is an isomorphism, and, as we have seen in Sec
tion 2.4, the diagram in Figure 6.10 commutes, that is, LA(f>p = <j>p\. As a
consequence, it suffices to show that there exists an L^-invariant subspace Z
of Rn such that 1 < dim(Z) < 2. If we then define W = 4>^l(Z), it follows
that W satisfies the conclusions of the lemma (see Exercise 13).
V V
Rn L^ , Rn
Figure 6.10

Sec. 6.11 The Geometry of Orthogonal Operators 475
The matrix A can be considered as an n x n matrix over C and, as such,
can be used to define a linear operator U on Cn by U(i>) = Av. Since U
is a linear operator on a finite-dimensional vector space over C, it has an
eigenvalue A G C. Let x G Cn be an eigenvector corresponding to A. We may
write A = Ai + iA2, where Ai and A2 are real, and
lax +ih
a2 + ib2
x =
\an + ibn)
where the a^'s and tVs are real. Thus, setting
Xi =
far
Vn)
and x2 =
b2
Vn)
we have x = X\ + ix2, where xi and x2 have real entries. Note that at least
one of Xi or x2 is nonzero since x ^ 0. Hence
U(x) = Ax = (Ai -I- ?'A2)(xi + 1x2) = (\X\ — A2x2) + i(\X2 4- A2xi).
Similarly,
U(x) = ^4(xi + 1x2) = Ax\ + iAx2-
Comparing the real and imaginary parts of these two expressions for U(x),
we conclude that
Ax 1 — A1X1 — A2x2 and Ax2 — Ajx2 + A2xi.
Finally, let Z = span({xi,x2}), the span being taken as a subspace of Rn.
Since xi ^ 0 or x2 7^ 0, Z is a nonzero subspace. Thus 1 < dim(Z) < 2, and
the preceding pair of equations shows that Z is L^-invariant. II
Theorem 6.46. Let T be an orthogonal operator on a nonzero finite-
dimensional real inner product space V. Then there exists a collection of
pairwise orthogonal T-invariant subspaces {Wi, W2,..., WTO} ofV such that
(a) 1 < dim(Wi) <2 for i = 1,2,... , m.
(b) v = w1ew2e--- ewTO.
Proof The proof is by mathematical induction on dim(V). If dim(V) = 1,
the result is obvious. So assume that the result is true whenever dim(V) < n
for some fixed integer n > 1.

476 Chap. 6 Inner Product Spaces
Suppose elim(V) = n. By the lemma, there exists a T-invariant subspace
Wi of V such that 1 < dim(W) < 2. If Wi = V, the result is established.
Otherwise, \Nj- / {0}. By Exercise 14, Wj1 is T-invariant and the restriction
of T to Wf is orthogonal. Since dim(Wf) < n, we may apply the induc
tion hypothesis to Twi and conclude that there exists a collection of pair-
wise orthogonal T-invariant subspaces {Wi, W2,..., Wm} of W^- such that
1 < dim(W<) < 2 for i = 2,3,... ,m and W^ = W2 e W3 0 • • • 0 Wm.
Thus {Wi,W2,..., Wm} is pairwise; orthogonal, and by Exercise 13(d) of
Section 6.2.
V = Wi e VArf = Wi $ W2 0W,
Applying Example 1 and Theorem 6.45 in the context of Theorem 6.46.
we conclude that the restriction of T to W, is either a rotation e>r a reflection
for each i = 2,3,...,m. Thus, in some sense, T is composed of rotations and
reflections. Unfortunately, very little can be said about the uniqueness e)f the
decomposition of V in Theorem 6.46. For example, the NA/j's, the number rn
of Wj's, and the number of W,"s for which Tw, is a reflection are not unique.
Although the number of W,'s for which Tw, is a reflection is not unique,
whether this number is even or odd is an intrinsic property of T. Moreover,
we can always decompose V so that Tw, is a reflection for at most one W,.
These fae;ts are established in the following result.
Theorem 6.47. Let T, V, Wi, ... . W,„ be as in Theorem 0.46.
(a) The number ofWi s for which Tw, is a reflection is even or odd according
to whether det(T) = 1 or det(T) = -1.
(b) It is always possible to decompose V as in Theorem 6.46 so that the
number of Wj s for which Tw, is a reflection is zero or one according to
whether elet(T) = 1 or dct(T) = —1. Furthermore, i/Tw, is a reflection,
then dim(Wi) = 1.
Proof, (a) Let r denote the number of W,'s in the decomposition for which
Twi is a reflection. Then, by Exercise 15,
det(T) = det(TWl)- det(Tw,) dct(T
w,
= (-1
proving (a).
(b) Let E = {x G V: T(x) = —x}; then E is a T-invariant subspace
of V. If W = E , then W is T-invariant. So by applying Theorem 6.46
to Tw- we obtain a collection of pairwise orthogonal T-invariant subspaces
{Wi, W2,..., Wfc} of W such that W = Wi 0 W2 0 • • • e Wfc and for 1 <
i < k, the dimension of each Wi is either 1 or 2. Observe that, for each
i — 1,2,... ,k, Tw; is a rotation. For otherwise, if Tw, is a reflection, there;
exists a nonzero x G W^ for which T(x) = — x. But then, x G W, fi E C
E1nE = {0}, a contradiction. If E = {()}, the result follows. Otherwise,

Sec. 6.11 The Geometry of Orthogonal Operators 477
choose an orthonormal basis f3 for E containing p vectors (p > 0). It is
possible to decompose ft into a pairwise disjoint union (3 = (3\ U (32 U • • • U (3r
such that each /% contains exactly two vectors for i < r, and (3r contains
two vectors if p is even and one vector if p is odd. For each i = 1, 2,..., r,
let Wk+i = span(/?i). Then, clearly, {Wi, W2, • • •, Wfc,..., Wfc+r} is pairwise
orthogonal, and
V = Wi 0 W2 © • • • 0 Wfc © • • • 0 Wfc+r
Moreover, if any $ contains two vectors, then
-1 0
(27)
det(TWfc+i) = det([T
w. ii
= det
0 -1
= 1.
So Twfc+i is a rotation, and hence Tw., is a rotation for j < k + r. If (3r
consists of one vector, then dim(Wfc+r) = 1 and
= det([T
w. J A
.) =det(-l) = -1.
Thus Twfc+J. is a reflection by Theorem 6.46, and we conclude that the de
composition in (27) satisfies the condition of (b). |
As a consequence of the preceding theorem, an orthogonal operator can
be factored as a product of rotations and reflections.
Corollary. Let T be an orthogonal operator on a finite-dimensional real
inner product space V. Then there exists a collection {Ti,T2,... ,Tm} of
orthogonal operators on V such that the following statements are true.
(a) For each i, T, is either a reflection or a rotation.
(b) For at most one i, T% is a reflection.
(c) TiTj = TjT; for all i and j.
(d) T = TiT2---Tm.
(e) det(T) =
1 if Ti is a rotation for each i
— 1 otherwise.
Proof. As in the proof of Theorem 6.47(b), we can write
V = W10W20..-0Wm,
where Tw, is a rotation for i < m. For each i = 1,2,..., m, define T;: V —+ V
by
Tj(xi +x2 + = Xi+X2-\ r- Xi-i + T(Xi) + Xi+i -I 1- xm,
where Xj G Wj for all j. It is easily shown that each T$ is an orthogonal
operator on V. In fact, T; is a rotation or a reflection according to whether
Tw; is a rotation or a reflection. This establishes (a) and (b). The proofs
of (c), (d), and (e) are left as exercises. (See Exercise 16.) 1

478 Chap. 6 Inner Product Spaces
Example 3
Orthogonal Operators on a Three-Dimensional Real Inner Product Space
Let T be an orthogonal operator em a three-dimensional real inner product
space V. We show that T can be decomposed into the composite of a rotation
and at most one reflection. Let
V = W,©W20---0Wm
be a decomposition as in Theorem 6.47(b). Clearly, m = 2 or m = 3.
If m = 2, then V = Wi © W2. Without loss of generality, suppose that
dim(Wi) = 1 and dim(W2) = 2. Thus Tw, is a reflection or the identity on
Wi, and Tw2 Is a rotation. Defining Tj and T2 as in the proof of the corollary
to Theorem 6.47, we have that T = TiT2 is the composite of a rotation and
at most one reflection. (Note that if Tw, is not a reflection, then Ti is the
identity on V and T = T2.)
If m = 3, then V = Wx © W2 © W3 and ehm(W,) = 1 for all i. For each
i, let Tj be as in the; proof of the corollary to Theorem 6.47. If Tw, is not a
reflection, then Tj is the identity on Wj. Otherwise, Tj is a reflection. Since
Twi is a reflection for at most one i, we conclude that T is either a single
reflection or the identity (a rotation). •
EXERCISES
1. Label the following statements as true or false. Assume that the under
lying vector spaces are finite-dimensional real inner product spaces.
(a) Any orthogonal operator is either a rotation or a re-flection.
(b) The composite of any two rotations on a two-dimensional space is
a rotation.
(c) The composite e>f any two rotations on a three-dimensional space
is a rotation.
(d) The composite of any two rotations on a four-dimensional space is
a rotation.
(e) The ielentity operator is a rotation.
(f) The composite of two reflections is a reflection.
(g) Any orthogonal operator is a composite of rotations.
(h) For any orthogonal operator T, if det(T) = — 1, then T is a reflec
tion,
(i) Reflections always have eigenvalues,
(j) Rotations always have eigenvalues.
2. Prove that rotations, reflections, anel composites of rotations and re%-
flections are orthogonal operators.

•^•-T*
i.
Sec. 6.11 The Geometry of Orthogonal Operators
3. Let
479
A =
( I
2
v/3
\/3
2
1
"2 I
and B —
0 -1
(a) Prove that VA is a reflection.
(b) Find the axis in R2 about which L^ reflects, that is, the subspace
of R2 on which L^ acts as the identity.
(c) Prove that \-AB and \-BA are rotations.
4. For any real number </>, let
A =
cos*
U
sm<
sin*
— cos*
(a) Prove that LA is a reflection.
(b) Find the axis in R2 about which L^ reflects.
5. For any real number <p, define T COS'
(a) Prove that any rotation on R2 is of the form T^, for some </>.
(b) Prove that T^T^p = T^+^) for any (p,ip G R.
(c) Deduce that any two rotations on R2 commute.
6. Prove that the composite of any two rotations on R3 is a rotation on
R3.
7. Given real numbers (p and ip, define matrices
10 0
A = | 0 cos <p — sin <
, 0 sin <b cos
'cos ip — sin ip 0^
and B = | sint/' cos ip 0
0 0 1,
(a) Prove that LA and LB are rotations.
(b) Prove that LAB is a rotation.
(c) Find the axis of rotation for LAB-
8. Prove Theorem 6.45 using the hints preceding the statement of the
theorem.
9. Prove that no orthogonal operator can be both a rotation and a reflec
tion.

480 Chap. 6 Inner Product Spaces
10. Prove that if V is a two- or three-dimensional real inner product space',
then the cemiposite e>f two reflcctiems on V is a rotation of V.
11. Give an example of an orthogonal operator that is neither a rcfiection
nor a rotation.
12. Lt;t V be a finite-dimensional real inner product spae-e. Define T: V —* V
by T(x) = — x. Prove that T is a product e>f rotatiems if and only if
dim(V) is even.
13. Complete the proof e>f the lemma to Theorem 6.46 by shewing that
W = 0~i1(Z) satisfies the required conelitieais.
14. Let T be an orthogonal [unitary] operator on a finite-dimensional real
[complex] inner product space V. If W is a T-invariant subspace erf V,
prove the following rosults.
(a) Tw is an orthogonal [unitary] operator on W.
(b) WL is a T-invariant subspace of V. Hint: Use the fact that. Tw
is one-to-one and onto te) concluele that, for any y G W. T*(y) =
T '(,(/) G W.
(c) Tw is an orthogonal [unitary] operator on W.
15. Let T be1 a linear e)perate>r on a finite-eliniensional vector space V, where
V is a diroct sum of T-invariant subspae-e;s, say, V = W] ©W20- • -cOW/,..
Prove that det(T) = elet(TW|) • ele>t(Tw,,) ele>t(TwJ.
16. Complete the proof of the1 corollary to The'orcm 6.47.
17. Let T be' a linear e)perate>r on an n-dimensional real inner product space
V. Suppe)se that T is not the identity. Prove the following results.
(a) If a, is odd, then T can be e'xpressexl as the1 e-emiposite of at most
one reflection anel at most (n — 1) rotations.
(b) If /? is even, then T can be1 e'xpresse>el as the' colllpe)site, of at most
|n rotations e>r as the ce)iupe)sitc of one inflection and at most
^(n — 2) rotations.
18. Let V be a real inner product spae*? e)f elimension 2. For any x, y G V
sue:h that x / y and ||x|| = \y\ = 1, show that there exists a unique
rotation T em V such that T(x) — y.
INDEX OF DEFINITIONS FOR CHAPTER 6
Adjoint of a linear operator 358
Adjoint of a matrix 331
Axis of rotation 473
Bilinear form 422
Complex inner product space 332
Condition number 469

Chap. 6 Index of Definitions 481
Congruent matrices 426
Conjugate transpose (adjoint) of a
matrix 331
Critical point 439
Diagonalizable bilinear form 428
Fourier coefficients of a vector rela
tive to an orthonormal set 348
Frobenius inner product 332
Gram-Schmidt orthogonalization
process 344
Hessian matrix 440
Index of a bilinear form 444
Index of a matrix 445
Inner product 329
Inner product space 332
Invariants of a bilinear form 444
Invariants of a matrix 445
Least squares line 361
Legendre polynomials 346
Local extremum 439
Local maximum 439
Local minimum 439
Lorentz transformation 454
Matrix representation of a bilinear
form 424
Minimal solution of a system of equa
tions 364
Norm of a matrix 467
Norm of a vector 333
Normal matrix 370
Normal operator 370
Normalizing a vector 335
Orthogonal complement of a subset
of an inner product space 349
Orthogonally equivalent,
matrices 384
Orthogonal matrix 382
Orthogonal operator 379
Orthogonal projection 398
Orthogonal projection on a subspace
351
Orthogonal subset of an inner prod
uct space 335
Orthogonal vectors 335
Orthonormal basis 341
Orthonormal se;t. 335
Penrose conditions 421
Permanent, of a 2 x 2 matrix 448
Polar decomposition of a matrix
412
Pseudoinverse of a linear transforma
tion 413
Pseudoinverse of a matrix 414
Quadratic form 433
Rank of a bilinear form 443
Rayleigh quotient 467
Real inner product space 332
Reflection 473
Resolution of the identity operator
induced by a linear transformation
402
Rigid motion 385
Rotation 472
Self-adjoint matrix 373
Self-adjoint operator 373
Signature of a form 444
Signature of a matrix 445
Singular value decomposition of a
matrix 410
Singular value of a linear transforma
tion 407
Singular value of a matrix 410
Space-time coordinates 453
Spectral decomposition of a linear
operator 402
Spectrum of a linear operator 402
Standard inner product 330
Symmetric bilinear form 428
Translation 386
Trigonometric polynomial 399
Unitarily equivalent matrices 384
Unitary matrix 382
Unitary operator 379
Unit vector 335

7
Canonical Forms
7.1
7.2
7.3
7.4*
The Jordan Canonical Form I
The Jordan Canonical Form II
The Minimal Polynomial
The Rational Canonical Form
./Is we learned in Chapter 5, the advantage of a diagonalizable linear oper
ator lies in the simplicity of its elescription. Such an operator has a diagonal
matrix representation, or, equivalently, there is an ordered basis for the un
derlying vector space consisting of cigenvee;tors of the operator. However, not
every linear operator is diagonalizable, even if its characteristic polynomial
splits. Example 3 of Section 5.2 describes such an operator.
It is the purpose of this chapter to consider alternative matrix repre
sentations for nondiagonalizable operators. These representatiems are called
canonical forms. There are different kinds of canonical forms, and their ad
vantages and disadvantages depend on how they are applied. The choice of a
canonical form is determineel by the appropriate choice of an ordered basis.
Naturally, the canonical forms of a linear ejperator are not diagonal matrices
if the linear operator is not diagonalizable.
In this chapter, we treat two common canonical forms. The first of these,
the Jordan canonical form, requires that the characteristic polynomial of
the operator splits. This form is always available if the underlying field is
algebraically closed, that is, if every polynomial with coefficients from the field
splits. For example, the field of complex numbers is algebraically closed by
the fundamental theorem of algebra (see Appendix D). The first two sections
deal with this form. The rational canonical form, treated in Section 7.4, does
not require such a factorization.
7.1 THE JORDAN CANONICAL FORM I
Let T be a linear operator on a finite-dimensional vector space V, and suppose
that the characteristic polynomial of T splits. Recall from Section 5.2 that
the eliagonalizability of T de;pends on whether the union of ordered bases
for the distinct eigenspaces of T is an ordered basis for V. So a lack of
diagonalizability means that at least one eigenspace of T is too "small."
482

Sec. 7.1 The Jordan Canonical Form I 483
In this section, we extend the definition of eigenspace to generalized
eigenspace. From these subspaces, we select ordered bases whose union is
an ordered basis (3 for V such that
m* =
/Ai O
O A2
{O O
0
O
Ak)
where each O is a zero matrix, and each Ai is a square matrix of the form
(A) or
A
0
0
{0
1
A
0
0
0 •
1 •
0 •
0 •
• 0
• 0
• A
• 0
°>
0
1
V
for some eigenvalue A of T. Such a matrix Ai is called a Jordan block
corresponding to A, and the matrix [T]^ is called a Jordan canonical form
of T. We also say that the ordered basis j3 is a Jordan canonical basis
for T. Observe that each Jordan block Ai is "almost" a diagonal matrix—in
fact, [T]^ is a diagonal matrix if and only if each Ai is of the form (A).
Example 1
Suppose that T is a linear operator on C8, and (3 = {v\,v2,... ,v&} is an
ordered basis for C8 such that
J = [T]/3 =
( 2 1 0
0 2 1
0 0 2
0 0 0
0 0 0
0 0 0
0
0
0
2
0
0
0 0 0 0
^0000
0 0 0 0
0 0 0 0
0 0 0 0
0 0
3 1
0 3
0 0
0 0
0 0
0 0
0 0
0 1
0 0/
is a Jordan canonical form of T. Notice that the characteristic polynomial
of T is det(J — ti) — (t — 2)4(t — 3)2r.2, and hence the multiplicity of each
eigenvalue is the number of times that the eigenvalue appears on the diagonal
of J. Also observe that v\,V4,v^, and vj are the only vectors in (3 that are
eigenvectors of T. These are the vectors corresponding to the columns of J
with no 1 above the diagonal entry. •

484 Chap. 7 Canonical Forms
J' =
In Sections 7.1 and 7.2, we prove that every linear operator whose charac
teristic polynomial splits has a Jordan canonical form that is unique up to the
order of the Jordan blocks. Nevertheless, it is not the case that the Jordan
canonical form is completely determined by the characteristic polynomial of
the operator. For example, let T' be the linear operator on C8 such that
[T']/? = J', where (3 is the ordered basis in Example 1 and
/2 0 0 0 0 0 0 0
02000000
00200000
00020000
00003000
00000300
00000000
\0 0 0 0 0 0 0 0/
Then the characteristic polynomial of T' is also (t — 2)4(t — 3)2t2. But the
operator T' has the Jordan canonical form J', which is different from J, the
Jordan canonical form of the linear operator T of Example 1.
Consider again the matrix J and the ordered basis (3 of Example 1. Notice
that T(v2) = vi+2v2 and therefore, (T—2)(v2) = v\. Similarly, (T—2)(v$) =
v2. Since v\ and V4 are eigenvectors of T corresponding to A = 2, it follows
that (T - 2)3(vi) = 0 for i = 1,2,3, and 4. Similarly (T - 3I)2(^) = 0 for
i = 5,6, and (T - 0)2(vi) = 0 for i = 7,8.
Because of the structure of each Jordan block in a Jordan canonical form,
we can generalize these observations: If v lies in a Jordan canonical basis for
a linear operator T and is associated with a Jordan block with diagonal entry
A, then (T — )p(v) — 0 for sufficiently large p. Eigenvectors satisfy this
condition for p = 1.
Definition. Let T be a linear operator on a vector space V, and let A be
a scalar. A nonzero vector x in V is called a generalized eigenvector of T
corresponding to A if (T — Al)p(x) = 0 for some positive integer p.
Notice that if x is a generalized eigenvector of T corresponding to A, and p
is the smallest positive integer for which (T —Al)p(x) = 0, then (T —Al)p_1(x)
is an eigenvector of T corresponding to A. Therefore A is an eigenvalue of T.
In the context of Example 1, each vector in (3 is a generalized eigenvector
of T. In fact, vi,v2, V3 and V4 correspond to the scalar 2, i>5 and VQ correspond
to the scalar 3, and V7 and vs correspond to the scalar 0.
Just as eigenvectors lie in eigenspaces, generalized eigenvectors lie in "gen
eralized eigenspaces."
Definition. Let T be a linear operator on a vector space V, and let A be
an eigenvalue of T. The generalized eigenspace of T corresponding to

Sec. 7.1 The Jordan Canonical Form I 485
A, denoted K\, is the subset ofV defined by
K\ = {x G V: (T — Al)p(x) = 0 for some positive integer p}.
Note that K\ consists of the zero vector and all generalized eigenvectors
corresponding to A.
Recall that a subspace W of V is T-invariant for a linear operator T if
T(W) C W. In the development that follows, we assume the results of Exer
cises 3 and 4 of Section 5.4. In particular, for any polynomial g(x), if W is
T-invariant, then it is also ^(T-invariant. Furthermore, the range of a linear
operator T is T-invariant.
Theorem 7.1. Let T be a linear operator on a vector space V, and let A
be an eigenvalue of T. Then
(a) K\ is a T-invariant subspace ofV containing E\ (the eigenspace ofT
corresponding to X).
(b) For any scalar p ^ A, the restriction ofT — p\ to KA is one-to-one.
Proof, (a) Clearly, 0 £ K\. Suppose that x and y are in KA- Then there
exist positive integers p and q such that
Therefore
(T-AI)p(x) = (T-AI)4(y) = 0.
(T - X)P+fl(x + y) = (T- Al)p+«(x) + (T - M)p+q(y)
= (T-AI)«(0)-r-(T-AI)p(0)
= 0,
and hence x + y G KA- The proof that KA is closed under scalar multiplication
is straightforward.
To show that K\ is T-invariant, consider any x G K\. Choose a positive
integer p such that (T — Al)p(x) = 0. Then
(T - AI)pT(x) = T(T - Al)p(x) = T(0) = 0.
Therefore T(x) G K>.
Finally, it is a simple observation that EA is contained in KA-
(b) Let x G KA and (T — p)(x) = 0. By way of contradiction, suppose
that x 7^ 0. Let p be the smallest integer for which (T — Al)p(x) = 0, and let
y = (T-)P-l(x). Then
(T-AI)(y) = (T-AI)p(x) = 0,
and hence y G EA- Furthermore,
(T - p)(y) = (T - p)(T - Wf-^x) = (T - AI)p-x(T - p)(x) = 0,
so that j/6 E,,. But EA n EM = {0}, and thus y = 0, contrary to the
hypothesis. So x = 0, and the restriction of T — p\ to KA is one-to-one. 1

486 Chap. 7 Canonical Forms
Theorem 7.2. Let T be a linear operator on a finite-dimensional vector
space V such that the characteristic polynomial of T splits. Suppose that A
is an eigenvalue of T with multiplicity m. Then
(a) dim(KA) < m.
(b) KA = N((T-AI)m).
Proof, (a) Let W = KA, and let h(t) be the characteristic polynomial of Tw
By Theorem 5.21 (p. 314), h(t) divides the characteristic polynomial of T, and
by Theorem 7.1(b), A is the only eigenvalue of Tw- Hence h(t) = (—l)d(t—X)d,
where d = dim(W), and d < m.
(b) Clearly N((T - Al)m) C KA. Now let W and h(t) be as in (a). Then
/i(Tw) is identically zero by the Cayley-Hamilton theorem (p. 317); therefore
(T - Al)d(x) = 0 for all x G W. Since d < m, we have KA C N((T - Al)m). |
v Theorem 7.3. Let T be a linear operator on a finite-dimensional vec
tor~space V such that the characteristic polynomial of T splits, and let
Ai, A2, • • •, Afe be the distinct eigenvalues of T. Then, for every x G V, there
exist vectors Vi G K\if 1 < i < k, such that
X — Vi + V2 H hUfe.
Proof. The proof is by mathematical induction on the number k of dis
tinct eigenvalues of T. First suppose that k = 1, and let m be the multiplic
ity of Ai. Then (Ai — t)m is the characteristic polynomial of T, and hence
(Ail - T)m = To by the Cayley-Hamilton theorem (p. 317). Thus V = KAl,
and the result follows.
Now suppose that for some integer k > 1, the result is established when
ever T has fewer than k distinct eigenvalues, and suppose that T has k distinct
eigenvalues. Let m be the multiplicity of Afc, and let f(t) be the characteristic
polynomial of T. Then f(t) = (t — Xk)mg(t) for some polynomial g(t) not
divisible by (t - Xk). Let W = R((T - Afcl)m). Clearly W is T-invariant.
Observe that (T — Afcl)w maps KA{ onto itself for i < k. For suppose that
i < k. Since (T — Afcl)m maps K\t into itself and Afc 7^ Ai, the restriction
of T — Afc I to K\i is one-to-one (by Theorem 7.1(b)) and hence is onto. One
consequence of this is that for i < k, K\{ is contained in W, and hence Xi is
an eigenvalue of Tw with corresponding generalized eigenspace K\i.
Next, observe that Afc is not an eigenvalue of Tw- For suppose that T(v) =
Afct? for some v G W. Then v = (T — Afcl)m(y) for some y G V, and it follows
that
0 = (T-Afcl)(t;) = (T-Afcir+1(2/)-
Therefore y G KAfc. So by Theorem 7.2, v = (T - Xk)m(y) = 0.
Since every eigenvalue of Tw is an eigenvalue of T, the distinct eigenvalues
of Tw are Ai, A2,..., Afc_i.

Sec. 7.1 The Jordan Canonical Form I 487
Now let x G V. Then (T - Afcl)m(x) G W. Since Tw has the k - 1
distinct eigenvalues Ai, A2, •.., Afc_i, the induction hypothesis applies. The
corresponding generalized eigenspace of T^ for each Xi is K\., and hence
there are vectors Wi G K\^ 1 < i < k — 1, such that
(T - Afcl)m(x) = W! + w2 + • • • + wk-i.
Since (T — Afcl)m maps KA; onto itself for i < k, there exist vectors Vi G KA;
such that (T - Xk)m(vi) = Wi for i < k. Thus
(T - Afcl)m(x) = (T - Xk)m(Vl) + (T - Afcl)m(t*) + • • • + (T - Xk)m(vk-i),
and it follows that x - (v\ + u2 H + vk-i) G KAfc. Therefore there exists a
vector vk G KAfc such that
x = vi+u2H +vk. |
The next result extends Theorem 5.9(b) (p. 268) to all linear operators
whose characteristic polynomials split. In this case, the eigenspaces are re
placed by generalized eigenspaces.
Theorem 7.4. Let T be a linear operator on a finite-dimensional vec
tor space V such that the characteristic polynomial of T splits, and let
Ai, A2,..., Afc be the distinct eigenvalues of T with corresponding multiplici
ties mi,m2,...,mk. For 1 < i < k, let Pi be an ordered basis for K\{. Then
the following statements are true.
(a) Anft=0 fori^j.
(b) (3 = 0i U (32 U • • • U 0k is an ordered basis for V.
(c) dim(KAj = mi for all i.
Proof, (a) Suppose that x G /% fl f3j C KA4 D KAJ5 where i ^ j. By
Theorem 7.1(b), T — A^l is one-to-one on KA^ , and therefore (T — AJ)p(x) ^ 0
for any positive integer p. But this contradicts the fact that x G K\t, and the
result follows.
(b) Let x G V. By Theorem 7.3, for 1 < i < k, there exist vectors Vi G KA4
such that x = v\ + v2 + • • • -f- vk. Since each Vi is a linear combination of
the vectors of pi, it follows that x is a linear combination of the vectors of (3.
Therefore (3 spans V. Let q be the number of vectors in (3. Then dim(V) < q.
For each i, let di = dim(KAj). Then, by Theorem 7.2(a),
q = >J di < Tj m-i = dim(V).
i=i i=i
Hence q = dim(V). Consequently (3 is a basis for V by Corollary 2 to the
replacement theorem (p. 47).

488 Chap. 7 Canonical Forms
(c) Using the notation and result of (b), we se%e> that /Jdt = >.'mi- But
i=l i
di < m-i by Theorem 7.2(a). and the'reforc d, — m, for all i. I
Corollary. Let T be a linear operator on a finite-dimensional vector space
V such that the characteristic polynomial of T splits. Then T is diagonalizable
if and only if EA = KA for every eigenvalue X ofT.
Proof. Combining Theeuvms 7.1 and 5.0(a) (p. 268). we se>e> that T is
diagonalizable' if anel only if (HIII(EA) = dim(KA) for each eigenvalue A of T.
But EA G KA, anel he'iure; the>se; subspnce>s have1 the same dimension if and only
if they are equal.
We now foe-us our attention on the' problem of select ing suitable bases for
the generalized eigenspaces of a linear operates se> that we- may use' The'o-
rem 7.4 to obtain a Jordan canonical basis for the operator. For this purpose;,
we consieler again the; basis ft of Example 1. We; have seen that the first four
vectors of (3 lie' in the generalize'el eigenspace K2. Observe that the vee-te)rs in
0 that eletcrinine the first Jorelan block of ./ are of the; form
{'•i.'"2.":i} = {(T-2l)2(e3).(T-2l)(r0.r:{}.
Furthermore, e)bse>rve that (T- 2l)^(/».{) = 0. The relation be;twe;e;n these vcc-
te>rs is the key to finding Jordan canonie-al base's. This leads to the fe)lle)wing
ele'finitions.
Definitions. Let T be a linear operator on a vector space V. and let x
be a generalized eigenvector of T corresponding to the eigenvalue X. Suppose
that p is the smallest positive integer for which (T - Al)p(x) = 0. Then the
ordered set
{(T - AI)""1 (x). (T - Al)"-2(x),..., (T - Al)(x),x}
is called a cycle of generalized eigenvectors of T corresponding to X.
The vectors (T — Al)p~'(x) and x are called the initial vector and the end
vector of the cycle, respectively. We say that the length of the cycle is p.
Notice' that the initial vector of a cycle of generalized eigenvecte>rs of a
line;ar operator T is the only eigenvector erf T in the eyerie1. Alse) observe that
if.r is an eigenvector of T e-orrosponding to the eigenvalue A, then the set {x}
is a cycle of generalized eigenvectors of T corresponding to A of length 1.
In Example 1, the subsets fti — {v\,v2,v;\}, 02 = {"4}^ 03 = {»?>•<va}-
and 04 — {VJ,VQ} are the cycle's of generalized eigenvectors of T that occur
in 0. Notie-e that ft is a disjoint union of these cycles. Furthermore, setting
W.j = span(/?i) for 1 < i, < 4, we see that fti is a. basis for W; anel [Twjft is
the ith Jordan block of the' Jorelan canonical form of T. This is precisely the
condition that is re'ejuirenl for a Jordan canonical basis.

Sec. 7.1 The Jordan Canonical Form I 489
Theorem 7.5. Let T be a linear operator on a finite-dimensional vector
space V whose characteristic polynomial splits, and suppose that ft is a basis
for V such that ft is a disjoint union of cycles of generalized eigenvectors of
T. Then the following statements are true.
(a) For each cycle 7 of generalized eigenvectors contained in ft, W = span(7)
is T-invariant, and [Tw]7 is a Jordan block.
(b) 0 is a Jordan canonical basis for V.
Proof, (a) Suppose that 7 corresponds to A, 7 has length p, and x is the
end vector of 7. Then 7 = {v\, w2,..., vp}, where
Vi = (T — Al)p-*(x) for i 1,
(T-AI)K) = (T-AI)^i-1>(x) = ^_i.
Therefore T maps W into itself, and, by the preceding equations, we sec that
[Twh is a Jorelan block.
For (b), simply repeat the arguments of (a) for each cycle in ft in order to
obtain [T]^. We leave the details as an exercise;. II
In view of this result, we must show that, under appropriate conditions,
there exist bases that are disjoint unions of cycles of generalized eigenvectors.
Since the characteristic polynomial of a Jordan canonical form splits, this is
a neec;ssary condition. We will soon s<;e that it is also sufficient. The next
result mov<;s us toward the; desireel existence theorem.
Theorem 7.6. Let T be a linear operator on a vector space V, and let
X be an eigenvalue ofT. Suppose that 71,72, • • • ,lq are cycles of generalized
eigenvectors of T corresponding to X such that the initial vectors of the "fi 's
are distinct and form a linearly independent set. Then the 7* 's are disjoint,
and their union 7 = M 7, is linearly independent.
i-l
Proof. Exercise 5 shows that the 7$'s are disjoint.
The proof that 7 is line;arly independent is by mathematical induct ion on
the number of vectors in 7. If this number is less than 2, then the result is
clear. So assume that, for some integer n > 1, the result is valid whenever 7
has fewer than n vectors, and suppose that 7 has exactly n vectors. Let W
be the subspace of V generatcxl by 7. Clearly W is (T — Al)-invariant, and
dim(W) < n. Let U denote the; restriction of T - AI to W.

490 Chap. 7 Canonical Forms
For each i, let 7^ denote the cycle obtained from 7$ by deleting the end
vector. Note that if 7* has length one, then 7^ = 0. In the case that 7^ ^ 0,
each vector of 7^ is the image under U of a vector in 7;, and conversely, every
nonzero image under U of a vector of 7$ is contained in 7^. Let 7' = IIT*-
i
Then by the last statement, 7' generates R(U). Furthermore, 7' consists of
n — q vectors, and the initial vectors of the 7^'s are also initial vectors of
the 7i's. Thus we may apply the induction hypothesis to conclude that 7' is
linearly independent. Therefore 7' is a basis for R(U). Hence dim(R(U)) =
n — q. Since the q initial vectors of the 7,;'s form a linearly independent set
and lie in N(U), we have dim(N(U)) > q. From these inequalities and the
dimension theorem, we obtain
n > dim(W)
= dim(R(U)) -
>(n-q) + q
— n.
dim(N(U))
We conclude that dim(W) = n. Since 7 generates W and consists of n vectors,
it must be a basis for W. Hence 7 is linearly independent. I
Corollary. Every cycle of generalized eigenvectors of a linear operator is
linearly independent.
Theorem 7.7. Let T be a linear operator on a finite-dimensional vector
space V, and let X be an eigenvalue of T. Then K\ has an ordered basis con
sisting of a union of disjoint cycles of generalized eigenvectors corresponding
to X.
Proof. The proof is by mathematical induction on n = dim(KA). The
result is clear for n = 1. So suppose that for some integer n > 1 the result is
valid whenever dim(KA) < n, and assume that dim(KA) = n. Let U denote the
restriction of T — AI to KA- Then R(U) is a subspace of KA of lesser dimension,
and R(U) is the space of generalized eigenvectors corresponding to A for the
restriction of T to R(U). Therefore, by the induction hypothesis, there exist
disjoint cycles 71,72,.. •, 7g of generalized eigenvectors of this restriction, and
1
hence of T itself, corresponding to A for which 7 = M 7^ is a basis for R(U).
For 1 < i < q, the end vector of 7» is the image under U of a vector Vi G KA,
and so we can extend each 7* to a larger cycle ji = 7* U {vi} of generalized
eigenvectors of T corresponding to A. For 1 < i < q, let Wi be the initial vector
of 7J (and hence of 7*). Since {wi,W2, • • •, wq} is a linearly independent sub
set of EA, this set can be extended to a basis {wi, w2,... ,wQ,U\,U2,... ,us}

Sec. 7.1 The Jordan Canonical Form I 491
for EA- Then 71,72, • • •,%, {«i}, {^2}, • • •, {'"«} are disjoint cycles of gener
alized eigenvectors of T corresponding to A such that the initial vectors of
these cycles are linearly independent. Therefore their union 7 is a linearly
independent subset of KA by Theorem 7.6.
We show that 7 is a basis for KA- Suppose that 7 consists of r =
rank(U) vectors. Then 7 consists of r + q + s vectors. Furthermore, since
{wi,W2, • • • ,wq, U\, w2,..., us] is a basis for EA = N(U), it follows that
nullity(U) = q + s. Therefore
dim(KA) = rank(U) + nullity(U) = r + q + s.
So 7 is a linearly independent subset of KA containing dim(KA) vectors. It
follows that 7 is a basis for KA- I
The following corollary is immediate.
Corollary 1. Let T be a linear operator on a finite-dimensional vec
tor space V whose characteristic polynomial splits. Then T has a Jordan
canonical form.
Proof. Let Ai, A2,.... Afc be the distinct eigenvalues of T. By Theorem 7.7,
for each i there is an ordered basis fti consisting of a disjoint union of cycles
of generalized eigenvectors corresponding to A,. Let ft = ft\ U 02 U • • • U 0k-
Then, by Theorem 7.4(b), ft is an ordered basis for V. 1
The Jordan canonical form also can be studied from the viewpoint of
matrices.
Definition. Let A G MnXn(F) be such that the characteristic polynomial
of A (and hence of LA) splits. Then the Jordan canonical form of A is
defined to be the Jordan canonical form of the linear operator LA on Fn.
The next result is an immediate consequence of this definition and Corol
lary 1.
Corollary 2. Let A be an n x n matrix whose characteristic polynomial
splits. Then A has a Jordan canonical form J, and A is similar to J.
Proof. Exercise. 1
We can now compute the Jordan canonical forms of matrices and linear
operators in some simple cases, as is illustrated in the next two examples.
The tools necessary for computing the Jordan canonical forms in general are
developed in the next section.

492
Example 2
Let
Chap. 7 Canonical Forms
A= -
3
1
1
1
0
-1
-2
5
4
G M3x3(rt).
To find the Jorelan canonical form for A, we; need to find a Jordan canonical
basis for T = L.,\.
The characteristic polynomial of A is
f(t)=det(A-tI) = -(t-Z)(t-2)2.
Hence Ai = 3 and A2 = 2 arc the eigenvalues of A with multiplicities 1
and 2, respectively. By Theorem 7.4, dim(KA,) = 1, and dhn(KA2) = 2. By
Theorem 7.2. KA, = N(T-3I), and KA, = N((T-2I)2). Since EAl = N(T-3I),
we have that EA, = KA, . Observe that (—1,2,1) is an eigenvector of T
corresponding to A) = 3; therefore
ft =
is a basis for KA, .
Since dim(KA_,) = 2 and a generalized eigenspace has a basis consisting of
a union of cycles, this basis is either a union of two cycles of length 1 or a
single cycle of length 2. The former case is impossible because the vectors in
the basis would be eigenvectors -contradicting the fact that dim(EA2) = 1.
Therefore the desired basis is a single cycle of length 2. A vector v is the end
vector of such a cycle if and only if (A - 2I)v f 0, but (A - 2I)2v = 0. It
can easily be shown that
is a basis for the solution space of the homogeneous system (A — 2I)2.v = 0.
Now choose a vector v in this set so that (A — 2I)v ^ 0. The vector v =
(-1,2,0) is an ae-c;e;ptable candidate for v. Since (A — 2I)v = (I, —3, —1), we
obtain the cycle of generalizes! eigenvectors
02 = {(A-2I)v,v} =
m

Sec. 7.1 The Jordan Canonical Form I 493
as a basis for KA2 • Finally, we take the union of these two bases to obtain
ft = ftiUft2 =
which is a Jordan canonical basis for A. Therefore,
J = m* =
is a Jordan canonical form for A. Notice that A is similar to J. In fact,
J = Q~lAQ, where Q is the matrix whose columns are the vectors in 0.
3
~o1
0
0
2
0
0
1
2
Example 3
Let T be the linear operator on P2(R) defined by T(g(x)) = —g(x) — g'(x).
We find a Jordan canonical form of T and a Jordan canonical basis for T.
Let ft be the standard ordered basis for P2(R)- Then
m* =
which has the characteristic polynomial f(t) — —(t + l)3. Thus A = — 1 is
the only eigenvalue of T, and hence KA = ?2(R) by Theorem 7.4. So ft is a
basis for KA- NOW
dim(EA) = 3 - rank(j4 + 7) = 3 - rank
'0 -1 0
0 0 -2=3-2 = 1.
.0 0 0/
Therefore a basis for KA cannot be a union of two or three cycles because
the initial vector of each cycle is an eigenvector, and there do not exist two
or more linearly independent eigenvectors. So the desired basis must consist
of a single cycle of length 3. If 7 is such a cycle, then 7 determines a single
Jordan block
/-I 1 0"
[T], = 0 -1 1
\ 0 0-1
which is a Jordan canonical form of T.
The end vector h(x) of such a. cycle must satisfy (T 4- )2(h(x)) / 0. In
any basis for KA, there must be a vector that satisfies this condition, or else

494 Chap. 7 Canonical Forms
no vector in KA satisfies this condition, contrary to our reasoning. Testing
the vectors in 0, we see that h(x) = x is acceptable. Therefore
7 = {(T + I)V),(T + I)(x2),x2} = {2,~2x,,;2}
is a Jordan canonical basis for T. •
In the next section, we develop a computational approach for finding a
Jordan canonical form and a Jordan canonical basis. In the process, we; prove
that Jorelan canonical forms are unique up to the order of the Jordan blocks.
Let T be a linear operator on a finite-dimensional vector space V, and sup
pose that the characteristic polynomial of T splits. By Theorem 5.11 (p. 278),
T is diagonalizable if and only if V is the direct sum of the eigenspaces of T.
If T is diagonalizable. then the eigenspaces and the generalized eigenspaces
coincide. The next result, which is optional, extends Theorem 5.11 to the
nondiagonalizable case.
Theorem 7.8. Let T be a linear operator on a finite-dimensional vector
space V whose characteristic polynomial sj>lits. Then V is the direct sum of
the generalized eigenspaces of T.
Proof. Exercise. (
EXERCISES
1. Label the following statements as true or false.
(a) Eigenvectors of a linear operator T are also generalized eigenvec
tors of T.
(b) It is possible; for a generalized eigenvector of a linear operator T
to correspond to a scalar that is not an eigenvalue of T.
(c) Any linear operator on a finite-dimensional ve;e:tor space; has a Jor
elan canonical form.
(d) A cycle of generalized eigenvectors is linearly independent.
(e) There is exactly one cycle of generalized eigenvectors correspond
ing to each eigenvalue of a linear operator on a finite-dimensional
ve;ctor space;.
(f) Let T be a linear operator on a finite-dimensional vector space
whose characteristic polynomial splits, and let Ai,A2 Afc be
the; distinct eigenvalues of T. If, for each i, fti is a basis for K,\,,
then 0i U 02 U • • • U ftk is a Jordan canonical basis for T.
(g) For any Jorelan block ./, the operator L./ has Jordan canonical
form J.
(h) Let T be a linear operator on an n-dimensional vector spae;e; whose;
characteristic polynomial splits. Then, for any eigenvalue A of T,
KA = N((T-AI)").

Sec. 7.1 The Jordan Canonical Form I 495
2. For each matrix A, find a basis for each generalized eigenspace of L^
consisting of a union of disjoint cycles of generalized eigenvectors. Then
find a Jordan canonical form J of A.
(a) A =
(c) A =
1
-1
11
21
3
l
*)
-4
-8
-1
-5
-11
0
(b) A =
(d) A =
1 2
3 2
(2 1
0 2
0 0
0 o
1 0
3 0
^0 1 -1 3/
3. For each linear operator T, find a basis for each generalized eigenspace
of T consisting of a union of disjoint cycles of generalized eigenvectors.
Then find a Jordan canonical form J of T.
(a) T is the linear operator on P2(R) defined by T(/(x)) = 2/(x) —
(b) V is the real vector space of functions spanned by the set of real
valued functions {l,t, t2,e1, te1}, and T is the linear operator on V
defined by T(/) = /'.
(c) T is the linear operator on M2x2(#) defined by T(A) = I ) -A
for all A G M2X2(#).
(d) T(A) = 2A + At for all A G M2x2(i2).
4 J Let T be a linear operator on a vector space V, and let 7 be a cycle
of generalized eigenvectors that corresponds to the eigenvalue A. Prove
that span(7) is a T-invariant subspace of V.
5. Let 71,72, • • • ,7P be cycles of generalized eigenvectors of a linear op
erator T corresponding to an eigenvalue A. Prove that if the initial
eigenvectors are distinct, then the cycles are disjoint.
6. Let T: V-» Wbea linear transformation. Prove the following results.
(a) N(T) = N(-T).
(b) N(Tfc) = N((-T)fc).
(c) If V = W (so that T is a linear operator on V) and A is an eigen
value of T, then for any positive integer k
N((T-Alv)fc) = N((Alv-T)fc).
7. Let U be a linear operator on a finite-dimensional vector space V. Prove
the following results.
(a) N(U) C N(U2) C • • • C N(Ufc) C N(Ufc+1) C • • •.

496 Chap. 7 Canonical Forms
8.
9.
10.
11.
(b)
(c)
(d)
(e)
(f)
If rank(Um) = rank(Um+1) for some positive integer rn, then
rank(Um) = rank(Ufc) for any positive integer k > m.
If rank(UTO) = rank(Uw+1) for some positive integer rn, then
N(U'") = N(Ufc) for any positive integer k > rn.
Let T be a linear operator on V. and let A be an eigenvalue of T.
Prove that if rank((T-Al)'n) = rank((T-Al)m+1) for some integer
rn, then KA = N((T - AI)'")-
Second Test for Diagonalizability. Let T be a linear operator on
V whose characteristic polynomial splits, and let Ai, A2,... , Afc be
the distinct eigenvalues of T. Then T is diagonalizable if and only
if rank(T - A;I) = rank((T - A,I)2) for 1 < i < k.
Use (e) to obtain a simpler proof of Exercise 24 of Section 5.4: If
T is a diagonalizable linear operator on a finite-dimensional vec
tor space V and W is a T-invariant subspace of V. then Tw is
diagonalizable.
Use Theorem 7.4 to prove that the vectors v\, v2,..., vk in the statement
of Theorem 7.3 are unique.
Let T be a linear operator on a finite-dimensional vector space V whose
characteristic polynomial splits.
(a) Prove Theorem 7.5(b).
(b) Suppose; that ft is a Jordan canonical basis for T. and let A be an
eigenvalue of T. Let ft' = ft n KA- Prove that ft' is a basis for KA-
Let T be a linear operator on a finite-dimensional vector space whose
characteristic polynomial splits, and let A be an eigenvalue of T.
(a) Suppose that 7 is a basis for KA consist ing of the union of q disjoint
cycles of generalized eigenvectors. Prove that q < dim(EA).
(b) Let ft be a Jordan canonical basis for T, and suppose that J = [T]^
has q Jordan blocks with A in the diagonal positions. Prove that
q < dim(EA).
Prove Corollary 2 to Theorem 7.7.
Exercises 12 and 13 are concerned with direct sums of matrices, defined in
Section 5.4 on page 320.
12. Prove Theorem 7.8.
13. Let T be a linear operator on a finite-dimensional vector space V such
that the characteristic polynomial of T splits, and let Ai, A2,..., Afc be
the distinct eigenvalues of T. For each i, let Jt be the Jordan canonical
form of the restriction of T to KA^ . Prove that
J - J\ © J2 © • • • © Jfc
is the Jordan canonical form of J.

Sec. 7.2 The Jordan Canonical Form II
7.2 THE JORDAN CANONICAL FORM II
497
For the purposes of this section, we fix a linear operator T on an n-dimensional
vector space V such that the characteristic polynomial of T splits. Let
Ai. A2 Afc be the distinct eigenvalues of T.
By Theorem 7.7 (p. 490), each generalized eigenspace KA, contains an
ordered basis fti consisting of a union of disjoint cycles of generalized eigen
vectors corresponding to A,. So by Theorems 7.4(b) (p. 487) and 7.5 (p. 489).
fc
the union ft — M fti is a Jordan canonical basis for T. For each i, let T,
be the restriction of T to KA,, anel let Ai = [T,],^. Then Ai is the Jordan
canonical form of T,. and
J = m* =
/Ai O
0 A2
() O
0
0
Ak)
is the Jordan canonical form of T. In this matrix, each O is a zero matrix of
appropriate size.
In this section, wc compute the matrices A, and the bases fti, thereby
computing ./ anel ft as well. While developing a method for finding ,/, it
becomes evident that in some sense the matrices Aj are unique.
To aid in formulating the uniqueness theorem for ./, we adopt the following
convention: The basis fti for KA, will henceforth be ordered in such a way
that the cycles appear in order of decreasing length. That is, if/?,; is a disjoint
union of cycles 7!, 72,.... 7,,, and if the length of the cycle 7~ is pj, we index
the ewcles so that p\ > p2 > • • • > pUi. This ordering of the cye:les limits the
possible; orderings of vectors in fti, which in turn determines the matrix Aj.
It is in this sense that A, is unique. It then follows that the Jordan canonical
form for T is unique up to an ordering of the eigenvalues of T. As we will
see, there; is no uniqueness theorem for the; bases fti or for ft. Specifically, we
show that for eae:h i. the number n, of cycles that form 0i} and the length pj
(j — 1.2...., rij) of e%ach cycle, is completely detenninc'd by T.
Example 1
To illustrate the discussion above, suppose that, for some /', the ordered basis
fti for KA, is the union of four cycles fti = 71 U 72 U 73 U 74 with respective

498 Chap. 7 Canonical Forms
lengths p] = 3, p2 = 3, pz = 2, and p4 = 1. Then
Ar =
(Xi
0
0
0
0
0
0
0
1
A,
0
0
0
0
0
0
0
0
1
A,
0
0
0
0
0
0
0
0
0
A,
0
0
0
0
0
0
0
0
1
A,
0
0
0
0
0
0
0
0
1
A,
0
0
0
0
0
0
0
0
0
Xi
0
0
0
0
0
0
0
0
1
A,
0
0
0
0
0
0
0
0
0
Xi )
To help us visualize each of the matrices A, anel ordered bases 0i, we
use an array of dots called a dot diagram of T;. where T, is the restriction
of T to KA,. Suppose that fti is a disjoint union of cycles of generalized
eigenvectors 71,72, • • • »7n< with lengths p\ > p2 > • • • > pn., respectively.
The dot diagram of T,; contains one clot for each vector in fti, anel the dots
are configured according to the following rules.
1. The array consists of n* columns (one column for each cycle).
2. Counting from left to right, the y'th column consists of the Pj dots that
correspond to the vectors of 7/ starting with the initial vector at the
top and continuing down to the end vector.
Denote the end vectors of the cycles by v\.v2 <•„,. In the following
dot diagram of T,;, each clot is labeled with the name of the vector in fti to
which it corresponds.
-i,
'•1
(T-A,I;
(T-A,l)p'-2(r
(T-AJ)"-1^;
(T-A,;l)"-2(,>2;
(T-A,l)(„2)
v2
(T-Atl)p».-'K,)
(T - \i»i-2(vni)
(T-K)(vni)
•(T-A,l)(c1)
• e,
Notice that the; dot diagram of Tj has //, columns (one for each cycle;) and
p\ rows. Since; pi > p2 > • • • > pni- the columns of the dot diagram become
shorter (or at least not longer) as we move from left to right.
Now let /-, denote the number of dots in the jth row of the dot diagram.
Observe that r\ > r2 > ••• > rPl. Furthermore, the diagram can be re
constructed from the values of the r'j's. The proofs of these; facts, which are
combinatorial in nature, are treated in Exercise 9.

Sec. 7.2 The Jordan Canonical Form II 499
In Example 1, with n-i = 4, p\ = p2 = 3, p$ = 2, and P4 = 1, the dot
diagram of Tj is as follows:
• • •
Here r\ = 4, r2 = 3, and r3 = 2.
We now devise a method for computing the dot diagram of Tj using the
ranks of linear operators determined by T and A^. Hence the dot diagram
is completely determined by T, from which it follows that it is unique. On
the other hand, 0i is not unique. For example, see Exercise 8. (It is for this
reason that we associate the dot diagram with T,; rather than with fti.)
To determine the dot diagram of Ti, we devise a method for computing
each Tj, the number of dots in the jth row of the dot diagram, using only T
and Xi. The next three results give us the required method. To facilitate our
arguments, we fix a basis 0i for KA; SO that fti is a disjoint union of n7; cycles
of generalized eigenvectors with lengths p\ > p2 > • • • > prli.
Theorem 7.9. For any positive integer r, the vectors in fti that arc
associated with the dots in the first r rows of the dot diagram of Ti constitute
a basis for N((T — AJ)r). Hence the number of dots in the first r rows of the
dot diagram equals nullity((T — AJ)7-).
Proof. Clearly, N((T-Atl)r) C KA,;, and KXi is invariant under (T-A?J)r.
Let U denote the restriction of (T — Ajl)7' to KA*. By the preceding remarks,
N((T — Ajl)r) = N(U), and hence it suffices to establish the theorem for U.
Now define
Si = {x G fti: U(x) = 0} and S2 = {./; G A: U(x) ^ 0}.
Let a and b denote the number of vectors in S\ and S2, respectively, and let
m-i = din^KAj. Then a + b = rrii. For any x G fti, x G Si if and only if x is
one of the first r vectors of a cycle, and this is true if and only if x corresponds
to a dot in the first r rows of the dot diagram. Hence a is the number of dots
in the first r rows of the dot diagram. For any x & S2, the effect of applying
U to x is to move the dot corresponding to x exactly r places up its column to
another dot. It follows that U maps 52 in a one-to-one fashion into fti. Thus
{U(x): x G S2} is a basis for R(U) consisting of b vectors. Hence rank(U) = b,
and so nullity(U) = nii — b = a. But S\ is a linearly independent subset of
N(U) consisting of a vectors; therefore 5] is a basis for N(U). 1
In the case that r = 1, Theorem 7.9 yields the following corollary.
Corollary. The dimension of EA; is n«. Hence in a Jordan canonical form
ofT, the number of Jordan blocks corresponding to Xi equals the dimension
of Ex,-

500 Chap. 7 Canonical Forms
Proof. Exercise.
We are now able to devise a method for describing the dot diagram in
terms of the ranks of operators.
Theorem 7.10. Let rj denote the number of dots in the jth row of the
dot diagram of Tj, the restriction ofT to KA4. Then the following statements
are true.
(a) n = dim(V) - rank(T - AJ).
(b) rj = rank((T - AJ)'-1) - rank((T - Xi)j) if j > 1.
Proof. By Theorem 7.9, for 1 < j» < pi, we have
ri + r2 + • • • + rj = nullity((T - Ajl)j)
= dim(V) - rank((T - Ail)J).
Hence
r\ — dim(V) — rank(T — A»l),
and for j > 1,
rj = (rL +r2-\ + Tj) - (n + r2 + h r^-i)
= [dim(V) - rank((T - Xi)j)} - [dim(V) - rank((T - Ajl)^1)]
= rank((T - Ajl)^1) - rank((T - Xi)j). I
Theorem 7.10 shows that the dot diagram of Tj is completely determined
by T and Aj. Hence we have proved the following result.
Corollary. For any eigenvalue Xi ofT, the dot diagram of Ti is unique.
Thus, subject to the convention that the cycles of generalized eigenvectors
for the bases of each generalized eigenspace are listed in order of decreasing
length, the Jordan canonical form of a linear operator or a matrix is unique
up to the ordering of the eigenvalues.
We apply these results to find the Jordan canonical forms of two matrices
and a linear operator.
Example 2
Let
A =
(1
0
0
V0
-l
3
1
-1
0 1
-1 0
1 0
0 3/

Sec. 7.2 The Jordan Canonical Form II 501
We find the Jordan canonical form of A and a Jordan canonical basis for the
linear operator T = L^. The characteristic polynomial of A is
det(i4 - ti) = (t- 2)3(i - 3).
Thus A has two distinct eigenvalues, Ai = 2 and A2 = 3, with multiplicities 3
and 1, respectively. Let Ti and T2 be the restrictions of L^ to the generalized
eigenspaces KA, and KA2, respectively.
Suppose that 0\ is a Jordan canonical basis for Ti. Since Ai has multi
plicity 3, it follows that dim^A,) = 3 by Theorem 7.4(c) (p. 487); hence the
dot diagram of Ti has three dots. As we did earlier, let rj denote the number
of dots in the jth. row of this dot diagram. Then, by Theorem 7.10,
n = 4 — rank(i4 — 27) = 4 — rank
/o
0
0
V>
-1
1
1
-1
0
-1
-1
0
i
0
0
V
= 4-2 = 2,
and
r2 = ram(A - 21) - rank((,4 - 21 f) =2-1 = 1.
(Actually, the computation of r2 is unnecessary in this case because r\ = 2 and
the dot diagram only contains three dots.) Hence the dot diagram associated
with 0i is
So
(2 1 0'
M = P"ilA =020
V° 0 2,
Since A2 = 3 has multiplicity 1, it follows that dim(KA2) = 1, and conse
quently any basis 02 for KA2 consists of a single eigenvector corresponding to
A2 = 3. Therefore
A2 = [T2J>a = (3).
Setting 0 = 0\ U 02, we have
J=[lAb =
(2 1 0 0
0 2 0 0
0 0 2 0
\0 0 0 3/

502 Chap. 7 Canonical Forms
and so J is the Jordan canonical form of A.
We now find a Jordan canonical basis for T = L^. We begin by determin
ing a Jordan canonical basis 0\ for Ti- Since the dot diagram of Ti has two
columns, each corresponding to a cycle of generalized eigenvectors, there are
two such cycles. Let v\ and u2 denote the end vectors of the first and second
cycles, respectively. We reprint below the dot diagram with the dots labeled
with the names of the vectors to which they correspond.
• (T-2l)(t>i) •v2
Prom this diagram we sec that V\ e N((T - 2I)2) but Vi £ N(T - 21). Now
/0 -1
0 1
0 1
\0 -1
It is easily seen that
A-2J =
0
-1
-1
0
1
0
0
and (A - 2I)2 =
/()
0
0
V"
-2
0
0
-2
0
0
f (
0
0
W
'
AA
l
2
w
/o
I
0
w
is a basis for N((T — 2I)2) = KA, . Of these three basis vectors, the last two
do not belong to N(T — 21), and hence we select one of these for v\. Suppose
that we choose
'••i =
Then
(0
1
2
w
/o -i
0 1
0 1
V
--1
0
-1
-1
0
(T-2)(vi) = (A-2I)(v1) =
Now simply choose v2 to be a vector in EA, that is linearly independent of
(T - 21) (t>i); for example, select
L
0
0
V
(0
1
2
w
-*
-1
" -1
V-i/
V2 =
/1
0
0

Sec. 7.2 The Jordan Canonical Form II 503
Thus we have associated the Jordan canonical basis
0i =
with the dot diagram in the following manner.
M
-1
-1
K-y
>
(°
i
2
sPJ
)
(1
0
0
v<v
/-1
-1
-1
h
•
0
0
0
w
By Theorem 7.6 (p. 489), the linear independence of 0\ is guaranteed since
t'2 was chosen to be linearly independent of (T — 21)(vi).
Since A2 = 3 has multiplicity 1, dim(KA2) = dim(EA2) = 1. Hence any
eigenvector of L^ corresponding to A2 = 3 constitutes an appropriate basis
02. For example,
02 =
0
0
W
Thus
0 = 0lU02 =
is a Jordan canonical basis for L^.
Notice that if

1
1
h
?
(°)
i
2
v<v
!
/1
0
0
W
?
/i
0
0
W
Q =
/-l o 1 i
-110 0
-12 0 0
\-l 0 0 1/
then J = Q~lAQ.

504 Chap. 7 Canonical Forms
Example 3
Let
( 2
-2
-2
V-2
-4
0
-2
-6
2
1
3
3
2^
3
3
V
A =
We find the Jordan canonical form J of A, a Jordan canonical basis for L^,
and a matrix Q such that J — Q_1 AQ.
The characteristic polynomial of A is dct(A - ti) = (t - 2)2(t - 4)2. Let
T = L^, Ai = 2, and A2 = 4, and let T; be the restriction of L^ to KA* for
i = 1,2.
We begin by computing the dot diagram of Ti. Let r\ denote the number
of dots in the first row of this diagram. Then
r, = 4 - rank(,4 - 21) = 4 - 2 = 2;
hence the dot diagram of Ti is as follows.
Therefore
^=[T,],, = g °)
where /?i is any basis corresponding to the dots. In this case, 0\ is an arbitrary
basis for EAX = N(T — 21), for example,
01 =
Next we compute the dot diagram of T2. Since rank(.A — 4/) = 3, there
is only 4 — 3=1 dot in the first row of the diagram. Since A2 = 4 has
multiplicity 2, we have dim(KA2) = 2, and hence this dot diagram has the
following form:
/2
1
0
\v
5
M
1
2
w
Thus
M = [T2]A =
4 1
0 4

Sec. 7.2 The Jordan Canonical Form II 505
where 02 is any basis for KA., corresponding to the dots. In this case, 02
is a cycle of length 2. The end vector of this cycle is a vector v € K^a =
N((T - 4I)2) such that v £ N(T - 41). One way of finding such a vector was
used to select the vector v\ in Example 2. In this example, we illustrate
another method. A simple calculation shows that a basis for the null space
of L^ - 41 is
Choose v to be any solution to the system of linear equations
/0
{A - AI)x =
1
W
for example,
Thus
-1
-1
\ V
02 = {(LA-4)(v),v} = {
( /0
1
1
IW
1 l
-i
-i
V o/
Therefore
(2y /& M
1
o •
2/
1
2 <
W
1
1
v)
-I
-1
V <v
/ A
0 = 0lU02 =
is a Jordan canonical basis for L.4. The corresponding Jordan canonical form
is given by
•/ = \-A\B =
Ax O
O An
/ 2 0
0 2
0 0
\ 0 0
0 0
0 0
4 1
0 4/

506 Chap. 7 Canonical Forms
f2
1
0
V2
0 0
1 1
2 1
0 1
1
-1
-1
V
Finally, we define Q to be the matrix whose columns are the vectors of 0
listed in the same order, namely,
Q =
Then J = Q~lAQ. •
Example 4
Let V be the vector space of polynomial functions in two real variables x
and y of degree at most 2. Then V is a vector space over R and a =
{l,x,y, x2,y2,xy} is an ordered basis for V. Let T be the linear operator
on V defined by
T(f(x,y)) = -^f(x,y).
For example, if f(x, y) = x + 2x2 — 3xy + y, then
TCffoy)) = j- (x + 2x2 - 3xy + y) = 1 + Ax - 3y.
We find the Jordan canonical form and a Jordan canonical basis for T.
Let A = [T]a. Then
/() 1 0 0 0 0
0 0 0 2 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
vo o o o o oy
A =
and hence the characteristic polynomial of T is
det(A - ti) = det
f-t
0
0
0
0
V ()
I
-t
0
0
0
0
0
0
-t
0
0
0
0
2
0
-t
0
0
0
0
0
0
-t
0
0
0
1
0
0
-v
= ?
Thus A = 0 is the only eigenvalue of T, and KA = V. For each j, let rj denote
the number of dots in the jib. row of the dot diagram of T. By Theorem 7.10,
n = 6 - rank(;4) = 6-3 = 3,

Sec. 7.2 The Jordan Canonical Form II 507
and since
.42 =
A) 0 0 2 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
\0 0 0 0 0 0/
r2 = rank(4) - rank(.42) = 3-1 = 2.
Because there are a total of six dots in the dot diagram and r\ = 3 and
r2 = 2, it follows that r$ = 1. So the dot diagram of T is
We conclude that the Jordan canonical form of T is
J =
/ 0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
\ 0 0 0
0 0
0 0
0 0
0 1
0 0
0 0
o
0
0
0
0
0/
We now find a Jordan canonical basis for T. Since the first column of the
dot diagram of T consists of three dots, we must find a polynomial f(x,y)
d2
:fi(x,y) 7^ 0. Examining the basis a = {l,x,y,x2,y2,xy} for
such that
dx2
KA = V, we see that x is a suitable candidate. Setting fi(x,y) = x , we see
that
(T - X)(h(x,y)) = T(h(x,y)) = —(x2) = 2x
and
(T - X)2(j(x,y)) = T2(h(x,y)) = ^(x2) = 2.
Likewise, since the second column of the dot diagram consists of two dots, we
must find a polynomial /2(a;, y) such that
~(f2(x,y))^0, but ^(f2(x,y)) = 0.

508 Chap. 7 Canonical Forms
Since our choice must be linearly independent of the polynomials already
chosen for the first cycle, the only choice in a that satisfies these constraints
is xy. So we set /2(a;.?y) = xy. Thus
(T - X)(f2(x>y)) = T(f2(x,y)) = %-(xy) = y.
ox
Finally, the third column of the dot diagram consists of a single polynomial
that lies in the null space of T. The only remaining polynomial in a is y2,
and it is suitable here. So set fz(x,y) = y2. Therefore we have identified
polynomials with the dots in the dot diagram as follows.
• 2
• 2x
• x2
y
>xy
• ir
Thus 0 = {2,2x,x2,y, xy,y2} is a Jordan canonical basis for T. •
In the three preceding examples, we relied on our ingenuity and the con
text of the problem to find Jordan canonical bases. The reader can do the
same in the exercises. We are successful in these cases because the dimen
sions of the generalized eigenspaces under consideration are small. We do
not attempt, however, to develop a general algorithm for computing Jordan
canonical bases, although one could be devised by following the steps in the
proof of the existence of such a basis (Theorem 7.7 p. 490).
The following result may be thought of as a corollary to Theorem 7.10.
Theorem 7.11. Lot A and B he n x n matrices, each having Jordan
canonical forms computed according to the conventions of this section. Then
A and B are similar if and only if they have (up to an ordering of their
eigenvalues) the same Jordan canonical form.
Proof. If A and B have the same Jordan canonical form J, then A and B
are each similar to J and hence are similar to each other.
Conversely, suppose that A and B are similar. Then A and B have the
same eigenvalues. Let J A and JR denote the Jordan canonical forms of A and
B, respectively, with the same ordering of their eigenvalues. Then A is similar
to both J A and JB, and therefore, by the corollary to Theorem 2.23 (p. 115),
J A and JB are matrix representations of L^. Hence J A and JB are Jordan
canonical forms of L^. Thus J A = JB by the corollary to Theorem 7.10. 1
Example 5
We determine which of the matrices
B =

Sec. 7.2 The Jordan Canonical Form II 509
'0 1 2'
and /; [oil
.0 0 2,
are similar. Observe that A, B, and C have the same characteristic poly
nomial —(t — l)(t — 2)2, whereas D has —t(t — l)(t — 2) as its characteristic
polynomial. Because similar matrices have the same characteristic polynomi
als, D cannot be similar to A, B, or C. Let J A, JB, and Jc be the Jordan
canonical forms of A, B, and C, respectively, using the ordering 1, 2 for their
common eigenvalues. Then (see Exercise 4)
JB =
Since J A = Jc-, A is similar to C. Since JB is different from J A and Jc, B is
similar to neither A nor C. •
The reader should observe that any diagonal matrix is a Jordan canonical
form. Thus a linear operator T on a finite-dimensional vector space V is diag
onalizable if and only if its Jordan canonical form is a diagonal matrix. Hence
T is diagonalizable if and only if the Jordan canonical basis for T consists of
eigenvectors of T. Similar statements can be made about matrices. Thus,
of the matrices A, B, and C in Example 5, A and C are not diagonalizable
because their Jordan canonical forms are not diagonal matrices.
0
0
0
2
0
°
o ,
2
and Jc =
/I
°
Vo
0
2
0
0
1
2
EXERCISES
1. Label the following statements as true or false. Assume that the char
acteristic polynomial of the matrix or linear operator splits.
(a) The Jordan canonical form of a diagonal matrix is the matrix itself.
(b) Let T be a linear operator on a finite-dimensional vector space V
that has a Jordan canonical form J. If 0 is any basis for V, then
the Jordan canonical form of [T]^ is J.
(c) Linear operators having the same characteristic polynomial are
similar.
(d) Matrices having the same Jordan canonical form are similar.
(e) Every matrix is similar to its Jordan canonical form.
(f) Every linear operator with the characteristic polynomial
(—l)n(t — A)ra has the same Jordan canonical form.
(g) Every linear operator on a finite-dimensional vector space has a
unique Jordan canonical basis.
(h) The dot diagrams of a linear operator on a finite-dimensional vec
tor space are unique.

510 Chap. 7 Canonical Forms
2. Let T be a linear operator on a finite-dimensional vector space V such
that the characteristic polynomial of T splits. Suppose that Ai = 2,
A2 = 4, and A3 = —3 are the distinct eigenvalues of T and that the dot
diagrams for the restriction of T to KA, (i = 1,2,3) are as follows:
Ai =2 A, =4 A3 = -3
Find the Jordan canonical form J of T.
3. Let T be a linear operator on a finite-dimensional vector space V with
Jordan canonical form
/ 2 1 0
0 2 1
0 0 2
0 0 0
0 0 0
0 0 0
V 0 0 0
0 0 0 0 N
0 0 0 0
0 0
2 1
0 2
0 0
0 0
0 0
0 0
0 0
3 0
0 3 J
(a)
(b)
(c)
(d)
(e)
Find the characteristic polynomial of T.
Find the dot diagram corresponding to each eigenvalue of T.
For which eigenvalues A;, if any, docs EA, = KA/?
For each eigenvalue A*, find the smallest positive integer pi for
which KAi = N((T - AJ)^).
Compute the following numbers for each i, where Uj denotes the
restriction of T - Ajl to KA, .
(i) rank(Ui)
(ii) rank(U2)
(iii) nullity(U?)
(iv) nullity(U2)
4. For each of the matrices A that follow, find a Jordan canonical form
J and an invertible matrix Q such that .7 = Q~1AQ. Notice that the
matrices in (a), (b), and (c) are those used in Example 5.
(a) A =
1 2
-1 2
-1 2
1 4/
(c) A = (d) A
( 0
-2
-2
V-2

Sec. 7.2 The Jordan Canonical Form II 511
5. For each linear operator T, find a Jordan canonical form J of T and a
Jordan canonical basis 0 for T.
(a) V is the real vector space of functions spanned by the set of real-
valued functions {et,tet,t2et,e2t}, and T is the linear operator on
V defined by T(/) = /'.
(b) T is the linear operator on ?3(R) defined by T(/(a:)) = xf"(x).
(c) T is the linear operator on P-$(R) defined by
T(f(x)) = f"(x) + 2f(x).
(d) T is the linear operator on M2x2(i?) defined by
T(A)-(» ty-A-A*.
(e) T is the linear operator on M2x2(i?) defined by
T(A)=(l ty(A-A%
(f) V is the vector space of polynomial functions in two real variables
x and y of degree at most 2, as defined in Example 4, and T is the
linear operator on V defined by
T(f(x,y)) = £f(x,y) + ^f(x,y).
6. Let A be an n x n matrix whose characteristic polynomial splits. Prove
that A and A1 have the same Jordan canonical form, and conclude that
A and A* are similar. Hint: For any eigenvalue A of A and A1 and any
positive integer r, show that rank((A — Al)r) = rank((Al — Al)r).
7. Let A be an n x n matrix whose characteristic polynomial splits, 7 be
a cycle of generalized eigenvectors corresponding to an eigenvalue A,
and W be the subspace spanned by 7. Define 7' to be the ordered set
obtained from 7 by reversing the order of the vectors in 7.
(a) Prove that [Tw]7' = ([Tw]7) •
(b) Let J be the Jordan canonical form of A. Use (a) to prove that J
and J* are similar.
(c) Use (b) to prove that A and At are similar.
8. Let T be a linear operator on a finite-dimensional vector space, and
suppose that the characteristic polynomial of T splits. Let 0 be a Jordan
canonical basis for T.
(a) Prove that for any nonzero scalar c, {ex: x € 0} is a Jordan canon
ical basis for T.

512 Chap. 7 Canonical Forms
(b) Suppose that 7 is one of the cycles of generalized eigenvectors that
forms 0, and suppose that 7 corresponds to the eigenvalue A and
has length greater than 1. Let x be the end vector of 7, and let y
be a nonzero vector in EA- Let 7' be the ordered set obtained from
7 by replacing x by x + y. Prove that 7' is a cycle of generalized
eigenvectors corresponding to A, and that if 7' replaces 7 in the
union that defines 0, then the new union is also a Jordan canonical
basis for T.
(c) Apply (b) to obtain a Jordan canonical basis for LA, where A is the
matrix given in Example 2, that is different from the basis given
in the example.
9. Suppose that a dot diagram has k columns and m rows with pj dots in
column j and r* dots in row i. Prove the following results.
(a) rn — p\ and k = r\.
(b) Pj = max {i: Ti > j} for 1 < j < k and r^ = max {j: pj > i} for
1 r2 > ••• > rm.
(d) Deduce that the number of dots in each column of a dot diagram
is completely determined by the number of dots in the rows.
10. Let T be a linear operator whose characteristic polynomial splits, and
let A be an eigenvalue of T.
(a) Prove that dim(KA) is the sum of the lengths of all the blocks
corresponding to A in the Jordan canonical form of T.
(b) Deduce that EA = KA if and only if all the Jordan blocks corre
sponding to A are lxl matrices.
The following definitions arc used in Exercises 11 19.
Definitions. A linear operator T on a vector space V is called nilpotcnt
if Tp = To for some positive integer p. An nxn matrix A is called nilpotent
if Ap — O for some positive integer p.
11. Let T be a linear operator on a finite-dimensional vector space V, and
let 0 be an ordered basis for V. Prove that T is nilpotcnt if and only if
\T]j3 is nilpotent.
12. Prove that any square upper triangular matrix with each diagonal entry
equal to zero is nilpotent.
13. Let T be a nilpotent operator on an n-dimensional vector space V, and
suppose that p is the smallest positive integer for which Tp = T0. Prove
the following results.
(a) N(P) C N(T'+1) for every positive integer i.

Sec. 7.2 The Jordan Canonical Form II 513
(b) There is a sequence of ordered bases 0\, p\,..., 0P such that 0i is
a basis for N(T*) and 0i+\ contains 0i for 1 < i < p — 1.
(c) Let 0 = 0P be the ordered basis for N(TP) = V in (b). Then [T]^
is an upper triangular matrix with each diagonal entry equal to
zero.
(d) The characteristic polynomial of T is (—l)ntn. Hence the charac
teristic polynomial of T splits, and 0 is the only eigenvalue of T.
14. Prove the converse of Exercise 13(d): If T is a linear operator on an n-
dimensional vector space V and (—l)ntn is the characteristic polynomial
of T, then T is nilpotent.
15. Give an example of a linear operator T on a finite-dimensional vector
space such that T is not nilpotent, but zero is the only eigenvalue of T.
Characterize all such operators.
16. Let T be a nilpotent linear operator on a finite-dimensional vector space
V. Recall from Exercise 13 that A = 0 is the only eigenvalue of T, and
hence V = KA- Let 0 be a Jordan canonical basis for T. Prove that for
any positive integer i, if we delete from 0 the vectors corresponding to
the last i dots in each column of a dot. diagram of 0. the resulting set is
a basis for R(T*). (If a column of the dot diagram contains fewer than i
dots, all the vectors associated with that column arc removed from 0.)
17. Let T be a linear operator on a finite-dimensional vector space V such
that the characteristic polynomial of T splits, and let Ai, A2,..., Afc be
the distinct eigenvalues of T. Let S: V —* V be the mapping defined by
S(x) = Ai?;j + X2v2 H h Afc-Ufc,
where, for each i, Vi is the unique vector in KA, such that x — v\ +
V2 H \-Vfc- (This unique representation is guaranteed by Theorem 7.3
(p. 486) and Exercise 8 of Section 7.1.)
(a) Prove that S is a diagonalizable linear operator on V.
(b) Let U = T — S. Prove that U is nilpotent and commutes with S,
that is, SU = US.
18. Let T be a linear operator on a finite-dimensional vector space V, and
let J be the Jordan canonical form of T. Let D be the diagonal matrix
whose diagonal entries are the diagonal entries of J, and let M = J — D.
Prove the following results.
(a) M is nilpotent.
(b) MD = DM.

514 Chap. 7 Canonical Forms
(c) If p is the smallest positive integer for which Mp — O, then, for
any positive integer r < p,
r = £>r + rDr~1M + r{'r~V)Dr-2M2 + ••• + rDMr~l + Mr,
and, for any positive integer r > p,
jr = Dr + rDr~lM + r^~^Dr-2M2 + •••
2!
+
r!
(r-p-rl)(p-l)
Dr~p+1Mp-1.
19. Let
J =
(X 1 0 ••• 0
0 A 1 ••• 0
0 0 A ••• 0
0 0 0 ••• 1
\0 0 0 ••• A/
be the m x m Jordan block corresponding to A, and let N = J — XIm.
Prove the following results:
(a) Nm = O, and for 1 < r < m,
0 otherwise.
(b) For any integer r >m,
1 _x r(r-l) r(r-l)---(r-m + 2)xr_m+1^
2!
Jr =
Ar rAT
0 A?
(m-1)!
—m+l
rX r-l
r(r-l)---(r-m + 3) .r_m+2
(m-2)!
\0 0 0 AT
(c) lim Jr exists if and only if one of the following holds:
T—>00
(i) |A| < 1.
(ii) A = 1 and m = 1.

-w.
Sec. 7.2 The Jordan Canonical Form II 515
(Note that lim Ar exists under these conditions. See the discus-
r—*oo
sion preceding Theorem 5.13 on page 285.) Furthermore, lim Jr
r—»oo
is the zero matrix if condition (i) holds and is the lxl matrix (1)
if condition (ii) holds,
(d) Prove Theorem 5.13 on page 285.
The following definition is used in Exercises 20 and 21.
Definition. For any A G MnXn(C), define the norm of A by
\A\ = max{|j4jj|: 1 < i,j < n}.
20. Let A, B G Mnxn(C). Prove the following results.
(a) ||A|| > 0 and ||A|| = 0 if and only if A = O.
(b) \cA\ = |c|»||i4|| for any scalar c.
(c) P + S||< ||A|| + ||£||.
(d) ||AB||<n||A||||S||.
21. Let A G Mnxn(C) be a transition matrix. (See Section 5.3.) Since C is
an algebraically closed field, A has a Jordan canonical form J to which
A is similar. Let P be an invertible matrix such that P~lAP = J.
Prove the following results.
(a) ||Am|| < 1 for every positive integer rn.
(b) There exists a positive number c such that ||Jm|| < c for every
positive integer m.
(c) Each Jordan block of J corresponding to the eigenvalue A = 1 is a
lxl matrix.
(d) lim Am exists if and only if 1 is the only eigenvalue of A with
m—>oo
absolute value 1.
(e) Theorem 5.20(a) using (c) and Theorem 5.19.
The next exercise requires knowledge of absolutely convergent series as well
as the definition of eA for a matrix A. (See page 312.)
22. Use Exercise 20(d) to prove that eA exists for every A G Mnxn(C).
23. Let x' — Ax be a system of n linear differential equations, where x is
an n-tuple of differentiable functions x(t),X2(t),... ,xn(t) of the real
variable t, and A is an n x n coefficient matrix as in Exercise 15 of
Section 5.2. In contrast to that exercise, however, do not assume that
A is diagonalizable, but assume that the characteristic polynomial of A
splits. Let Ai, A2,..., Afc be the distinct eigenvalues of A.

516 Chap. 7 Canonical Forms
(a) Prove that if u is the end vector of a cycle of generalized eigenvec
tors of LA of length p and u corresponds to the eigenvalue A;, then
for any polynomial f(t) of degree less than p, the function
ex*[f(t)(A - Xilf-1 + f'(t)(A - Xi)p-2 + ••• + /^(iOJti
is a solution to the system x' = Ax.
(b) Prove that the general solution to x' = Ax is a sum of the functions
of the form given in (a), where the vectors u are the end vectors of
the distinct cycles that constitute a fixed Jordan canonical basis
for LA-
24. Use Exercise 23 to find the general solution to each of the following sys
tems of linear equations, where x, y, and z are real-valued differentiable
functions of the real variable t.
x' = 2x + y x' = 2x + y
(a) y'= 2y- z (b) y'= 2y + z
z' = 3z z'= 2z
7.3 THE MINIMAL POLYNOMIAL
The Cayley-Hamilton theorem (Theorem 5.23 p. 317) tells us that for any
linear operator T on an n-dimensional vector space, there is a polynomial
f(t) of degree n such that f(T) = To, namely, the characteristic polynomial
of T. Hence there is a polynomial of least degree with this property, and this
degree is at most n. If g(t) is such a polynomial, we can divide g(t) by its
leading coefficient to obtain another polynomial p(t) of the same degree with
leading coefficient 1, that is, p(t) is a monic polynomial. (See Appendix E.)
Definition. Let T be a linear operator on a finite-dimensional vector
space. A polynomial p(t) is called a minimal polynomial ofT if p(t) is a
monic polynomial of least positive degree for which p(T) — To.
The preceding discussion shows that every linear operator on a finite-
dimensional vector space has a minimal polynomial. The next result shows
that it is unique.
Theorem 7.12. Let p(t) be a minimal polynomial of a linear operator T
on a finite-dimensional vector space V.
(a) For any polynomial g(t), if g(T) = To, thenp(t) divides g(t). In partic
ular, p(t) divides the characteristic polynomial ofT.
(b) The minimal polynomial of T is unique.
Proof, (a) Let g(t) be a polynomial for which g(T) = TQ. By the division
algorithm for polynomials (Theorem E.l of Appendix E, p. 562), there exist
polynomials q(t) and r(t) such that
9(t) = q(t)p(t)+r(t),
(1)

Sec. 7.3 The Minimal Polynomial 517
where r(t) has degree less than the degree of p(i). Substituting T into (1)
and using that g(T) — p(T) — T0, we have r(T) = To. Since r(t) has degree
less than p(t) and p(t) is the minimal polynomial of T, r(t) must be the zero
polynomial. Thus (1) simplifies to g(t) — q(t)p(t), proving (a).
(b) Suppose that p(t) and p2(£) are each minimal polynomials ofT. Then
P(t) divides p2(£) by (a). Since p(t) and P2(t) have the same degree, we have
that p2(t) = cpi(t) for some nonzero scalar c. Because Pi(t) and p2(i) are
monic, c = 1; hence P(t) = p2(i). D
The minimal polynomial of a linear operator has an obvious analog for a
matrix.
Definition. Let A € MnXn(F). The minimal polynomial p(t) of A is
the monic polynomial of least positive degree for which p(A) = O.
The following results are now immediate.
Theorem 7.13. Let T be a linear operator on a finite-dimensional vector
space V, and let 0 be an ordered basis for V. Then the minimal polynomial
ofT is the same as the minimal polynomial of [T]p.
Proof. Exercise. [~]
Corollary. For any A G Mnxn(F), the minimal polynomial of A is the
same as the minimal polynomial of LA •
Proof. Exercise. I |
In view of the preceding theorem and corollary, Theorem 7.12 and all
subsequent theorems in this section that are stated for operators are also
valid for matrices.
For the remainder of this section, we study primarily minimal polynomials
of operators (and hence matrices) whose characteristic polynomials split. A
more general treatment of minimal polynomials is given in Section 7.4.
Theorem 7.14. Let T be a linear operator on a finite-dimensional vector
space V, and let p(t) be the minimal polynomial of T. A scalar X is an
eigenvalue ofT if and only ifp(X) = 0. Hence the characteristic polynomial
and the minimal polynomial of T have the same zeros.
Proof. Let f(t) be the characteristic polynomial of T. Since p(t) divides
f(t), there exists a polynomial q(t) such that f(t) = q(t)p(t). If A is a zero of
p(t), then
f(X) = q(X)p(X) = q(X).0 = 0.
So A is a zero of f(t); that is, A is an eigenvalue of T.

518 Chap. 7 Canonical Forms
Conversely, suppose; that A is an eigenvalue of T, and let x G V be an
eigenvector corresponding to A. By Exercise 22 of Section 5.1, we have
0 = To(aO=p(T)(aO=p(A)a;.
Since x ^ 0. it follows that p(X) = 0, and so A is a zero of p(t). I
The following corollary is immediate.
Corollary. Let T be a linear operator on a finite-dimensional vector space
V with minimal polynomial p(t) and characteristic polynomial f(t). Suppose
that f(t) factors as
f(t) = (\1-t)ni(\2-t)n>-..(\k-t)n'>,
where X\, A2,..., Afc are the distinct eigenvalues ofT. Then there exist inte
gers mi, m2,... ,nifc such that 1 < m» < rii for all i and
p(t)«(t-A1)m»(t-Aa)m« •••(«- A*)™*.
Example 1
We compute the minimal polynomial of the matrix
4 =
Since A has the characteristic polynomial
(3-t -1 0
/(t) = det 0 2-t 0 = -(t-2)2{t~3),
\ i -1 2-7
the minimal polynomial of A must be either (t — 2)(t — 3) or (t — 2)2(t — 3)
by the corollary to Theorem 7.14. Substituting A into p(t) = (t - 2)(t - 3),
we find that p(A) — O; hence p(t) is the minimal polynomial of A. •
Example 2
Let T be the linear operator on R2 defined by
T(a, b) = (2a + 56,6a + b)
and 0 be the standard ordered basis for R2. Then
'2 ;>
[T]p =
6 1
and hence the characteristic polynomial of T is
/(*)=det(2~' llt)=(*-7)(* + 4).
Thus the minimal polynomial of T is also (t — 7)(t + 4). •

Sec. 7.3 The Minimal Polynomial 519
Example 3
Let D be the linear operator on P2(/?) defined by D(g(x)) — g'(x), the deriva
tive of g(x). We compute the minimal polynomial of T. Let 0 be the standard
ordered basis for P2(i?). Then
/(> 1 0
[O]0 = 0 0 2 ,
\o 0 0/
and it follows that the characteristic polynomial of D is — t3. So by the
corollary to Theorem 7.14, the minimal polynomial of D is t, t2, or t3. Since
D2(a;2) = 2 ^ 0, it follows that D2 ^ T0; hence the minimal polynomial of D
must be t3. •
In Example 3, it is easily verified that P2(i?) is a D-cyclic subspace (of
itself). Here the minimal and characteristic polynomials are of the same
degree. This is no coincidence.
Theorem 7.15. Let T be a linear operator on an n-dimcnsional vector
space V such that V is a T -cyclic subspace of itself. Then the characteristic
polynomial f(t) and the minimal polynomial p(t) have the same degree, and
hence f(t) = (-l)np(t).
Proof. Since V is a T-cyclic space, there exists an ,7; G V such that
0 = {x,T(x),...,Tn-1(x)}
is a basis for V (Theorem 5.22 p. 315). Let
g(t) = a0-ra1t-r--- + aktk,
be a polynomial of degree k < n. Then afc 7^ 0 and
g(T)(x) = a0x + axT(x) + ••• + o,kTk(x),
and so g(T)(x) is a linear combination of the vectors of 0 having at least one
nonzero coefficient, namely, afc. Since 0 is linearly independent, it follows
that g(T)(x) ^ 0; hence g(T) / To. Therefore the minimal polynomial of T
has degree n, which is also the degree of the characteristic polynomial of T.
Theorem 7.15 gives a condition under which the degree of the minimal
polynomial of an operator is as large as possible. We now investigate the
other extreme. By Theorem 7.14, the degree of the minimal polynomial of an
operator must be greater than or equal to the number of distinct eigenvalues
of the operator. The next result shows that the operators for which the
degree of the minimal polynomial is as small as possible are precisely the
diagonalizable operators.

520 Chap. 7 Canonical Forms
Theorem 7.16. Let T be a linear operator on a finite-dimensional vector
space V. Then T is diagonalizable if and only if the minimal polynomial ofT
is of the form
p(t) = (t-X1)(t-X2)---(t-Xk),
where Ai. A2 AA- are the distinct eigenvalues ofT.
Proof. Suppose that T is diagonalizable. Let Ai, A2 , Afc be the distinct
eigenvalues of T, and define
p(t) = (t-\1)(t-\2)•••(t- Afe).
By Theorem 7.14, p(t) divides the minimal polynomial of T. Let 0 —
{vi,V2,- • • ,Vn} be a basis for V consisting of eigenvectors of T. and con
sider any '.',: G 0. Then (T — Xj)(Vi) = 0 for some eigenvalue Ay. Since t - A?
divides p(i), there is a polynomial qj(t) such that p(t) = qj(t)(t - Xj). Hence
p(T)(vi) = qj(T)(T-Xj)(vi) = 0.
It follows that p(T) = To, since p(T) takes each vector in a basis for V into
0. Therefore p(t) is the minimal polynomial of T.
Conversely, suppose that there are distinct scalars Ai, A2,..., Afc such that
the minimal polynomial p(t) of T factors as
p(t) = (t-X1)(t-X2)•••(*- Afc).
By Theorem 7.14. the A/'s are eigenvalues of T. We apply mathematical
induction on n = diin(V). Clearly T is diagonalizable for n = 1. Now
assume that T is diagonalizable whenever dim(V) < n for some n > 1, and
lei dim(V) = n and W = R(T - A/,-1). Obviously W ^ V, because Afc is an
eigenvalue of T. If W = {0}. then T = Afc I. which is clearly diagonalizable.
So suppose that 0 < diin(W) < n. Then W is T-invariant, and for any x G W,
(T - A,I)(T - A2I) • • • (T - Afc_,!)(:/;) = 0.
It follows that the minimal polynomial of Tw divides the polynomial
(/ — Ai)(r - A2) • • • (/ — Afc_i). Hence by the induction hypothesis, Tw is
diagonalizable. Furthermore, Afc is not an eigenvalue of Tw by Theorem 7.14.
Therefore W n N(T - AAI) = {()}. Now let ft = {ci.r,.... , vm) be a ba
sis for W consisting of eigenvectors of Tw (and hence of T), and let 02 =
{w\, W2, • • • ,wp} be a basis for N(T - Afcl), the eigenspace of T corresponding
to Afc. Then 0\ and 02 are disjoint by the previous comment. Moreover,
m+p = n by the dimension theorem applied to T - Afcl. We show that
Q = 0i U 02 is linearly independent. Consider scalars a],a2,... ,am and
b\.b2.... ,bv such that
a\V\ + a2t'2 ,t'm + b\ u-\ + b2ti<2 + • • • + bpwp = 0.

Sec. 7.3 The Minimal Polynomial
Let
521
x = 22 aiVi and
*/ = !>< (D,
i=l
Then x G W, y G N(T - Afcl), and x + y = 0. It follows that x = -y <E
W fl N(T — Afcl), and therefore .r = 0. Since 0\ is linearly independent, we
have that ai = a2 = • • • = am = 0. Similarly, b\ = 62 = • • • = 6» = 0,
and we conclude that 0 is a linearly independent subset of V consisting of n
eigenvectors. It follows that 0 is a basis for V consisting of eigenvectors of T,
and consequently T is diagonalizable. 1
In addition to diagonalizable operators, there arc methods for determin
ing the minimal polynomial of any linear operator on a finite-dimensional
vector space. In the case that the characteristic polynomial of the operator
splits, the minimal polynomial can be described using the Jordan canonical
form of the operator. (See Exercise 13.) In the case that the characteristic
polynomial does not split, the minimal polynomial can be described using the
rational canonical form, which we study in the next section. (See Exercise 7
of Section 7.4.)
Example 4
We determine all matrices A G M2X2(fl) for which A2 - 3A-r 21 = O. Let
g(t) = t2 - 3t + 2 = (t - l)(t - 2). Since g(A) = O, the minimal polynomial
p(t) of A divides g(t). Hence the only possible candidates for p(i) are t — \,
t - 2, and (I - l)(t - 2). If p(t) = t- 1 or p(t) =t-2, then A = / or A ~ 21,
respectively. If p(t) = (t — l)(t — 2), then A is diagonalizable with eigenvalues
1 and 2, and hence A is similar to
1 0
0 2
Example 5
Let A £ MnXn(7?) satisfy A3 = A. We show that A is diagonalizable. Let
g(t) = t3 - t = t(t + l)(t - 1). Then g(A) = O, and hence the minimal
polynomial p(t) of A divides g(t). Since g(t) has no repeated factors, neither
does p(t). Thus A is diagonalizable by Theorem 7.16. •
Example 6
In Example 3, we saw that the minimal polynomial of the differential operator
D on P2(7?) is t3. Hence, by Theorem 7.16, D is not diagonalizable. •

522 Chap. 7 Canonical Forms
EXERCISES
1. Label the following statements as true or false. Assume that all vector
spaces are finite-dimensional.
(a) Every linear operator T has a polynomial p(t) of largest degree for
which 7->(T) = TQ.
(b) Every linear operator has a unique minimal polynomial.
(c) The characteristic polynomial of a linear operator divides the min
imal polynomial of that operator.
(d) The minimal and the characteristic polynomials of any diagonal
izable operator are equal.
(e) Let T be a linear operator on an n-dimensional vector space V, p(t)
be the minimal polynomial of T, and f(t) be the characteristic
polynomial of T. Suppose that f(t) splits. Then f(t) divides
\p(t)\n-
(f) The minimal polynomial of a linear operator always has the same
degree as the characteristic polynomial of the operator.
(g) A linear operator is diagonalizable if its minimal polynomial splits.
(h) Let T be a linear operator on a vector space V such that V is a
T-cyclic subspace of itself. Then the degree of the minimal poly
nomial of T equals dim(V).
(i) Let T be a linear operator on a vector space V such that T has n
distinct eigenvalues, where n — dim(V). Then the degree of the
minimal polynomial of T equals n.
2. Find the minimal polynomial of each of the following matrices.
(a)
(c)
2
1
4
1
1
1
2j
-14
-4
-6
5
2
4
(b)
1 1
0 1
3.
4.
5.
For each linear operator T on V, find the minimal polynomial of T.
(a) V = R2 and T(o, b) = (a + b, a - b)
(b) V = P2(R) and T(g(x)) = g'(x) + 2g(x)
(c) V = P2(R) and T(f(x)) = -xf"(x) + f'(x) + 2f(x)
(d) V = MnXn(i?) and T(A) = A*. Hint: Note that T2 = I.
Determine which of the matrices and operators in Exercises 2 and 3 are
diagonalizable.
Describe all linear operators T on R2 such that T is diagonalizable and
T3 _ 2T2 + T = T0.

Sec. 7.3 The Minimal Polynomial
6. Prove Theorem 7.13 and its corollary.
7. Prove the corollary to Theorem 7.14.
523
8. Let T be a linear operator on a finite-dimensional vector space, and let
p(t) be the minimal polynomial of T. Prove the following results.
(a) T is invertible if and only if p(0) -£ 0.
(b) If T is invertible and p(t) = tn + an_i£n_1 + • • • + ait + a0, then
T-1 = -— (T"-1 + an_,Tn-2 + • • • + a2T + a,I).
a0
9. Let T be a diagonalizable linear operator on a finite-dimensional vector
space V. Prove that V is a T-cyclic subspace if and only if each of the
eigenspaces of T is one-dimensional.
10. Let T be a linear operator on a finite-dimensional vector space V, and
suppose that W is a T-invariant subspace of V. Prove that the minimal
polynomial of Tw divides the minimal polynomial of T.
11. Let g(t) be the auxiliary polynomial associated with a homogeneous lin
ear differential equation with constant coefficients (as defined in Section
2.7), and let V denote the solution space of this differential equation.
Prove the following results.
(a) V is a D-invariant subspace, where D is the differentiation operator
on C°°.
(b) The minimal polynomial of Dv (the restriction of D to V) is g(t).
(c) If the degree of g(t) is n, then the characteristic polynomial of Dv
is(-l)ng(t).
Hint: Use Theorem 2.32 (p. 135) for (b) and (c).
12. Let D be the differentiation operator on P(/?), the space of polynomials
over R. Prove that there exists no polynomial g(t) for which </(D) = T0.
Hence D has no minimal polynomial.
13. Let T be a linear operator on a finite-dimensional vector space, and
suppose that the characteristic polynomial of T splits. Let Ai, A2,..., Afc
be the distinct eigenvalues of T, and for each i let pi be the order of the
largest Jordan block corresponding to Xi in a Jordan canonical form of
T. Prove that the minimal polynomial of T is
(t-x1r(t-x2y^.--(t-xk)p^.
The following exercise requires knowledge of direct sums (see Section 5.2).

;
524 Chap. 7 Canonical Forms
14. Let T be linear operator on a finite-dimensional vector space V, and
let Wi and W2 be T-invariant subspaces of V such that V = W| ©W2.
Suppose that pi(t) and p2(t) are the minimal polynomials of Tw, and
Tw2, respectively. Prove or disprove; that p\{t)p2(t) is the minimal
polynomial of T.
Exercise 15 uses the following definition.
Definition. Let T be a linear operator on a linite-dimensional vector
space V, and let x be a nonzero vector in V. The polynomial p(t) is called
a T-annihilator of x if p(t) is a monic polynomial of least degree for which
p(T)(x) = 0.
15. * Let T be a linear operator on a finite-dimensional vector space V, and
let x be a nonzero vector in V. Prove the following results.
(a) The vector x has a unique T-annihilator.
(b) The T-annihilator of X divides any polynomial git) for which
9(T) = T().
(c) If p(t) is the T-annihilator of x and W is the T-cyclic subspace
generated by x, then />(<) is the minimal polynomial of Tw, and
dim(W) equals the degree of p(t).
(d) The degree of the T-annihilator of x is I if and only if x is an
eigenvector of T.
16. T be a linear operator on a finite-dimensional vector space V, and let
Wi be a T-invariant subspace of V. Let x G V such that .;• ^ Wi. Prove
the following results.
(a) There exists a unique monic polynomial g\ (t) of least positive de
gree such that 0] (T)(.T) G WI.
(b) If fi.(t) is a polynomial for which h(T)(x) G Wi, then gi(t) divides
h(t).
(c) gi(t) divides the minimal and the characteristic polynomials of T.
(d) Let W2 be a T-invariant subspace of V such that W2 C W^ and
let g2(t) be the unique monic polynomial of least degree such that
g2(T)(x) G W2. Then gi(t) divides g2(t).
7.4* THE RATIONAL CANONICAL FORM
Until now we have used eigenvalues, eigenvectors, and generalized eigenvec
tors in our analysis of linear operators with characteristic polynomials that
split. In general, characteristic polynomials need not split, and indeed, oper
ators need not have eigenvalues! However, the unique factorization theorem
for polynomials (see Appendix E) guarantees that the characteristic polyno
mial f(t) of any linear operator T on an n-dimensional vector space factors

Sec. 7.4 The Rational Canonical Form 525
uniquely as
f(t) = (-i)n(Mt)r(<h(t))n2---(Mmnk
where the </>i(£)'s (1 < i < k) are distinct irreducible monic polynomials and
the ni's are positive integers. In the case that f(t) splits, each irreducible
monic polynomial factor is of the form </»j(i) = t — Xi, where Aj is an eigenvalue
of T, and there is a one-to-one correspondence between eigenvalues of T and
the irreducible monic factors of the characteristic polynomial. In general,
eigenvalues need not exist, but the irreducible monic factors always exist. In
this section, we establish structure theorems based on the irreducible monic
factors of the characteristic polynomial instead of eigenvalues.
In this context, the following definition is the appropriate replacement for
eigenspace and generalized eigenspace.
Definition. Let T be a linear operator on a finite-dimensional vector
space V with characteristic polynomial
f(t) = (-irOi(*))ni (Mt))n* • • • (MW,
where the <t>i(t)'s (1 < i < k) arc distinct irreducible monic polynomials and
the ni 's are positive integers. For 1 i(t) = t — X is of degree one, then K^ is the generalized eigenspace of T
corresponding to the eigenvalue A.
Having obtained suitable generalizations of the related concepts of eigen
value and eigenspace, our next task is to describe a canonical form of a linear
operator suitable to this context. The one that we study is called the rational
canonical form. Since a canonical form is a description of a matrix represen
tation of a linear operator, it can be defined by specifying the form of the
ordered bases allowed for these representations.
Here the bases of interest naturally arise from the generators of certain
cyclic subspaces. For this reason, the reader should recall the definition of
a T-cyclic subspace generated by a vector and Theorem 5.22 (p. 315). We
briefly review this concept and introduce some new notation and terminology
Let T be a linear operator on a finite-dimensional vector space V, and let
x be a nonzero vector in V. We use the notation Cx for the T-cyclic subspace
generated by x. Recall (Theorem 5.22) that if dim(Ca;) = k, then the set
{^T^.T2^),...,^-1^)}
is an ordered basis for Cr- To distinguish this basis from all other ordered
bases for Cx, we call it the T-cyclic basis generated by x and denote it by

526 Chap. 7 Canonical Forms
0X. Let A be the matrix representation of the restriction of T to Cx relative
to the ordered basis 0X. Recall from the proof of Theorem 5.22 that
A
/o
1
0
0 •
0 •
1 •
• 0
• 0
• 0
-ao
-ai
-a2
xo 0
where
aox + a\T(x) +

1 -afc-i/
ak-lTk-l{x) + Tk(x) = 0.
Furthermore, the characteristic polynomial of A is given by
det(i4 - ti) = (-l)fc(a0 + axt + • • • + ak-it k-
The matrix A is called the companion matrix of the monic polynomial
h(t) = ao + ai/. + ••• + ak-ifk~1 + tk. Every monic polynomial has a com
panion matrix, and the characteristic polynomial of the companion matrix of
a monic polynomial g(t) of degree k is equal to ( — l)kg(t). (See Exercise 19
of Section 5.4.) By Theorem 7.15 (p. 519), the monic polynomial h(t) is also
the minimal polynomial of A. Since A is the matrix representation of the
restriction of T to Cx, h(t) is also the minimal polynomial of this restriction.
By Exercise 15 of Section 7.3, h(t) is also the T-annihilator of x.
It is the object of this section to prove that for every linear operator T
on a finite-dimensional vector space V, there exists an ordered basis 0 for V
such that the matrix representation [T]^ is of the form
(Cl O
o c2
\o o
o
Cr)
where each C7; is the companion matrix of a polynomial (<t>(t))m such that <j)(t)
is a monic irreducible divisor of the characteristic polynomial of T and m is
a positive integer. A matrix representation of this kind is called a rational
canonical form of T. We call the accompanying basis a rational canonical
basis for T.
The next theorem is a simple consequence of the following lemma, which
relies on the concept of T-annihilator, introduced in the Exercises of Sec
tion 7.3.
Lemma. Let T be a linear operator on a finite-dimensional vector space
V, let x be a nonzero vector in V, and suppose that the T-annihilator of x
is of the form ((j)(t) )v for some irreducible monic polynomial <j)(t). Then (f>(t)
divides the minimal polynomial of T, and x G K&.

~=
Sec. 7.4 The Rational Canonical Form 527
Proof. By Exercise 15(b) of Section 7.3, ((t))p divides the minimal poly
nomial of T. Therefore 4>(t) divides the minimal polynomial of T. Further
more, x G K,p by the definition of K^,. |
Theorem 7.17. Let T be a linear operator on a finite-dimensional vector
space V, and let 0 be an ordered basis for V. Then 0 is a rational canonical
basis for T if and only if 0 is the disjoint union of T-cyclic bases 0Vi, where
each Vi lies in K^ for some irreducible monic divisor <p(t) of the characteristic
polynomial ofT.
Proof. Exercise. Ill
Example 1
Suppose that T is a linear operator on R8 and
0= {vi,v2,v3,V4,v5,v6,v7,v8}
is a rational canonical basis for T such that
C=[Tb =
( o
1
0
0
0
0
0
v 0
-3
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
-1
0
-2
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
-1
oy
is a rational canonical form of T. In this case, the submatrices C\, C2, and
Cz are the companion matrices of the polynomials <f>\ (t), (cj)2(t))2, and 02(£),
respectively, where
<j)l(t)=t2 -t + 3 and 4>2(t) = t2 + 1.
In the context of Theorem 7.17, 0 is the disjoint union of the T-cyclic bases;
that is,
0 = 0VI U 0V3 U 0V7
= {vi,v2} U {v3,v4,v5, v6} U {v7,v8}.
By Exercise 40 of Section 5.4, the characteristic polynomial f(t) of T is the
product of the characteristic polynomials of the companion matrices:
f(t) = Mt)(h(t))2Mt) = Mt)(Mt))3- •

528 Chap. 7 Canonical Forms
The rational canonical form C of the operator T in Example 1 is con
structed from matrices of the form C\, each of which is the companion matrix
of some power of a monic irreducible divisor of the characteristic polynomial
of T. Furthermore, each such divisor is used in this way at least once.
In the course of showing that, every linear operator T on a finite dimen
sional vector space has a rational canonical form C, we show that the com
panion matrices C, that constitute C are always constructed from powers of
the monic irreducible divisors of the characteristic polynomial of T. A key
role in our analysis is played by the subspaces K^, where 4>(t) is an irreducible
monic divisor of the minimal polynomial of T. Since the minimal polynomial
of an operator divides tin1 characteristic polynomial of the operator, every ir
reducible divisor of the former is also an irreducible divisor of the latter. We
eventually show that the converse is also true: that is. the minimal polynomial
and the characteristic polynomial have the same irreducible divisors.
We begin with a result that lists several properties of irreducible divisors
of the minimal polynomial. The reader is advised to review the definition of
T-annihilator and the accompanying Exercise 15 of Section 7.3.
Theorem 7.18. Let T lie a linear operator on a finite-dimensional vector
space V. and suppose that
p(t) = (Mt))mi(Mt)r3---(Mt))mk
is the minimal polynomial ofT. where the <f>i(t)*s (1 < i < k) arc the distinct
irreducible monic factors of pit) and the nij 's are positive integers. Then the
following statements are true.
(a) K0( i.s a nonzero T-invariant subspace of V for each i.
(b) If x is a nonzero vector in some K^,. then the T-annihilator of x is of
the form ((j>i(t))p for some integer p.
(c) K0, nKlh = {()} fori ^j. .
(d) K0, is invariant under (f>j(T) for i ^ j, and the restriction of 0;(T) to
K(j>j is one-to-one and onto.
(e) K,,' - N((0,(T))'"<) for each i.
Proof. If k = 1, then (a), (b), and (e) are obvious, while (c) and (d) are
vacuously true. Now suppose that k > 1.
(a) The proof that Kfl>i is a T-invariant subspace of V is left as an exer
cise. Let fi(t) be the polynomial obtained from p(t) by omitting the factor
(0;(/))'"'. To prove that K^. is nonzero, first observe that /,(/) is a proper di
visor of pit): therefore there exists a vector z G V such that x — fi(T)(z) ^ 0.
Then x G K,.,( because
(i(T))mi(x) = (d>i(T))mifi(T)(z)=p(T)(z) = 0.
(b) Assume the hypothesis. Then (<f>i(T))q(x) = 0 for some positive in
teger q. Hence the T-annihilator of x divides (4>%(t))q by Exercise 15(b) of
Section 7.3. and the result follows.

Sec. 7.4 The Rational Canonical Form 529
(c) Assume i ^ j. Let x G K</)( HK^., and suppose that x ^ 0. By (b), the
T-annihilator of;/: is a power of both 4>i(t) and <I)j(t). But this is impossible
because (j>i(t) and <f>j(t) are relatively prime (see Appendix E). We conclude
that x = 0.
(d) Assume / ^ j. Since K<Pi is T-invariant, it is also 0y(T)-invariant.
Suppose that 0_,-(T)(:/:) = 0 for SOUK; X G K0/. Then x G K^. D K<t>j = {<)}
by (c). Therefore the restriction of 0y(T) to K0i is one-to-one. Since V is
finite-dimensional, this restriction is also onto.
(e) Suppose that 1 < i < k. Clearly. N((0,(T))'"') C K^,. Let fc(t) be the
polynomial defined in (a). Since fj(t) is a product of polynomials of the form
(pj(i) for j ^ i, we have by (d) that the restriction of /,(T) to K0i is onto.
Let x G Kfa. Then there exists y G K0i such that fj(T)(y) = x. Therefore
((MVr-Kx) = «<P,iT)y"')fiiT)iy) =p(T)(y) = 0,
and hence x G N(0;(T))m'). Thus K0. = N((^(T))'"')- I
Since a rational canonical basis for an operator T is obtained from a union
of T-cyclic bases, we need to know when such a union is linearly independent.
The next major result, Theorem 7.19, reduces this problem to the study of
T-cyclic bases within K^, where <p(t) is an irreducible monic divisor of the
minimal polynomial of T. We begin with the following lemma.
Lemma. Let T be a linear operator on a finite-dimensional vector space
V, and suppose that
p(t) = (Mt))mi(<h(t))m2---(Mt))mk
is the minimal polynomial of T. where the 0,; 's (1 < i < k) are the dis
tinct irreducible monic factors of pit) and the rrii 's are positive integers. For
1 1. Consider
any i. Let fi(t) be the polynomial obtained from p(t) by omitting the factor
(<t>i(t))"1' • As a consequence of Theorem 7.18. /i(T) is one-to-one on K^., and
fr(V(vj) = 0 for i / j. Thus, applying /,(T) to (2), we obtain /„(T)(«0 = 0,
from which it follows that Vi = 0. I
Theorem 7.19. Let T be a linear operator on a finite-dimensional vector
space V, and suppose that
p(t) = (Mt))mi(Mt))m3---(</>k(t)y
rn/,

530 Chap. 7 Canonical Forms
is the minimal polynomial of T, where the 0j 's (1 < i < fc) are the dis
tinct irreducible monic factors ofp(t) and the 7m's are positive integers. For
1 < i < k, let Si be a linearly independent subset of K0i. Then
(a) 5j n 5j = 0 for i ^ j
(b) S\ U S2 U • • • U Sfc is linearly independent.
Proof. If A: — 1, then (a) is vacuously true and (b) is obvious. Now
suppose that k > 1. Then (a) follows immediately from Theorem 7.18(c).
Furthermore, the proof of (b) is identical to the proof of Theorem 5.8 (p. 267)
with the eigenspaces replaced by the subspaces K^,. i
In view of Theorem 7.19. we can focus on bases of individual spaces of
the form K(/)(/), where 0(f) is an irreducible monic divisor of the minimal
polynomial of T. The next several results give us ways to construct bases for
these spaces that are unions of T-cyclic bases. These results serve the dual
purposes of leading to the existence theorem for the rational canonical form
and of providing methods for constructing rational canonical bases.
For Theorems 7.20 and 7.21 and the latter's corollary, we fix a linear
operator T on a finite-dimensional vector space V and an irreducible monic
divisor 6(t) of the minimal polynomial of T.
Theorem 7.20. Let v\, v2,..., vk be distinct vectors in K,p such that
Si =0VlU 0V2 U • • • U 0Vk
is linearly independent. For each i, choose wi G V such that 0(T)(w^) = t'j.
Then
S2 = 0un U 0W2 U • • • U 0Wk
is also linearly independent.
Proof. Consider any linear combination of vectors in 52 that sums to zero,
say,
< = 1 j=0
For each i. let /,-(/) be the polynomial defined by
(3)
/i(t)=yw.
1=0
Then (3) can be rewritten as
£/<(T)(tm) = 0.
(4)
i=

Sec. 7.4 The Rational Canonical Form
Apply 0(T) to both sides of (4) to obtain
531
I>0O/<(T)(ti;0 = J>0>(T)(u,.) = X>(T)(«0 = »•
i=i i=l i=
This last sum can be rewritten as a linear combination of the vectors in S
so that each /»(T)(«j) is a linear combination of the vectors in 0Vi. Since Si
is linearly independent, it follows that
fi(T)(vi) = 0 for all*.
Therefore the T-annihilator of vi divides fi(t) for all i. (See Exercise 15 of
Section 7.3.) By Theorem 7.18(b), 0(f) divides the T-annihilator of Vi, and
hence 0(f) divides fi(t) for all i. Thus, for each i, there exists a polynomial
gi(t) such that fi(t) = <?j(f)0(f). So (4) becomes
k k
X>(T)0(T)K) = £>(T)M - 0.
i=l j=l
Again, linear independence of 5i requires that
/i(T)(tiK)=fl|,(T)(«i) = 0 far all i.
But fi(T)(wi) is the result of grouping the terms of the linear combination
in (3) that arise from the linearly independent set 0Wi. We conclude that for
each i, aij = 0 for all j. Therefore <S2 is linearly independent. 1
We now show that K0 has a basis consisting of a union of T-cycles.
Lemma. Let W be a T-invariant subspace ofK^, and let 0 be a basis for
W. Then the following statements are true.
(a) Suppose that x G N(0(T)), but x (£ W. Then 0 U 0X is linearly inde
pendent.
(b) For some ttfi,w2, • • • ,ws in N(0(T)), 0 can be extended to the linearly
independent set
0' = 0U0W1 U0W2U---U0Wa,
whose span contains N(0(T)).
Proof, (a) Let 0 = {vi, v2,..., vk}, and suppose that
yZ aiVi + z = 0
i=i
and z = 2_\ fy ~^j (x)'
3=0

532 Chap. 7 Canonical Forms
where d is the degree of 0(f). Then z G Cx D W, and hence C2 C Cx n W.
Suppose that z ^ 0. Then 2 has 0(f) as its T-annihilator, and therefore
d = dim(Q) < dim(Cx nW)< dim(Cx) = d.
It follows that Cx PI W = Cx, and consequently x G W, contrary to hypothesis.
Therefore z = 0, from which it follows that bj — 0 for all j. Since 0 is
linearly independent, it follows that a^ = 0 for all i. Thus 0 U 0X is linearly
independent.
(b) Suppose that W does not contain N(0(T)). Choose a vector tui G
N(0(f)) that is not in W. By (a), 0\ = 0 U 0Wl is linearly independent.
Let Wi = span(/?i). If Wi does not contain N(0(f)), choose a vector u;2 in
N(0(f)), but not in Wi, so that /32 = 0\ D0W2 = 0U0Wl Li0W2 is linearly inde
pendent. Continuing this process, we eventually obtain vectors w\, w2,..., ws
in N(0(T)) such that the union
0'=0U0WlU0W2U---U0Wa
is a linearly independent set whose span contains N(0(T)). |
Theorem 7.21. If the minimal polynomial ofT is of the form p(t) =
(0(f) )m, then there exists a rational canonical basis for T.
Proof. The proof is by mathematical induction on m. Suppose that m = 1.
Apply (b) of the lemma to W = {0} to obtain a linearly independent subset
of V of the form 0Vl U 0V2 U • • • U 0Vk, whose span contains N(0(T)). Since
V = N(0(T)), this set is a rational canonical basis for V.
Now suppose that, for some integer m > 1, the result is valid whenever the
minimal polynomial of T is of the form (0(T))fc, where k < m, and assume
that the minimal polynomial of T is p(t) = (0(f) )m. Let r = rank(0(T)).
Then R(0(T)) is a T-invariant subspace of V, and the restriction of T to this
subspace has (0(f))m_1 as its minimal polynomial. Therefore we may apply
the induction hypothesis to obtain a rational canonical basis for the restriction
of T to R(T). Suppose that v\, v2,... ,vk are the generating vectors of the
T-cyclic bases that constitute this rational canonical basis. For each i, choose
Wi in V such that Vi = cf>(T)(wi). By Theorem 7.20, the union 0 of the sets 0Wi
is linearly independent. Let W = span(/3). Then W contains R(0(T)). Apply
(b) of the lemma and adjoin additional T-cyclic bases Ai,fc+1,/?u,fc+2,... ,0W„
to 0, if necessary, where w^ is in N(0(T)) for i > k, to obtain a linearly
independent set
0' = 0Wl U0W2 U •• • U0Wk U • • • U0Wa
whose span W contains both W and N(0(T)).

Sec. 7.4 The Rational Canonical Form 533
We show that W = V. Let U denote the restriction of 0(T) to W, which
is 0(T)-invariant. By the way in which W was obtained from R(0(T)), it
follows that R(U) = R(0(T)) and N(U) = N(0(T)). Therefore
dim(W') = rank(U) + nullity(U)
= rank(0(T)) + nullity(0(T))
= dim(V).
Thus W = V, and 0' is a rational canonical basis for T. |
Corollary. K0 has a basis consisting of the union of T-cyclic bases.
Proof. Apply Theorem 7.21 to the restriction of T to K^. I
We are now ready to study the general case.
Theorem 7.22. Every linear operator on a finite-dimensional vector space
has a rational canonical basis and, hence, a rational canonical form.
Proof. Let T be a linear operator on a finite-dimensional vector space V,
and let p(t) = (0i(f))mi(02(*))m2 • • • (<f>k(t))mk be the minimal polynomial
of T, where the 0i(f)'s are the distinct irreducible monic factors of p(t) and
mi > 0 for all i. The proof is by mathematical induction on k. The case
k — 1 is proved in Theorem 7.21.
Suppose that the result is valid whenever the minimal polynomial contains
fewer than k distinct irreducible factors for some k > 1, and suppose that p(t)
contains k distinct factors. Let U be the restriction of T to the T-invariant
subspace W = R((0fc(T)mfc), and let q(t) be the minimal polynomial of U.
Then q(t) divides p(t) by Exercise 10 of Section 7.3. Furthermore, 0fc(f) does
not divide q(t). For otherwise, there would exist a nonzero vector x G W such
that 0fc(U)(:r) = 0 and a vector y G V such that x = (4>k{T))mk(y). It follows
that (MT))mk + 1(y) = 0, and hence y G K^ and x = f>fc(T))m*(y) =
0 by Theorem 7.18(e), a contradiction. Thus q(t) contains fewer than k
distinct irreducible divisors. So by the induction hypothesis, U has a rational
canonical basis 0\ consisting of a union of U-cyclic bases (and hence T-cyclic
bases) of vectors from some of the subspaces K0i, 1 < i < k — 1. By the
corollary to Theorem 7.21, K0fc has a basis 02 consisting of a union of T-
cyclic bases. By Theorem 7.19, 0\ and 02 are disjoint, and 0 = 0i U/?2 is
linearly independent. Let s denote the number of vectors in 0. Then
s = dim(R((0fc(T))Wfc)) + dim(K0fc)
= rank((0fc(T)r*) + nullity((0fc(T))TOfc)
= n.
We conclude that 0 is a basis for V. Therefore 0 is a rational canonical basis,
and T has a rational canonical form. I

534 Chap. 7 Canonical Forms
In our study of the rational canonical form, we relied on the minimal
polynomial. We are now able to relate the rational canonical form to the
characteristic polynomial.
Theorem 7.23. Let T be a linear operator on an n-dimensional vector
space V with characteristic polynomial
/(f) = (-i)»(fc(t)r (Mt)r • • • (Mt))nk,
where the (pi(t)'s (I < i < k) are distinct irreducible monic polynomials and
the Hi's are positive integers. Then the following statements are true.
(a) 0i(f), 02(f),... , 0fc(f) are the irreducible monic factors of the minimal
polynomial.
(b) For each i, dim(K0i) = diUi, where di is the degree of 0i(f).
(c) If 0 is a rational canonical basis for T, then 0i = 0C\ K0i is a basis for
K0i for each i.
(d) If 7J is a basis for K0i for each i, then 7 = 71 U 72 U • • • U 7fc is a basis
for V. In particular, if each 7$ is a disjoint union of T-cyclic bases, then
7 is a rational canonical basis for T.
Proof, (a) By Theorem 7.22, T has a rational canonical form C. By
Exercise 40 of Section 5.4, the characteristic polynomial of C, and hence of
T, is the product of the characteristic polynomials of the companion matrices
that compose C. Therefore each irreducible monic divisor 0i(f) of /(f) divides
the characteristic polynomial of at least one of the companion matrices, and
hence for some integer p, (0j(f))p is the T-annihilator of a nonzero vector of
V. We conclude that (0j(f))p, and so 0j(f), divides the minimal polynomial
of T. Conversely, if 0(f) is an irreducible monic polynomial that divides the
minimal polynomial of T, then 0(f) divides the characteristic polynomial of
T because the minimal polynomial divides the characteristic polynomial.
(b), (c), and (d) Let C — \T]p, which is a rational canonical form of T.
Consider any i, (1 < i < k). Since /(f) is the product of the characteristic
polynomials of the companion matrices that compose C, we may multiply
those characteristic polynomials that arise from the T-cyclic bases in 0i to
obtain the factor (0j(f))ni of /(f). Since this polynomial has degree nidi, and
the union of these bases is a linearly independent subset 0i of K0i, we have
nidi < dim(K0i).
Furthermore, n = 2_.diUi, because this sum is equal to the degree of /(f).
i=i
Now let s denote the number of vectors in 7. By Theorem 7.19, 7 is linearly
independent, and therefore
= y^dini < / ]dim(K^i) = s < n.
i=i i=i

Sec. 7.4 The Rational Canonical Form 535
Hence n = s, and dini = dim(K()f,i) for all i. It follows that 7 is a basis for V
and 0i is a basis for K^ for each i. I
Uniqueness of the Rational Canonical Form
Having shown that a rational canonical form exists, we are now in a po
sition to ask about the extent to which it is unique. Certainly, the rational
canonical form of a linear operator T can be modified by permuting the T-
cyclic bases that constitute the corresponding rational canonical basis. This
has the effect of permuting the companion matrices that make up the rational
canonical form. As in the case of the Jordan canonical form, we show that
except for these permutations, the rational canonical form is unique, although
the rational canonical bases are not.
To simplify this task, we adopt the convention of ordering every rational
canonical basis so that all the T-cyclic bases associated with the same irre
ducible monic divisor of the characteristic polynomial are grouped together.
Furthermore, within each such grouping, we arrange the T-cyclic bases in
decreasing order of size. Our task is to show that, subject to this order, the
rational canonical form of a linear operator is unique up to the arrangement
of the irreducible monic divisors.
As in the case of the Jordan canonical form, we introduce arrays of dots
from which we can reconstruct the rational canonical form. For the Jordan
canonical form, we devised a dot diagram for each eigenvalue of the given
operator. In the case of the rational canonical form, we define a dot diagram
for each irreducible monic divisor of the characteristic polynomial of the given
operator. A proof that the resulting dot diagrams are completely determined
by the operator is also a proof that the rational canonical form is unique.
In what follows, T is a linear operator on a finite-dimensional vector space
with rational canonical basis 0; 0(f) is an irreducible monic divisor of the char
acteristic polynomial of T; 0Vl, 0V2,..., 0Vk are the T-cyclic bases of 0 that
are contained in K^; and d is the degree of 0(f). For each j, let (0(f))Pj be the
annihilator of Vj. This polynomial has degree dpj\ therefore, by Exercise 15
of Section 7.3, 0Vj contains dpj vectors. Furthermore, pi > p2 > • • • > pk
since the T-cyclic bases are arranged in decreasing order of size. We define
the dot diagram of 0(f) to be the array consisting of k columns of dots with
Pj dots in the jib. column, arranged so that the jfth column begins at the top
and terminates after pj dots. For example, if k = 3, pi = 4, p2 = 2, and
P3 = 2, then the dot diagram is
Although each column of a dot diagram corresponds to a T-cyclic basis

536 Chap. 7 Canonical Forms
0Vi in K(/,, there are fewer dots in the column than there are vectors in the
basis.
Example 2
Recall the linear operator T of Example 1 with the rational canonical basis
ii and the rational canonical form C — [T]/j. Since there are two irreducible
monic divisors of the characteristic polynomial of T, 0i(f) = t — t + 3 and
02(f) = t + 1, there are two dot diagrams to consider. Because 0i(f) is
the T-annihilator of V\ and 0Vl is a basis for K0,. the dot diagram for 0i(f)
consists of a single dot. The other two T cyclic bases, 0V3 and 0Vl, lie in K02.
Since V3 has T-annihilator (02(f))2 and V7 has T-annihilator 02(f), in the dot
diagram of 02(f) we have p\ = 2 and p2 = 1. These diagrams are as follows:
Dot diagram for 0i(f) Dot diagram for <p2(t) •
In practice, we obtain the rational canonical form of a linear operator
from the information provided by dot diagrams. This is illustrated in the
next example.
Example 3
Let T be a linear operator on a finite-dimensional vector space over R, and
suppose that the irreducible monic divisors of the characteristic polynomial
of T are
01(f)=f-l, 02(f) = f2+2, and 4>-sit) = t2 + t + I.
Suppose, furthermore, that the dot diagrams associated with these divisors
are as follows:
Diagram for 0i(f) Diagram for <j>2(t) Diagram for 03(f)
Since the dot diagram for Oi (f) has two columns, it contributes two companion
matrices to the rational canonical form. The first column has two dots, and
therefore corresponds to the 2x2 companion matrix of (0i(f))2 = (f — l)2.
The second column, with only one dot, corresponds to the lxl companion
matrix of 0i(£) = t — 1. These two companion matrices are given by
Ci =
0 -1
1 2
and C2 = ().
The dot diagram for 02(f) = f2 + 2 consists of two columns, each containing a
single dot; hence this diagram contributes two copies of the 2x2 companion

—z
MET
Sec. 7.4 The Rational Canonical Form
matrix for 02(f), namely,
C-x ~ CA —
537
The dot diagram for 03(f) = f2 4- f + 1 consists of a single column with a
single dot contributing the single 2x2 companion matrix
C, =
0 -1
1 -ly
Therefore the rational canonical form of T is the 9x9 matrix
C =
(Cx 0 O O o
O C2 O O O
O 0 c3 o o
o o o c4 o
\o o O O C5J
( o
1
0
0
0
0
0
0
v 0
-1
2
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
-2
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
-2
0
0
0
0
0
0
0
0
0
0
0
1
o >
0
0
0
0
0
0
-1
-1 J
We return to the general problem of finding dot diagrams. As we did
before, we fix a linear operator T on a finite-dimensional vector space and an
irreducible monic divisor 0(f) of the characteristic polynomial of T. Let U
denote the restriction of the linear operator 0(T) to K^. By Theorem 7.18(d),
\Jq = TQ for some positive integer q. Consequently, by Exercise 12 of Sec
tion 7.2, the characteristic polynomial of U is (-l)wfm, where m = dim(K^).
Therefore K^ is the generalized eigenspace of U corresponding to A = 0, and
U has a Jordan canonical form. The dot diagram associated with the Jordan
canonical form of U gives us a key to understanding the dot diagram of T
that is associated with 0(f). We now relate the two diagrams.
Let 0 be a rational canonical basis for T, and 0Vl, 0V2,..., 0Vk be the T-
cyclic bases of 0 that are contained in K0. Consider one of these T-cyclic
bases 0Vj, and suppose again that the T-annihilator of Vj is (4>(t))Pj. Then
0Vj consists of dpj vectors in 0. For 0 < i < d, let 7; be the cycle of
generalized eigenvectors of U corresponding to A = 0 with end vector TZ(VJ),

538 Chap. 7 Canonical Forms
where T°(Vj) = bj. Then
H = {(<f>(T)y,J-lVivj). (0(T))^-2T'(r,),.. •, (<PiT))Tivj),Tivj)}.
By Theorem 7.1 (p. 485), 7; is a linearly independent subset of C,v Now let
otj = 70U71 U---U7d_i.
Notice that otj contains pjd vectors.
Lemma 1. otj is an ordered basis for CVj.
Proof. The key to this proof is Theorem 7.4 (p. 487). Since ctj is the union
of cycles of generalized eigenvectors of U corresponding to A = 0, it suffices
to show that the set of initial vectors of these cycles
{(0(T)r-1^), (0(T))^-1T(tV) (0(T))^-1P'"1(f/)}
is linearly independent. Consider any linear combination of these vectors
aoW))*-1^) + a1(0(T))*-1T(i;i) + • • • -r-ad-1(0(T))^-1Td-1(t;i),
where not all of the coefficients are zero. Let g(t) be the polynomial defined
by g(t) — ao + a + • • • + a<i-ifd_1- Then g(t) is a nonzero polynomial of
degree less than d, and hence (<l>(t))Pj~lg(t) is a nonzero polynomial with
degree less than pjd. Since (0(f))Pj is the T-annihilator of Vj, it follows
that (0(T))pJ_1o(T)(vj) ^ 0. Therefore the set of initial vectors is linearly
independent. So by Theorem 7.4, otj is linearly independent, and the; 7J'S are
disjoint. Consequently, otj consists oipjd linearly independent vectors in Cv ,
which has dimension pjd. We conclude that otj is a basis for Cv.. H
Thus we may replace 0V, by otj as a basis for Cv . We do this for each j
to obtain a subset o: = Qi U Q:2 • • • U afc of K^,.
Lemma 2. a is a Jordan canonical basis for K^.
Proof. Since 0Vi U 0V2 U • • • U 0Vk is a basis for K^, and since span(aj) =
span(/3Wi) = CVi, Exercise 9 implies that a is a basis for K0. Because a is
a union of cycles of generalized eigenvectors of U, we conclude that a is a
Jordan canonical basis. If]
We are now in a position to relate the dot diagram of T corresponding to
0(f) to the dot diagram of U, bearing in mind that in the first case we are
considering a rational canonical form and in the second case we are consider
ing a Jordan canonical form. For convenience, we designate the first diagram
Di, and the second diagram D2. For each j, the presence of the T-cyclic
basis 0X. results in a column of pj dots in D\. By Lemma 1, this basis is

Sec. 7.4 The Rational Canonical Form 539
replaced by the union otj of d cycles of generalized eigenvectors of U, each of
length Pj, which becomes part of the Jordan canonical basis for U. In effect,
otj determines d columns each containing pj dots in D2- So each column in
Di determines d columns in D2 of the same length, and all columns in JD2 are
obtained in this way. Alternatively, each row in D2 has d times as many dots
as the corresponding row in D\. Since Theorem 7.10 (p. 500) gives us the
number of dots in any row of D2, we may divide the appropriate expression
in this theorem by d to obtain the number of dots in the corresponding row
of D\. Thus we have the following result.
Theorem 7.24. Let T be a linear operator on a finite-dimensional vector
space V, let 0(f) be an irreducible monic divisor of the characteristic poly
nomial ofT of degree d, and let r; denote the number of dots in the ith row
of the dot diagram for 0(f) with respect to a rational canonical basis for T.
Then
(a) n = i[dim(V) - rank(0(T))]
(b) n = i[rank((0(T))i-1) - rank((0(T))i)l for % > 1.
d
Thus the dot diagrams associated with a rational canonical form of an op
erator are completely determined by the operator. Since the rational canoni
cal form is completely determined by its dot diagrams, we have the following
uniqueness condition.
Corollary. Under the conventions described earlier, the rational canonical
form of a linear operator is unique up to the arrangement of the irreducible
monic divisors of the characteristic polynomial.
Since the rational canonical form of a linear operator is unique, the poly
nomials corresponding to the companion matrices that determine this form
are also unique. These polynomials, which are powers of the irreducible monic
divisors, are called the elementary divisors of the linear operator. Since a
companion matrix may occur more than once in a rational canonical form,
the same is true for the elementary divisors. We call the number of such
occurrences the multiplicity of the elementary divisor.
Conversely, the elementary divisors and their multiplicities determine the
companion matrices and, therefore, the rational canonical form of a linear
operator.
Example 4
Let
0 = {ex cos 2x, ex sin 2x, xex cos 2x, xex sin 2x}

540 Chap. 7 Canonical Forms
( 1
-2
0
^ 0
2
1
0
0
1 0
0 1
1 2
-2 V
be viewed as a subset of J-(R, R), the space of all real-valued functions defined
on R, and let V = span (/if). Then V is a four-dimensional subspace oiT(R, R),
and 0 is an ordered basis for V. Let D be the linear operator on V defined by
D(y) = y', the derivative of y, and let A — [D]^. Then
A =
and the characteristic polynomial of D, and hence of A, is
/(f) = (f2-2f + 5)2.
Thus 0(f) = f2 — 2f -f-5 is the only irreducible monic divisor of /(f). Since 0(/,)
has degree 2 and V is four-dimensional, the dot diagram for 0(f) contains only
two dots. Therefore the dot diagram is determined by ri, the number of dots
in the first row. Because ranks are preserved under matrix representations,
we can use A in place of D in the formula given in Theorem 7.24. Now
4>(A) =
(0 0 0 4
0 0-40
0 0 0 0
\0 0 0 o/
and so
r1 = l[4-rank(0(A))] = i[4-2] = l.
It follows that the second dot lies in the second row, and the dot diagram is
as follows:
Hence V is a D-cyclic space generated by a single function with D-annihilator
(0(f))2. Furthermore, its rational canonical form is given by the companion
matrix of (0(f))2 = f4 - 4i3 + 14f2 - 20f + 25, which is
/() 0 0 -25
1 0 0 20
0 1 0 -14
\0 0 1 4/
Thus (0(f))2 is the only elementary divisor of D, and it has multiplicity 1. For
the cyclic generator, it suffices to find a function g in V for which 0(D) (o) ^ 0.

•*•
Sec. 7.4 The Rational Canonical Form 541
Since 0(^4)(e3) ^ 0, it follows that 0(D)(a;excos2a:) ^ 0; therefore g(x) =
xex cos 2x can be chosen as the cyclic generator. Hence
0g = {xex cos 2x, D(xex cos 2x), D2(xex cos 2x), D3(xex cos 2x)}
is a rational canonical basis for D. Notice that the function h defined by
h(x) = xex sin 2x can be chosen in place of g. This shows that the rational
canonical basis is not unique. •
It is convenient to refer to the rational canonical form and elementary
divisors of a matrix, which are defined in the obvious way.
Definitions. Let A G Mnxn(F). The rational canonical form of
A is defined to be the rational canonical form of LA- Likewise, for A, the
elementary divisors and their multiplicities are the same as those of LA •
Let A be an n x n matrix, let C be a rational canonical form of A, and let
0 be the appropriate rational canonical basis for L^. Then C = [LA\P, and
therefore A is similar to C. In fact, if Q is the matrix whose columns are the
vectors of 0 in the same order, then Q~XAQ = C.
Example 5
For the following real matrix A, we find the rational canonical form C of A
and a matrix Q such that Q~lAQ — C.
A =
V
0
1
1
1
1
2 0
-2 0
0 1
-2 1
-4 3
-6
0
-3
-1
-3
2
2
2
2
*)
The characteristic polynomial of A is /(f) = — (f2 + 2)2(f — 2); therefore
01 (f) = f2 + 2 and 02(f) = f — 2 are the distinct irreducible monic divisors of
/(f). By Theorem 7.23, dim(K0,) = 4 and dim(K02) = 1. Since the degree
of 0i (f) is 2, the total number of dots in the dot diagram of 0i(f) is 4/2 = 2,
and the number of dots n in the first row is given by
n = 4[dim(R5) - rank(0i (A))]
= i[5-rank(,42 + 2/)]
= |[5-1]=2.
Thus the dot diagram of 0i (f) is

542 Chap. 7 Canonical Forms
and each column contributes the companion matrix
0 -2
1 0
for 0i (f) = f2 + 2 to the rational canonical form C. Consequently 0i(f) is an
elementary divisor with multiplicity 2. Since dim(K0;2) = 1, the dot diagram
of 02(f) = f — 2 consists of a single dot, which contributes the lxl matrix
(2). Hence 02(f) is an elementary divisor with multiplicity 1. Therefore the
rational canonical form C is
C =
/ 0
1
0
0
V o
-2
0
0
0
0
0
0
0
1
0
0
0
-2
0
0
0
0
0
0
2/
We can infer from the dot diagram of 0i(f) that if 0 is a rational canonical
basis for L^, then 0D K^, is the union of two cyclic bases 0Vx and 0V2, where
v\ and V2 each have annihilator 0i(f). It follows that both v\ and ?;2 lie in
N(0i(L/t)). It can be shown that
((1
0
o
0
I w
AA
l
0
0
w
•
(o
0
2
1
W
( °
0
-l
0
X V

>
_
is a basis for N(0i(L,4)). Setting v\ — e,\, we sec that
Av
fi)
1
1
1
\V
Next choose v2 in K^, = N(0(Lyt)), but not in the span of 0Vi = {v\,Av\}.
For example, V2 = e2. Then it can be seen that
Av-? =
/ 2
-2
0
-2
V-4/
and 0Vl U 0V2 is a basis for K0].

Sec. 7.4 The Rational Canonical Form 543
Since the dot diagram of 02(f) = f — 2 consists of a single dot, any nonzero
vector in K02 is an eigenvector of A corresponding to the eigenvalue A = 2.
For example, choose
M
I
V3= 1
1
w
By Theorem 7.23, 0 = {vi, Av\,V2, Av2,v%} is a rational canonical basis for
LA- SO setting
Q =
(1
0
0
0
\o
0
1
1
1
1
0
1
0
0
0
2
-2
0
-2
-4
0
1
1
1
2/
we have Q~lAQ = C. •
Example 6
For the following matrix A, we find the rational canonical form C and a
matrix Q such that Q~lAQ = C:
A =
(2 1 0 0
0 2 10
0 0 2 0
\0 0 0 2/
Since the characteristic polynomial of A is /(f) = (f — 2)4, the only irreducible
monic divisor of /(f) is 0(f) — f — 2, and so K0 = R4. In this case, 0(f) has
degree 1; hence in applying Theorem 7.24 to compute the dot diagram for
0(f), we obtain
n = 4 - rank(0(A)) =4-2 = 2,
r2 = rank(0(^)) - rank((0(,4))2) = 2-1 = 1,
and
r3 = rank((0(4))2) - rank((0(^))3) = 1-0 = 1,
where r^ is the number of dots in the ith row of the dot diagram. Since there
are dim(R4) = 4 dots in the diagram, we may terminate these computations

544 Chap. 7 Canonical Forms
with r$. Thus the dot diagram for A is
Since (f — 2) has the companion matrix
'0 0 8"
1 0 -12
J) 1 6,
and (f — 2) has the companion matrix (2), the rational canonical form of A
is given by
C =
Next we find a rational canonical basis for L.4. The; preceding dot diagram
indicates that there are two vectors V\ and v2 in R4 with annihilators (0(f))3
and 0(f), respectively, and such that
0 = [0Vl U 0Vi} = {v., A«;,, A2vi, v2}
is a rational canonical basis for L.\. Furthermore, Vj £ N((L4 - 2I)2), and
v2 G N(L4 - 21). It can easily be shown that
/ 0 0
1 0
0 1
\ 0 0
8
-12
6
0
0
0
0
2 J
and
N(L4-2l)=span({c1,e4})
N((LA-2l)2) = span({ei,e2,e4}).
The standard vector C3 meets the criteria for v\ so we set V\ = ('3. It follows
that
Avi =
Next we choose a vector v2 G N(L/i -21) that is not in the span of 0Vi. Clearly.
v2 = < 1 satisfies this condition. Thus
0
2
V
and A V{ =
n
4
4
W
r/o
0
\ 1
Iw
>
(o
1
2
W
(i
1
4
w
(0
0
0
W

Sec. 7.4 The Rational Canonical Form 545
is a rational canonical basis for LA-
Finally, let Q be the matrix whose columns are the vectors of 0 in the
same order:
Q =
ft) 0 1 0
0 14 0
12 4 0
\0 0 0 1/
ThenC = Q-MQ.
Direct Sums*
The next theorem is a simple consequence of Theorem 7.23.
Theorem 7.25 (Primary Decomposition Theorem). Let T be a
linear operator on an n-dimensional vector space V with characteristic poly
nomial
/(f) = (-ir (0, (or (02(or • • • (0fc(f)r,
where the 0;(f)'s (1 < % < k) are distinct irreducible monic polynomials and
the ni 's are positive integers. Then the following statements arc true.
(a) V = K^ ©K^2 ©••• ©K^.
(b) If Ti (1 < i < k) is the restriction ofT to K^, and Cj is the rational
canonical form of Tj, fhen C\ © C2 © • • • © Ck is the rational canonical
form ofT.
Proof. Exercise. 1
The next theorem is a simple consequence of Theorem 7.17.
Theorem 7.26. Let T be a linear operator on a finite-dimensional vector
space V. Then V is a direct sum of T-cyclic subspaces CVi, where each V{ lies
in K^ for some irreducible monic divisor 0(f) of the characteristic polynomial
ofT.
Proof. Exercise. 1
EXERCISES
1. Label the following statements as true or false.
(a) Every rational canonical basis for a linear operator T is the union
of T-cyclic bases.

546 Chap. 7 Canonical Forms
3.
(b)
(c)
(d)
(e)
(f)
(g)
If a basis is the union of T-cyclic bases for a linear operator T,
then it is a rational canonical basis for T.
There exist square matrices having no rational canonical form.
A square matrix is similar to its rational canonical form.
For any linear operator T on a finite-dimensional vector space, any
irreducible factor of the characteristic polynomial of T divides the
minimal polynomial of T.
Let 0(f) be an irreducible monic divisor of the characteristic poly
nomial of a linear operator T. The dots in the diagram used to
compute the rational canonical form of the restriction of T to K0
are in one-to-one correspondence with the vectors in a basis for
K^.
If a matrix has a Jordan canonical form, then its Jordan canonical
form and rational canonical form are similar.
2. For each of the following matrices A G Mnx„(F), find the rational
canonical form C of A and a matrix Q G MnXn(F) such that Q~l AQ =
C.
(a) A =
(c) A =
(d) A =
(e) A =
F = R (b) A =
0 -1
1 -1
F = R
F = R
F = R
For each of the following linear operators T, find the elementary divisors,
the rational canonical form C, and a rational canonical basis Q.
(a) T is the linear operator on P$(R) defined by
T(f(x)) = f(0)x-f'(l).
(b) Let S — {sinx,cosa:,xsinx,xcos3;}, a subset of F(R, R), and let
V = span(5). Define T to be the linear operator on V such that
T(/) = /'•
(c) T is the linear operator on M2x2(.R) defined by

Sec. 7.4 The Rational Canonical Form 547
6.
T(A) =
0 1
-1 1
•A.
(d) Let S = {sin x sin?/, sin x cost/, cos x sin y, cos a: cost/}, a subset of
F(R x R,R), and let V = span(S'). Define T to be the linear
operator on V such that
T(f)(x,y) =
df(x,y) , df(x,y)
dx dy
4. Let T be a linear operator on a finite-dimensional vector space V with
minimal polynomial (0(f))m for some positive integer m.
(a) Prove that R(0(T)) C N((0(T))m-1).
(b) Give an example to show that the subspaces in (a) need not be
equal.
(c) Prove that the minimal polynomial of the restriction of T to
R(0(T)) equals (0(f))m~1.
5. Let T be a linear operator on a finite-dimensional vector space. Prove
that the rational canonical form of T is a diagonal matrix if and only if
T is diagonalizable.
Let T be a linear operator on a finite-dimensional vector space V with
characteristic polynomial /(f) = (—l)n0i(f)02(f), where 0i(f) and 02(f)
are distinct irreducible monic polynomials and n — dim(V).
(a) Prove that there exist V\,V2 G V such that v\ has T-annihilator
i(t), ^2 has T-annihilator 02(f), and 0Vl U0V2 is a basis for V.
Prove that there is a vector t^ G V with T-annihilator 0i(f)02(f) (b)
(c)
such that 0Va is a basis for V.
Describe the difference between the matrix representation of T
with respect to 0Vl U 0V2 and the matrix representation of T with
respect to 0Va.
Thus, to assure the uniqueness of the rational canonical form, we re
quire that the generators of the T-cyclic bases that constitute a rational
canonical basis have T-annihilators equal to powers of irreducible monic
factors of the characteristic polynomial of T.
7. Let T be a linear operator on a finite-dimensional vector space with
minimal polynomial
m = (Mt))mi(Mt)y w)r
where the 0*(f)'s are distinct irreducible monic factors of /(f). Prove
that for each i, m-i is the number of entries in the first column of the
dot diagram for 0i(f).

548 Chap. 7 Canonical Forms
8. Let T be a linear operator on a finite-dimensional vector space V. Prove
that for any irreducible polynomial 0(f), if 0(T) is not one-to-one, then
0(f) divides the characteristic polynomial of T. Hint: Apply Exercise 15
of Section 7.3.
9. Let V be a vector space and 0i,02,- • • ,0k be disjoint subsets of V whose
union is a basis for V. Now suppose that 7i, 72,.. •, 7fc are linearly
independent subsets of V such that span(7$) = span(/?») for all i. Prove
that 71 U 72 U • • • U 7fc is also a basis for V.
10. Let T be a linear operator on a finite-dimensional vector space, and
suppose that 0(f) is an irreducible monic factor of the characteristic
polynomial of T. Prove; that if 0(f) is the T-annihilator of vectors x and
y, then x G Cy if and only if Cx = C«.
Exercises 11 and 12 arc concerned with direct sums.
11. Prove Theorem 7.25.
12. Prove Theorem 7.26.
INDEX OF DEFINITIONS FOR CHAPTER 7
Companion matrix 526
Cycle of generalized eigenvectors
488
Cyclic basis 525
Dot diagram for Jordan canonical
form 498
Dot diagram lor rational canonical
form 535
Elementary divisor of a linear oper
ator 539
Elementary divisor of a matrix 541
End vector of a cycle 488
Generalized eigenspace 484
Generalized eigenvector 484
Generator of a cyclic basis 525
Initial vector of a cycle 488
Jordan block 483
Jordan canonical basis 483
Jordan canonical form of a linear op
erator 483
Jordan canonical form of a matrix
491
Length of a cycle 488
Minimal polynomial of a linear oper
ator 516
Minimal polynomial of a matrix
517
Multiplicity of an elementary divisor
539
llational canonical basis of a linear
operator 526
Rational canonical form for a linear
operator 526
Rational canonical form of a matrix
541

Appendices
APPENDIX A SETS
A set is a collection of objects, called elements of the set. If x is an element
of the set A, then we write x E A; otherwise, we write x G" A. For example,
if Z is the set of integers, then 3 G Z and i ^ Z.
One set that appears frequently is the set of real numbers, which we denote
by R throughout this text.
Two sets A and B are called equal, written A = B,if they contain exactly
the same elements. Sets may be described in one of two ways:
1. By listing the elements of the set between set braces { }.
2. By describing the elements of the set in terms of some characteristic
property.
For example, the set consisting of the elements 1, 2, 3, and 4 can be
written as {1,2,3,4} or as
{x: x is a positive integer less than 5}.
Note that the order in which the elements of a set are listed is immaterial;
hence
{1,2,3,4} = {3,1,2,4} = {1,3,1,4,2}.
Example 1
Let A denote the set of real numbers between 1 and 2. Then A may be
written as
A = {xeR:l<x<2}. •
A set B is called a subset of a set A, written B C A or A 3 B, if every
element of B is an element of A. For example, {1,2,6} C {2,8,7,6,1}. If
B C A, and B / A, then B is called a proper subset of A. Observe that
A = B if and only if A C B and B C A, a fact that is often used to prove
that two sets are equal.
The empty set, denoted by 0, is the set containing no elements. The
empty set is a subset of every set.
Sets may be combined to form other sets in two basic ways. The union
of two sets A and B, denoted A U B, is the set of elements that are in A, or
B, or both; that is,
AU B = {x: x e A or x & B}.
549

550 Appendices
The intersection of two sets A and B, denoted Ad B, is the set of elements
that are in both A and B; that is,
A fl B = {x: x G A and :r G B}.
Two sets are called disjoint if their intersection equals the empty set.
Example 2
Let A= {1,3.5} and B = {1,5,7,8}. Then
AuB = {1,3,5,7,8} and An J? = {1,5}.
Likewise, if X = {1,2,8} and Y = {3,4,5}, then
XUY = {1,2,3,4,5,8} and X (lY = 0.
Thus X and Y are disjoint sets. •
The union and intersection of more than two sets can be defined analo
gously. Specifically, if Aj, A2,..., An are sets, then the union and intersec
tions of these sets are defined, respectively, by
n
M Ai = {x: x G Ai for some i — 1,2,... ,n}
2=1
and
p| Ai = {rr: .r G A{ for all i = 1,2,..., n}.
j=i
Similarly, if A is an index set and {Aa: a G A} is a collection of sets, the
union and intersection of these sets arc defined, respectively, by
M Att = {x: X E AQ for some a G A}
r*6A
and
p| Aa = {x: x G An for all a G A}.
aeA
Example 3
Let A = {a € R: a > 1}, and let
4a = < x e R: — < x < 1 + a
I «
for each a G A. Then
(J Aa = {£ G R: x > -1} and f) Aa = {x e R: 0 < x <2}. •
oeA a€A

Appendix B Functions 551
By a relation on a set A, we mean a rule for determining whether or not,
for any elements x and y in A, x stands in a given relationship to y. More
precisely, a relation on A is a set S of ordered pairs of elements of A such
that (x, y) G S if and only if x stands in the given relationship to y. On the
set of real numbers, for instance, "is equal to," "is less than," and "is greater
than or equal to" are familiar relations. If S is a relation on a set A, we often
write x ~ y in place of (x, y) G S.
A relation 5 on a set A is called an equivalence relation on A if the
following three conditions hold:
1. For each x G A, x ~ x (refcxivity).
2. If x ~ y, then y ~ x (symmetry).
3. If x ~ y and y ^ z, then x ~ 2 (transitivity).
For example, if we define x ~ y to mean that x — ?/ is divisible by a fixed
integer n, then ~ is an equivalence; relation on the set of integers.
APPENDIX B FUNCTIONS
If A and B are sets, then a function / from A to B, written /: A —* B, is
a rule that associates to each element, x in A a unique element denoted f(x)
in Z?. The element f(x) is called the image of x (under /), and x is called
a preimage of f(x) (under /). If /: A —• B, then A is called the domain
of /, B is called the codomain of /, and the set {fix): x E A} is called the
range of /. Note that the range of / is a subset of B. If S C A, we denote
by f(S) the set {/(x): x £ S} of all images of elements of S. Likewise, if
T C B,we denote by f~l(T) the set {x G A: /(x) G T7} of all preimages of
elements in T. Finally, two functions f: A —> B and <?: A —> f? are equal,
written / = </, if /(x) = .o(x) for all x G A
Example 1
Suppose that A = [—10,10]. Let /: A —> R be the function that assigns
to each element x in A the element x 4- 1 in R; that is, / is defined by
/(x) = x2 + l. Then A is the domain of/, i? is the codomain of/, and [1,101]
is the range of /. Since /(2) = 5, the image of 2 is 5, and 2 is a preimage
of 5. Notice that —2 is another preimage of 5. Moreover, if S — [1,2] and
T = [82,101], then f(S) = [2,5] and f~l(T) = [-10, -9] U [9,10]. •
As Example 1 shows, the preimage of an element in the range need not be
unique. Functions such that each element of the range has a unique preimage
are called one-to-one; that is /: A —> B is one-to-one if /(x) = f(y) implies
x = y or, cquivalently, if x ^ y implies f(x) ^ f(y)-
If /: A —• B is a function with range 5, that is, if /(A) = B, then / is
called onto. So / is onto if and only if the range of / equals the codomain
of/.

552 Appendices
Let /: A —* B be a function and S Q A. Then a function /<,•: S —* B.
called the restriction of / to S, can be formed by defining fs(x) = f(x) for
each x G S.
The next example illustrates these concepts.
Example 2
Let /: [-1,1] -» [0. 1] be defined by f(x) = x2. This function is onto, but
not one-to-one since /( — I) = /(l) = 1. Note that if 5 = [0,1], then fs is
both onto and one-to-one. Finally, if T = [A, 1], then fr is one-to-one, but
not onto. •
Let A. B. and C be sets and f:A-+B and o: B -* C be functions. By
following / with g. we obtain a function g of: A —> C called the composite
of o and /. Thus (g o /)(x) = g(f(x)) for all x G A. For example, let
A = B = C = R, f(x) = sinx. and .o(x) = x2 + 3. Then (g o /)(x) =
(.</(/(•''-')) = sin2 2 + 3, whereas (/ o g)(x) = f(g(x)) ~ sin(x2 + 3). Hence,
9 ° / 7^ / ° .(/- Functional composition is associative, however; that is, if
h: C —* I) is another function, then h o (p of) — (h og)o/,
A function /: A —> B is said to be invertible if there exists a function
g: B —> A such that (/ o g)(y) = y for all y G B and (g o /)(x) = x for all
x G A. If such a function g exists, then it is unique and is called the inverse
of /. We denote the inverse of / (when it exists) by /-1. It can be shown
that / is invertible if and only if / is both one-to-one and onto.
Example 3
The function f:R—* R denned by fix) — 3x + 1 is one-to-one and onto;
hence / is invertible. The inverse of /' is the function /_l : /? —• R defined
by/-'(x) = (x-l)/3. •
The following facts about invertible functions are easily proved.
1. If /: A-> B is invertible. then / ' is invertible, and (/_1)-' = /.
2. If /: A —* B and g: B —> C are invertible. then q o f is invertible, and
(.'?o./r,=./-1ory-'.
APPENDIX C FIELDS
The set of real numbers is an example of an algebraic structure called a
field. Basically, a field is a set in which four operations (called addition,
multiplication, subtraction, and division) can be defined so that, with the
exception of division by zero, the sum, product, difference, and quotient of
any two elements in the set is an element of the set. More precisely, a field is
denned as follows.

•*•
Appendix C Fields 553
Definitions. A field F is a set on which two operations + and • (called
addition and multiplication, respectively) are defined so that, for each pair
of elements x, y in F, there are unique elements x + y and x-y in F for which
the following conditions hold for all elements a,b,c in F.
(Fl) a + b = b + a and a*6 = b-a
(commutativity of addition and multiplication)
(F2) (a + b)+c = a + (b + c) and (a-b)>c = a>(b-c)
(associativity of addition and multiplication)
(F 3) There exist distinct elements 0 and 1 in F such that
0 + a = a and 1 -a = a
(existence of identity elements for addition and multiplication)
(F 4) For each clement a in F and each nonzero clement b in F, there exist
elements c and d in F such that
a + c = 0 and tW = 1
(existence of inverses for addition and multiplication)
(F 5) a-(b + c) =ci-b + a-c
(distributivity of multiplication over addition)
The elements x + y and x-y are called the sum and product, respectively,
of x and y. The elements 0 (read "zero") and 1 (read "one') mentioned in
(F 3) are called identity elements for addition and multiplication, respec
tively, and the elements c and d referred to in (F 4) are called an additive
inverse for a and a multiplicative inverse for b, respectively.
Example 1
The set of real numbers R with the usual definitions of addition and multi
plication is a field. •
Example 2
The set of rational numbers with the usual definitions of addition and multi
plication is a field. •
Example 3
The set of all real numbers of the form a + by/2, where a and b an; rational
numbers, with addition and multiplication as in R is a field. •
Example 4
The field Z2 consists of two elements 0 and 1 with the operations of addition
and multiplication defined by the equations
0 + 0 = 0, 0+1 = 1+0=1, 1 + 1=0,
0-0 = 0, 0-l = l-0 = 0. and 1-1 = 1. •

554 Appendices
Example 5
Neither the set of positive integers nor the set of integers with the usual
definitions of addition and multiplication is a field, for in either case (F 4)
does not hold. •
The identity and inverse elements guaranteed by (F 3) and (F 4) are
unique; this is a consequence of the following theorem.
Theorem C.l (Cancellation Laws). For arbitrary elements a, b, and
c in a held, the following statements are true.
(a) If a + b = c + b, then a = c.
(b) If a-b = cb and b ^ 0, then a = c.
Proof, (a) The proof of (a) is left as an exercise.
(b) If b T& 0, then (F 4) guarantees the existence of an element d in the
field such that b-d = 1. Multiply both sides of the equality a-b = c - b by d
to obtain (a-6)-d = (c-fe)-d. Consider the left side of this equality: By (F 2)
and (F 3), we have
(a • b) • d = a • (6- d) = a • 1 = a.
Similarly, the right side of the equality reduces to c. Thus a = c. I
Corollary. The elements 0 and 1 mentioned in (F 3), and the elements c
and d mentioned in (F 4), are unique.
Proof. Suppose that 0' G F satisfies 0' + o, = a for each a G F. Since
0 + a = a for each a G F, we have 0' + a = 0 + a for each a G F. Thus 0' = 0
by Theorem C.l.
The proofs of the remaining parts are similar. I
Thus each element b in a field has a unique additive inverse and, if b ^ 0,
a unique multiplicative inverse. (It is shown in the corollary to Theorem C.2
that 0 has no multiplicative inverse.) The additive inverse and the multi
plicative inverse of b are denoted by — b and ft""1, respectively. Note that
-(-&) = 6 and (&-1)""1 = 6.
Subtraction and division can be defined in terms of addition and multi
plication by using the additive and multiplicative inverses. Specifically, sub
traction of b is defined to be addition of —b and division by b ^ 0 is defined
to be multiplication by o_1; that is,
a — b = a + (—6) and
1
- = d'b .
b
In particular, the symbol f denotes b l. Division by zero is undefined, but.
o
with this exception, the sum, product, difference, and quotient of any two
elements of a field are defined.

• '—*^B
Appendix C Fields 555
Many of the familiar properties of multiplication of real numbers are true
in any field, as the next theorem shows.
Theorem C.2. Let a and b be arbitrary elements of a field. Then each
of the following statements are true.
(a) a-0 = 0.
(b) (-a)-6 = a-(-6) = -(a-6).
(c) (-a)-(-6) = a-6.
Proof, (a) Since 0 + 0 = 0, (F 5) shows that
0 + a-0 = a-0 = a-(0 + 0) = a-0 + a-0.
Thus 0 = a-0 by Theorem C.l.
(b) By definition, —(a-6) is the unique element of F with the property
a-6 + [-(a-6)] = 0. So in order to prove that (—a)-6 = — (a-6), it suffices
to show that a-6 + (—a)-6 = 0. But —a is the element of F such that
a + (-a) = 0; so
a-6 + (-a)- b= [a + (-a)] -6 = 0-6 = 6-0 = 0
by (F 5) and (a). Thus (-a)-6 = -(a-6). The proof that a-(-6) = -(a-6)
is similar.
(c) By applying (b) twice, we find that
(-a)-(-b) = -M-6)] = -[-(a-6)] = a-6. |
Corollary. The additive identity of a field has no multiplicative inverse.
In an arbitrary field F, it may happen that a sum 1 + 1 H +1 (p sum-
mands) equals 0 for some positive integer p. For example, in the field Z2
(defined in Example 4), 1 + 1 = 0. In this case, the smallest positive integer p
for which a sum of p l's equals 0 is called the characteristic of F; if no such
positive integer exists, then F is said to have characteristic zero. Thus Z2
has characteristic two, and R has characteristic zero. Observe that if F is a
field of characteristic p / 0, then x + x + - • • +x (p summands) equals 0 for all
x G F. In a field having nonzero characteristic (especially characteristic two),
many unnatural problems arise. For this reason, some of the results about
vector spaces stated in this book require that the field over which the vector
space is defined be of characteristic zero (or, at least, of some characteristic
other than two).
Finally, note that in other sections of this book, the product of two ele
ments a and 6 in a field is usually denoted a6 rather than a • 6.

556 Appendices
APPENDIX D COMPLEX NUMBERS
For the purposes of algebra, the field of real numbers is not sufficient, for
there are polynomials of nonzero degree with real coefficients that have no
zeros in the field of real numbers (for example, x2 + 1). It is often desirable
to have a field in which any polynomial of nonzero degree with coefficients
from that field has a zero in that field. It is possible to "enlarge" the field of
real numbers to obtain such a field.
Definitions. A complex number is an expression of the form z = a+bi,
where a and 6 are reai numbers called the real part and the imaginary part
of z, respectively.
The sum and product of two complex numbers z = a + bi and w = c+di
(where a, b, c, and d are real numbers) are defined, respectively, as follows:
z + w = (a + bi) + (c + di) = (a + c) + (6 + d)i
and
zw = (a + bi)(c + di) — (ac — bd) + (6c + ad)i.
Example 1
The sum and product of z = 3 — 5i and w = 9 + 7% are, respectively,
z + w = (3 - 5i) + (9 + 7i) = (3 + 9) + [(-5) + 7}i = 12 + 2i
and
zw = (3 - 5i)(9 + 7i) = [3-9 - (-5)-7] + [(-5)-9 + 3-7}i = 62 - 24i. •
Any real number c may be regarded as a complex number by identifying c
with the complex number c + Oi. Observe that this correspondence preserves
sums and products; that is,
(c + Oi) + (d + Oi) = (c + d) + Qi and (c + 0i)(d + 0i) = cd + Oi.
Any complex number of the form bi = 0 + bi, where 6 is a nonzero real
number, is called imaginary. The product of two imaginary numbers is real
since
(bi)(di) = (0 + 6i)(0 + di) = (0 - 6d) + (6-0 + 0-d)i = -6d.
In particular, for i = 0 + li, we have i • i = — 1.
The observation that i2 = i • i = — 1 provides an easy way to remember the
definition of multiplication of complex numbers: simply multiply two complex
numbers as you would any two algebraic expressions, and replace i2 by —1.
Example 2 illustrates this technique.

Appendix D Complex Numbers
Example 2
The product of — 5 + 2i and 1 — 3i is
(-5 + 2i)(l - Si) = -5(1 - Si) + 2i(l - Zi)
= -5 + 15i + 2i - 6i2
= —5 + 15i + 2« — 6(-l)
= 1 + 17i •
557
Z
The real number 0, regarded as a complex number, is an additive identity
element for the complex numbers since
(a + bi) + 0 = (a + bi) + (0 + Oi) = (a + 0) + (6 + 0)i = a + bi.
Likewise the real number 1, regarded as a complex number, is a multiplicative
identity element for the set of complex numbers since
(a + 6i)-l = (a + bi)(l+0i) = (a-1 - 6-0) + (6-1 + a-0)i = a + bi.
Every complex number a + bi has an additive inverse, namely (—a) + (—b)i.
But also each complex number except 0 has a multiplicative inverse. In fact,
(a + «) =
a2 + b2 b2
In view of the preceding statements, the following result is not surprising.
Theorem D.l. The set of complex numbers with the operations of addi
tion and multiplication previously defined is a Held.
Proof. Exercise. 1
Definition. The (complex) conjugate of a complex number a + bi is
the complex number a — bi. We denote the conjugate of the complex number
z byz.
Example 3
The conjugates of —3 + 2i, 4 — 7i, and 6 are, respectively,
-3 + 2i = -3-2i, 4-7i = 4 + 7i, and 6 = 6 + Oi = 6 - Oi = 6. •
The next theorem contains some important properties of the conjugate of
a complex number.
Theorem D.2. Let z and w be complex numbers. Then the following
statements are true.

558 Appendices
(a) z = z.
(b) (z + w) = z + w.
(c) ZW — ~Z'W.
(d) (-) = = ifw ^ 0.
\wJ w
(e) z is a real number if and only if z = z.
Proof. We leave the proofs of (a), (d), and (e) to the reader,
(b) Let z — a + bi and w — c + di, where a, 6, c, d G R. Then
[z + w) = (a + c) + (6 + d)i = (a + c) - (6 + d)i
= (a — bi) + (c — di) = z + w.
(c) For z and to, we have
zw = (a + bi)(c + di) = (ac — bd) + (ad + bc)i
— (ac — 6d) — (ad + bc)i — (a — 6i)(c — di) = z-w. 13
For any complex number z = a + bi, z~z is real and nonnegative, for
z~z — (a + bi)(a — bi) = a2 + b2.
This fact can be used to define the absolute value of a complex number.
Definition. Let z = a + bi, where a,b G R. The absolute value (or
modulus) of z is the real number \/a2 + 62. We denote the absolute value
of z by \z\.
Observe that zz = |z|2. The fact that the product of a complex number
and its conjugate is real provides an easy method for determining the quotient
of two complex numbers; for if c + di ^ 0, then
a + 6i a + 6i c — di (ac + bd) + (be — ad)i ac + bd, be — ad .
c2+d2 + d? r/'2 c + di c + di c — di
Example 4
To illustrate this procedure, we compute the quotient (1 + 4i)/(3 — 2i):
l+4i_l+4i 3 + 2f_-5 + 14i_ 5 14.
3 - 2i ~ 3 - 2i' 3 + 2i " 9 + 4 13 + 13*'
The absolute value of a complex number has the familiar properties of the
absolute value of a real number, as the following result shows.
Theorem D.3. Let z and w denote any two complex numbers. Then the
following statements are true.

B~
Appendix D Complex Numbers
(a) \zw\ = \z\-\w\.
559
(b) - =11 ifw^O.
w \w
(c) \z + w\ < \z\ + \w\.
(d) \z\ - \w\ < \z + w\.
Proof, (a) By Theorem D.2, we have
\zw\2 = izw)izw) = (zw)(z • w) = (zz)(ww) — |2|2|w;|2,
proving (a).
(b) For the proof of (b), apply (a) to the product (— ) w.
\w/
(c) For any complex number x = a + 6i, where a, 6 G R, observe that
x + x = (a + 6i) + (a - bi) = 2a < 2\/a2 + b2 = 2|x|.
Thus x + x is real and satisfies the inequality x + x < 2|x|. Taking x = wz,
we have, by Theorem D.2 and (a),
wz + wz < 2\wz\ = 2|w;||2| = 2|z||tu|.
Using Theorem D.2 again gives
\z + w\ = iz + w)(z + w) = (z + w)(z + w) = zz + wz + ZW + WW
< \zf + 2|z|H + \wf = (|z|
By taking square roots, we obtain (c).
(d) From (a) and (c), it follows that
»'
\z\ = 1(2 + w) — w\ < \z + w\ + I — w\ = \z + w\ + \w\.
So
\z\ — |t^| < \z + w\,
proving (d). 1
It is interesting as well as useful that complex numbers have both a ge
ometric and an algebraic representation. Suppose that z — a + bi, where a
and 6 are real numbers. We may represent z as a vector in the complex plane
(see Figure D.l(a)). Notice that, as in R2, there are two axes, the real axis
and the imaginary axis. The real and imaginary parts of z are the first and
second coordinates, and the absolute value of z gives the length of the vector
z. It is clear that addition of complex numbers may be represented as in R2
using the parallelogram law.

560 Appendices
imaginary axis
(b)
Figure D.l
In Section 2.7 (p.132). we introduce Filler's formula. The special case
(,tn _ cosg _|_ j gm 0 js 0£ particular interest. Because of the geometry we have
introduced, we may represent the vector c'" as in Figure D.l(b); that is, el°
is the unit vector that makes an angle 6 with the positive real axis. From
this figure, we sec1 that any nonzero complex number z may be depicted as
a multiple of a unit vector, namely, z = |^|e*^, where 0 is the angle that the
vector z makes with the positive real axis. Thus multiplication, as well as
addition, has a simple geometric interpretation: If z = \z\e and w = \w\e,w
are two nonzero complex numbers, then from the properties established in
Section 2.7 and Theorem D.3. we have
.//•
>o
we'
,i(<H a;)
So zw is the vector whose length is the product of the lengths of z and ;/>,
and makes the angle 0 + uj with the positive real axis.
Our motivation for enlarging the set of real numbers to the set of complex
numbers is to obtain a field such that every polynomial with nonzero degree
having coefficients in that field has a zero. Our next result guarantees that
the field of complex numbers has this property.
Theorem D.4 (The Fundamental Theorem of Algebra). Suppose
that p(z) = o.„z" + a„ iz" l + ••• + a\z + a{) is a polynomial in P(C) of
degree n > 1. Then />(.r) has a zero.
The following proof is based on one in the book Principles of Mathematical
Analysis 3d., by Walter Rudin (McGraw-Hill Higher Education. New York,
1976).
Proof. We want to find zrj in C such that p(zo) = 0. Let m be the greatest
lower bound of {|/>(c)|: 2 G ('}. For \z\ = s > 0. we have
\p(z)\ •-= \anzn +a„_i .«-i
an

7s
Appendix D Complex Numbers 561
> |an||2n| - lon-iU^p"1 |a0|
= \an\sn - |an_i|sn_1 |ao|
= sn[\an\ - |an_i|s_1 \a0\s~n].
Because the last expression approaches infinity as s approaches infinity, we
may choose a closed disk D about the origin such that \p(z)\ > m + 1 if z is
not in D. It follows that rn is the greatest lower bound of {|p(z)|: z G D).
Because D is closed and bounded and p(z) is continuous, there exists ZQ in
D such that |p(zo)| = m. We want to show that rn = 0. We argue by
contradiction.
Assume that rn ^ 0. Let q(z) = :—-—. Then q(z) is a polynomial of
P(zo)
degree n, q(0) = 1, and \q(z)\ > 1 for all z in C. So we may write
q(z) = 1 + bkzk + bk+1zh+1 + ••• + bnzn,
where 6fc 7^ 0. Because —— has modulus one. we may pick a real number 9
bk
such that etkd = — ~-, or etkdbk = — l&J. For any r > 0, we have
bk
q(rei9) = 1 + bkrkelk6 + &fc+1r*+1e«*+1>' + • • • + bnrnein6
= 1 - kk + bk+1rk+1ei(k+l)d + ••• + bnrnein0.
Choose r small enough so that 1 — |6fc[rfc > 0. Then
\q(rei9)\ < 1 - |6fc|rfc + |6fc+i|rfc+1 + • • • + |6n|rn
= l-rfc[|6fc|-|6fc+i|r |6n|r"-*].
Now choose r even smaller, if necessary, so that the expression within the
brackets is positive. We obtain that \q(rel0)\ < 1. But this is a contradiction.
I
The following important corollary is a consequence of Theorem D.4 and
the division algorithm for polynomials (Theorem E.l).
Corollary. If p(z) = anzn + an-izn~l + • • • + a\z + ao is a polynomial
of degree n > 1 with complex coefficients, then there exist complex numbers
Ci, c2, • • •, cn (not necessarily distinct) such that
p(z) = an(z - cA)(z - c2) • • • (z - Cn).
Proof. Exercise. |
A field is called algebraically closed if it has the property that every
polynomial of positive degree with coefficients from that field factors as a
product of polynomials of degree 1. Thus the preceding corollary asserts that
the field of complex numbers is algebraically closed.

562
APPENDIX E POLYNOMIALS
Appendices
In this appendix, we discuss some useful properties of the polynomials with
coefficients from a field. For the definition of a polynomial, refer to Sec
tion 1.2. Throughout this appendix, we assume that all polynomials have
coefficients from a fixed field F.
Definition. A polynomial f(x) divides a polynomial g(x) if there exists
a polynomial q(x) such that g(x) = f(x)q(x).
Our first result shows that the familiar long division process for polyno
mials with real coefficients is valid for polynomials with coefficients from an
arbitrary field.
Theorem E.l (The Division Algorithm for Polynomials). Let
f(x) be a polynomial of degree n, and let g(x) be a polynomial of degree
rn > 0. Then there exist unique polynomials q(.r) and r(x) such that
f(x)=q(x)g(x) + r(x),
where the degree of r(x) is less than m.
(1)
Proof. We begin by establishing the existence of q(x) and r(x) that sat-
isfy (1).
CASE 1. If n < m, take q(x) = 0 and r(x) = f(x) to satisfy (1).
CASE 2. When 0 < rn < n, we apply mathematical induction on n.
First suppose that n = 0. Then m = 0, and it follows that f(x) and g(x)
are nonzero constants. Hence we may take q(x) — f(x)/g(x) and r(x) = 0 to
satisfy (1).
Now suppose that the result is valid for all polynomials with degree less
than n for some fixed n > 0, and assume that f(x) has degree n. Suppose
that
f{x) = anxn + an-ixn + aix + ao
and
g(x) = bmxm + bm-ix™-1 +••• + bix + bo,
and let h(x) be the polynomial defined by
hi.r) = fix)-anb-l]-r"- </(*)• (2)
Then h(x) is a polynomial of degree less than n, and therefore we may ap
ply the induction hypothesis or CASE 1 (whichever is relevant) to obtain
polynomials q\ (x) and r(x) such that r(x) has degree less than rn and
h(x) = qx(x)g(x) + r(x). (3)

Appendix E Polynomials 563
Combining (2) and (3) and solving for f(x) gives us f(x) = q(x)g(x) + r(x)
with q(x) — an6~1x"_m + </i(x), which establishes (a) and (b) for any n > 0
by mathematical induction. This establishes the existence of q(x) and r(x).
We now show the uniqueness of q(x) and r(x). Suppose that Oi(x), g2(x),
ri(x), and r2(x) exist such that n(x) and r2(x) each has degree less than m
and
f(x) = oi(x)o(x) + ri(x) = q2(x)g(x) + r2(x).
Then
ki(x-) - q2(x)]g(x) = r2(x) - n(x). (4)
The right side of (4) is a polynomial of degree less than m. Since g(x) has
degree m, it must follow that q(x) — q2(x) is the zero polynomial. Hence
<7i (x) = g2(x); thus n(x) = r2(x) by (4). I
In the context of Theorem E.l, we call q(x) and r(x) the quotient and
remainder, respectively, for the division of f(x) by g(x). For example,
suppose that F is the field of complex numbers. Then the quotient and
remainder for the division of
f(x) = (3 + i)x5 - (1 - i)x4 + 6x3 + (-6 + 2i)x2 + (2 + i)x + 1
by
g(x) = (3 + i)x2 -2ix + 4
arc, respectively,
q(x) = x3 + ix2 - 2 and r(x) = (2 - 3i)x + 9.
Corollary 1. Let f(x) be a polynomial of positive degree, and let a G F.
Then /(a) = 0 if and only if x — a divides f(x).
Proof. Suppose that x — a divides f(x). Then there exists a polynomial
q(x) such that f(x) = (x — a)q(x). Thus f(a) = (a — a)o(a) = O-g(a) = 0.
Conversely, suppose that /(a) = 0. By the division algorithm, there exist
polynomials q(x) and r(x) such that r(x) has degree less than one and
f(x) =o(x)(x-a)+r(x).
Substituting a for x in the equation above, we obtain ria) = 0. Since r(x)
has degree less than 1, it must be the constant polynomial r(x) = 0. Thus
f(x) = q(x)(x-a). I

564 Appendices
For any polynomial f(x) with coefficients from a field F, an element a G F
is called a zero of f(x) if f(a) = 0. With this terminology, the preceding
corollary states that a is a zero of f(x) if and only if x — a divides f(x).
Corollary 2. Any polynomial of degree n > 1 has at most n distinct
zeros.
Proof. The proof is by mathematical induction on n. The result is obvious
if n — 1. Now suppose that the result is true for some positive integer n, and
let f(x) be a polynomial of degree n + 1. If f(x) has no zeros, then there is
nothing to prove. Otherwise, if a is a zero of /(x), then by Corollary 1 we
may write f(x) = (x —a)o(x) for some polynomial q(x). Note that q(x) must
be of degree n; therefore, by the induction hypothesis, q(.r) can have at most
n distinct zeros. Since any zero of f(x) distinct from a is also a zero of q(x),
it follows that /(x) can have at most n + 1 distinct, zeros.
Polynomials having no common divisors arise naturally in the study of
canonical forms. (See Chapter 7.)
Definition. Two nonzero polynomials are called relatively prime if no
polynomial of positive degree divides each of them.
For example, the polynomials with real coefficients f(x) = x2(x - 1) and
h(x) = (x — l)(x — 2) are not relatively prime because x — 1 divides each of
them. On the other hand, consider f(x) and g(x) = (x — 2)(x — 3), which do
not appear to have common factors. Could other factorizations of f(x) and
g(x) reveal a hidden common factor? We will soon see (Theorem E.9) that
the preceding factors are the only ones. Thus f(x) and g(x) are relatively
prime because they have; no common factors of positive degree.
Theorem E.2. If fi(x) and f2(x) are relatively prime polynomials, there
exist polynomials q(x) and q2ix) such that
qi(x)fi(x) + q2(x)f2(x) = l,
where 1 denotes the constant, polynomial with value 1.
Proof. Without loss of generality, assume that the degree of fi(x) is greater
than or equal to the degree of /2(x). The proof is by mathematical induction
on the degree of /2(x). If /2(x) has degree 0, then /2(x) is a nonzero constant
c. In this case, we can take q(x) = 0 and f/2(x) = 1/c.
Now suppose that the theorem holds whenever the polynomial of lesser
degree has degree less than n for some positive integer n. and suppose that
/2(x) has degree n. By the division algorithm, there exist polynomials q(x)
and r(x) such that r(x) has degree less than n and
/,(x)=g(x)/2(x) + r(x). (5)

Appendix E Polynomials 565
Since /i(x) and f2(x) are relatively prime, r(x) is not the zero polynomial. We
claim that /2(x) and r(x) are relatively prime. Suppose otherwise; then there
exists a polynomial g(x) of positive degree that divides both /2(x) and r(x).
Hence, by (5), g(x) also divides /i(x), contradicting the fact that /i(x) and
/2(x) are relatively prime. Since r(x) has degree less than n, we may apply
the induction hypothesis to /2(x) and r(x). Thus there exist polynomials
ai (x) and p2 (x) such that
9i(x)f2(x)+g2(x)r(x) = l. (6)
Combining (5) and (6), we have
1 = gi(x)f2(x) + g2(x) [fi(x) - q(x)f2(x)
= g^(x)fi(x) + [oi(x) - g2(x)q(x)] f2(x).
Thus, setting q(x) = p2(^) and <?2(x) = gi(x) — g2(x)q(x), we obtain the
desired result. |
Example 1
Let /i(x) = x3 — x2 + 1 and /2(x) = (x — l)2. As polynomials with real
coefficients, /i(x) and f2(x) are relatively prime. It is easily verified that the
polynomials qi(x) = —x + 2 and g2(x) = x2 — x — 1 satisfy
qi(x)fi(x) + q2(x)f2(x) = 1,
and hence these polynomials satisfy the conclusion of Theorem E.2. •
Throughout Chapters 5, 6, and 7, we consider linear operators that are
polynomials in a particular operator T and matrices that are polynomials in a
particular matrix A. For these operators and matrices, the following notation
is convenient.
Definitions. Let
f(x) = a0 + ai (x) + \- anxn
be a polynomial with coefficients from a field F. If T is a linear operator on
a vector space V over F, we define
/(T) = a0l + aiT + --- + anTn.
Similarly, if A is an x n matrix with entries from F, we define
f(A) = a0I + a1A + ---+anAn.

566 Appendices
Example 2
Let T be the linear operator on R2 defined by T(a,6) = (2a + 6, a — 6), and
let f(x) = x2 + 2x - 3. It is easily checked that T2(a, 6) = (5a + 6, a + 26); so
/(T)(a,6) = (T2 + 2T-3l)(a,6)
= (5a + 6. a + 26) + (4a + 26,2a - 26) - 3(a, 6)
= (6a+ 36,3a-36).
Similarly, if
then
A =
f(A) = A2+2A-3I =
1 2
+ 2
1 -1
2 l)-3 r °
l -1 °\0 1
6 3
3 -3
The next three results use this notation.
Theorem E.3. Let f(x) be a polynomial with coefficients from a field F,
and let T be a linear operator on a vector space V over F. Then the following
statements are true.
(a) /(T) is a linear operator on V.
(b) If8 is a finite ordered basis for V and A = [T)0, then [f(T)}0 = f(A).
Proof. Exercise. 1
Theorem E.4. Let T be a linear operator on a vector space V over a
field F, and let A be a square matrix with entries from F. Then, for any
polynomials fi(x) and /2(x) with coefficients from F,
(a) /i(T)/2(T) = /2(T)/1(T)
(b) /1(A)/2(A) = /2(A)/1(A).
Proof. Exercise. 1
Theorem E.5. Let T be a linear operator on a vector space V over a
field F, and let A be an n x n matrix with entries from F. If f\ (x) and
/2(x) are relatively prime polynomials with entries from F, then there exist
polynomials q(x) and q2(x) with entries from F such that
(a) ai(T)/!(T)+o2(T)/2(T) = l
(b) q1{A)f1(A) + q2{A)f2(A) = I.
Proof. Exercise.

w
Appendix E Polynomials 567
In Chapters 5 and 7, we are concerned with determining when a linear
operator T on a finite-dimensional vector space can be diagonalized and with
finding a simple (canonical) representation of T. Both of these problems are
affected by the factorization of a certain polynomial determined by T (the
characteristic polynomial ofT). In this setting, particular types of polynomi
als play an important role.
Definitions. A polynomial f(x) with coefficients from a field F is called
monic if its leading coefficient is 1. If f(x) has positive degree and cannot be
expressed as a product of polynomials with coefficients from F each having
positive degree, then f(x) is called irreducible.
Observe that whether a polynomial is irreducible depends on the field F
from which its coefficients come. For example, f(x) = x2 + 1 is irreducible
over the field of real numbers, but it is not irreducible over the field of complex
numbers since x2 + 1 = (x + i)(x — i).
Clearly any polynomial of degree 1 is irreducible. Moreover, for polyno
mials with coefficients from an algebraically closed field, the polynomials of
degree 1 are the only irreducible polynomials.
The following facts are easily established.
Theorem E.6. Let 0(x) and f(x) be polynomials. If 0(x) is irreducible
and 4>(x) does not divide f(x), then 0(x) and f(x) are relatively prime.
Proof. Exercise. i
Theorem E.7. Any two distinct irreducible monic polynomials arc rela
tively prime.
Proof. Exercise. I
Theorem E.8. Let f(x), g(x), and 0(x) be polynomials. If 0(x) is ir
reducible and divides the product f(x)g(x), then 0(x) divides f(x) or <j>(x)
divides g(x).
Proof. Suppose that 0(x) does not divide f(x). Then 0(x) and f(x) are
relatively prime by Theorem E.6, and so there exist polynomials tfi(x) and
q2(x) such that
l = Ol(x)0(x)+o2(x)/(x).
Multiplying both sides of this equation by g(x) yields
g(x) = ai(x)0(x)p(x) + a2(x)/(x)o(x). (7)
Since 0(x) divides f(x)g(x), there is a polynomial h(x) such that f(x)g(x) =
4>(x)h(x). Thus (7) becomes
g(x) = qi(x)(f>(x)g(x) + c2(x)0(x)a(x) = 0(x) \qi(x)g(x) + q2(x)h(x)}.
So 0(x) divides g(x). 1

568 Appendices
Corollary. Let 0(x), 0i (x), 02(x),..., <f>n(x) be irreducible monic polyno
mials. If 0(x) divides the product 0i(x)02(x) • • -0n(x), then 0(x) = 0i(x)
for some i (i — 1,2,..., n).
Proof. We prove the corollary by mathematical induction on n. For n = 1,
the result is an immediate; consequence of Theorem E.7. Suppose then that for
some n > 1, the corollary is true for any n — 1 irreducible monie polynomials,
and let 0i (x), 02(x)...., 0n(x) be n irreducible polynomials. If 0(x) divides
0l(-r)02(.r) • • • (t)n(x) = [0l(x)02(x) • • • 0„_i(x)] 0n(x),
then 0(x) divides the product 0i(x)02(x) • • • 0„_i (x) or 0(x) divides 0n(x) by
Theorem E.8. In the first case, 0(x) = 0i(x) for some i (i = 1,2,..., n— 1) by
the induction hypothesis; in the second case, 0(x) = <f>n(x) by Theorem E.7.
We are now able to establish the unique factorization theorem, which is
used throughout Chapters 5 and 7. This result states that every polynomial
of positive; degree is uniquely expressible as a. constant times a product of
irreducible monic polynomials.
Theorem E.9 (Unique Factorization Theorem for Polynomials).
For any polynomial fix) of positive degree, there exist a unique constant
c; unique distinct irreducible monic polynomials 0i(.r). 02(x),.... 4>k(x); and
unique positive integers n\, n2,..., nk such that
f(x) = <iMx)Y'i[Mx)}n2---[ct>kix)}nk.
Proof. We begin by showing the existence of such a factorization using
mathematical induction on the degree of fix). If f(x) is of degree 1, then
f(x) = ax + b for some constants a and 6 with a ^ 0. Setting 0(x) = x + 6/a,
we have f(x) — a0(x). Since 0(x) is an irreducible monic polynomial, the
result is proved in this case. Now suppose that the conclusion is true for any
polynomial with positive degree less than some integer n > 1, and let f(x)
be a polynomial of degree n. Then
f(x) = anxn H r- aix + a0
for some constants a* with an ^ 0. If f(x) is irreducible, then
/W = a„(.r" + ^i:r»-'+... + ^ + ^)
\ an an aTIJ
is a representation of f(x) as a product of an and an irreducible monic poly
nomial. If f(x) is not irreducible, then /(x) = g(x)h(x) for some polynomials
g(x) and h(x), each of positive degree less than n. The; induction hypothesis

Appendix E Polynomials 569
guarantees that both g(x) and h(x) factor as products of a constant and pow
ers of distinct irreducible monic polynomials. Consequently f(x) — g(x)h(x)
also factors in this way. Thus, in either case, f(x) can be factored as a product
of a constant and powers of distinct irreducible monic polynomials.
It remains to establish the uniqueness of such a factorization. Suppose
that
/(x)=c[0i(x)r [02(x)p • • • [0fc(x)r
= d[Mx)}mi[Mx)}m2---lMx)]n
(8)
where c and d are constants, 0i(x) and i/)j(x) are irreducible monic polynomi
als, and ni and m,j are positive integers for i = 1,2,..., k and j — 1,2,..., r.
Clearly both c and d must be the leading coefficient of /(x); hence c = d.
Dividing by c, we find that (8) becomes
[0i (x)r [02(x)p • • • [0fc(x)r = [Mx)}™1 [Mx)r> • • • IM*)}1 (9)
So 0i(x) divides the right side of (9) for i = 1,2,..., k. Consequently, by the
corollary to Theorem E.8, each 0i(x) equals some ipj(x), and similarly, each
ij)j(x) equals some 0»(x). We conclude that r = k and that, by renumbering
if necessary, 0i(x) = i^i(x) for i = 1,2,... ,k. Suppose that n» ^ m-j for some
i. Without loss of generality, we may suppose that i = 1 and ni > mi. Then
by canceling [0i(x)]mi from both sides of (9), we obtain
[0i(x)p-™' [02(x)]"2 • • • [0fc(x)r = [02(x)p • • • [0fc(x)f (10)
Since ni — mi > 0, 0i(x) divides the left side of (10) and hence divides the
right side also. So 0i(x) = 0i(x) for some i = 2,..., k by the corollary to
Theorem E.8. But this contradicts that 0i(x),02(x),... ,0fc(x) are distinct.
Hence the factorizations of f(x) in (8) are the same. 1
It is often useful to regard a polynomial f(x) = anxn H h aix + ao with
coefficients from a field F as a function f:F—*F. In this case, the value of
/ at c G F is f(c) = ancn + \- a\C + ao- Lmfortunately, for arbitrary fields
there is not a one-to-one correspondence between polynomials and polynomial
functions. For example, if f(x) = x2 and g(x) = x are two polynomials over
the field Z2 (defined in Example 4 of Appendix C), then f(x) and g(x) have
different degrees and hence are not equal as polynomials. But f(a) = g(a) for
all a G Z2, so that / and g are equal polynomial functions. Our final result
shows that this anomaly cannot occur over an infinite field.
Theorem E.10. Let f(x) and g(x) be polynomials with coefficients from
an infinite Geld F. If f(a) = g(a) for all a, G F, then f(x) and g(x) are equal.
Proof. Suppose that /(a) = g(a) for all a G F. Define h(x) = f(x) — g(x),
and suppose that h(x) is of degree n > 1. It follows from Corollary 2 to

570 Appendices
Theorem E.l that h(x) can have at most n zeroes. But
h(a) = f(a)-g(a) = 0
for every a G F, contradicting the assumption that h(x) has positive degree.
Thus h(x) is a constant polynomial, and since h(a) — 0 for each a G F, it
follows that h(x) is the zero polynomial. Hence f(x) = g(x). 1

Answers
to Selected Exercises
CHAPTER 1
SECTION 1.1
1. Only the pairs in (b) and (c) are parallel.
2. (a) x = (3, -2,4) + t(-8,9, -3) (c) x = (3, 7, 2) + t(0,0, -10)
3. (a) x = (2, -5, -1) + s(-2,9, 7) + t(-5,12,2)
(c) x = (-8,2,0) + s(9,1,0) + t(14, -7,0)
SECTION 1.2
1. (a) T (b) F (c) F (d) F (e) T (f) F
(g)F (h)F (i)T (j)T (k)T
3. M13 = 3, M2i = 4, and M22 = 5
6 3 2\ , , /8 20 -12N
4' (a) 1-4 3 9 (C) U 0 28
(e) 2x4 + x3 + 2x2 - 2x + 10 (g) 10rc7 - 30x4 + 40x2 - 15x
13. No, (VS 4) fails.
14. Yes
15. No
17. No, (VS 5) fails.
22. 2mn
SECTION 1.3
1. (a) F (b) F (c) T (d) F (e) T (f) F (
2,(a)(_2 _i)i the trace is-5 (c) ("* J fj
( l
(e)
-1
3
< 5y
(g) (5 6 7)
(g)F
8. (a) Yes (c) Yes (e) No
11. No, the set is not closed under addition.
15. Yes
571

572 Answers to Selected Exercises
SECTION 1.4
1. (a) T (b) F (c) T (d) F (e) T (f) F
2. (a) {r( 1,1,0,0) + s(-3,0, -2,1) + (5,0,4,0) :r,s<= R}
(c) There are no solutions.
(e) {r(10, -3,1,0,0) + s(-3,2,0,1,0) + (-4,3,0,0,5): r, s G R}
3. (a) Yes (c) No (e) No
4. (a) Yes (c) Yes (e) No
5. (a) Yes (c) No (e) Yes (g) Yes
SECTION 1.5
1. (a) F (b) T (c) F (d) F (e) T
2. (a) linearly dependent (c) linearly independent
(g) linearly dependent, (i) linearly independent
T 0\ (0 0N
vo o)' \0 1
11. 2"
(f)T
(e) linearly dependent
7.
(b) T (c) F (d) F
(h) T (i) F (j) T
(c) Yes (e) No
(c) Yes (e) No
(e)T
(k)T
(f)F
(1)T
SECTION 1.6
1. (a) F
(g)F
2. (a) Yes
3. (a) No
4. No
5. No
7. {u\,U2,ua}
9. (oi,02,03,04) = aioi + (02 — Oi)«2 + (o.j — a2)«3 + (04 — a,i)u4
10. (a) -4x2 - x + 8 (c) -x3 + 2x2 + 4x - 5
13. {(1,1,1)}
15. n2 - 1
17. |n(n-l)
26. n
30. dim(Wi) = 3, dim(W2) = 2, dim(Wi + W2) = 4, and dim(Wi n W2) = 1
SECTION 1.7
1. (a) F (b) F (c)F (d)T (e)T (f)T
CHAPTER 2
SECTION 2.1
1. (a) T (b) F (c)F (d)T (e)F (f)F (g) T (h) F

Answers to Selected Exercises 573
2. The nullity is 1, and the rank is 2. T is not one-to-one but is onto.
4. The nullity is 4, and the rank is 2. T is neither one-to-one nor onto.
5. The nullity is 0, and the rank is 3. T is one-to-one but not onto.
10. T(2,3) = (5,11). T is one-to-one. 12. No.
SECTION 2.2
1. (a) T (b) T (c) F
(2 -1
2. (a) 3 4 (c) (2 1
\i 0/
/0 0 ••• 0 1
(f)
0 0 ••• 1 0
0 1 ••• 0 0
\1 0 ••• 0 0/
/-* -1
3. mj = 0 1 and [T];
V I o/
/l 0 0 0
5. (a)
0 0 10
0 10 0
(b)
\0 0 0 1/
/l 1 0 ••• ON
10.
0 1 1 ••• 0
0 0 1 ••• 0
0 0 0 -•- 1
\0 0 0 ••• 1/
SECTION 2.3
1. (a) F (b) T (c) F
(g) F (h) F (i) T
2. (a)A(2£ + 3C7)=(2° ^
,.v „•„ /23 19 0
/2 3 0
3. (a)[T]^= 0 3 6 , [U]J
\0 0 4/
4. (a)
{ l
-1
4
< 6y
(c) (5)
(d)T
-3)
(g)(l
(~
l=

(0 1
2 2
0 0
^0 0
(d)T
(J)T
i)
and
/I
= 0
V
(e)T
/ 0 '.
f)F
I 1
(d) -1 4 5
V i o i/
0 •-• 0 1
3 3
2 3
2 *-;
3 3/
o
2
0
(e)
2/
)
/ 1
-2
0
V v
(e) F (f) F
and A(BD)
CB = (27 7
1 0
0 1 , anc
-1 o)
-(•£)
9)
/2 6
[UT]2 =00
\2 0
6
4
-6

574 Answers to Selected Exercises
12. (a) No. (b) No.
SECTION 2.4
1. (a) F
(g)T
2. (a) No
3. (a) No
19. (a) [T]0 =
SECTION 2.5
(d)F
(d) No
(d) No
(e)T
(e) No
(f)F
(f) Yes
7. (a)T(x,y) =
1+m2
((1 - m2)x + 2my, 2mx + (m2 - l)y)
SECTION 2.6
1. (a) F (b) T (c) T (d) T (e) F (f) T (g) T (h) F
2. The functions in (a), (c), (e), and (f) are linear functionals.
3. (a) h(x,y,z) = x- \y, h(x,y,z) = \y, and f3ix,y,z) = -x + z
5. The basis for V is {pi(x),P2(x)}, where pi(x) = 2 - 2x and p?(x) = -5 + x.
7. (a) T'(f) = g, where g(o + bx) = -3a - 46
w vf.=(zi;) (o[T]s=(-; -f)

Answers to Selected Exercises
SECTION 2.7
1. (a) T (b) T (c) F (d) F (e) T (f) F (g) T
2. (a) F (b) F (c) T (d) T (e) F
3. (a) {e-*,te-*} (c) {e'Ste-*,**,*^} (e) {e~*,e* cos 2t,etsin2t}
575
4. (aHe'14-^2,^1-^/2} (c){l,e -4* „-2t
CHAPTER 3
SECTION 3.1
1. (a) T (b) F (c) T (d) F (e) T (f) F
(g) T (h) F (i) T
2. Adding —2 times column 1 to column 2 transforms A into B.
/0 0 1\ /l 0 0
3. (a) (0 1 0 (c) (0 1 0)
\1 0 0/ \2 0 1/
(e) F (f) T
-3-
5. (a) The rank is 2, and the inverse is I .
(c) The rank is 2, and so no inverse exists.
(e) The rank is 3, and the inverse is
(g) The rank is 4, and the inverse is
6. (a) T_1 (ax2 + bx + c) = -ax2 - (4a + b)x - (10a + 2b + c)
(c) T"1(a,6,c) = (la-+\c,\a-\c,-\ + + §c)
(e) T~1(a,b,c) = (\a-b+\c)x2 + (-\a+\c)x + b
(
1 0 0\ (1 0 0\ /l 0 0\ (1 2 0\/l 0
0 1 Oil 1 00 -2 OllO 1 Olio 1
i o l/lo o i/\o o 1/ \o o 1/ Vo -l
°
0
1/
(1 0 1
10 10
Vo o i

576 Answers to Selected Exercises
20. (a)
/ 1 3 0 0 0
-2 10 0 0
1 0 0 0 0
0-2000
V 0 10 0 0/
SECTION 3.3
1. (a) F (b) F (c) T (d) F
2. (a)
(e)F (f)F (g)T (h)F
(e)
3. (a)
(e)
(g)
(g)
(~3
1
1
V <v
( l
-1
V i)

>
.
+ t
-3'
t G R
1
^
(1
0
0
w
+ r
(~2
1
0
V oy1
+ .s
(3
0
1
w
+ t
(-l
0
0
^ 1/
: r, s, t G It
0
0
w
+ r
1
0
: r, .s, G /?
4. (b) (1) .4-1 =
6. T-1{(1,H)} = ^
2
9
2 1
3 9/
(2) G-
' 3N
-2,
+ t -1 : t G /?
V 0/
7. The systems in parts (b), (c), and (d) have solutions.
11. The farmer, tailor, and carpenter must have incomes in the proportions 4:3:4.
13. There must be 7.8 units of the first commodity and 9.5 units of the second.
SECTION 3.4
1. (a) F (b) T (c) T (d) T (e) F (f) T (g) T
•L

Answers to Selected Exercises 577
2. (a) -3 (c) (e)
f
[
/4
0
1
w
(f
H
w
+ 8
/1
U
2
w
^
: r,s e R
t
(g) <
/-23
0
7
9
V <V
+ r
/A
1
0
0
\o)
+ s
/-23
0
6
9
I l)
: r,s £ R
1
(i)
s
( 2
0
0
-1
l °y
+ r
/0
2
1
0
VV
+ s
( l
-4
0
-2
V V
1
: r, s G .ff >
J
4. (a) ^
3
1
3
0
I w
+ t
( l
-1
1
\ z)
: te R
,
(c) There are no solutions.
/ 1 0 2 1 4
5. ( -1 -1 3 -2 -7
V 3 11 0 -9/
7. {«i,«2,«5}
11. (b) {(1,2,1,0,0), (2,1,0,0,0), (1,0,0,1,0), (-2,0,0,0,1)}
13. (b) {(1,0,1,1,1,0), (0,2,1,1,0,0), (1,1,1,0,0,0), (-3,-2,0,0,0,1)}
CHAPTER 4
SECTION 4.1
1. (a) F (b) T (c) F
2. (a) 30 (c) -8
3. (a) -10 + 15i (c) -24
4. (a) 19 (c) 14
(d) F (e) T
SECTION 4.2
1. (a) F (b) T
3. 42 5. -12
13. -8 15. 0
(c)T
7. -12
17. -49
(d)T
9. 22
19. -28
(e)F
11. -3
-i 2
(f) F (g) F (h) T
21. 95

578 Answers to Selected Exercises
SECTION 4.3
1. (a) F (b) T (c) F (d) T (e) F (f) T
3. (4,-3,0) 5. (-20,-48,-8) 7. (0.-12. 16)
(g) F (h) F
24. tn +ari-it'
26- <•> (-£:
(e)
H +tti< + ao
/10 0
(c) 0 -20
\ 0
-An
An
-3i 0 0
4 -1 + i 0
10+16i -5-3i 3 + 3i,
0 -8
(g)
18 28
-20 -21
-6N
37
48 14 -16,
SECTION 4.4
1. (a) T (b) T (c) T (d) F (e) F
(g) T (h) K (i) T (j) T (k) T
2. (a) 22 (c) 2 - 4i
3. (a) -12 (c) -12 (e) 22 (g) -3
4. (a) 0 (c) -49 (e) -28 - i (g) 95
SECTION 4.5
1. (a) F (b) T (c) T (d) F (e) F
3. No 5. Yes 7. Yes 9. No
(f)T
(f)T
CHAPTER 5
SECTION 5.1
1. (a) F (b) T (c) T
(g) F (h) T (i) T
2. (a) [Tk3=(_$ o)' "°
(e) [TJ
(d) F (e) F (f) F
(j) F (k) F
(-1 0 0N
(c) [T]3 =01 0
V 0 0 -1,
o
0
yes
no
-1 1 0
0 -1 1
0 0-10
0 0 0 -1)
3. (a) The eigenvalues are 4 and — 1, a basis of eigenvectors is
'2\ ( lM ^ (2 L\ , „ (
3y v-vr " vs -v an<1 D=v)
(c) The eigenvalues are 1 and —1, a basis of eigenvectors is
1 M „ / 1 1
, „ i . •, and D = ,
— 11 \-l —1/1 VI —i -l—i/ \Q -1

~
Answers to Selected Exercises 579
4. (a) A = 3,4
(b) A = -1,1,2
(f) A = 1,3
(h) A = -1,1,1,1 0 =
(i) A = 1,1, -1,-1 (3 =
(j)A = -l,l,5 (3 =
26. 4
SECTION 5.2
1. (a) F (b) F (c) F
(g) T (h) T (i) F
/? = {(3,5),(1,2)}
/? = {(1,2,0),(1,-1,-1),(2,0,-1)}
(3 = {-2 + x, -4 + x2, -8 + x3, x}
-1 0\ /0 l\ /l 0
o 1/ ' \o oy ' [o i
1 0\ /0 l\ (-1 0
i o) ' vo iy ' v i o
0 A (1 0\ /0 1
-l oj'lo -l/'ll 0
0 0
1 0
0 -1
0 1
1 0
0 1
(f)F (d) T (e) T
2. (a) Not diagonalizable (c) Q = I J
/ i i i
(e) Not diagonalizable (g) Q = 2 -1 0
V-l 0 -l)
3. (a) Not diagonalizable (c) Not diagonalizable
(d) (3 = {x - x2,1 - x - x2, x + x2} (e) (3 = {(1,1), (1, -1)}
7 4«_l/5n + 2(-ir 2(5")-2(-l)"
3\5n-(-l)n 2(5)n + (-l)n
14. (b)x(t) = c1e3^-^+c2e-2^_11)
(c)x(t)=e'
CI0+C2[
+ c3e
2til
SECTION 5.3
1. (a) T (b) T (c) F (d) F (e) T
(g)T (h)F (i)F (j)T
2. (a)
0 0
0 0
(c)
/-I 0 -1
(g) -4 1 -2
\ 2 0 2/
6. One month after arrival, 25% of the patients have recovered, 20% are ambu
latory, 41% are bedridden, and 14% have died. Eventually || recover and |£
die.

580 Answers to Selected Exercises
7. f.
8. Only the matrices in (a) and (b) are regular transition matrices.
9. (a)
(e)
10. (a)
(c)
(e)
l
3
l
2
3
1
3
)

0
0
V
(c) No limit exists.
(g)
0
0
l
2
'0.225'
0.441
^0.334,
'0.372^
0.225
v0.403,
'0.329N
0.334
,0.337,
after two stages and
after two stages and
'0.50N
0.20
.0.30,
eventually
eventually
after two stages and l eventually
12. yjj new, JQ once-used, and j§ twice-used
13. In 1995, 24% will own large cars, 34% will own intermediate-sized cars, and
42% will own small cars; the corresponding eventual percentages are 10%, 30%,
and 60%.
20. e° = I and e7 = el.
SECTION 5.4
1. (a) F (b) T (c) F (d) F (e) T (f) T
2. The subspaces in (a), (c), and (d) are T-invariant.
A\ /i\ / i
(g)T
6. (a) (c)
0 0-1
0 ' 1 ' 2
iw W V 2/
9. (a) -tit2 - 3t + 3) (c) 1 - t
10. (a) tit - l)(t2 - 3t + 3) (c) (t - l)3(t
'2 -2 -4
1 3
1)
1
18. (c) A'1 = ± I 0
z \0 0 -2

Answers to Selected Exercises
31. (a) t2 - 6t + 6 (c) -(t + l)(t2 - Qt + 6)
581
CHAPTER 6
SECTION 6.1
1. (a) T (b) T (c) F (d) F (e) F (f) F (g) F
2. (x,y) = 8 + 5i, \x\ = V7, \y\ = vT4, and \x + y\ = >/&.
(h)T
e'-l ll + 3e^
3- (/,o) = 1,||/|| = ^, 11011 = ^^-^, and||/ + o|
16. (b) No
SECTION 6.2
1. (a) F (b) T (c) T (d) F (e) T (f) F (g) T
2. For each part the orthonormal basis and the Fourier coefficients are given.
(b) {f (1,1,1), f (-2,1,1), f (0, -1,1)}; 2#, -f, f.
(c){l,2V3(x-i),6V5(x2-x+i)}; f, ^, 0.
(e) {±(2,-1,-2,4), ^(-4,2,-3,1), ^(-3,4,9,7)}; 10, 3^, vTo5
«fi(-! O'ifcO -8-iJlG =»)' iM~^
(i){yfSint,^cost,^T(l-lsint),v/^'(t+Jcost-f)};
Ji(2n + 2), -4yjl, y/^(l + n), y^
(k) {^(-4,3-2i,i,l-4i),^(3-i,-5i,-2 + 4i,2 + i),
-^(-17 - i, -9 + 8i, -18 + 16i, -9 + 8i)};
y/4l(-l - i), V60i-1 + 2i), V/Il60(l + i)
(m) {^I (a-V i
1 -4% 11 — 9i
-5-118i -7-26i
^39063 V ~uu _58
; v/18(2 + i), v/246(-l-i), 0
4. 5-L=span({(i,-i(l+i),l)})
5. SQ is the plane through the origin that is perpendicular to xo; S1' is the line
through the origin that is perpendicular to the plane containing xi and x2.
29
19- w h ( *
(b) * I"
M- « 75

582 Answers to Selected Exercises
•
SECTION 6.3
1. (a) T (b) F (c) F (d) T (e) F (f) T (g) T
2. (a) y = (1, -2,4) (c) y = 210x2 - 204x + 33
3. (a) T* (x) = (11, -12) (c) r (/(*)) = 12 + 6t
14. T*(x) = (x,z)y
20. (a) The linear function is y = —2t + 5/2 with £7 = 1, and the quadratic
function is y = t2/3 - 4t/3 + 2 with £7 = 0.
(b) The linear function is y = 1.25t + 0.55 with E = 0.3, and the quadratic
function is t2/56 + 15t/14 + 239/280 with E = 0.22857 (approximation).
21. The spring constant is approximately 2.1.
22. (a) x = f, y = f, z « \ (d) x = £, y = i, z = \, w = -^
SECTION 6.4
1. (a) T (b) F (c) F (d) T (e) T (f) T (g) F
2. (a) T is self-adjoint. An orthonormal basis of eigenvectors is
(h)T
(
—T=(1, —2), —F(2, 1) \, with corresponding eigenvalues 6 and 1.
v5 V5
(c) T is normal, but not self-adjoint. An orthonormal basis of eigenvectors
is
< -(1 + i, V2), -(1 + i, —V2) > with corresponding eigenvalues
2 + l+iand2_i+!.
V2 V2
(e) T is self-adjoint. An orthonormal basis of eigenvectors is
V2V1 oJ'^Vo V'v^V1 0/'V2
with corresponding eigenvalues 1, 1, —1, 1.
SECTION 6.5
(d)T
-1 0
0 1
l. (a) T
(g)F
(b)F
(h)F
(c)F
0)F
(e) F (f) T
••«'-K-i
and D =
3 0N
0 -1
(d)P = .1
73
1
V6
v^3
1 1
V6 v^3
V 0 -is j-J
(-2 0 0N
and D= I 0 -2 0
0 0 4,
4. T2 is normal for all z G C, T2 is self-adjoint if and only if z G .ft, and T2 is
unitary if and only if |.z| = 1.
5. Only the pair of matrices in (d) are unitarily equivalent.

Answers to Selected Exercises
25. 20-0)
26. (a)tf-f (b)V+!
27. (a)x=-^x' + -^2/' and y = -L*' - ±y'
583
The new quadratic form is 3(x') — (y') .
(c) x = —T=X H—7=2/ and y = —-=x H—p=y
V ' vT3 vT3 \/l3 \/l3
The new quadratic form is 5(x') — 8(y') .
29. (c) Q =
/ l
1 _ 1
75 73
1
V3
6
6
and R =
/v/2 V2 2v^
0 \/3 ^
V 0 0 ^y 0 4- ^,
\ U 73 3 /
(e) xi = 3, X2 = -5, X3 = 4
SECTION 6.6
1. (a) F (b) T (c) T (d) F (e) F
2. ForW = span({(l,2)}), Fh = (\ \ )•
3. (2) (a) Ti(o,6) = |(a + 6,o + 6) andT2(a,6) = ±(a-o,-a + &)
(d) Ti(a,6,c) = i(2a-6-c,-a + 26-c,-a-6 + 2c) and
T2(a, b,c) = |(a + 6 + c,a + b + c,a + b + c)
SECTION 6.7
1. (a) F (b) F (c) T (d) T (e) F (f) F (g) T
2-(a)ui=G)'u2=(i)' ui=^(ij'M2=^(_j)'u3=^
<7i = V3, <?2 = V2
1 \ 1 • ! !
(c) f 1 = —= sin x, i>2 = —7= cos x, U3 = —==.
V71" v71" V 27T
tii =
cos x + 2 sin x
-, U2 =
2 cos x — sin x
\/57r \/57r
O"! = V5, (72 = VO) 0"3 = 2
-, u3 =
v^F'
3. (a)
h
/ 1 1 _=
'73 75 76
1 _ 1 1
73 72 76
V-73 ° TJ
VS 0^
0 0
. 0 0,
1 1
75 75
1 i_
V2 V2

584 Answers to Selected Exercises
(c)
l
v/lll
I
vTT)
V ~4U
1
V'2
I
V2
-2* o -
i
0
0
1
73
_u
2
vTo
2
yio
1
v2
y/2
(e)
2
1-t
2
i±i
2
-I 4-i
2
>/« 0
0 0
2
^6
l + i
4. (a) WP =
1 J_\ / s/&+s/2
V2 75 \ / 2
1 1 J 1 -V§+^2
75 N/2/ \ 2
1 -i
2
-N/S+V/2
2
2
5. (a) Tf(x,y,2) =
x + U + z y - z
(c) T^a + 6 sin a: + ccosx) = T"
a
6sinx + ccosx) =
a (26 + c) sin re + (— 6 + 2c) cos x
2+" "I"
*«SG
I -i
I -l
o>!"
-2 3 1
3 -2 1 wSCr1
1 +i
i
7. (a) Zi = N(T)1 =R2 and Z2 = R(T) = span{(l, 1,1),(0,1,-1)}
(c) Zx = N(T)l = V and Z2 = R(T) - V
8. (a) No solution
(e) T (f) F
2 VI
SECTION 6.8
1. (a) F (b) K (c) T (d) F
(g)F (h)F (i)T (j)F
4. (a) Yes (b) No (c) No (d) Yes
5. (a)
17. (a) and (b)
18. Same as Exercise 17(c)
22. (a) Q= (i ""?! i.i
f\ 0 0 l
0 0 0 0
0 0 0 0
\1 0 0 1/
(c)
(c)
(e) Yes (f) No
/ 0 0 0 ON
10-40
0 0 0 0
\-2 0 -8 0/
0
(b) Q =
1 -*
and D =
/0 0 1 N
(c) Q = I 0 1 -0.25
\1 0 2 ;
and
-1 0 0 N
0 4 0
0 0 6.75,
/1 >
V2
0
UJ
M
1
w
»
V2
0
V~3j/

Answers to Selected Exercises
SECTION 6.9
/
7. (P.)"1 =
vT^
VvT^
0 0
1 0
0 1
0 0
V
vT
s/T^v2/
SECTION 6.10
1. (a) F (b) F (c) T (d) F (e) F
2. (a) \/l8 (c) approximately 2.34
4. (a) \A\ ft* 84.74, ||A_1|| « 17.01, and cond(A) « 1441
(b) ||x - A~lb\ < ||A-11| • \Ax - 6|| a 0.17 and
^-^l6"<cond(A)"6-f'U^l
IA-1?
5. 0.001 < "X„ „*" < 10
585
6. ft -2 = -, ||P|| = 2, and cond(P) = 2.
SECTION 6.11
1. (a) F (b) T (c) T (d) F (e) T
(g)F (h)F (i)T (j)F
(f)F
3- (b) {t ( y 1 : / C H
7. (c) There are six possibilities:
(1) Any line through the origin if 0 = ip = 0
(2) < t I 0 I : t e ft > if 0 = 0 and tp = rr
t G ft \ if 0 # 0
(
/cos ^ + F
t I -sirn/-
(4) i t I cos 0 - 1
t G ft > if0 = 7T and tp ^ ir
t G ft > if^ = 7r and 0 ^ n
(5) < t 1 : t G ft > if 0 = ^ = 7T

586 Answers to Selected Exercises
{
/sin 0(cos i\> + 1)N
t I — sin0sint/>
\sint/>(cos0 + 1)
t e ft otherwise
CHAPTER 7
SECTION 7.1
1. (a) T (b) F
2. (a) For A = 2,
(c) For A = -1,
(2 1 0N
3. (a) For A = 2, {2,-2x,x2} J = I 0 2 1
\0 0 2,
(f) F (g) T (h) T
J =
(c) For A = 1,
SECTION 7.2
1. (a) T (b) T
/Ax O O
2. J= I O A2 O
\0 O A:i
o o)
n
(o o) C0)
J =
A
0
0
V"
1 0
1 0
0 1
0 0
0^
0
1
(c)F (d)T (e)T (f)F (g) F
(2 1 0 0 0 0
0 2 10 0 0
0 0 2 0 0 0
0 0 0 2 10
0 0 0 0 2 0
\0 0 0 0 0 2/
where A\ =
A2 =
/4 1 o ON
0 4 10
0 0 4 0
\0 0 0 4/
and A3 =
-3 0
0 -3
3. (a) -(t-2)5(t-3)' (b)
A, =2 Ao = 3
(h)T
(c) A2 = 3 (d) pi = 3 and p2 = 1
(e) (i) rank(Ui) = 3 and rank(U2) = 0
(ii) rank(U?) = 1 and rank(Ui) = 0
(iii) nullity(Ui) = 2 and nullity(U2) = 2
(iv) nullity(U?) = 4 and nullity(Ui) = 2

Answers to Selected Exercises 587
/I 0 ON
4. (a) J = I 0 2 1
\0 0 2/
/o i o ON
0 0 0 0
and Q =
(d) J =
5. (a) J =
(c) J =
(d)J =
0 =
0 0 2 0
\0 0 0 2/
(1 1 0 ON
0 110
0 0 10
\0 0 0 2/
/2 1 0 0
0 2 0 0
0 0 2 1
\Q 0 0 2/
/2 1 0 ON
0 2 10
0 0 2 0
\0 0 0 4/
1 oWo 1
o oj'li o
1 l lN
2 12
1 -1 o)
(1 0
l -1
1 -2
\i o
1
0
0
1
-IN
l
l
0/
and Q =
and (3={2et,2tet,t2et,e2t}
and /?= {6x,x3,2,x2}
and
ci + c2t) j 0 j + c2 1
(c! + c2t + c3r) 0 + (c2 + 2c3t) + 2c3
SECTION 7.3
1. (a) F (b) T (c) F (d) F (e) T (f) F
(g) F (h) T (i) T
2. (a) it - l)(t - 3) (c) (t - l)2(t - 2) (d) it - 2)2
3. (a) t2 - 2 (c) (t - 2)2 (d) (t -l)it+ 1)
4. For (2), (a); for (3), (a) and (d)
5. The operators are To, I, and all operators having both 0 and 1 as eigenvalues.
SECTION 7.4
1. (a) T (b) F
2. (a)
(e) T (f) F (g) T
(c)
±(-l + n/3)
0 \i-l-iV3)
(e)
/o
1
0
\o
-2
0
0
0
0
0
0
1
ON
0
-3
0/

3. (a) t2 + 1 and t2 C =
(c) t2 ~ t + 1 C -
P =
1 0
0 0
/() -1 0 0
1 0 0 0
0 0 0 0
Vo o o o/
(0 -1 0 0
110 0
0 0 0-1
\0 0 1 1/
0 ON (0 1\ /0 0
-1 oj'vo oj'lo -1
(3= {l,x,-2x + x2,-3x + x3}
588

Index 589
Index
Absolute value of a complex num
ber, 558
Absorbing Markov chain, 304
Absorbing state, 304
Addition
of matrices, 9
Addition of vectors, 6
Additive function, 78
Additive inverse
of an element of a field, 553
of a vector, 12
Adjoint
of a linear operator, 358-360
of a linear transformation, 367
of a matrix, 331, 359-360
uniqueness, 358
Algebraic multiplicity of an eigen
value, see Multiplicity of an
eigenvalue
Algebraically closed field, 482, 561
Alternating n-linear function, 239
Angle between two vectors, 202,
335
Annihilator
of a subset, 126
of a vector, 524, 528
Approximation property of an or
thogonal projection, 399
Area of a parallelogram, 204
Associated quadratic form, 389
Augmented matrix, 161, 174
Auxiliary polynomial, 131, 134, 137-
140
Axioms of the special theory of
relativity, 453
Axis of rotation, 473
Back substitution, 186
Backward pass, 186
Basis, 43-49, 60-61, 192-194
cyclic, 526
dual, 120
Jordan canonical, 483
ordered, 79
orthonormal, 341, 346-347, 372
rational canonical, 526
standard basis for Fn, 43
standard basis for Pn(P), 43
standard ordered basis for F",
79
standard ordered basis for P„(F),
79
uniqueness of size, 46
Bessel's inequality, 355
Bilinear form, 422-433
diagonalizable, 428
diagonalization, 428-435
index, 444
invariants, 444
matrix representation, 424-428
product with a scalar, 423
rank, 443
signature, 444
sum, 423
symmetric, 428-430, 433-435
vector space, 424
Cancellation law for vector addi
tion, 11
Cancellation laws for a field, 554
Canonical form
Jordan, 483-516
rational, 526-548
for a symmetric matrix, 446
Cauchy-Schwarz inequality, 333
Cayley-Hamilton theorem
for a linear operator, 317
for a matrix, 318, 377
Chain of sets, 59
Change of coordinate matrix, 112-
115
Characteristic of a field, 23, 41,
42, 430, 449, 555
Characteristic polynomial, 373

590 Index
1
of a linear operator, 249
of a matrix, 248
Characteristic value, see Eigenvalue
Characteristic vector, see Eigen
vector
Classical adjoint
of an n x n matrix, 231
of a 2 x 2 matrix, 208
Clique, 94, 98
Closed model of a simple econ
omy, 176-178
Closure
under addition, 17
under scalar multiplication, 17
Codomain, 551
Coefficient matrix of a system of
linear equations, 169
Coefficients
Fourier, 119
of a differential equation, 128
of a linear combination, 24, 43
of a polynomial, 9
Cofactor, 210, 232
Cofactor expansion, 210, 215, 232
Column of a matrix, 8
Column operation, 148
Column sum of matrices, 295
Column vector, 8
Companion matrix, 526
Complex number, 556 561
absolute value, 558
conjugate, 557
fundamental theorem of alge
bra, 482, 560
imaginary part, 556
real part, 556
Composition
of functions, 552
of linear transformations, 86
89
Condition number, 469
Conditioning of a system of linear
equations, 464
Congruent matrices, 426, 445, 451
Conic sections, 388 392
Conjugate linear property, 333
Conjugate of a complex number,
557
Conjugate transpose of a matrix,
331, 359 360
Consistent system of linear equa
tions, 169
Consumption matrix, 177
Convergence of matrices, 284 288
Coordinate function, 119 120
Coordinate system
left-handed, 203
right-handed, 202
Coordinate vector, 80, 91, 110-
111
Corresponding homogeneous sys
tem of linear equations, 172
Coset, 23, 109
Cramer's rule, 224
Critical point, 439
Cullen, Charles G., 470
Cycle of generalized eigenvectors,
488-491
end vector, 488
initial vector, 488
length, 488
Cyclic basis, 526
Cyclic subspace, 313-317
Degree of a polynomial, 10
Determinant, 199-243
area of a parallelogram, 204
characterization of, 242
cofactor expansion, 210, 215, 232
Cramer's rule, 224
of an identity matrix, 212
of an invertible matrix, 223
of a linear operator, 258, 474,
476-477
of a matrix transpose, 224
of annxn matrix, 210, 232
n-dimensional volume, 226
properties of, 234 236
of a square matrix, 367, 394
of a 2 x 2 matrix, 200
uniqueness of, 242
of an upper triangular matrix,
218

Index 591
volume of a parallelepiped, 226
Wronskian, 232
Diagonal entries of a matrix, 8
Diagonal matrix, 18, 97
Diagonalizable bilinear form, 428
Diagonalizable linear operator, 245
Diagonalizable matrix, 246
Diagonalization
of a bilinear form, 428-435
problem, 245
simultaneous, 282, 325, 327, 376,
405
of a symmetric matrix, 431-433
test, 269, 496
Diagonalize, 247
Differentiable function, 129
Differential equation, 128
auxiliary polynomial, 131, 134,
137-140
coefficients, 128
homogeneous, 128,137-140, 523
linear, 128
nonhomogeneous, 142
order, 129
solution, 129
solution space, 132, 137-140
system, 273, 516
Differential operator, 131
null space, 134-137
order, 131, 135
Dimension, 47-48, 50-51,103,119,
425
Dimension theorem, 70
Direct sum
of matrices, 320-321, 496, 545
of subspaces, 22, 58, 98, 275-
279, 318, 355, 366, 394, 398,
401, 475-478, 494, 545
Disjoint sets, 550
Distance, 340
Division algorithm for polynomi
als, 562
Domain, 551
Dominance relation, 95-96, 99
Dot diagram
of a Jordan canonical form, 498-
500
of a rational canonical form, 535 -
539
Double dual, 120, 123
Dual basis, 120
Dual space, 119-123
Economics, see Leontief, Wassily
Eigenspace
generalized, 485-491
of a linear operator or matrix,
264
Eigenvalue
of a generalized eigenvector, 484
of a linear operator or matrix,
246, 371-374, 467-470
multiplicity, 263
Eigenvector
generalized, 484-491
of a linear operator or matrix,
246, 371-374
Einstein, Albert, see Special the
ory of relativity
Element, 549
Elementary column operation, 148,
153
Elementary divisor
of a linear operator, 539
of a matrix, 541
Elementary matrix, 149-150, 159
Elementary operation, 148
Elementary row operation, 148, 153,
217
Ellipse, see Conic sections
Empty set, 549
End vector of a cycle of general
ized eigenvectors, 488
Entry of a matrix, 8
Equality
of functions, 9, 551
of matrices, 9
of n-tuples, 8
of polynomials, 10
of sets, 549
Equilibrium condition for a sim
ple economy, 177
Equivalence relation, 107, 551
congruence, 449, 451

592 Index
unitary equivalence, 394, 472
Equivalent systems of linear equa
tions, 182-183
Euclidean norm of a matrix, 467-
470
Euler's formula, 132
Even function, 15, 21, 355
Exponential function, 133 140
Exponential of a matrix, 312, 515
Extremum, see Local extrcmum
Field, 553 555
algebraically closed, 482, 561
cancellation laws, 554
characteristic, 23, 41, 42, 430,
449, 555
of complex numbers, 556 561
product of elements, 553
of real numbers, 549
sum of elements, 553
Field of scalars, 6-7, 47
Finite-dimensional vector space, 46-
51
Fixed probability vector, 301
Forward pass, 186
Fourier, Jean Baptiste, 348
Fourier coefficients, 119, 348, 400
Frobenius inner product, 332
Function, 551-552
additive, 78
alternating n-linear, 239
codomain of, 551
composite, 552
coordinate function, 119-120
differentiablc, 129
domain of, 551
equality of, 9, 551
even, 15, 21, 355
exponential, 133-140
image of, 551
imaginary part of, 129
inverse, 552
invertible, 552
linear, sec Linear transforma
tion
?>.-linear, 238 242
norm, 339
odd, 21, 355
one-to-one, 551
onto, 551
polynomial, 10, 51-53, 569
preimage of, 551
range of, 551
real part of, 129
restriction of, 552
sum of, 9
vector space, 9
Fundamental theorem of algebra,
482, 560
Gaussian elimination, 186 187
back substitution, 186
backward pass, 186
forward pass, 186
General solution of a system of
linear equations, 189
Generalized eigenspace, 485-491
Generalized eigenvector, 484-491
Generates, 30
Generator of a cyclic subspace, 313
Geometry, 385, 392, 436, 472 478
Gerschgorin's disk theorem, 296
Gram Schmidt process, 344, 396
Gramian matrix, 376
Hardy Weinberg law, 307
Hermitian operator or matrix, see
Self-adjoint linear operator
or matrix
Hessian matrix, 440
Homogeneous linear differential equa
tion, 128, 137 140, 523
Homogeneous polynomial of de
gree two, 433
Homogeneous system of linear equa
tions, 171
Hooke's law, 128, 368
Householder operator, 397
Identity element
in C, 557
in a field, 553, 554
Identity matrix, 89, 93, 212
Identity transformation, 67
Ill-conditioned system, 464

Index 593
Image, see Range
Image of an element, 551
Imaginary number, 556
Imaginary part
of a complex number, 556
of a function, 129
Incidence matrix, 94-96, 98
Inconsistent system of linear equa
tions, 169
Index
of a bilinear form, 444
of a matrix, 445
Infinite-dimensional vector space,
47
Initial probability vector, 292
Initial vector of a cycle of gener
alized eigenvectors, 488
Inner product, 329-336
Frobenius, 332
on H, 335
standard, 330
Inner product space
complex, 332
H, 332, 343, 348-349, 380, 399
real, 332
Input-output matrix, 177
Intersection of sets, 550
Invariant subspace, 77-78. 313
315
Invariants
of a bilinear form, 444
of a matrix, 445
Inverse
of a function, 552
of a linear transformation, 99
102, 164-165
of a matrix, 100 102, 107, 161-
164
Invertible function, 552
Invertible linear transformation, 99
102
Invertible matrix, 100 102, 111,
223, 469
Irreducible polynomial, 525, 567
569
Isometry, 379
Isomorphic vector spaces, 102 105
Isomorphism, 102-105, 123, 425
Jordan block, 483
Jordan canonical basis, 483
Jordan canonical form
dot diagram, 498 500
of a linear operator, 483-516
of a matrix, 491
uniqueness, 500
Kernel, sec Null space
Kronecker delta, 89, 335
Lagrange interpolation formula, 51—
53, 125, 402
Lagrange polynomials, 51, 109, 125
Least squares approximation, 360
364
Least squares line, 361
Left shift, operator, 76
Left-handed coordinate system, 203
Left-multiplication transformation,
92-94
Legendre polynomials, 346
Length of a cycle of generalized
eigenvectors, 488
Length of a vector, see Norm
Leontief
closed model, 176-178
open model, 178 179
Leontief, Wassily, 176
Light second, 452
Limit of a sequence of matrices,
284 288
Linear combination, 24-26, 28-30,
39
uniqueness of coefficients, 43
Linear dependence, 36 40
Linear differential equation, 128
Linear equations, see System of
linear equations
Linear functional, 119
Linear independence, 37 40, 59
61, 342
Linear operator, (see also Linear
transformation), 112
adjoint, 358 360
characteristic polynomial, 249

594 Index
determinant, 258, 474, 476 477
diagonalizable, 245
diagonalize, 247
differential, 131
differentiation, 131, 134-137
eigenspace, 264, 401
eigenvalue, 246, 371 -374
eigenvector, 246, 371-374
elementary divisor, 539
Householder operator, 397
invariant subspace, 77 78, 313-
315
isometry, 379
Jordan canonical form, 483-516
left shift, 76
Lorentz transformation, 454-461
minimal polynomial, 516-521
nilpotent, 512
normal, 370, 401-403
orthogonal, 379-385, 472-478
partial isometry, 394, 405
positive definite, 377-378
positive semidefinite, 377 378
projection, 398 403
projection on a subspace, 86,
117
projection on the x-axis, 66
quotient space, 325-326
rational canonical form, 526 548
reflection, 66, 113, 117, 387, 472-
478
right shift, 76
rotation, 66, 382, 387, 472-478
self-adjoint, 373. 401-403
simultaneous diagonalization, 282,
405
spectral decomposition, 402
spectrum, 402
unitary, 379-385, 403
Linear space, see Vector space
Linear transformation, (see also
Linear operator), 65
adjoint, 367
composition, 86-89
identity, 67
image, see Range
inverse, 99-102, 164-165
invertible, 99 102
isomorphism, 102-105, 123, 425
kernel, see Null space
left-multiplication, 92-94
linear functional, 119
matrix representation, 80, 88
92, 347, 359
null space, 67 69, 134-137
nullity, 69-71
one-to-one, 71
onto, 71
product with a scalar, 82
pseudoinverse, 413
range, 67-69
rank, 69-71, 159
restriction, 77 -78
singular value, 407
singular value theorem, 406
sum, 82
transpose, 121, 126, 127
vector space of, 82, 103
zero, 67
Local extremum, 439, 450
Local maximum, 439, 450
Local minimum, 439, 450
Lorentz transformation, 454-461
Lower triangular matrix, 229
Markov chain, 291, 304
Markov process, 291
Matrix, 8
addition, 9
adjoint, 331, 359-360
augmented, 161, 174
change of coordinate, 112-115
characteristic polynomial, 248
classical adjoint, 208, 231
coefficient, 169
cofactor, 210, 232
column of, 8
column sum, 295
companion, 526
condition number, 469
congruent, 426, 445, 451
conjugate transpose, 331, 359
360
consumption, 177

Index 595
convergence, 284-288
determinant of, 200, 210, 232,
367, 394
diagonal, 18, 97
diagonal entries of, 8
diagonalizable, 246
diagonalize, 247
direct sum, 320-321, 496, 545
eigenspace, 264
eigenvalue, 246, 467-470
eigenvector, 246
elementary, 149-150, 159
elementary divisor, 541
elementary operations, 148
entry, 8
equality of, 9
Euclidean norm, 467-470
exponential of, 312, 515
Gramian, 376
Hessian, 440
identity, 89
incidence, 94-96, 98
index, 445
input-output, 177
invariants, 445
inverse, 100-102, 107, 161-164
invertible, 100-102, 111, 223,
469
Jordan block, 483
Jordan canonical form, 491
limit of, 284-288
lower triangular, 229
minimal polynomial, 517-521
multiplication with a scalar, 9
nilpotent, 229, 512
norm, 339, 467-470, 515
normal, 370
orthogonal, 229, 382-385
orthogonally equivalent, 384-385
permanent of a 2 x 2, 448
polar decomposition, 411-413
positive definite, 377
positive semidefinite, 377
product, 87-94
product with a scalar, 9
pseudoinverse, 414
rank, 152-159
rational canonical form, 541
reduced row echelon form, 185,
190-191
regular, 294
representation of a bilinear form,
424-428
representation of a linear trans
formation, 80, 88-92, 347,
359
row of, 8
row sum, 295
scalar, 258
self-adjoint, 373, 467
signature, 445
similarity, 115, 118, 259, 508
simultaneous diagonalization, 282
singular value, 410
singular value decomposition, 410
skew-symmetric, 23, 229, 371
square, 9
stochastic, see Transition ma
trix
submatrix, 230
sum, 9
symmetric, 17, 373, 384, 389,
446
trace, 18, 20, 97, 118, 259, 281,
331, 393
transition, 288-291, 515
transpose, 17, 20, 67, 88, 127,
224, 259
transpose of a matrix inverse,
107
transpose of a product, 88
unitary, 229, 382-385
unitary equivalence, 384-385, 394,
472
upper triangular, 21, 218, 258,
370, 385, 397
Vandermonde, 230
vector space, 9, 331, 425
zero, 8
Maximal element of a family of
sets, 58
Maximal linearly independent sub
set, 59-61
Maximal principle, 59

596 Index
Member, see Element
Michelson-Morley experiment, 451
Minimal polynomial
of a linear operator, 516 521
of a matrix, 517-521
uniqueness, 516
Minimal solution to a system of
linear equations, 364-365
Monic polynomial, 567 569
Multiplicative inverse of an ele
ment of a field, 553
Multiplicity of an eigenvalue, 263
Multiplicity of an elementary di
visor, 539, 541
n-dimensional volume, 226
n-linear function, 238-242
n-tuple, 7
equality, 8
scalar multiplication, 8
sum, 8
vector space, 8
Nilpotent linear operator, 512
Nilpotent matrix, 229, 512
Nonhomogeneous linear differen
tial equation, 142
Nonhomogeneous system of linear
equations, 171
Nonnegative vector, 177
Norm
Euclidean, 467 470
of a function, 339
of a matrix, 339, 467-470, 515
of a vector, 333-336, 339
Normal equations, 368
Normal linear operator or matrix,
370, 401-403
Normalizing a vector, 335
Null space, 67 69, 134-137
Nullity, 69-71
Numerical methods
conditioning, 464
QR factorization, 396 397
Odd function, 21, 355
One-to-one function, 551
One-to-one linear transformation,
71
Onto function, 551
Onto linear transformation, 71
Open model of a simple economy,
178-179
Order
of a differential equation, 129
of a differential operator, 131,
135
Ordered basis, 79
Orientation of an ordered basis,
202
Orthogonal complement, 349, 352,
398 401
Orthogonal equivalence of matrices,
384-385
Orthogonal matrix, 229, 382 385
Orthogonal operator, 379-385, 472
478
on R2, 387-388
Orthogonal projection, 398-403
Orthogonal projection of a vector,
351
Orthogonal subset, 335, 342
Orthogonal vectors, 335
Orthonormal basis, 341, 346 347,
372
Orthonormal subset, 335
Parallel vectors, 3
Parallelogram
area of, 204
law, 2, 337
Parseval's identity, 355
Partial isometry, 394, 405
Pendular motion, 143
Penrose conditions, 421
Periodic motion of a spring, 127,
144
Permanent of a 2 x 2 matrix, 448
Perpendicular vectors, see Orthog
onal vectors
Physics
Hooke's law, 128, 368
pendular motion, 143
periodic motion of a spring, 144
special theory of relativity, 451-
461

n.
—»
T-
Index 597
spring constant, 368
Polar decomposition of a matrix,
411-413
Polar identities, 338
Polynomial, 9
annihilator of a vector, 524, 528
auxiliary, 131, 134, 137-140
characteristic, 373
coefficients of, 9
degree of a, 10
division algorithm, 562
equality, 10
function, 10, 51-53, 569
fundamental theorem of alge
bra, 482, 560
homogeneous of degree two, 433
irreducible, 525, 567-569
Lagrange, 51, 109, 125
Legendre, 346
minimal, 516-521
monic, 567-569
product with a scalar, 10
quotient, 563
relatively prime, 564
remainder, 563
splits, 262, 370, 373
sum, 10
trigonometric, 399
unique factorization theorem, 568
vector space, 10
zero, 9
zero of a, 62, 134, 560, 564
Positive definite matrix, 377
Positive definite operator, 377 378
Positive semidefinite matrix, 377
Positive semidefinite operator, 377-
378
Positive vector, 177
Power set, 59
Preimage of an element, 551
Primary decomposition theorem,
545
Principal axis theorem, 390
Probability, see Markov chain
Probability vector, 289
fixed, 301
initial, 292
Product
of a bilinear form and a scalar,
423
of complex numbers, 556
of elements of a field, 553
of a linear transformation and
scalar, 82
of matrices, 87-94
of a matrix and a scalar, 9
of a vector and a scalar, 7
Projection
on a subspace, 76, 86, 98, 117,
398-403
on the x-axis, 66
orthogonal, 398-403
Proper subset, 549
Proper value, see Eigenvalue
Proper vector, see Eigenvector
Pseudoinverse
of a linear transformation, 413
of a matrix, 414
Pythagorean theorem, 337
QR factorization, 396-397
Quadratic form, 389, 433-439
Quotient of polynomials, 563
Quotient space, 23, 58, 79, 109,
325-326
Range, 67-69, 551
Rank
of a bilinear form, 443
of a linear transformation, 69-
71, 159
of a matrix, 152-159
Rational canonical basis, 526
Rational canonical form
dot diagram, 535-539
elementary divisor, 539, 541
of a linear operator, 526-548
of a matrix, 541
uniqueness, 539
Rayleigh quotient, 467
Real part
of a complex number, 556
of a function, 129
Reduced row echelon form of a
matrix, 185, 190-191

598 Index
Reflection, 66, 117, 472-478
of R2, 113, 382-383, 387, 388
Regular transition matrix, 294
Relation on a set, 551
Relative change in a vector, 465
Relatively prime polynomials, 564
Remainder, 563
Replacement theorem, 45-46
Representation of a linear trans
formation by a matrix, 80
Resolution of the identity opera
tor, 402
Restriction
of a function, 552
of a linear operator on a sub-
space, 77 78
Right shift operator, 76
Right-handed coordinate system,
202
Rigid motion, 385 387
in the plane, 388
Rotation, 66, 382, 387, 472-478
Row of a matrix, 8
Row operation, 148
Row sum of matrices, 295
Row vector, 8
Rudin. Walter, 560
Saddle point, 440
Scalar, 7
Scalar matrix, 258
Scalar multiplication, 6
Schur's theorem
for a linear operator, 370
for a matrix, 385
Second derivative test, 439 443,
450
Self-adjoint linear operator or ma
trix, 373, 401-403, 467
Sequence, 11
Set, 549 551
chain, 59
disjoint, 550
element of a, 549
empty, 549
equality of, 549
equivalence relation, 107, 394,
449, 451
equivalence relation on a, 551
intersection, 550
linearly dependent, 36-40
linearly independent, 37-40
orthogonal, 335, 342
orthonormal, 335
power, 59
proper subset, 549
relation on a, 551
subset, 549
union, 549
Signature
of a bilinear form, 444
of a matrix, 445
Similar matrices, 115, 118, 259,
508
Simpson's rule, 126
Simultaneous diagonalization, 282,
325, 327, 376, 405
Singular value
of a linear transformation, 407
of a matrix, 410
Singular value decomposition of a
matrix, 410
Singular value decomposition the
orem for matrices, 410
Singular value theorem for linear
transformations, 406
Skew-symmetric matrix, 23, 229,
371
Solution
of a differential equation, 129
minimal, 364-365
to a system of linear equations,
169
Solution set of a system of linear
equations, 169, 182
Solution space of a homogeneous
differential equation, 132, 137
140
Space time coordinates, 453
Span, 30, 34, 343
Special theory of relativity, 451-
461
axioms, 453
Lorentz transformation, 454 461
space—time coordinates, 453

Index 599
time contraction, 459-461
Spectral decomposition, 402
Spectral theorem, 401
Spectrum, 402
Splits, 262, 370, 373
Spring, periodic motion of, 127,
144
Spring constant, 368
Square matrix, 9
Square root of a unitary operator,
393
Standard basis
for Fn, 43
for P«(F), 43
Standard inner product on Fn, 330
Standard ordered basis
for Fn, 79
for Pn(F), 79
Standard representation of a vec
tor space, 104-105
States
absorbing, 304
of a transition matrix, 288
Stationary vector, see Fixed prob
ability vector
Statistics, see Least squares ap
proximation
Stochastic matrix, see Transition
matrix
Stochastic process, 291
Submatrix, 230
Subset, 549
linearly dependent, 36-40
linearly independent, 59-61
maximal linearly independent,
59-61
orthogonal, 335, 342
orthogonal complement of a, 349,
352, 398- 401
orthonormal, 335
span of a, 30, 34, 343
sum, 22
Subspace, 16-19, 50-51
cyclic, 313-317
dimension of a, 50-51
direct sum, 22, 58, 98, 275-279,
318, 355, 366, 394,398,401,
475-478, 494, 545
generated by a set, 30
invariant, 77-78
sum, 275
zero, 16
Sum
of bilinear forms, 423
of complex numbers, 556
of elements of a field, 553
of functions, 9
of linear transformations, 82
of matrices, 9
of n-tuples, 8
of polynomials, 10
of subsets, 22
of vectors, 7
Sum of subspaces, (see also Direct
sum, of subspaces), 275
Sylvester's law of inertia
for a bilinear form, 443
for a matrix, 445
Symmetric bilinear form, 428-430,
433-435
Symmetric matrix, 17, 373, 384,
389, 446
System of differential equations,
273, 516
System of linear equations, 25-30,
169
augmented matrix, 174
coefficient matrix, 169
consistent, 169
corresponding homogeneous sys
tem, 172
equivalent, 182-183
Gaussian elimination, 186—187
general solution, 189
homogeneous, 171
ill-conditioned, 464
inconsistent, 169
minimal solution, 364-365
nonhomogeneous, 171
solution to, 169
well-conditioned, 464
T-annihilator, 524, 528
T-cyclic basis, 526
'

7
600
T-cyclic subspace, 313-317
T-invariant subspace, 77 78, 313-
315
Taylor's theorem, 441
Test for diagonalizability, 496
Time contraction, 459-461
Trace of a matrix, 18, 20, 97, 118,
259, 281, 331, 393
Transition matrix, 288-291, 515
regular, 294
states, 288
Translation, 386
Transpose
of an invertible matrix, 107
of a linear transformation, 121,
126, 127
of a matrix, 17, 20, 67, 88, 127,
224, 259
Trapezoidal rule, 126
Triangle inequality, 333
Trigonometric polynomial, 399
Trivial representation of zero vec
tor, 36-38
Union of sets, 549
Unique factorization theorem for
polynomials, 568
Uniqueness
of adjoint, 358
of coefficients of a linear com
bination, 43
of Jordan canonical form, 500
of minimal polynomial, 516
of rational canonical form, 539
of size of a basis, 46
Unit vector, 335
Unitary equivalence of matrices,
384-385, 394, 472
Unitary matrix, 229, 382 385
Unitary operator, 379-385, 403
Upper triangular matrix, 21, 218,
258, 370, 385, 397
Vandcrmonde matrix, 230
Vector, 7
additive inverse of a, 12
annihilator of a, 524, 528
column, 8
Index
coordinate, 80, 91, 110-111
fixed probability, 301
Fourier coefficients, 119, 348, 400
initial probability, 292
linear combination, 24
nonnegative, 177
norm, 333-336, 339
normalizing, 335
orthogonal, 335
orthogonal projection of a, 351
parallel, 3
perpendicular, see Orthogonal
vectors
positive, 177
probability, 289
product with a scalar, 8
Rayleigh quotient, 467
row, 8
sum, 7
unit, 335
zero, 12, 36-38
Vector space, 6
addition, 6
basis, 43-49, 192-194
of bilinear forms, 424
of continuous functions, 18, 67,
119, 331, 345, 356
of cosets, 23
dimension, 47-48, 103, 119, 425
dual, 119 123
finite-dimensional, 46-51
of functions from a set into a
field, 9, 109, 127
infinite-dimensional, 47
of infinitely differentiable func
tions, 130 137, 247, 523
isomorphism, 102-105, 123, 425
of linear transformations, 82, 103
of matrices, 9, 103, 331, 425
of n-tuples, 8
of polynomials, 10, 86, 109
quotient, 23, 58, 79, 109
scalar multiplication, 6
of sequences, 11, 109, 356, 369
subspace, 16 19, 50-51
zero, 15
zero vector of a, 12

•=
Index
Volume of a parallelepiped, 226
Wade, William R., 439
Well-conditioned system, 464
Wilkinson, J. H., 397
Wronskian, 232
Z2, 16, 42, 429, 553
Zero matrix, 8
601
Zero of a polynomial, 62, 134, 560,
564
Zero polynomial, 9
Zero subspace, 16
Zero transformation, 67
Zero vector, 12, 36-38
trivial representation, 36- 38
Zero vector space, 15

List of Symbols (continued)
Si+S2
span (5)
mz
T"1
Tt
T*
To
T*
Te
Tw
tr(A)
V*
v/w
W, + • • • + Wfc
EtiW,
Wj © w2
w, w*
INI
[xb
{x,y)
z2
z
0
the sum of sets S\ and S2 page 22
the span of the set S page 30
the orthogonal complement of the set S page 349
the matrix representation of T in basis (3 page 80
the matrix representation of T in bases (3 and 7 page 80
the inverse of the linear transformation T page 99
the pseudoinverse of the linear transformation T page 413
the adjoint of the linear operator T page 358
the zero transformation page 67
the transpose of the linear transformation T page 121
the rotation transformation by 6 page 66
the restriction of T to a subspace W page 314
the trace of the matrix A page 18
the dual space of the vector space V page 119
the quotient space of V modulo W page 23
the sum of subspaces W] through Wfc page 275
the sum of subspaces Wi through Wfc page 275
the direct sum of subspaces Wi and W2 page 22
the direct sum of subspaces Wi through Wfc page 275
the norm of the vector x page 333
the coordinate vector of x relative to 0 page 80
the inner product of x and y page 330
the field consisting of 0 and 1 page 553
the complex conjugate of z page 557
the zero vector page 7

Pearson
Education
Prentice Hall
Upper Saddle River, NJ 07458
www.prenhall.com

Linear-Algebra-Friedberg-Insel-Spence-4th-E.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Linear-Algebra-Friedberg-Insel-Spence-4th-E.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77