Multiplying Vectors III: Going Beyond

(An archive question of the week)

We’ve looked at the scalar (dot) product and the vector (cross) product; but there is one answer in the Ask Dr. Math archives that was too long to fit in either post. Here we’ll see again where the two familiar products come from, while looking deeper into the math behind and beyond them.

What justifies the definitions?

Here is the question, from 2012:

Vector Products and Possibilities

I have been studying maths for a long time, but one thing keeps puzzling me: how do we get to define vector dot products and, more importantly, cross products?

When solving problems, you hear, "Now we take the vector cross product ..." or, "We take the dot product ..." I have always failed to see how they get to this step without knowing the answer in advance. What are the justifications for these definitions?

I do realize that in physics there are certain phenomena that depict these definitions, but for me that cannot be satisfactory motivation for them.

Doctor Fenton provided the information we’ve already seen on the dot and cross products. But Cyrus felt that his question was deeper than that, and wrote back:

Thank you very much for your reply; however, it does not address my problem.

I do know how to derive the cross product and dot product. It is about THE VERY DEFINITION of these two -- the justification of these definitions -- wherein lies my problem. How did we get the notion of defining the vector cross product as it is defined? Is it purely because of observation in natural phenomena, or what?

In all the textbooks I have looked at, they start by defining the vector product and the cross product, and go on from there to work them out BASED on these definitions. 

I do not see where they get these definitions. 

Thank you for your consideration.

I think his question was really answered in what Doctor Fenton said, as in my last two posts on vectors, but it takes subtlety to fully understand how the derivations justify the definitions.

Doctor Fenton tried to clarify the question:

I'm not sure what you are asking. 

Is the idea of finding a vector perpendicular to two given vectors not sufficient motivation? There are certainly many applications where orthogonal directions are particularly convenient for computation, to say nothing of physical phenomena, such as the motion of a charged particle in a magnetic field where a force acts at right angles to the field and the velocity vector.

If you read the references on the dot product, the basic problem is to find the angle between two vectors, and the formula for the dot product drops out of the computation. Similarly, if you need to determine a direction orthogonal to a given plane (or two non-collinear vectors), the cross-product drops out.

We make definitions because they are useful, and it is useful to be able to compute the angle between vectors, and to find a vector perpendicular to a given pair of vectors or a plane.

What else are you looking for?

Entities that arise in the midst of attempts to solve useful problems deserve names, and that is what is happening here. Cyrus tried again:

Thank you for your kind answer. I do appreciate it. But it seems that I fail to make my problem clear.

It is clear to me that from the dot product, you get the angle between the two vectors; and from cross product, you get the vector which is perpendicular to the plane of the two vectors.

My question is: are these definitions simply based on the observation of physical phenomena or on something else? Could one have defined them differently?

Other possible products

After yet another attempt that was even less successful, Doctor Fenton asked other Math Doctors to join in, and two days later, Doctor Jacques offered a long essay on Cyrus’s last question.

I think your question is, "Are there other possible definitions for vector products?"

The answer to that question is a big YES. I will show you below how to define other kinds of vector products. Not all of them are interesting; but some of them are used in many applications, although less often than the dot and cross products.

The reason why the dot and cross products are encountered so frequently is probably linked to their use in physics. There are also deeper, purely mathematical, reasons, which I will discuss.

I will not enter into too many details, nor give proofs of all the claims I will make; I will rather focus on the reasons behind the concepts.

Incidentally, the concepts of vectors we have now arose from some of the bigger ideas to be discussed, so in some sense we will be seeing the true historical background of the dot and cross products – which are more complicated than we usually want to get into!

What is a product?

The discussion here is based on the concept of vector spaces, which is taught in Linear Algebra. I won’t try to define them here, but basically they are just sets of vectors, such as all the vectors in a plane, or all the vectors in 3D space.

In all generality, a vector product is an operation that takes two vectors u and v as operands (arguments) and produces a vector w. We can see that as a function:

   f : U x V -> W              [1]

Here, u is in U, v is in V, w is in W, and w = f(u, v). U, V, and W are vector spaces (possibly different, but over the same scalar field).

Unless explicitly specified otherwise, we will restrict ourselves to a particular case:

   * U = V

   * The base field is R (the real numbers).

   * The vector spaces have finite dimension.
     We will let n = dim U = dim V, and m = dim W.

For example, in the dot product, n can be anything, and m = 1 (i.e., the dot product is a scalar, which is essentially the same thing as a vector space of dimension 1). In the cross product, we have n = m = 3 (the reason of the explicit 3 will be given below).

So we’re looking for ways to define a product of two vectors in the same space that produces a vector. Note that last paragraph: The real numbers themselves can be thought of as vectors (they have magnitude and direction!), so this does not exclude the dot product, whose result is a real number!

If we consider arbitrary functions f, there is not very much to say. To have a more interesting structure, we add some conditions that express that the product operation is "compatible" with the vector space structure. Specifically, we require the function f to be bilinear, i.e., for all vectors a, b, c in U and for all scalars k in R, we must have:

   (1) f(a + b, c) = f(a, c) + f(b, c)

   (2) f(a, b + c) = f(a, b) + f(a, c)

   (3) f(ka, b) = f(a, kb) = kf(a, b)

That is, we wouldn’t call something a product if it didn’t follow these rules, which amount to the familiar distributive and associative properties that are true of ordinary multiplication:

Note that we can also write the product in operator notation. For example, we can write f(a, b) = a # b, where '#' is a symbol that describes the specific product (we write a.b for the dot product and a x b for the cross product). Using that notation, the bilinearity axioms above can be written as:

   (1') (a + b) # c = (a # c) + (b # c)

   (2') a # (b + c) = (a # b) + (a # c)

   (3') (ka) # b = a # (kb) = k(a # b)

(1') and (2') are very similar to the distributive law. (3') is vaguely similar to the associative law, although it is not the same thing, because k, a, and b do not belong to the same set: k is a scalar, and a and b are vectors (and a # b can be a vector of another dimension). This property is sometimes called "mixed associativity." This shows that requiring the axioms (1) - (3) is not unreasonable.

That’s a mathematician’s reason for setting those rules for a “product”; they are similar to ideas I discussed in a different context in What is Multiplication … Really?.

In addition, (bi-)linearity is very important in physics. Many physical laws are linear; this is somehow related to the fact that space and time "look the same" everywhere.

From now on, a vector product will mean a function like [1] that meets all the requirements above.

“Scalar” products

Now we can try inventing some products, starting in the simplest case:

The case m = 1
-----
Let us consider the case where m = 1, i.e., when f(u, v) is a scalar.

Such a vector product is called a bilinear form. (The expression "scalar product" would also be appropriate, but it is often used with a more restricted meaning.)

Recall that m is the dimension of the output, so we are multiplying two vectors and getting a one-dimensional “vector”, a scalar.

The simplest possible product we can define is the trivial product:  f(u, v) = 0 for all u and v. It is easy to see that this satisfies all the requirements, but I think you will agree that it is not very interesting.

This just squashes everything down to zero, which isn’t useful.

Another simple bilinear form is the elementary product given by:

   f_{ij}(u, v) = u[i] * v[j]

Here, i and j are *fixed* integers in the range {1, ..., n}, and u[i] is the i-th component of the vector u. This is as simple as a non-trivial bilinear form can get. However, such a product depends heavily on the choice of a coordinate system.

This means that, for example, we might define $\langle a,b,c\rangle\#\langle d,e,f\rangle = b\cdot f$, always multiplying the second component of the first by the third component of the other. Ignoring all but one component of each factor isn’t very useful. (But there may be times when that’s exactly what you want!)

We can next combine elementary products to get something more interesting. 

Let us define:

   f(u, v) = SUM (a[i, j]*u[i]*v[j])

Here, i and j range from 1 to n, and the a[i, j] are a set of fixed real numbers. It is easy to check that this defines a bilinear form. If we write A for the matrix {a[i, j]}, and if we represent u and v as column vectors, we can write the above product as:

   f(u, v) = u' A v

Here, x' is the transpose of the vector u (i.e., u' is a row vector).

Now, the important point is that any bilinear form can be written in this way: given a bilinear form f, you simply define a[i, j] as f(e_i, e_j), where the {e_i} are basis vectors.

The trivial product corresponds to the matrix A = 0. The elementary product f_{ij} corresponds to a matrix that has 1 in position [i, j] and 0 everywhere else.

As a simple example, in two dimensions we might (arbitrarily) define $$\mathbf{A} = \begin{bmatrix}1 & 2\\ 0 & 3\end{bmatrix}$$ so $$\mathbf{u}\#\mathbf{v} = \begin{pmatrix}u_1 & u_2\end{pmatrix}\begin{bmatrix}1 & 2\\ 0 & 3\end{bmatrix}
\begin{pmatrix}v_1\\v_2\end{pmatrix} = u_1v_1+2u_1v_1+3u_2v_2$$

None of the specific products we’ve tried so far are interesting. But …

The dot product

Another simple case is when A is the identity matrix. In that case, we have:

   f(u, v) = SUM (u[i]*v[i])

You will recognize this as the dot product: f(u, v) = u.v. This product is also very simple, and it has more symmetry than the previous ones: it is invariant under many coordinate transformations, including a permutation of the basis vectors. This is one of the reasons why it occurs so frequently.

We’re defining it here as: $$\mathbf{u}\cdot\mathbf{v} = \begin{pmatrix}u_1 & u_2\end{pmatrix}\begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix}
\begin{pmatrix}v_1\\v_2\end{pmatrix} = u_1v_1+u_2v_2$$

There’s a lot more to be said for the dot product:

Bilinear forms can have some additional properties.

A bilinear form f is symmetric if f(u, v) = f(v, u) for all vectors u and v. This will be the case if the matrix A is symmetric; in particular, the dot product is symmetric.

A bilinear form f is said to be positive-definite if f(u, u) > 0 for all non-zero vectors u (the axioms imply that f(0, 0) = 0). Symmetric bilinear forms are interesting because they allow us to define the length of a vector: we define the length |u| of the vector u as sqrt(f(u, u)), which is legitimate if f(u, u) >= 0. The square root is important, because it ensures that multiplying a vector by a real k multiplies its length by |k|. This allows us to define a geometric structure on the vector space.

The dot product is positive-definite, because u.u = SUM (u[i]^2). A very important point is that some kind of converse is true: any real symmetric and positive-definite bilinear form is equivalent to the dot product in a suitable coordinate system. This universality property is probably one of the main reasons for the importance of the dot product.

The dot product has also many physical interpretations; for example, if a force F produces a displacement u, the work done is the dot product F.u.

There are other interesting bilinear forms that are not positive-definite. For example, in special relativity, one deals with 4-dimensional vectors (t, x, y, z), and the form defined by this plays a special role:

  f((t1, x1, y1, z1), (t2, x2, y2, z2)) = c^2*t1*t2 - x1*x2 - y1*y2 - z1*z2

This form is not positive-definite, because, for example, if u = (0, 1, 0, 0), then f(u, u) = -1.

This is part of the weirdness of relativity!

The tensor product

Before talking about the cross product, we will first describe two other, more general, vector products.

The tensor product
------------------
In the case of m = 1, we have seen that any vector product can be constructed as a linear combination of the elementary products f_{ij}. In the general case, we can use these elementary products in another way: we can consider all the products u[i]*v[j] as separate coordinates of a vector in another space W. As there are n^2 such products, the dimension of W will be m = n^2.

This product is called the tensor product, and we write f(u, v) = u ⊗ v. A basis of W is the set {e_i ⊗ e_j}, where the e_k are basis vectors of U.

This product is, in some sense, the most general product that can be defined on U. Any such product f: U x U -> E, where E is a real vector space, can be defined by a normal linear map:

   g : W (= U ⊗ U) -> E

The map g can be described by a matrix G of dimension m by n^2, where m = dim E.

This universality makes the tensor product (and other generalizations) omnipresent in many fields of mathematics and physics.

We can get other universal constructions by imposing some additional conditions, like symmetry or antisymmetry, on the tensor product.

There is, of course, a lot more that could be said!

The wedge product

The wedge product
-----------------
We turn now to antisymmetric vector products, i.e., products where f(u, v) = -f(v, u). Note that this implies that f(u, u) = 0.

Like the tensor product, we can define a "most general" antisymmetric vector product by taking these elements as coordinates of a vector in a vector space W:

   w[i,j] = u[i]*v[j] - u[j]*v[i]

This product is called the wedge product, and written u ∧ v. As w[i, i] = 0 and w[i, j] = -w[j, i], there are n(n - 1)/2 independent coordinates, and we can take W as a vector space of dimension n(n - 1)/2. Like the tensor product, this product is "universal" in the sense that any antisymmetric vector product can be obtained from a linear map defined on the wedge product.

The cross product

In the particular case n = 3, we have n(n - 1)/2 = 3, and W has the same         dimension as U. This suggests that we may try to associate the product u ∧ v with a vector of U, as they have the same dimension.

It turns out that, if we impose some additional conditions (like the behavior under changes of coordinates), there are essentially two ways to do this, corresponding to two possible orientations of the space. The usual cross product is simply a representation of the universal wedge product with a particular choice of orientation (a "right-hand rule"). The fact that this is equivalent to a universal construction is one of the main reasons for its importance.

Note that this only works with n = 3, because only in that case have we n = n(n - 1)/2.

So the cross product only exists for 3D vectors, effectively because the number of components is the same as the number of pairs of coordinates.

Algebras

The case m = n
--------------
A vector space U with a (bilinear) vector product and range of U, itself, is called an algebra. In that case, the vector product is an internal operation. Depending on that product's additional properties, you can have different types of algebras -- for example, commutative or associative.

As we have seen, if n = 3, the cross product defines an algebra on R^3; this algebra is anti-commutative, and is not associative.

This, of course, is not what we mean by “algebra” in high school!

Conclusion
----------
To summarize, it is possible to define many things that can be called vector products. The fact that the examples above are more frequently encountered stems from the fact that these products are, in some sense, the most general products that can be defined under some reasonable conditions. This comes also from their importance in physics, although the two reasons are probably related.

I hope that this sheds some light on the subject; please feel free to write back if you want to discuss this further.

Cyrus responded,

Thank you very much for your kind and detailed answer. I really did appreciate it. Not being a mathematician but rather an engineer, I may not have followed all the arguments presented. However, you resolved a very important, nagging curiosity that I had, and that was that, certainly, other definitions of vector product can and do exist, but more popularly we take these two definitions simply because they fit the natural physical laws that we observe. In effect, these definitions are merely the result of physical observations of nature.

I would rather say these definitions are used because they have been found useful, both abstractly and practically.