Let's take a closer look at the 3D dot product. From the algebraic formula $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$, it is easy to show that the dot product has four properties: (1) Positivity: $\mathbf x \cdot \mathbf x \ge 0$, with 0 only occurring when $\mathbf x = \mathbf 0 $. (2) Symmetry: $\mathbf x \cdot \mathbf y = \mathbf y \cdot \mathbf x$. (3) Homogeneity: $(c\mathbf x) \cdot \mathbf y = c(\mathbf x \cdot \mathbf y)$. (4) Additivity: $( \mathbf x+ \mathbf z) \cdot \mathbf y = \mathbf x\cdot \mathbf y + \mathbf z \cdot \mathbf y$. Even though we used coordinates to get these properties, they hold generally, independent of any coordinate system.
Surprisingly, these properties are sufficient to define "dot products" on other vector spaces, and obtain "geometries" for these spaces. Instead of "dot product", we will use the term "inner product." Let $V$ be a vector space and $\langle \cdot, \cdot\rangle$ a function from $V\times V$ into the real or complex numbers, depending on whether $V$ is real or complex i.e., the scalars are real or complex.
Definition. We say that $\langle \cdot, \cdot\rangle$ is an inner product on $V$ if the following properties hold:
Theorem (Schwarz's inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big|\langle u,v\rangle\big| \le \|u\|\|v\|. \] Proof. If either $u$ or $v$ is $\mathbf 0$, the inequality is trivially true. Thus, we may suppose neither $u$ nor $v$ is $\mathbf 0$. We will suppose that $V$ is complex. The real case is easier. Let $t,\alpha\in \mathbb R$. From the four properties of the inner product, we can show that \[ 0\le \langle u+te^{i\alpha} v, u+ te^{i\alpha} v\rangle = \|u\|^2+ t\,\overline{e^{-i\alpha} \langle u,v\rangle} + te^{-i\alpha} \langle u,v\rangle + t^2\|v\|^2. \] Using the polar form of a complex number, we can write $\langle u,v\rangle = \big|\langle u,v\rangle\big|e^{i\theta}$. Choose $\alpha = \theta$. Then the previous inequality becomes \[ p(t):=\|u\|^2+2t\big|\langle u,v\rangle\big| + t^2\|v\|^2\ge 0. \] Because $p(t)$ doesn't go negative, it either has two complex roots or a double real root. In either case, the discriminant for $p$ satisfies $4\big|\langle u,v\rangle\big|^2 - 4\|u\|^2\|v\|^2\le 0$. Schwarz's inequality follows immediately from this. $\square$
Corollary. Equality holds in Schwarz's inequality if and only if $\{u,v\}$ is linearly dependent.
Proof. Exercise.
Theorem (The triangle inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big| \|u\|-\|v\|\big| \le \|u+v\|\le \|u\|+\|v\|. \] Proof. Recall that in the proof of Schwarz's inequality, we used \[ \|u+te^{-i\alpha} v\|^2=\|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \] If we set $t=1$, $\alpha =0$, use the identity $z+\bar z=2\text{Re}(z)$, the inequality $\text{Re}(z)\le |z|$ and Schwarz's inequality, we get \[ \|u+v\|^2 \le \|u\|^2 + 2\big|\langle u,v\rangle\big|+\|v\|^2\le \big( \|u\|+\|v\| \big)^2. \] Taking square roots gives the right side of the triangle inequality. Similarly, we have \[ \|u+v\|^2 \ge \|u\|^2 - 2\big|\langle u,v\rangle\big|+\|v\|^2\ge \big( \|u\|-\|v\| \big)^2. \] Again taking square roots yields the left side. $\square$
The triangle inequality is one of the essential properties of length. The sum of the lengths of two sides of a (nondegenaerate) triangle is greater than length of the third side. In a real vector space, we can also define the angle between two vectors. Suppose that neither $u$ nor $v$ is $\mathbf 0$. Schwarz's inequality implies that we have \[ -1 \le \frac{\langle u,v\rangle}{\|u\|\|v\|}\le 1. \] Thus we may define the angle between $u$ and $v$ to be \[ \theta(u,v) = \arccos\bigg(\frac{\langle u,v\rangle}{\|u\|\|v\|}\bigg). \]
When $\langle u,v\rangle = 0$, the vectors are orthogonal (perpendicular) to each other. We will say this is true even if one or both or $u$ and $v$ are $\mathbf 0$. In the complex case, the concept of angle between two vectors is not so important, except when $\langle u,v\rangle = 0$. When this happens we will also say that $u$ and $v$ are orthogonal.
Standard inner products. Here is a list of a few standard inner product spaces.
Solution. Symmetry, homogeneity and additivity are all simple consequences of the properties of the integral. Thus, we only need to show positivity. The definition of the Riemann integral implies that $\langle f,f\rangle = \int_0^1 f^2(x)dx\ge 0$, so what remains is showing that the only function $f\in C[0,1]$ for which $\langle f,f\rangle=0$ is $f\equiv 0$.
Suppose this is false. Then there is an $f\in C[0,1]$ such that $\int_0^1 f^2(x)dx=0$ and there is also an $x_0\in [0,1]$ for which $f(x_0) \ne 0$. Let $F=f^2$. Products of continuous functions are continuous, so $F\in C[0,1]$. Also, $F(x_0) = (f(x_0))^2>0$. Using a continuity argument, one can show that there is a closed interval $[a,b]\subseteq [0,1]$ that contains $x_0$ and on which $F(x)\ge \frac12 F(x_0)$. (Exercise.) Consequently, \[ \int_0^1 f^2(x)dx=\int_0^1 F(x)dx \ge \int_a^b F(x)dx \ge \frac12 F(x_0)(b-a)>0, \] which contradicts the assumption that $\int_0^1 f^2(x)dx=0$. Consequently, positivity holds. $\square$
Orthogonality
We will begin with a few definitions. In an inner product space $V$, we say that a (possibly infinite) set of vectors $S= \{v_1,\ldots,v_n, \ldots\}$ is orthogonal if and only if (i) none of the vectors are $\mathbf 0$ and (ii) $\langle v_j,v_k\rangle =0$ for all $j\ne k$. Part (i) excludes $\mathbf 0$ from the set. We do this to avoid having to write the phrase "orthogonal set of nonzero vectors." However, be aware that some authors do allow for including $\mathbf 0$. It is easy to see that an orthogonal set of vectors is linearly independent. We will frequently use normalized sets of orthogonal vectors. An orthogonal set is termed orthonormal (o.n.) if all of the vectors in it have unit length; that is, $\langle v_j,v_k\rangle =\delta_{j,k}$. We say that two subspaces of $V$, $U$ and $W$, are orthogonal if and only if all of the vectors in $U$ are orthogonal to all of the vectors in $W$. When this happens we write $U\perp W$. Finally, we define the orthogonal complement of $U$ in $V$ to be $U^\perp := \{v\in V\colon \langle v,u\rangle = 0 \ \forall\ u\in U\}$.
Minimization problems. A common way of fitting data, either discrete or continuous, is least-squares minimization. The familiar straight line fit to a set of data is a good example of this technique and we will discuss it briefly. Suppose that we have collected data $\{y_j\in \mathbb R,\ j=1,\ldots, n\}$ at times $\{t_1,\ldots,t_n\}$. To get a good straight line $y(t)=a+bt$ that fits the data, we choose the intecept and slope to to minimize the sum of the squares of $y_j-y(t_j)= y_j - a -bt_j$. Specifically, we will minimize over all $a$, $b$ the quantity $D^2 = \sum_{j=1}^n(y_j- a -bt_j)^2$. We can put this problem in terms of $\mathbb R^n$. Let $\mathbf y= [y_1\ y_2 \ \cdots \ y_n]^T$, $\mathbf 1= [1\ 1\ \cdots \ 1]^T$, and $\mathbf = [t_1\ t_2 \ \cdots \ t_n]^T$. In this notation, we have $D^2 = \|\mathbf y - a\mathbf 1 - b \mathbf t\|^2$. Next, let $U = \text{span}\{\mathbf 1,\mathbf t \}=\{a\mathbf 1 + b\mathbf t,\ \forall \ a,b\in \mathbb R\}$. Using this notation, we can thus recast the problem in its final form: Find $\mathbf p\in U$ such that \[ \min_{a,b} D = \|\mathbf y - \mathbf p\|=\min_{\mathbf u\in U}\|\mathbf{y}-\mathbf u\|. \] As you have shown in exercises 1.3 and 1.4, this problem has a unique solution that can be found from the normal equations, which in matrix form are \[ \left(\begin{array}{c} a\\ b\end{array}\right) = \left(\begin{array}{cc} \mathbf 1^T\mathbf 1 & \mathbf 1^T \mathbf t\\ \mathbf t^T\mathbf 1 & \mathbf t^T \mathbf t \end{array}\right)^{-1}\mathbf y. \] Stated another way, the solution is given by $\mathbf p = P\mathbf y$, where $P$ is the orhogonal projection of $\mathbf y$ onto $U$.
The general "least squares" minimization problem is this: Given an inner product space $V$, a vector $v\in V$, and a subspace $U\subset V$, find $p\in U$ such that $\|v - p\|=\min_{u\in U}\|v - u\|$. By the exercises 1.3 and 1.4, a solution $p$ exists for every $v\in V$ if and only if there is a vector $p\in U$ such that $v-p\in U^\perp$. When this happens $p$ is unique and $p=Pv$, where as before $P$ is the orthogonal projection of $v$ onto $u$. In particular, when $U$ is finite dimensional, this is always true. Furthermore, if $B=\{u_1,\ldots,u_n\}$ is an orthonormal (o.n) basis for $U$, then by exercise 1.4(c), the formula for $Pv$ is especially simple: \[ Pv = \sum_{j=1}^n \langle v,u_j\rangle u_j. \] Gram-Schmidt process. Unlike the situation for fitting data, we don't need to invert the matrix $G$. This raises two questions: Does an o.n. basis exist for an inner product space and, if so, how can it be found?
We will deal with a finite dimensional space having a basis $B=\{v_1,\ldots,v_n\}$. Our aim will be to produce an orthogonal basis for the space. This can be converted to an o.n. basis by simply dividing each vector by its length. To begin, define the spaces $U_k=\text{span}\{v_1,\ldots,v_k\}$, $k=1,\ldots,n$. Let $w_1=v_1$. Next, let $w_2=v_2 - P_1v_2=v_2 - \frac{\langle v_2,w_1\rangle}{\|w_1\|^2}w_1$, where $P_1$ is the orthogonal projection onto $U_1$. An easy computaion shows that $w_2$ is orthogonal to $w_1$ and, consequently, $\{w_1,w_2\}$ is an orthogonal basis for $U_2$. Similarly, we let $w_3 = v_2 - P_2v_3 = v_3 - \frac{\langle v_3,w_1\rangle}{\|w_1\|^2}w_1 -\frac{\langle v_3,w_2\rangle}{\|w_2\|^2}w_2$. As before $w_3$ is orthogonal to $w_1,w_2$ and $\{w_1,w_2, w_3\}$ is an orthogonal basis for $U_3$. We can continue in this way. Let $w_k = v_k - P_{k-1}v_k$. It follows that $w_k$ is orthogonal to $U_{k-1}$, so $U_k$ has the orthogonal basis $\{w_1,\ldots, w_k\}$. Eventually, we obtain an orthogonal basis for $V$, $\{w_1,\ldots,w_n\}$.