I ran into an interesting problem when studying algorithms that perform causal discovery. As these algorithms progress they need to perform linear regressions on many subsets of the data. One might regress and onto variable , calculate a score, then add an additional predictor and recalculate that score. Performing linear regressions amounts to solving a system of equations:
where each column in is a predictor, is the outcome, and is a vector of coefficients for the regression. There's some nuance to this. To perform a regression with an intercept term you need to append a column of 1's to the matrix. Additionally, a number of algorithms are available to perform this matrix inverse, but many conventional softwares default to a method called decomposition. More interestingly, some software packages are clever and allow you to save this decomposition to be reused if changes.
Causal algorithms will change and so it would appear that saving the decomposition would be pointless. However, doesn't change "too much". We only ever takes subsets of as the causal algorithm iterators over multiple predictors. If we compute the decomposition for the entire matrix, can we modify this decomposition slightly to solve a subsystem? If you guessed yes you're correct, otherwise this would be a pretty boring blog-post.đ
Q-less QR Decomposition
decomposition can be used to solve linear regression problems due to the properties of the decomposed matrices. is orthogonal meaning it's inverse and transpose are equal, and is an upper triangular matrix so forward/backward substitution can be utilized. If is a square matrix, then the solution to becomes:
You can see once we have the decomposition, then substituting different outcome variables (i.e., different vectors) becomes trivial. However, as mentioned before, and change whenever changes, so how can we use the above equation? The answer comes in two parts: eliminating and Givens Rotations.
To eliminate from the solution we can make use of the Normal Equation form. Through some clever substitutions we can arrive at a solution for that does not depend on .
This can be solved efficiently in Julia using "in-place" functions that will not allocate any additional memory. Provided the variables A, its QR decomposition, b, and x are already defined:
using LinearAlgebra
mul!(x, A', b)
ldiv!(R', x)
ldiv!(R, x)
Note that we effectively have to solve two systems of equations, one for and one for . Fortunately, both triangular matrices and can be solved in time.
Givens Rotations
When you take a subset of columns from , you have to then remove those same columns from . This will more than likely cause to no longer be an upper triangular matrix, defeating the purpose of saving it in the first place. Our saving grace here is a technique called Givens Rotations. These rotations are represented by orthogonal matrices that rotate vectors in a given plane counterclockwise specified by a desired angle. These additional matrices can be "absorbed" into because they are orthogonal and, more importantly, they have the property of "zeroing out" any desired value in a matrix.
This might be best demonstrated with an example. Take the following matrix and its QR decomposition:
You can check that these matrices satisfy all the required properties. should be an orthogonal matrix and should be an upper triangular matrix. Now say we wanted to solve a linear system but remove the second column in . Using Q-less decomposition, we know only needs to be modified to remain an upper triangular matrix. Here only contains the 1st and 3rd columns of .
needs to triangular, however, this is not the case after removing the second column. To "zero-out" the last entry in we can apply the following Givens rotations:
Here but in practice we never need to explicitly calculate its value.
The product reestablishes the upper-triangular requirement and, because is orthogonal, the decomposition remains valid. What's great about this approach is that never needs to be explicitly formed as we are just modifying individual entries in . What's more, we don't have to worry about what happens to because we have a solution that that only depends on .
Putting it all together
The following Julia code show how one might use this in practice. First we determine what columns to keep, next we loop over to remove non-zero entries, and finally we solve the linear system.
#Loop through lower triangular matrix#the 1st axis is reversed because we're moving non-zeros *up* above diagonal
lowTriIter(A::AbstractMatrix)=((i,j)for i in reverse(axes(A,1)), j in axes(A,2)if i>j)function qless!(x,A,b,R,Rsub,columnsKeep)
numCol = length(columnsKeep)#Selecting the columns to keepfor(k,j)in enumerate(columnsKeep)
@views Rsub[:,k]= R[:,j]end#Use a givens rotation to zero out values below diagonalfor(i,j)in lowTriIter(Rsub)if Rsub[i,j]â 0.0
G,r = givens(Rsub,i-1,i,j)
Rsub[i-1,j]= r
Rsub[i,j]=0.0#Givens impact columns to the rightfor k in j+1:numCol
(r1, r2)= Rsub[i-1,k], Rsub[i,k]
Rsub[i-1,k]= G.c*r1 + G.s*r2
Rsub[i,k]=-G.s*r1 + G.c*r2
endendend
Rv = UpperTriangular(view(Rsub,1:numCol,1:numCol))#Solve R'R*b = X'y
mul!(x,A',b)
ldiv!(Rv',x)
ldiv!(Rv,x)return nothing
end
Potential Numerical Instability
A way to measure how accurate a solution is for the given problem uses a property called the condition number, defined as:
Provided the error in the solution to is small, the condition number for the linear regression will also remain small. They should scale linearly with each other. However, because we are using the normal equation form to solve our system, we effectively square our condition number:
This amplifies errors that occur due to machine precision. Where we might get 9 digits of accuracy before, now we only get 3. This is, of course, dependent on the matrix A so it's condition number should be checked before using Q-less decomposition. One rule-of-thumb is to check the resulting coefficients in for values close to machine precision which would indicate a problem. In this case we might fall back on typical decomposition.