Solving Linear Systems with Q-less Decomposition

I ran into an interesting problem when studying algorithms that perform causal discovery. As these algorithms progress they need to perform linear regressions on many subsets of the data. One might regress x1 and x2 onto variable x3, calculate a score, then add an additional predictor and recalculate that score. Performing linear regressions amounts to solving a system of equations:

x=A∖b

where each column in A is a predictor, b is the outcome, and x is a vector of coefficients for the regression. There's some nuance to this. To perform a regression with an intercept term you need to append a column of 1's to the A matrix. Additionally, a number of algorithms are available to perform this matrix inverse, but many conventional softwares default to a method called QR decomposition. More interestingly, some software packages are clever and allow you to save this decomposition to be reused if b changes.

Causal algorithms will change b and A so it would appear that saving the QR decomposition would be pointless. However, A doesn't change "too much". We only ever takes subsets of A as the causal algorithm iterators over multiple predictors. If we compute the decomposition for the entire A matrix, can we modify this decomposition slightly to solve a subsystem? If you guessed yes you're correct, otherwise this would be a pretty boring blog-post.😉

Q-less QR Decomposition

QR decomposition can be used to solve linear regression problems due to the properties of the decomposed matrices. Q is orthogonal meaning it's inverse and transpose are equal, and R is an upper triangular matrix so forward/backward substitution can be utilized. If A is a square matrix, then the solution to Ax=b becomes:

x=R−1QâŠșb

You can see once we have the decomposition, then substituting different outcome variables (i.e., different b vectors) becomes trivial. However, as mentioned before, Q and R change whenever A changes, so how can we use the above equation? The answer comes in two parts: eliminating Q and Givens Rotations.

To eliminate Q from the solution we can make use of the Normal Equation form. Through some clever substitutions we can arrive at a solution for x that does not depend on Q.

Ax=bOriginal equationAâŠșAx=AâŠșbMultiply by AâŠș(QR)âŠșQRx=AâŠșbSubstitute A=QRRâŠșQâŠșQRx=AâŠșbRule of transposes (QR)âŠș=RâŠșQâŠșRâŠșRx=AâŠșbQ is orthogonal by definition, QâŠșQ=IR−1R−âŠșAâŠșb=xSolving for x

This can be solved efficiently in Julia using "in-place" functions that will not allocate any additional memory. Provided the variables A, its QR decomposition, b, and x are already defined:

using LinearAlgebra

mul!(x, A', b)
ldiv!(R', x)
ldiv!(R, x)

Note that we effectively have to solve two systems of equations, one for RâŠș and one for R. Fortunately, both triangular matrices and can be solved in O(n2) time.

Givens Rotations

When you take a subset of columns from A, you have to then remove those same columns from R. This will more than likely cause R to no longer be an upper triangular matrix, defeating the purpose of saving it in the first place. Our saving grace here is a technique called Givens Rotations. These rotations are represented by orthogonal matrices that rotate vectors in a given plane counterclockwise specified by a desired angle. These additional matrices can be "absorbed" into Q because they are orthogonal and, more importantly, they have the property of "zeroing out" any desired value in a matrix.

This might be best demonstrated with an example. Take the following matrix A and its QR decomposition:

A=[001010111]Q=[00−1010−100]R=[−1−1−101000−1]

You can check that these matrices satisfy all the required properties. Q should be an orthogonal matrix and R should be an upper triangular matrix. Now say we wanted to solve a linear system but remove the second column in A. Using Q-less decomposition, we know only R needs to be modified to remain an upper triangular matrix. Here Rsub only contains the 1st and 3rd columns of R.

Rsub=[−1−1000−1]

Rsub needs to triangular, however, this is not the case after removing the second column. To "zero-out" the last entry in Rsub we can apply the following Givens rotations:

G=[1000cos(ξ)sin(ξ)0−sin(ξ)cos(ξ)]=[10000−1010]

Here Ξ=−π2 but in practice we never need to explicitly calculate its value.

G∗Rsub=[−1−10100]

The product G∗Rsub reestablishes the upper-triangular requirement and, because G is orthogonal, the QR decomposition remains valid. What's great about this approach is that G never needs to be explicitly formed as we are just modifying individual entries in Rsub. What's more, we don't have to worry about what happens to Q because we have a solution that that only depends on R.

Putting it all together

The following Julia code show how one might use this in practice. First we determine what columns to keep, next we loop over Rsub to remove non-zero entries, and finally we solve the linear system.

#Loop through lower triangular matrix
#the 1st axis is reversed because we're moving non-zeros *up* above diagonal
lowTriIter(A::AbstractMatrix) = ((i,j) for i in reverse(axes(A,1)), j in axes(A,2) if i>j)

function qless!(x,A,b,R,Rsub,columnsKeep)

    numCol = length(columnsKeep)

    #Selecting the columns to keep
    for (k,j) in enumerate(columnsKeep)
        @views Rsub[:,k] = R[:,j]
    end

    #Use a givens rotation to zero out values below diagonal
    for (i,j) in lowTriIter(Rsub)        
        if Rsub[i,j] ≠ 0.0
            G,r = givens(Rsub,i-1,i,j)
            Rsub[i-1,j] = r
            Rsub[i,j] = 0.0

            #Givens impact columns to the right
            for k in j+1:numCol
                (r1, r2) = Rsub[i-1,k], Rsub[i,k]
                Rsub[i-1,k] = G.c*r1 + G.s*r2
                Rsub[i,k] = -G.s*r1 + G.c*r2
            end
        end
    end

    Rv = UpperTriangular(view(Rsub,1:numCol,1:numCol))

    #Solve R'R*b = X'y
    mul!(x,A',b)
    ldiv!(Rv',x)
    ldiv!(Rv,x)

    return nothing
end

Potential Numerical Instability

A way to measure how accurate a solution x is for the given problem uses a property called the condition number, defined as:

Îș(A)=∄A∄⋅∄A−1∄

Provided the error in the solution to Ax=b is small, the condition number for the linear regression will also remain small. They should scale linearly with each other. However, because we are using the normal equation form to solve our system, we effectively square our condition number:

Îș(AâŠșA)=Îș(A)2

This amplifies errors that occur due to machine precision. Where we might get 9 digits of accuracy before, now we only get 3. This is, of course, dependent on the matrix A so it's condition number should be checked before using Q-less decomposition. One rule-of-thumb is to check the resulting coefficients in x for values close to machine precision which would indicate a problem. In this case we might fall back on typical QR decomposition.