jblas: 1.1 Released and Some Examples

Monday, August 16, 2010

I have just released jblas 1.1 (release notes). The main addition are Singular Value Decomposition (both “full” and “sparse”), and fixing a nasty bug with complex return values. Basically, the way complex return values are treated by g77 and gfortran are drastically different leading to some very odd bugs resulting from a confused stack frame.

Unfortunately, I also had to remove support for 32-bit Mac OS X in the precompiled jar file. The reason is that I lost access to the 32-bit machine I originally used to compile ATLAS for Mac, and any other machine I got my hands on is apparently running with 64-bits. Also, I didn’t quite get the 32-bit version of gcc working. But in case you have a 32-bit machine and need jblas, please contact me and I’ll help you compile the beast.

Examples

To make up for that, here are two examples of how to use jblas. These are from my poster at this years MLOSS workshop at ICML.

You can download the slides here.

I’ve also put the source up on github. Note that these sources are not meant to be a good example for coding style or project layout, but just an example of how to use jblas!

Please take a few seconds for jsMath do its magic… .

Kernel Ridge Regression

The first example is taken from machine learning. Let’s implement the following: Noisy sinc data set learned with Kernel Ridge Regression (KRR) with a Gaussian kernel.

The data set

First, we define the sinc function, which is usually defined as follows (up to scaling of the $x$): \(\text{sinc}(x) = \sin(x) / x \text{ if } x \neq 0 \text{ and } 1 \text{ if } x = 0.\)

The first try to code this in Java using jblas is to just focus on the part where $x \neq 0$:

  DoubleMatrix sinc(DoubleMatrix x) {
    return sin(x).div(x);
  }

Note how sin and div “vectorize” over a number of vectors given as a matrix.

Now let’s add the $x=0$ case. Since the code like this:

  DoubleMatrix safeSinc(DoubleMatrix x) {
    DoubleMatrix xIsZero = x.eq(0);
    return sin(x).div(x.add(xIsZero)).add(xIsZero);
  }

Note how we first “patch” for the case where x is zero, and then again add those points to get the required output.

Next, we draw some data from the sinc function and add noise. Our data model looks like this: \(X \sim \text{uniformly from -4, 4}\) \(Y = \text{sinc}(x) + \sigma_\varepsilon^2 \varepsilon\) \(\varepsilon \sim \mathcal{N}(0, 1).\)

The following function generates a data set by returning an array of two DoubleMatrices representing $X$ and $Y$.

  DoubleMatrix[] sincDataset(int n, double noise) {
    DoubleMatrix X = rand(n).mul(8).sub(4);
    DoubleMatrix Y = sinc(X) .add( randn(n).mul(noise) );
    return new DoubleMatrix[] {X, Y};
  }

The Gaussian kernel

For KRR, we need to compute the whole kernel matrix. The kernel function is defined as \(k(x,z) = \exp\left( - \frac{||x - z||^2}{w} \right)\)

You can easily compute the kernel matrix using Geometry.pairwiseSquaredDistances().

  DoubleMatrix gaussianKernel(double w, 
                              DoubleMatrix X, 
                              DoubleMatrix Z) {
    DoubleMatrix d = 
      Geometry.pairwiseSquaredDistances(X.transpose(), 
                                        Z.transpose());
    return exp(d.div(w).neg());
  }

Kernel Ridge Regression

KRR learns a “normal” kernel model of the form \(f(x) = \sum_{i=1}^n k(x, X_i)\alpha_i\) with \(\alpha = (K + \lambda I)^{-1} Y,\) where $K$ is the kernel matrix \(K_{ij} = k(X_i, X_j).\)

With jblas, you would compute the $\alpha$ as follows:

  DoubleMatrix learnKRR(DoubleMatrix X, DoubleMatrix Y,
                      double w, double lambda) {
    int n = X.rows;
    DoubleMatrix K = gaussianKernel(w, X, X);
    K.addi(eye(n).muli(lambda));
    DoubleMatrix alpha = Solve.solveSymmetric(K, Y);
    return alpha;
  }

  DoubleMatrix predictKRR(DoubleMatrix XE, DoubleMatrix X, 
                          double w, DoubleMatrix alpha) {
    DoubleMatrix K = gaussianKernel(w, XE, X);
    return K.mmul(alpha);
  }

The function predictKRR computes predictions on new points stored as rows of the matrix XE.

Computing the mean-squared error

Finally, in order to compute the error of our fit, we would like to compute the mean-squared error. \(\frac 1n\sum_{i=1}^n (Y_i - Y'_i)^2.\)

In jblas:

  double mse(DoubleMatrix Y1, DoubleMatrix Y2) {
    DoubleMatrix diff = Y1.sub(Y2);
    return pow(diff, 2).mean();
  }

Conjugate Gradients

As a second example, let’s implement the conjugate gradients algorithm. This is an iterative algorithm for solving linear equations of the form $Ax = b$, where $A$ is symmetric and positive definite.

The pseudo-code looks as follows:

  1. $r \gets b - Ax$
  2. $p \gets r$
  3. repeat
  4.   $\alpha \gets \frac{r^Tr}{p^T A p}$
  5.   $x \gets x + \alpha p$
  6.   $r’ \gets r - \alpha A p$
  7.   if $r’$ is sufficiently small, exit loop
  8.   $\beta \gets \frac{ {r’ }^T r’}{r^Tr}$
  9.   $p \gets r + \beta p$
  10.   $r \gets r’$
  11. end repeat

In jblas, the algorithm looks as follows (numbers in comments indicate steps in the algorithm above.)

  DoubleMatrix cg(DoubleMatrix A, DoubleMatrix b, 
                  DoubleMatrix x, double thresh) {
    int n = x.length;
    DoubleMatrix r = b.sub(A.mmul(x)); // 1
    DoubleMatrix p = r.dup();          // 2
    double alpha = 0, beta = 0;
    DoubleMatrix r2 = zeros(n), Ap = zeros(n);
    while (true) {                     // 3
      A.mmuli(p, Ap);
      alpha = r.dot(r) / p.dot(Ap);  // 4
      x.addi(p.mul(alpha));          // 5
      r.subi(Ap.mul(alpha), r2);     // 6
      double error = r2.norm2();     // 7
      System.out.printf("Residual error = %f\n", error);
      if (error < thresh)
        break;
      beta = r2.dot(r2) / r.dot(r);  // 8
      r2.addi(p.mul(beta), p);       // 9
      DoubleMatrix temp = r;         // 10
      r = r2;
      r2 = temp;
    }
    return x;
  }

For better speed, I have tried to reduce the number of matrix allocations and used the in-place variants of the arithmetic operations where possible. These kinds of tweaks can often give some performance boost. For example, Ap is used to hold the result of $Ap$, and so on.

As a side-effect the assigment $r \gets r’$ actually becomes a swap between the matrices stored in r and r2.

Posted by Mikio L. Braun at 2010-08-16 16:55:00 +0000

blog comments powered by Disqus