In Part 2 I’ve discussed different options as a scripting options for a command line environment to do machine learning data analysis. In the final part, I want to mention two areas where I see most need for improvement currently.
You need some minimal editing capabilities on the command line to be productive. The most well-known project seems to be jline. It is used by practically all scripting languages on their shell, for example, JRuby, Groovy, Scala. There exists an interface to readline from Java, but GNU readline is distributed under GPL license on purpose which is quite incompatible with less restrictive licenses like BSD.
However, in its current form, JLine is quite buggy. Most importantly,
it lacks the convenient “Search Backward in History” feature which I
use a lot to find lines in the history. I and many
have forked from JLine to clear the code base up and add features. For
example, I’ve added the search facility (works ok on Linux, try at
your own risk
Jason Dillon has cleared up the code base
Still, JLine is actually quite a hack. It uses the
stty command to
control the terminal, meaning that it integrates quite poorly with
changes of the terminal window size, or signals. On Windows, it has
the annoying bug that you cannot see the cursor as you move it around.
Some work would should be put into cleaning the code base, adding sensible terminal control and more features, but as it sort of works, nobody (including me, of course) feels the urge or has the time to really do something about this.
Concerning the plotting library, probably the most well-known is JFreeChart, but I’m not really satisfied with that library for a number of reasons: Although it is open source, you have to buy a book to get some decent documentation (javadocs are available, though). JFreeChart produces some nice plots, but I think they are closer to what you get in Excel than what matlab provides. JFreeChart also comes with its own classes for handling the data which means that you have to copy your data into those structures to display them. There are some more options, but none of them seems as feature rich as JFreeChart.
One other problem is that printing is more or less broken under Linux when you’re relying on CUPS. On my debian box, I invariably get a “No Printing Services found” error every time I try to print from any Java program. There are also some bugs which haven’t been fixed in years. The bottom line is that you cannot really rely on the built in printing capabilities of Java to generate plots for your paper - which is really a shame.
So in summary, there are two main missing features: A feature rich, stable readline replacement, and a flexible plotting solution which also prints.
I haven’t talked about this at all until now, but of course there are also already several machine learning toolboxes in Java or other JVM related languages. Of course, these projects are more or less ignorant of one another, yet, so more work would be require to write some common interfaces. Here is just a short list to get you started, also look at mloss.org
Don’t hesitate to post more links in the comments!
Posted by Mikio L. Braun at 2010-04-19 12:55:00 +0200