What’s with all the anti-intellectualism in the “Java-killers”?

April 16, 2012

I’m guessing because of their need to dethrone Scala as the Java-killer, the copy for this latest round of mainstream language wannabes – Ceylon, Kotlin, Fantom, etc. – has included an absurd amount of anti-intellectualism.

Since Scala is designed by a famous researcher (nevermind the fact that Odersky was also responsible for Java generics and the freakin’ Java compiler!), the goto criticism by those pushing languages that are still in their infancy is that Scala is too “academic”, “ivory-tower”, etc. Nevermind the fact that this isn’t true, it belies a disturbing unwillingness to learn. This is counter-intuitive in that we are talking about software developers here. These are supposed to be tech-savvy, curious, early-adopters.

The rhetoric in some of the press-releases/blog-posts about these new languages is absurd. I’m going to pick on Fantom because I just was reading an article about it when this outrageous paragraph prompted me to post:

Fantom is a language developed for software engineers by software engineers. We didn’t create Fantom as an academic experiment to publish research papers. We take pride in the fact that Fantom is a “boring language” — a workhorse, everyday language to build large software systems. Fantom is easy to learn and bundled with all the tools a typical project needs…

I mean, damn, throw in a couple references to “cheesy grits” and this language could be running for president! Fantom is a good, honest hardworking language. Not one of those uppity, fancy-pants languages.

I agree there is something to the idea that simple tasks require simple solutions, and that you can’t expect experienced programmers to throw out everything they know when trying to learn a new language for production (spare-time languages are another story entirely). That’s why I like the functional/OO hybrid provided by Scala – it allows for a continuum of use. I similarly get annoyed when someone asks a simple question on, say, the Scala mailing list, and is met with many references to comonads (though this represents a tiny minority of mailing list activity – nearly everyone there is awesomely helpful).

However, the idea is to provide a language that gives developers the tools to write better software, along with a smooth “upgrade” path – not to abandon all these unfamiliar (read: statically typed and functional) features entirely in the name of familiarity. “Easy to learn” is just another way to say “business as usual.” And having written a fairly gimungous amount of “enterprise software”, I can say that “business as usual” definitely does not cut it. Maybe I’m just doing it wrong, but the amount of time spent wrestling with the paradigm when writing OO software is just absurd – at least half of the patterns in the GOF book have a simpler name: “higher order function.”

In conclusion: I applaud the Scalas and C#’s of the world, whose enthusiastic embrace of the functional paradigm has done more for my understanding of how to write good software than a thousand pattern books to the face.

PS. I don’t know if anyone else has followed the series of Ceylon blog posts by Gavin King, but I get the sense that even though he started in the anti-intellectual camp, the process of designing the language has slowly been pushing him over into serious language-geek territory. The language that started out by critcizing the “ivory tower” syntax for function calls in languages like Haskell and ML is now probably going to include such arcana as intersection types. I love it! Additionally, I used to think the language seemed dumb but now I think the combination of untagged union types, intersection types, and variadic generics could lead to a really interesting language. I’m excited to see how it turns out and I recommend those blog posts for anyone interested in a smart guy’s exploration of the insanity of F-sub.


Non-empty lists

March 10, 2012

I recently watched a video about the library scalaz, which among other things discussed a “non-empty list” data structure provided by the library. I enjoyed very much when someone in the library asked “isn’t that just Cons?”… This is an awesome implication of the fact that in Scala, data constructors are actually full-blown types, something that functional programming guys – like myself – can easily forget.

Of course, scalaz’s NonEmptyList is a bit more than just Cons (or :: or whatever it’s called in Scala’s standard library), because it implements standard collection functions like map, flatmap, etc. such that they preserve the NonEmptyList result type.

Anyhow, this video, and my subsequent investigation of scalaz’s NonEmptyList, got me thinking about this data type. List is such a fundamental data structure – lists and folds on lists (remember, a data type is completely defined by its catamorphism) basically embody the concept of induction. So, it seemed like NonEmptyList should have some sort of similarly deep meaning, and a similarly pithy definition.

I was sort of disappointed to find that the scalaz definition of NonEmptyList actually appeals to the definition of List directly, looking something like this (scalaz doesn’t use a case class, but this gets the idea across):

case class NonEmptyList[+A](hd: A, tl: List[A])

I started reading a Haskell-cafe thread about NonEmptyLists, and saw Edward Kmett (who is one of my favorite writers on functional programming) use a NonEmptyList that was defined in a very similar way, once again appealing to the definition of List:

data NE a = NE a [a] deriving (Eq,Ord,Show,Read)

I felt kind of dissatisfied with these definitions. I find using the definition of List inside NonEmptyList to be sort of ugly – it seems to get in the way of understanding the essence of NonEmptyList, taken as its own concept. From a library implementer’s perspective, I understand the practicality – plenty of functions are already written to work with lists, and this is probably a fairly efficient encoding, performance-wise, as only the first cons cell needs to have any special information compared to a regular List.

With all those practical considerations aside, I thought this was a much slicker definition:

case class NonEmptyList[+A](hd: A, tl: Option[NonEmptyList[A]])

This seems to get a bit more at the fundamental nature of NonEmptyLists. Of course it is trivially isomorphic to the above definition, but I like it better since its a bit more self-contained – Option is simpler than List (something like forall X. 1 + X vs. forall X. mu. x. 1 + X * x).

As far as I understand it, this is the equivalent of calling someNonEmptyList.tail.toNel (toNel produces an Option[NonEmptyList[A]]) in the scalaz library, so they certainly use it this way. Still, I think its slicker to do it by default rather than relying on extension methods on List. Don’t get me wrong, I’m not so foolish as to think they hadn’t considered doing it in the way I propose – these guys know a hell a lot more about types and FP than I do (I’ve quoted their blogs a bunch, and I just bought a pre-release of their upcoming book).

As I continued to read the Haskell-cafe thread, I was excited to see that the next (and final) post was by Conor McBride, and showed that same slick formulation:

data NE x = x :> Maybe (NE x)

Conor McBride is a rock-star of type theory – the guy who showed the definition of “differentiation” (think calculus) can be extended to datatypes! What a boss.

It’s always a fun experience when you see people discussing a problem, you think “why don’t you solve it *this* way??”, and then someone whom you look up to offers that same idea. The occasional affirmation that you’re not totally out to lunch. Unfortunately for me, this only ever seems to happen while looking back at things – if I could just have decent ideas *before* other people have them, then I might be on to something.

All kidding aside, I feel like this phenomenon is common with many aspiring academics or budding programmers (at least among my friends and I): first you start reinventing wheels that were invented 50 years ago, then 20, then 5, then 1, then before you know it you’re actually doing some original research!

Anyhow, I just really liked the idea of NonEmptyList, and this particular slick little formulation, and wanted to share it with everybody. I know above I speculated as to whether NonEmptyList had some sort of similarly deep meaning as List, and I’ll let you know if I read of (or think of) any.

Java VisualVM is a sweet, free Scala profiler that comes with the JDK!

March 7, 2012

Scala is a very sweet language, but many of its abstractions come with hidden costs, so you might sometimes wind up writing some really slick code that is dog-slow, and be wondering where to start.

That’s where VisualVM comes in. You probs already have it installed on your computer. This was an especially pleasant surprise for me as primarily a .NET developer, I’m used to the decent profilers being expensive (my favorite is the one that comes with Visual Studio Ultimate, as I think sampling > instrumenting).

Just use Launchy to open up Search Everything, type in “visualvm”, and open the program. Then:

  1. Start running the code you want to profile inside IntelliJ.
  2. You’ll see a new java icon, representing the running process, appear under the “Local” section of the leftmost “Applications” pane of VisualVM – right-click the icon.
  3. From the menu that appears, click “Sample”.
  4. Now wait a while for it to collect enough data, stop running your program, and you should have a sweet list of the hotspots in your code.
  5. When you want to dig into a hotspot and figure out the stack trace leading up to it, select it and click the “Snapshot” button. Bam!
Of course, this isn’t perfect – in the ~10 minutes I’ve been playing with it, I still haven’t figured out how to get good line numbers. Perhaps running in debug mode will add better symbols for the tool to take advantage of. Even so, the traces are more than enough to point you in the right direction.

Thanks to http://visualvm.java.net/gettingstarted.html and http://stackoverflow.com/questions/1340082/scala-profiler for pointing me to this stuff!


Scala code to generate Markov chains from a text file

March 6, 2012

(Technical note: the following code is not quite correct, as it can lead to 0 probability for picking certain words, which isn’t right… the correct approach is to have a uniform prior (i.e. add 1 to everything). I’ll update it soon. Learning is fun!)

As a followup to my last post on Markov chains, I thought I’d post some code I wrote to generate those sorts of word-chains given a training .txt file.

I will be putting this example along with some of my helper libraries and other introductory machine learning examples up on my GitHub in the near future. I’ve been self-studying machine learning for a while now, and writing little models and snippets of code here and there, but I thought it would be a good exercise to have a bit more discipline about it and try to make public a set of worked examples, factored into a mini-library of sorts.

I’ve been using Project Gutenberg as a nice free source of training text, and the code below is configured to use a local copy of War and Peace as the training data. Just set the path to point to wherever you download the text. If you can’t get this stuff to work keep on the lookout for some posts about setting up Scala + IntelliJ that I’ve been putting off writing…

Here’s some sample output:

the mob pointing to act and earned him princess mary listened to 
thedisordered battalion three and wait said something to be a small
officer said his coat over the village of countbezukhov and yellow
legs went onall the strange voice this a memorandum is so on their
mastersand he moved on the country seat of the dorogomilov gates of
oldenburgand there to commitmurder he had saved by the young
armenian family had taken from the meeting and the morning
alpatychdonned a quiet gentleirony because it offended by its all
that a third of moscow no farther end innothinghe was such


Anyhow, here’s the code (apologies for the funky formatting, I know I should just use Gists or figure out a blog plugin or something, but it’s haaaaard. Wah.)

object StandaloneMarkovChainGenerator {

  // replace this with a path containing your training text
  val localPath = "C:\\Users\\Luke\\Desktop\\Files For Machine Learning\\";

  def readLocalTextFile(path: String): String =
    io.Source.fromFile(localPath + path).mkString

  def getWordSequenceFromString(str: String): Seq[String] =
    str.toLowerCase filter { c => c.isLetter || c.isSpaceChar } split "\\s+"
  def getWordSequenceFromLocalFile(fileName: String): Seq[String] =

  def getConsecutivePairs[T](items: Seq[T]): Seq[(T, T)] =
    items zip (items drop 1)

  def getCountsDouble[T](items: Seq[T]): Map[T, Double] =
    items groupBy identity mapValues { _.length.asInstanceOf[Double] }

  type Dist[+T] = () => T

  def getWeightedCasesDistribution[T](
    weightedCases: Seq[(T, Double)]): Dist[T] = {

    val cases = weightedCases map { _._1 }
    val weights = weightedCases map { _._2 }
    val summedWeights = weights .scan(0d) { _ + _ } drop 1
    val sumOfWeights = summedWeights.last
    val probs = summedWeights map { _ / sumOfWeights }
    val casesAndProbs = cases zip probs

    { () =>
      val roll = scala.math.random
      casesAndProbs find {
        case (_, prob) => prob > roll
      } map { _._1 } getOrElse (sys.error("Impossible!"))

  def generateMarkovChainFromFile(
    fileName: String, startWord: String = "the"): Stream[String] = {

    val words = getWordSequenceFromLocalFile(fileName)
    val pairs = getConsecutivePairs(words)
    val pairsWithCounts = getCountsDouble(pairs)

    val wordsToFollowingWordsAndCounts =
        .map { case ((a, b), num) => (a, (b, num)) }
        .groupBy { case (a, (b, num)) => a }
        .mapValues { _ map { case (a, (b, num)) => (b, num) } }

    val sortedByFrequency =
      wordsToFollowingWordsAndCounts mapValues { _ sortBy { _._2 } }

    var wordsToNextWordDists = Map(): Map[String, Dist[String]]

    def pickNext(word: String): String = {
      val possibleNextWordsAndWeights = sortedByFrequency(word)
      if (possibleNextWordsAndWeights.isEmpty) { return startWord }
      wordsToNextWordDists = wordsToNextWordDists updated
        (word, getWeightedCasesDistribution(possibleNextWordsAndWeights))


  def main(args: Array[String]) {
    // you can replace this with a different book if you'd like
    val filePath = "MarkovChain\\WarAndPeace.txt"
    val chain = generateMarkovChainFromFile(filePath)

    println(chain take 100 reduce { _ + " " + _ })

Miles Sabin’s Shapeless is awesome!

March 6, 2012

So, I just found out about this cool Scala library, Shapeless.

It contains (among many things) what the author describes as “the mother of all HList implementations.” If you’ll remember, I did a post on using HLists to kinda-sorta simulate variable-arity polymorphism, as a followup to my very frist psot using Typed Racket’s variable-arity features to make a “fmap*” function that “lifts” functions of arbitrary arity into functions over functorized values – in that case, the functor was the type of probability distributions.

Shapeless has the ability to convert from functions of arbitrary arity to functions which accept a single HList parameter. Using that, he’s defined a variadic lifting function, lift0, which takes a function (A, B, C, ...) => D and turns it into (Option[A], Option[B], Option[C], ...) => Option[D]. This is exactly my holy-grail “fmap*” for the Option functor, and it’s done perfectly!

In Typed Racket I had to (mostly) disable type checking to use fmap* without requiring a frightening amount of type annotations, which made it more of a moral victory than something practical. In my Scala attempts at HLists, I acknowledged the connection, but certainly didn’t get anywhere close to an ability to convert between regular functions and functions over HLists, nor did I know it was possible.

Anyhow, I just think it’s very cool that somebody solved this puzzle 1) which I didn’t know how to solve, 2) for which I doubted a solution was possible, and 3) packaged it all up into a nicely reusable library that borders on the practical. This puts Miles Sabin up there with Oleg for smashing apart my preconceptions of what is and isn’t possible in the static type systems of actually usable languages.

I recommend you check out the library. It’s mostly above my head, which is always a good sign. Next step is to use recursive types to develop a similar facility for curried functions…

A quick note on Markov chains, or “where does spam come from?”

February 26, 2012

(Technical note: the following explanation is not quite correct, as it can lead to 0 probability for picking certain words, which isn’t right… the correct approach is to have a uniform prior (i.e. add 1 to everything) but this seemed to me to be a little easier to explain…)

Have you ever wondered why spam has all those hilarious nonsensical strings of words in it? They’re pretty odd constructions… not as random as if you picked words randomly out of a hat, almost grammatical much of the time, but still clearly gibberish.

I know this has been addressed in a bazillion places (see for example this), but most explanations say that these are randomly generated words designed to fool your spam filter, without explaining the interesting part – why they appear to have more structure than completely random choices. So, I thought I’d get into it in a bit more detail.

What are they? The answer is found in the concept of Markov chains.

You take a piece of “training” text. Make a list of all the words in it. For each word, make a list of all the other words that come after it, with the number of times each appears. So with the sentence: “the quick brown fox jumped over the lazy dog”, I’d end up with the list:

1. the    -> (1, quick), (1, lazy) 
2. quick  -> (1, brown)
3. brown  -> (1, fox)
4. fox    -> (1, jumped)
5. jumped -> (1, over)
6. over   -> (1, the)
7. lazy   -> (1, dog)
8. dog    ->

Turn that into a matrix, where the rows represent the “leading” words and the columns represent “following” words, and each number in the matrix says how many times the following word appeared after the leading word. So we’d get:

       the quick brown fox jumped over lazy dog
the    0   1     0     0   0      0    1    0
quick  0   0     1     0   0      0    0    0
brown  0   0     0     1   0      0    0    0 
fox    0   0     0     0   1      0    0    0
jumped 0   0     0     0   0      1    0    0
over   0   0     0     0   0      0    1    0
lazy   0   0     0     0   0      0    0    1
dog    0   0     0     0   0      0    0    0

Now, divide every number in the matrix by the total of its row, and you’ll notice that each row becomes a sort of probability distribution.

       the quick brown fox jumped over lazy dog
the    0   0.5   0     0   0      0    0.5  0
quick  0   0     1     0   0      0    0    0
brown  0   0     0     1   0      0    0    0 
fox    0   0     0     0   1      0    0    0
jumped 0   0     0     0   0      1    0    0
over   0   0     0     0   0      0    1    0
lazy   0   0     0     0   0      0    0    1
dog    0   0     0     0   0      0    0    0

You can interpret this as saying “if the first word is a ‘the’ there’s a 50% chance the next word is ‘quick’, and a 50% chance the next word is ‘lazy’. For all the other words, there is only one possible word following it.”

Now so far this hasn’t been too interesting. Almost every word has only one possible following word because the text is so short. But, if you train it with a larger text, and interpret the rows as a probability distribution, you can start to see for every word what sort of word tends to follow it. This gives a very interesting insight into the nature of written text.

If you then take that big “transition matrix” you’ve trained from a large text, you can use it to actually generate new text in the following way:

1. Pick a “seed” word from the text at random. For best results use one with many possible following words.

2. Find the row in the matrix corresponding to that word. Choose the next word at random, weighted according to the probabilities in the row. That is, if the the column corresponding to the word “blue” has the number .05 in it, you have a 5% chance of picking “blue” as the next word, and so on (when we divided each number by the total of its row we made sure that these probabilities would add up to 1).

3. Go back to step 2 using this second word as the new “seed” word. Continue this process to generate as long a string of words as you want. If you end up with a word for which no other words follow it (uncommon when you train on a large test, but possible – imagine if the last word of a novel was the only occurrence of the word “xylophone”, or whatever), just pick a random word.

You can see how strings of words generated with this method will follow the “trends” of the training data, meaning that if you were to generate a new transition matrix from the generated words it would, on average, look the same as the original transition matrix since you picked the words according to those weights. This completely mechanical process can generate data which looks, statistically, like meaningful English. Of course, it is not necessarily grammatical, and is certainly devoid of higher meaning since it was generated through this completely simplistic process.

Those “chains” of words constructed by the above process are an example of Markov chains. And they are (as far as I know) the answer to the question “where does spam come from?” posed above. Those uncannily-almost-grammatical ramblings below the “Viagra” ads, generated through the above process, are the spam-creators way of fooling your spam filter. They include these chains to give their advertisements statistical similarity to meaningful human correspondence. This works because the spam filters are (at least in part) using probabilistic models that depend on word-transitions and word frequencies to classify incoming email as spam. The spammers and the filter-writers are engaged in an eternal game of randomly-generated cat-and-mouse.

As with many algorithms in Bayesian statistics and machine learning, the fact that such a simple process can give birth to such seemingly, for lack of a better word, “intelligent” results, is mind-boggling.

Getting started with Scala, Part I: Rambling Introduction!

February 25, 2012

Until recently, my Scala experimentation has mostly made use of only code that I personally had written, and avoided the complexities of IDEs/build scripts/DLL hell/etc. The PushGP implementation I wrote uses no third-party libraries, and is all in one big file with a single “main” method!

I use C# with .NET at work for the last 4 years or so, and I’ve mostly figured out how to code in that environment by now (read: my co-workers, especially Sean, have with long-suffering patience taught me how to do what I need to do and let me copy their keybindings.) Up until recently with Scala I’ve been trying to avoid learning how to really use the IDE and environment and concentrated on the neato things I could do with type systems and DSLs and whatnot.

Unfortunately, once you want to do anything besides write programming language implementations from scratch (thanks, built-in parser combinator library!), you’re going to need to reference other libraries. You’re going to need to learn how to build those libraries, which are frequently distributed as balls of source code. You’re going to need to learn how to make your IDE reference them from your project. And you’re going to need to learn wtf a JAR is. This, in my experience, is quite a pain.

So, I decided to put together a few bits of advice that might be useful to other people who are used to how rad C# + Studio + ReSharper is (or to a lesser extent F# – I’m looking at you, guy who was too lazy to let me color type names and variable names differently!), and are trying to get started with Scala. Read on through the next few posts and find out how!