Letterpress Cheater Algorithm

Sep 2013

A few months ago when Letterpress was hot, I thought about an efficient algorithm for a cheater application. A trivial solution in Ruby is quite slow for this problem, which motivated me to check out alternative approaches. Abbreviated, the problem is the following:

Given 25 random letters (letters), find every string in an array of strings (words) that consists of only those letters.

I’ll start by walking through a naive solution before presenting a data structure to solve this problem efficiently.

Naive algorithm

The naive approach for this problem is to simply loop through all elements of words and check whether these words can be formed from the characters in letters with a frequency map:

letters = "ovrkqlwislrecrtgmvpfprzey"

# Create the frequency map
letters = letters.each_char.inject(Hash.new(0)) { |map, char| (map[char] += 1) && map }

words.select { |word|
  word.each_char.inject(letters.clone) { |freq, char| 
    (freq[char] -= 1) < 0 ? break : freq
  } && word
}.uniq

A frequency map looks like this:

{"o"=>1, "v"=>2, "r"=>4, "k"=>1, "q"=>1, "l"=>2, "w"=>1, "i"=>1, 
"s"=>1, "e"=>2, "c"=>1, "t"=>1, "g"=>1, "m"=>1, "p"=>2, "f"=>1,
"z"=>1, "y"=>1}

For an average word length m and n words in words this runs in O(n m) time. In my tests, this algorithm runs in about 2-4s on my Macbook on the dictionary in /usr/share/dict/words. That is not fast enough if this was to be used for a web application for example, so we dig deeper.

A simple iteration on this naive algorithm, is the observation that if a character is not in letters, then we do not have to loop through all the words that start with this letter in words. E.g. if there’s no letter c in letters, we can skip to the words beginning with d when we encounter the first word which begins with c in words and so on.

groups = words.group_by { |s| s.first }

groups.map { |char, group|
  next if letters[char] == 0

  group.select { |word|
    word.each_char.inject(letters.clone) { |freq, char| 
      (freq[char] -= 1) < 0 ? break : freq
    } && word
  }
}.flatten.uniq

The speed of this iteration depends on how many words in letters start with each character in words. Say that k is the highest number of words that starts with any character in words, and d is the number of distinct characters from which we can form words, then this runs in worst-case O(d k) time. In my tests, this was about twice as fast as the previous algorithm. Another iteration is threading the processing of each group.

Using a Trie

The idea of grouping is similar to the original solution I had in mind, but instead of just grouping on the first character, group on all characters! This creates a neat, recursive structure called a Trie.

For instance, if we put in the word “band”, “ban” and “boo” in the data structure, it will look like this:

{
  b: {
    o: {
      o: {
      }
    }
    a: {
      n: {
        d: {
        }
      }
    }
  }
}

That way we can check that “boo” is in the structure with something like: map[:b][:o][:o]. However, that would also imply that “bo” is in the structure, which it is not. We need a state on each of the maps that tells whether a word ends at this letter.

class Trie
  attr_accessor :word, :nodes

  def initialize
    @word, @nodes = false, {}
  end
end

With that in place, we can create a method to create the data structure described above, by going through each character in the added string, creating new Tries as we go:

def <<(word)
  node = word.each_char.inject(self) { |node, char| 
    node.nodes[char] ||= Trie.new 
  }
  node.word = true
end

With that comes the interesting part. The problem is now, given this data structure, how do we find all the entries in the data structure, which is the end of a word that we can get to with letters?

Once again we make use of the frequency map explained in the previous section, and then we recursively visit nodes in the data structure. The frequency map is updated as we go down the recursion, so invalid paths can be detected.

def find(letters)
  recursive_find frequency_map(letters), ""
end

def recursive_find(used, word)
  words = nodes.reject { |c, v| used[c] == 0 }.map { |char, node|
  node.recursive_find(used.merge(char => used[char] - 1),
                      word + char)
  }.flatten

  words << word if self.word
  words
end

The full implementation of the data structure can be seen in this Gist.

Benchmarks

I made a few small optimizations to the trie showed here to make it faster. There’s still plenty of things that could be done to make it faster, such as concurrent searching, and there’s probably a bunch of things that can be found profiling. Either way, it’s clear that it is much faster to use a trie over the naive methods presented earlier. The trie uses a lot of memory, this could be optimized by converting it into a radix tree, which could also yield small performance benefits.

       user     system      total        real
Benchmarking 'ovrkqlwislrecrtgmvpfprzey'
faster trie  0.090000   0.000000   0.090000 (  0.089411)
blog trie  0.390000   0.040000   0.430000 (  0.452724)
naive  3.290000   0.140000   3.430000 (  3.467491)
group  2.230000   0.050000   2.280000 (  2.290674)

Benchmarking 'abcdefghifghijklmnopqrstuvxyz'
faster trie  0.670000   0.000000   0.670000 (  0.671967)
blog trie  3.020000   0.070000   3.090000 (  3.094881)
naive  4.450000   0.050000   4.500000 (  4.596610)
group  4.120000   0.050000   4.170000 (  4.208492)

Benchmarking 'odidwocswkbafvydehsbiviez'
faster trie  0.030000   0.000000   0.030000 (  0.030700)
blog trie  0.100000   0.010000   0.110000 (  0.103797)
naive  2.550000   0.010000   2.560000 (  2.566604)
group  1.640000   0.010000   1.650000 (  1.658194)

Benchmarking 'rtlyifebuzkxndovzyzodelap'
faster trie  0.150000   0.000000   0.150000 (  0.154702)
blog trie  0.630000   0.010000   0.640000 (  0.633183)
naive  2.810000   0.010000   2.820000 (  2.822617)
group  2.140000   0.000000   2.140000 (  2.147578)

About 0.1-0.2s in the worst case on actual Letterpress games is absolutely fine for use in a service. The second case, which takes the most time, is a stress test with every letter of the alphabet. It would never happen in Letterpress.

Subscribe through email, RSS or Twitter to new articles!

3,890 subscribers