One for the discrete mathematics wonks:

Consider strings of symbols taken from an alphabet (a set of symbols). A string contains an echo if it contains a substring of the form SS, where S stands for any substring. So the string hello contains an echo (S = l), as does banana (S = an, or S = na). But bandana is contains no echo: although the substring an occurs twice, these occurrences are not contiguous. A string without an echo is echo-free.

So here's the question: Given an alphabet, what is the longest echo-free string of symbols from that alphabet? Is there a echo-free string (or strings) of length n such that all strings with length greater than n do contain an echo?

With an alphabet of only one symbol, a, the longest echo-free string is obviously just a, since aa contains an echo.

For an alphabet of two symbols, a and b, the longest echo-free strings are aba and bab. Extending these strings always introduces an echo (e.g. abaa, abab).

What about an alphabet of three symbols? It turns out that with three symbols, there are echo-free strings which continue without end, and so there is no longest echo-free string. And since such strings exist for alphabets of three symbols, they exist for all larger alphabets.

Here is a definition in Haskell of an echo-free infinite string using three symbols (Haskell being a wonderful language for describing infinite sequences):

-- three magical strings
expand 'a' = "abcbabcacbc"
expand 'b' = "abcbacabacbc"
expand 'c' = "abcbacbcacbabcbacbc"

expands = concatMap expand

-- an infine echo-free string
noecho = (expand 'a') ++ (expands $ drop 1 noecho)

Using ghc to run it:

*Main> take 1000 noecho

It's straightforward to write a program to check that some prefix of the string generated by noecho is in fact echo-free. Although, I have not found an algorithm for this with time complexity better than O(n2), making it time-consuming to check more than a few thousand symbols.

Of course, empirically verifying that some finite substring of noecho is echo-free is not the same as proving that noecho in its entirety is echo-free. The three magical strings from the program are carefully chosen to allow such a proof, which I will outline in a future post.

I expect that this result was described long ago in some corner of the mathematics literature. If anyone reading this knows where, please get in touch.

Update: thanks to a commenter, I know now that the established term for what I dubbed echo-free sequences is squarefree words.


Vanity genome searches

So Craig Venter and his team put a secret message in their arficial genome. But perhaps your name is present in a natural genome? Stephen Matheson explains how to search a public gene database.

I went searching, and found that DWRAGG appears in a gene in Oryza sativa var. japonicaJapanese rice

(This reminded me of a Futureshock strip in 2000AD comic back in the 80s. In that story, when scientists decode the human genome, they find a message saying something like “copyright Jehovah Labs”. The letter O isn't used to refer to one of the 20 amino acids, so I searched for YHWHLABS instead. No matches.)

1 comment