2008-04-18
Linking
Ian Lance Taylor recently released gold, a new ELF- and x86-specific linker implementation, as part of GNU binutils.
The main feature of gold is performance: It's much more efficient than the old binutils ld, and it is multithreaded (given that linking is not an immediately obvious source of low-hanging parallelism, this is an interesting case-study). Ian compares the performance of gold and ld when linking a “900M C++ executable”. Ian works for Google, and I can't help wondering if this is just a test case or a real use case for them.
Ian has been part of the binutils and GCC communities for many years, and he writes about them regularly on his blog. While developing gold, he wrote a 20 part series of posts about linking in general, and linking ELF for x86 in particular. And if you find that kind of thing interesting, there's Tom Tromey's blog too.
2008-03-10
Pet peeve #1682: Unnecessary spaces in XML empty-element tags
That is, spaces before the slash in XML empty-element tags. E.g. <foo attr="val" />, as opposed to <foo attr="val"/>.
This habit originates in guideline in the XHTML1 spec. It's a hack to get old browsers that only know about HTML to accept XHTML1. But it only addresses the issue of the difference between XML empty-element tags and SGML empty element types, and even then it is dubious (see about halfway through this document). There are many other more subtle differences between XHTML1 and HTML.
But even if one accepts this custom in the context of XHTML1, it has no sensible place in XML documents that are not XHTML. But I have seen it infecting XML documents in general.
2008-02-10
ABC
One for the discrete mathematics wonks:
Consider strings of symbols taken from an alphabet (a set of symbols). A string contains an echo if it contains a substring of the form SS, where S stands for any substring. So the string hello contains an echo (S = l), as does banana (S = an, or S = na). But bandana is contains no echo: although the substring an occurs twice, these occurrences are not contiguous. A string without an echo is echo-free.
So here's the question: Given an alphabet, what is the longest echo-free string of symbols from that alphabet? Is there a echo-free string (or strings) of length n such that all strings with length greater than n do contain an echo?
With an alphabet of only one symbol, a, the longest echo-free string is obviously just a, since aa contains an echo.
For an alphabet of two symbols, a and b, the longest echo-free strings are aba and bab. Extending these strings always introduces an echo (e.g. abaa, abab).
What about an alphabet of three symbols? It turns out that with three symbols, there are echo-free strings which continue without end, and so there is no longest echo-free string. And since such strings exist for alphabets of three symbols, they exist for all larger alphabets.
Here is a definition in Haskell of an echo-free infinite string using three symbols (Haskell being a wonderful language for describing infinite sequences):
-- three magical strings expand 'a' = "abcbabcacbc" expand 'b' = "abcbacabacbc" expand 'c' = "abcbacbcacbabcbacbc" expands = concatMap expand -- an infine echo-free string noecho = (expand 'a') ++ (expands $ drop 1 noecho)
Using ghc to run it:
*Main> take 1000 noecho "abcbabcacbcabcbacabacbcabcbacbcacbabcbacbcabcbacabacbcabcbabcacbcabcbacabacbcab cbacbcacbabcbacbcabcbabcacbcabcbacbcacbabcbacbcabcbacabacbcabcbacbcacbabcbacbcab cbabcacbcabcbacabacbcabcbacbcacbabcbacbcabcbacabacbcabcbabcacbcabcbacbcacbabcbac bcabcbabcacbcabcbacabacbcabcbabcacbcabcbacbcacbabcbacbcabcbacabacbcabcbacbcacbab cbacbcabcbabcacbcabcbacabacbcabcbacbcacbabcbacbcabcbacabacbcabcbabcacbcabcbacbca cbabcbacbcabcbacabacbcabcbacbcacbabcbacbcabcbabcacbcabcbacbcacbabcbacbcabcbacaba cbcabcbabcacbcabcbacabacbcabcbacbcacbabcbacbcabcbacabacbcabcbabcacbcabcbacbcacba bcbacbcabcbacabacbcabcbacbcacbabcbacbcabcbabcacbcabcbacabacbcabcbacbcacbabcbacbc abcbacabacbcabcbabcacbcabcbacbcacbabcbacbcabcbabcacbcabcbacabacbcabcbabcacbcabcb acbcacbabcbacbcabcbacabacbcabcbacbcacbabcbacbcabcbabcacbcabcbacabacbcabcbacbcaca bcbacbcabcbacabacbcabcbabcacbcabcbacabacbcabcbacbcacbabcbacbcabcbabcacbcabcbacbc acbabcbacbcabcbacabacbcabcbacbcacbabcbacbcabcbabcacbcabcbacabacbcabcbacbcacbabcb acbcabcbacabacbcabcbabcacbcabcbacbcacbab"
It's straightforward to write a program to check that some prefix of the string generated by noecho is in fact echo-free. Although, I have not found an algorithm for this with time complexity better than O(n2), making it time-consuming to check more than a few thousand symbols.
Of course, empirically verifying that some finite substring of noecho is echo-free is not the same as proving that noecho in its entirety is echo-free. The three magical strings from the program are carefully chosen to allow such a proof, which I will outline in a future post.
I expect that this result was described long ago in some corner of the mathematics literature. If anyone reading this knows where, please get in touch.
Update: thanks to a commenter, I know now that the established term for what I dubbed echo-free sequences is squarefree words.
2008-02-01
Vanity genome searches
So Craig Venter and his team put a secret message in their arficial genome. But perhaps your name is present in a natural genome? Stephen Matheson explains how to search a public gene database.
I went searching, and found that DWRAGG appears in a gene in Oryza sativa var. japonica — Japanese rice
(This reminded me of a Futureshock strip in 2000AD comic back in the 80s. In that story, when scientists decode the human genome, they find a message saying something like “copyright Jehovah Labs”. The letter O isn't used to refer to one of the 20 amino acids, so I searched for YHWHLABS instead. No matches.)
2007-12-03
Emacs font problem
There have been no updates here for a while. That's because I've been settling into a new job and a new flat/apartment (delete according to your prefered dialect). So, to keep this blog warm while I adapt to the new regime, a small Linux-related note:
My new place of employment is an Ubuntu shop, so I have been getting used to Gutsy Gibbon. I also recently upgraded my personal laptop to Fedora 8. Both of these have emacs 22.1.1. And both exhibited a new problem with emacs font handling: Emacs would ignore my font customizations, even though it took note of my color preferences, which are part of the same settings. This was quite annoying, so I spent many hours trying to track the cause down. I did find that it has nothing to do with the emacs version, but is actually related to the X version (running the new emacs under the old version of X is problem-free). But I gave up investigating further because I found a workaround: If I set the desired font using X resources instead of (or in addition to) the emacs customization system, then emacs always picks it up. So now I have
emacs.font: fixed
in my .Xdefaults, and emacs is back to its good old self.
2007-10-29
One size mostly fits all
Database pioneer Michael Stonebraker has published a series of papers arguing that the traditional relational database engine is obsolete. His claim is that, for every database workload important today, there is a specialized database architecture that can dramatically outperform (by an order of magnitude or more) established “one size fits all” databases.
The latest paper in this series is The End of an Architectural Era (It's Time for a Complete Rewrite) (Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, Pat Helland). This paper concentrates on transaction processing workloads. It presents a research database called H-Store, which outperforms a traditional RDBMS by 82 times on the TPC-C benchmark. H-Store has some features that may be familiar to some of my readers: It's an in-memory database, with the application logic in the same process as the database. Because it is an in-memory database, and because it targets OLTP workloads, transactions are very short-lived, and so it confines transactions to a single thread, avoiding the overhead of locking or any other form of concurrency control.
(Some aspects of H-Store are less than elegant. The transaction classification scheme used to support clustered operation seems rather ad-hoc. And the idea of handling conflicting transactions by requiring that “each execution site must wait a small period of time (meant to account for network delays) for transactions arriving from other initiators” (section 4.4) is positively scary.)
Other workloads are discussed in an earlier paper: One Size Fits All? Part 2: Benchmarking Results (Michael Stonebraker, Chuck Bear, Uğur Çetintemel, Mitch Cherniack, Tingjian Ge, Nabil Hachem, Stavros Harizopoulos, John Lifter, Jennie Rogers, and Stan Zdonik). One particular workload discussed in that paper is data warehousing (i.e. reporting). For these applications, they propose column stores (database engines storing tables as multiple sequences of column values, rather than as a single sequence of row tuples). They demonstrate a performance advantage for a research database called C-Store, and its commercial descendant Vertica, in a telco call analysis application and a simplified variant of the TPC-H benchmark.
Despite these results, I doubt that there is an imminent threat to established “one size fits all” databases. The market is much more likely to stick with traditional databases and SQL whenever they provide adequate performance, complementing them with specialized database architectures only when there is a compelling need for the performance advantages that they can provide.
2007-10-23
ExpressCard
While browsing the Inquirer today I discovered the reason why I can only fit PC Card and CardBus cards into one of the two slots on my year-old laptop: One of the slots is actually an ExpressCard slot. I'd heard of this format, but hadn't realized that ExpressCard is incompatible with the PC Card format, and some of the cards are a wierd L-shape. But externally, ExpressCard slots look identical to PC Card slots, and a PC Card can fit halfway into an ExpressCard slot before it gets stuck. What was that committee smoking?
The way to tell the slots apart is that the little eject buttons have a tiny “O” or an ”X“ symbol on them to indicate the slot type. I noticed these symbols when I first got the laptop, but I though they were just decorative. Now I know the secret.
2007-10-15
Enabling MP3 support in XMMS on Fedora 7
Recently I received some MP3 files. Fedora doesn't include MP3 support, due the patents surrounding MP3. This wasn't a problem for me until now, since my music collection is in FLAC and Ogg Vorbis.
But livna makes it easy to fix this:
$ wget http://rpm.livna.org/fedora/7/`uname -i`/livna-release-7-2.noarch.rpm $ rpm -i livna-release-7-2.noarch.rpm $ yum install xmms-mp3
(That's for xmms, but livna also has packages to provide MP3 support for gstreamer etc.)
2007-10-02
Dear music industry
I don't want to discourage your progress with DRM-free online music stores. They are interesting to me in a way that DRM-encumbered services are not. But while you only offer MP3 files, and not files using an open lossless format, it just doesn't work for me.
(Radiohead's experiment excluded. I can't explain why I am willing to pay 40 pounds for their new album in a nice box two months in advance of receiving it. But being able to download it while I wait for it to arrive is a bonus.)