Online news is full of quotes from politicians or other famous people. Quite often though, those quotes are transformed just a tiny little bit when they are copied from one source to another article – blog or news outlet. This process is most likely unconscious, but it's not random: the changes made are significant, and we measured some of them with data-mining techniques. Looking at substitutions from one word to another shows that more complex words tend to be replaced with simpler, better known words.
This led to our Semantic drift of quotations in blogspace paper. The whole analysis is free software and thoroughly documented, so that it can be remixed or can serve as inspiration or as an example for other projects. The code lives here, the documentation here.