Scientific paper as a poem

January 28, 2013

Gramicidin S Interrogation

gramicidin

gramicidin S was not like his fellow gramicidins
There was nothing linear about him

Two identical pentapeptides joined head to tail
Synthesized by gramicidin S synthetase

He was recruited for antibiotic duty in 1942
Fighting against bacteria and some fungi too

His fighting technique was well known in vivo
Binding with phospholipids of cell membranes was the way to go

In silico GS was more of a mystery man
Perhaps his intrinsic and native structure could reveal his plan

Know your enemy is what Machiavelli and Sun Tzu say
Maybe GS will reveal all with molecular H-bonding interplay

A good place to start is gas phase isolation
Progressive water(molecule)boarding helps control interaction

Add a bit of cryogenic cooling to avoid thermal congestion
A unique spectroscopic fingerprint is the final destination

GS he seemed like such a tough guy
But it took only 2 water molecules to make him cry

The complex stereoscopic fingerprint of GS was cracked
With these constraints it should be easy for the theorists to attack.

REFERENCES

Nagornova NS, Rizzo TR and Boyarkin OV (2012) Interplay of Intra- and Intermolecular H-Bonding in a Progressively Solvated Macrocyclic Peptide . Science 336, 320; DOI: 10.1126/science.1218709

Image credit: Wikipedia Commons

A fictional article I wrote for the Swiss Federal University Lausanne (EPFL).

EPFL will join forces with the search-engine giant Google as well as other scientific partners to develop a translation tool for scientific literature. EPFL’s Probabilistic Machine Learning Lab will be a part of a global initiative called linguaSCIENCE that aims at offering scientists the ability to translate scientific papers in English into their native language and vice-versa. This tool aims to overcome the language barrier often faced by scientists from non-English speaking countries, and thus promote global scientific collaboration. 

facebook map

Collaboration lost in translation?

Science has always been about collaboration. The globalisation of science has resulted in scientists from all over the world working with each other to add to the body of science. It is now almost the norm to have scientific papers with multiple authors of diverse nationalities. However, the extent of this collaboration could be overestimated.

In the spirit of the well-circulated Facebook friendship map by Paul Butler, research analyst Olivier Beauchesne at Science-Metrix examined scientific collaboration around the world from 2005 to 2009 by extracting and aggregating scientific collaboration between cities all over the world (see fig.). Looking at the map there doesn’t seem to be much collaboration outside of the United States and Europe. Beauchesne is unsure if that’s because of a limited dataset or really because there’s little collaboration in those areas. Could language be one of the contributing factors?

Machine learning to the rescue

If so, “statistical machine learning” (SML) could help prevent science from being lost in translation. SML is the process of seeking patterns in large amounts of text. It has already helped Google carve a niche for itself in translation with Google Translate, a free translation service that provides instant translations between 64 different languages.

When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be. Examples of human-translated documents used include those produced by the United Nations and the European Parliament.

However, SML has its limitations. The more human-translated documents that Google Translate can analyse in a specific language, the better the translation quality will be. This is why translation accuracy will sometimes vary across languages.

EPFL is part of the solution

EPFL’s role in the project is to help improve translation accuracy for languages that are relatively poor in the availability of human-translated documents, particularly in the area of scientific literature. EPFL’s expertise in Bayesian inference (which is principled way of combining new evidence with prior information) could hold the key to improving translation accuracy. Bayesian inference is already being applied in fields as diverse as spam filtering and analysing evidence in a courtroom.

Applying Bayesian inference to statistical machine learning could give global scientific collaboration a turbo boost”, says Michael Singer of the EPFL’s Probabilistic Machine Learning Lab. Singer adds that Bayesian inference has been shown to outperform other standard methods (such as expectation-maximization method), especially in the scientific domain. This is because standard methods take into account only the most likely point in the list of word-translation probabilities, but do not consider contributions from other points. Bayesian inference helps provide a bigger picture and better results on the long-term, which could make all the difference when translating languages lacking a significant body of scientific literature. Bayesian inference and EPFL could thus provide a valuable contribution to help improve the accuracy of linguaSCIENCE and overcome the language barrier in science.

REFERENCES

Mermer C and Saraclar M (2011). Bayesian Word Alignment for Statistical Machine Translation. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Volume 2 Pages 182-187. ISBN: 978-1-932432-88-6 .