Peter Murray-Rust is a chemist specialising in informatics, recently retired but still active in research at Cambridge University, England. He aims at creating “chemically-artificial-intelligent machines”. These can understand simple chemical discussion (written and spoken) and extract the chemical information.
We spoke to him about his work.
How can you make a machine that “understands” chemicals? Isn’t that science fiction?
I have a vision where machines can read all chemical information on the web and in publications and filter out the important parts for each reader and re-user. There are 10 million reactions each year published in research articles and our tools can process a useful proportion of them. It’s possible today for machines to answer some first-year chemistry exam questions.
You can try this for yourself at the Chemical Tagger web-site. Just paste in some chemistry (or use the sample provided) and click the “Process Text” button. The machine interprets all the chemical names. Click on one and it will draw the chemical structures.
How does it know all this?
There is a thousand-page book of rules which we’ve encoded them into the machine — mainly manually, but in other cases we use automatic machine learning.
What are the benefits of your work to broader society?
It creates higher quality science (by validating data). It allows mashups to make new discoveries in biochemistry, pharmacology, cell biology, metabolism, toxicity. It helps you design new drugs and find out what chemicals are beneficial and which are harmful. What this means is that you can find out what compounds cause cancer this week, and which ones cure it the next week.
Well, this is amazing! Full speed ahead?
No! It’s stalled because we need the published research to instruct the machines and we are legally prevented from using it.
Surely as an eminent researcher at a prestigious university you have all the access you need?
I have all the access; but I have an absolute refusal to use it for what I want. I have written to publishers several times and they refuse to answer or to let me do it.
Can’t you just set up web robots to harvest the papers you need?
No — the publishers have already cut the university off for my (legal) activities and they have the power to sue me. Unfortunately you can only find out what is legal by being taken to court. There are no clear case-law or authorities. I believe it is legal to extract facts from scientific publications manually and therefore by machine. The publishers disagree, and forbid the latter.
So what changes do you want to see, to help your work be more effective?
I’d like to see the scientific literature become a global Open public resource with a new generation of tools built to extract information and customise it for all types of readers — policy makers, doctors, startup businesses, schools, museums … The list goes on.