OpenAI's GPT2 Now Writes Scientific Paper Abstracts
OpenAI created GPT2 earlier this year — which is a "large scale unsupervised language model which generates coherent paragraphs of text," as per OpenAI's blog page. This incredible transformer neural network generates whole paragraphs one word at a time.
In a fun and intriguing turn of events, Dr. James Howard, a cardiology trainee at Imperial College in London, U.K., decided to test GPT2's scientific abilities.
Dr. Howard decided to prompt GPT2 with random scientific titles and watched as it wrote abstracts right in front of his eyes. Dr. Howard then shared his responses via Twitter.
Here's what GPT2 was able to create
Dr. Howard re-trained GPT2 on the Pubmed/MEDLINE database — a scientific database with over 30 million biomedical literature citations. This means that when Dr. Howard provided his scientific titles, the transformer neural network was able to respond in scientific terms.
It took Dr. Howard around 24 hours to re-train GPT2 in this way.
What Dr. Howard, unbelievably, received in response were concise medical abstracts that were well structured and thought-provoking. Below are some of the abstracts for your perusal.
A word of warning from Dr. Howard:
I've created a monster. I've re-trained @OpenAI's GPT2 transformer neural network on the Pubmed/MEDLINE database, so that if I give it an article's title, it spits out a abstract for me. I didn't teach it how to structure an abstract, how long it is, or any of the lingo.— James Howard (@DrJHoward) October 26, 2019
In each of the screenshots below, the top line shows the title I provided the network with, and everything below that is the network's work. It takes around 30 seconds to generate an abstract, though it took over 24 hours of training the network to get it to this level.— James Howard (@DrJHoward) October 26, 2019
The first abstract:
First, I tried to give it a title to a made-up randomised controlled trial. It seems a _bit_ unreasonable to compare renal denervation against apixaban for hypertension. Fasciantingly, it volunteered a clinical trial registration data at the end of the abstract. pic.twitter.com/Z9794BpgdJ— James Howard (@DrJHoward) October 26, 2019
The second one:
Next I gave it a title for a meta-analysis. Obviously, the title I chose is ludicrous, but I wanted to see what it did. Amazingly, it decided to put a search strategy in the methods section. It also provides relative risks, though the choice of the modified Rankin Scale... pic.twitter.com/Vjp6fhlFW2— James Howard (@DrJHoward) October 26, 2019
The third one:
I thought I'd try something which I have particularly little experience with: cost-effectiveness analysis. While the conclusion might be correct, I'm not sure it adequately conveys the study's findings... I do enjoy how the significance matches the p values (> vs < 0.05). pic.twitter.com/5rXlTUh8Hr— James Howard (@DrJHoward) October 26, 2019
GPT2 just keeps giving:
Finally, I thought I'd try something incendiary. It turns out the prevalence of narcotic use amongst adult cardiologists is 71%! And it must be true, they used eigenvectors! Sorry for the slander, my South American colleagues. pic.twitter.com/j700yVryve— James Howard (@DrJHoward) October 26, 2019
It's quite incredible — if perhaps a little worrying — what OpenAI's transformer was able to do, and how it could be re-trained in such a short time.
I hope you found this interesting. I'm happy to provide more examples if people want to give me titles. I might try to get the neural network working online like I did for pacemakers (https://t.co/CMnspA0N8R) - though the hardware requirements for this are much steeper!— James Howard (@DrJHoward) October 26, 2019
Dr. Howard was very open and transparent about how this intriguing exchange happened, and you can try for yourselves here where you 'talk' to the transformer yourselves.