Why did I think of this?
Recently I stumbled upon an interesting subreddit: r/nosleep. It is a subreddit where Redditors share their horror stories. I found it really interesting, Now I am not really into the Horror genre but this subreddit's stories were creative and unique, something you don't get to read too often. So I thought it would be interesting to build an AI model that can generate such stories on command. I decided to use the famous OpenAI's GPT-2 along with data from r/nosleep to create some chilling horror stories.
What is OpenAI and GPT-2?
OpenAI is a non-profit artificial intelligence research company founded by none other than Elon Musk that discovers the path to safe artificial general intelligence.
Almost a year ago they launched GPT-2. GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient data and compute.
In a nutshell, GPT-2 is an NLP Model that produces an artificially generated paragraph related to the input that you have provided. Now when you fine-tune the model on the particular dataset you can basically control the context in which the output is generated. In my case, It was the horror stories
How did I build it?
The entire program has 2 parts:
- Scraping Reddit for Horror Stories
- Training the Model
You can find the code for the entire program at the bottom of this article.
Scraping the Reddit.
After reading a lot of articles and researching I figured out that the easiest way to scrape information from Reddit it through using PRAW. PRAW is a Python library that can help you utilise the Reddit's developer API. I followed this article to get the data from r/nosleep subreddit
I did some clean-up on the data fetched like removing links and empty posts.
Training the Model.
To train and finetune the model I used Transformers by HuggingFace.
The Hugging Face transformers package is an immensely popular Python library providing pre-trained models that are extraordinarily useful for a variety of natural language processing (NLP) tasks.
Huggingface has made many pre-trained Transformer models available that can be implemented very quickly using PyTorch.
Does it Work?
Yes! Quite well. It actually turned out better than expected. Although sometimes it either generating gibberish or keeps repeating itself, That's mainly because I fine-tuned only a lighter version of GPT-2 and used a relatively small Dataset myself. You could make it better by using the GPT-2's largest 1.5Billion Param model and using a larger dataset to fine-tune but that would cost a huge amount of resources. But the current model works 60% of the time! Look at the chilling samples below
Samples
Sample 1:
It was a dark room, with a bed and a mattress in its place. There was a door leading to the basement and it creaked open, opening only a narrow staircase. There were three people in there, a young woman with a small child in her arms, and a man in a wheelchair. He sat at the table in the corner, holding the knife in a pointed, bloody hand. I saw blood on his hand. He looked like he was on the verge of death. There was a sharp pain in his right hand and he screamed in pain. “Please, please let me in!” he cried. He looked at me and saw me staring at him. “What happened to you?” I asked. “You killed me.” he said, his eyes looking at me and then he disappeared. “Didn’t you die?” I asked.
Sample 2:
It was the same clown, the same mask, the same grin. I was in the living room, watching a movie, and when I saw him, I didn’t want to die. I thought about the way he was smiling at me, how much he was smiling at me. It was like he had given me a bad kiss that would have hurt me, or worse, killed me. But when I saw him again, I was in shock. He was in a dark room, with the same mask and the same smile. I looked up and saw him staring at the ceiling with a sick expression on his face and then staring at something. I looked back down and saw the eyes. I knew that this was it. I knew that I would not be saved, and I would never see anyone again. But then, it was all stopped. And I was back in my apartment with a box. I opened it, and the first thing I noticed was the mask in the corner of the box
Sample 3:
It was a scary night. We heard a loud crash and the sound of a heavy metal door slamming behind us. We ran, I saw my brother standing right in the doorway of my room at the time. He looked like he was frozen in place, and his skin was pale. I saw my brother, standing there. “What’s wrong?” I asked him. My brother’s voice cracked open and he said something in a very low voice, “We’re going home. I’m going to get you some rest.” His words were almost a whisper, like the sound of a person being held prisoner, with no voice, no memory, and no reason to talk, no one to speak to. We were alone, and we were in the house, I could feel the door swing open and there we were. I saw my brothers face. I was in shock, I couldn’t believe what I’ve seen
Try it yourself
The model is available on hugging face: abbas/gpt2-horror-stories
You can test it yourself with this notebook
Conclusion
The results were pretty impressive but there’s more work that could be done. I am also looking for other possibilities where NLP could be used. If you have one in your mind leave a comment below.
If you enjoyed my work do consider following me on Twitter where I post content for beginners and as well as experienced developers. I write mainly about Full-Stack Development, DevOps, Cloud, SaaS and as well as Machine Learning and NLP. I do plan on posting a lot of blogs this year, so keep an eye out.