Data Science

Create Your Own Harry Potter Short Story Using RNNs and TensorFlow

Amisha Jodhani

Published in

Towards AI

8 min readAug 3, 2020

“Of course it is happening inside your head, Harry, but why on earth should that mean that it is not real?”¹

Still, waiting for your Hogwarts letter?
Want to enjoy the feast in the Great Hall?
Explore the secret passages in Hogwarts?
Buy your first wand from Ollivander’s?
*sigh* You are not alone.

I have (after all this time?) always been obsessed with Harry Potter, and I recently started learning neural networks. It’s fascinating to see how creative you can get with Deep Learning, so I thought why not brew them up?

So I executed a simple text generation model using TensorFlow to create my own version of a Harry Potter short-story (can't get as good as J.K. Rowling, duh!)

This article runs you through the entire code I wrote to implement it.
But for all the Hermione’s out there, you can directly find the github code here and run it yourself!

So here’s something which will cast a Banishing Charm on your boredom while you’re quarantined.

Background

What is an RNN?

A Recurrent Neural Network is different from the other neural networks as it has a memory which stores information of all the layers it has processed so far and computes the next layer on the basis of this memory. For a simple introduction to RNNs, you can refer to this.

GRU vs LSTM

Both of these are great for text generation but GRUs are a newer concept…and there isn’t actually a way to determine which one is better in general. Tuning your hyper-parameters well is what will improve your model performance more than choosing a good architecture.²

However, if the amount of data is not a problem, LSTMs perform better. If you have less data, GRUs have fewer parameters so they train faster and work well to generalize the lesser data.

Feel free to check out this article for a more detailed explanation.

Why character-based?

When working with large datasets like this, the complete number of unique words in a corpus is much higher than the number of unique characters. A large dataset will have many many unique words, and when we assign one-hot encodings to such large matrices we’re likely to run into memory issues. Our labels alone can take up storage of terabytes of RAM.

So, the same principles which you use to predict words can be applied here, but now you’ll be working with much smaller vocabulary size.

The code

So let’s get started!

First, import the libraries you need

import tensorflow as tf
import numpy as np
import os
import time

Now, read the data

You can find and download transcripts of all the Harry Potter books from this Kaggle dataset. Here, I am combining all the seven books into one text file named ‘harrypotter.txt’. You can also train your model on any one book if you like. Just experiment with it!

files= [‘1SorcerersStone.txt’, ‘2ChamberofSecrets.txt’, ‘3ThePrisonerOfAzkaban.txt’, ‘4TheGobletOfFire.txt’, ‘5OrderofthePhoenix.txt’, ‘6TheHalfBloodPrince.txt’, ‘7DeathlyHollows.txt’]
with open(‘harrypotter.txt’, ‘w’) as outfile:
for file in files:
  with open(file) as infile:
    outfile.write(infile.read())
text = open(‘harrypotter.txt’).read()

Looking at the data

print(text[:300])

“Harry Potter and the Sorcerer’s Stone
CHAPTER ONE
THE BOY WHO LIVED
Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they”³

Processing the data

We map all the unique character strings in vocab to numbers by making two look-up tables:

mapping the characters to numbers (char2index)
mapping the numbers back to the characters (index2char)

Then convert our text to numbers..

vocab = sorted(set(text))
char2index = {u:i for i, u in enumerate(vocab)}
index2char = np.array(vocab)
text_as_int = np.array([char2index[c] for c in text])#how it looks:
print ('{} -- characters mapped to int -- > {}'.format(repr(text[:13]), text_as_int[:13]))

‘Harry Potter ‘ — characters mapped to int → [39 64 81 81 88 3 47 78 83 83 68 81 3]

Each input sequence for our model will contain seq_length number of characters from the text, and its corresponding target sequence will be of the same length with all characters shifted one place to the right. So we break the text into chunks of seq_length+1.⁴

tf.data.Dataset.from_tensor_slices converts the text vector into a stream of character indices and the batch method lets us group these characters into batches of the required length.

By using the map method to apply a simple function to each batch, we create our inputs and targets.

seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)def split_input_target(data):
  input_text = data[:-1]
  target_text = data[1:]
  return input_text, target_textdataset = sequences.map(split_input_target)

Before feeding this data into the model, we shuffle the data and divide it into batches. tf.data maintains a buffer in which it shuffles elements.

BATCH_SIZE = 64
BUFFER_SIZE = 10000dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

Building the Model

Given all the characters computed until this moment, what will the next character be? This is what we will be training our RNN model to predict.

I have used tf.keras.Sequential to define the model since all the layers in it only have a single input and produce a single output. The different layers used are:

tf.keras.layers.Embedding : This is the input layer. An embedding is used to map all the unique characters to vectors in multi-dimensional space, having embedding_dim dimensions.
tf.keras.layers.GRU: A type of RNN with rnn_units number of units.(You can also use an LSTM layer here to see what works best for your data)
tf.keras.layers.Dense: This is the output layer, with vocab_size outputs.

It is also useful to define all the hyper-parameters separately so that it’s easier for you to change them later without editing the model definition.

text generation training example. Source

vocab_size = len(vocab)
embedding_dim = 300
# Number of RNN units 
rnn_units1 = 512
rnn_units2 = 256
rnn_units= [rnn_units1, rnn_units2]def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
       batch_input_shape=[batch_size, None]),    tf.keras.layers.GRU(rnn_units1, return_sequences=True,
       stateful=True,recurrent_initializer='glorot_uniform'),    tf.keras.layers.GRU(rnn_units2, return_sequences=True,
       stateful=True,recurrent_initializer='glorot_uniform'),    tf.keras.layers.Dense(vocab_size)  ])
  return modelmodel = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)

Training the model

The standard tf.keras.losses.sparse_categorical_crossentropy loss function works best with our model as it is applied across the last layer of the predictions. We set from_logits to True because the model returns logits. Then we choose the adam optimizer and compile our model.

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels,
         logits, from_logits=True)model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])

You can configure checkpoints like this to ensure that checkpoints are saved during training.

# Directory where the checkpoints will be saved
checkpoint_dir = ‘./training_checkpoints’
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, “ckpt_{epoch}”)
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
   filepath=checkpoint_prefix, save_weights_only=True)

The training time of each epoch depends on your model layers and hyper-parameters used. I have set epochs to 50 to see how accuracy and loss change over time, but it may not be required to train for all 50 epochs. Make sure to stop training when you see your loss starts to increase or remains constant for a few epochs. The last epoch you train will be stored in latest_check . If using Google Colab, set the runtime to GPU to reduce training time.

EPOCHS= 50
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
latest_check = tf.train.latest_checkpoint(checkpoint_dir)

Text generation

If you wish to use a different batch size, you need to rebuild the model and reload the checkpoints before running. I have used batch_size of 1 to keep it simple.

(You can run a model.summary() to get insights on the layers of your model and the output shape after each layer)

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(latest_check)
model.build(tf.TensorShape([1, None]))
model.summary()

The following function now generates the text:

It accepts a start_string, initializes the RNN state and sets the number of output characters to num_generate
Gets the prediction distribution of the next character using start_string and the RNN state. Then it calculates the index of the predicted character, which is our next input to the model.
The output state returned by the model is fed back into the model so that it now has more context, (as shown below). After predicting the next character, the cycle continues. This way the RNN learns as it builds up it’s memory from the previous outputs.⁴

A lower scaling results in a more predictable text whereas higher scaling gives a more surprising text.

def generate_text(model, start_string):  num_generate = 1000  #can be anything you like  input_eval = [char2index[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)  text_generated = []  scaling = 0.5 #kept at a lower value here  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    # remove the batch dimension
    predictions = tf.squeeze(predictions, 0)
    predictions = predictions / scaling
    predicted_id = tf.random.categorical(predictions, 
       num_samples=1)[1,0].numpy()
    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated.append(idx2char[predicted_id])return (start_string + ‘’.join(text_generated))

And you’re done!

Outputs

You can try giving it different start strings to get different outputs.

Here is a part of the output using my favorite character:

print(generate_text(model, start_string=u”Severus Snape“))

Severus Snape moved to the scarlet Hogwarts students. Hermione said, “Well, I think it’s all right, all right, a bit dead before. . . .”
“I think I’ll have to go to the other than you be to help him a question of the staff table and the doors opened and he stared at the clock to Harry. “I think it make the sword of Gryffindor, who was there too, he was on his pillows, and he and Ron stared at him. “I am sure we can bother the boy — “
“You should have been there,” said Ron, and he took a strange and color.
“I mean, he was a really good …

You can also try different sentences:

Voldemort died of coronavirus.”
“You didn’t know what to do,” said Harry, “it was a surrounding cloak, he was the one who sustain you to go to the way.”
“Yeah, well, I think you might have done that!” she said, striding up the steps, and the strength were so far as he was a pretty great tent that was the first time they might have realized I saw him to be devastated and screaming of the crowd through the darkness at the time shouts and silence.
“You see, Harry!”
“I don’t know, see you haven’t got anything to do with a prater of the Ministry of Magic …

Here is one example if you train the model using just the first book, Sorcerer’s Stone³:

Dumbledore in the Leaky Cauldron, now empty. Harry had never been to London before. Although Hagrid seemed very cold and green eyes. He was still shaking.
Harry sat down next to the bowl of peas. “What did you talk to Professor Dumbledore.”
She eyed him with a mixture of shock and suspicion.
“Who’s there?” he said suddenly as they climbed the street. He could just see the bundle of blankets on the step of number four.
Dudley’s favorite punching bag was Harry, but he couldn’t often catch him. Harry didn’t say anything …

You’ll see the model knows when to capitalize words, make a new paragraph and it imitates a magical writing vocabulary!

Mischief Managed.

To make the sentences more coherent, you can improve the model by

changing the different parameter values like seq_length , rnn_units , embedding_dims , scaling to find the best settings
training it for more epochs
adding more layers of GRU / LSTM

This model can be trained on any other series you like. Do share your own stories in the comments and have fun!

References:

[1] J.K. Rowling, Harry Potter and the Deathly Hallows, 2007

[2] Recurrent Neural Network Tutorial, Part 4 — Implementing a GRU/LSTM RNN with Python and Theano, OCTOBER 27, 2015 BY DENNY BRITZ

[3] J.K. Rowling, Harry Potter and the Sorcerer’s Stone, 1998

[4] Text generation with an RNN, TensorFlow