Mastering the Fundamentals: Differences Between Sample, Batch, Iteration, And Epoch

A Beginner’s Guide to Mastering the Fundamentals of Machine Learning

Clément Delteil 🌱
Towards AI

--

Two pedestrians signs pointing different directions
Photo by Robert Ruggiero on Unsplash

When you are new to Machine Learning, it is common to get overwhelmed with information. There are so many different models, and types of learning, and the scientific literature is quite prolific these days so it gets even worse!

As a reminder, when we talk about Machine Learning, we refer to

The field of study that gives computers the ability to learn without being explicitly programmed — Arthur Samuel

It breaks down into many other subcategories ranging from supervised learning to clustering, etc. But today, the concepts we will talk about are rather frequently used in Deep Learning.

So, if you’re interested in this field, you’ve probably already heard of Sample, Batch, Iteration, and Epoch, but do you really know the difference between them?

In this story, I will introduce these 4 concepts using Stochastic Gradient Descent and the MNIST dataset as illustrations.

Hopefully, at the end of this story, you won’t confuse them anymore!

Stochastic Gradient Descent

The key in machine learning is the optimization of the loss function and cost function. The first one measures the difference between the model’s predicted output and the actual output for a single training example. Whereas the second one is more general as the cost function is a function of the loss function, calculated over the entire training set, that the model’s parameters are adjusted to minimize during training.

Machine learning problems are usually formulated to minimize this cost function. This is where the gradient descent algorithm comes in. When the function is differentiable, it is used to find a minimum of this function.

Graphical illustration of the gradient descent algorithm
Gradient Descent Illustration — Source: Sebastian Raschka

The only difference between Gradient Descent and Stochastic Gradient Descent (SGD) is the way the model’s parameters are updated.

The first one takes the average of the gradients of the cost function with respect to the parameters, calculated over the entire training set. Whereas the second one updates the model’s parameters after each training example.

This means that the algorithm takes a single example and computes the gradient of the cost function with respect to the parameters for that example only.

Well now that everything is clear for gradient descent, let’s define the other terms.

Sample

A sample refers to a single instance of data that is used to train or test a model. It can consist of one or more features, which are the attributes or measurements of the data.

For example, if we take the MNIST dataset used to recognize images of handwritten digits, each number in the image below represents a sample of data.

Screenshot of a sample of handwritten numbers from the MNIST dataset
Josef Steppan, CC BY-SA 4.0, via Wikimedia Commons

A machine learning model is trained on a large number of such samples and then tested on new samples to evaluate its performance.

Now that we know what a sample is, let’s define a batch!

Batch

A batch refers to a set of samples used during the training of a model. The model is updated after every batch is processed.

If we define for example a batch of size 5. For every 5 samples of the MNIST dataset, the digits predicted are compared to the expected ones and an error is calculated.

We can link this concept to the gradient descent algorithm detailed earlier.

  • Batch Gradient Descent:

Batch gradient descent is when the batch size equals the size of the training set.

  • Stochastic Gradient Descent:

As explained above, SGD is when the batch size equals 1.

  • Mini-Batch Gradient Descent:

Finally, mini-batch gradient descent is when the batch size is greater than 1 but strictly smaller than the dataset size.

That is to say: 1 < Batch Size < Dataset size

Epoch

An epoch refers to the number of times the machine learning algorithm will go through the entire dataset.

In neural networks, for example, an epoch corresponds to the forward propagation and back-propagation. For those not familiar with these concepts, during the training phase, we will first browse the network from left to right to calculate the estimated value and then browse it from right to left to compare this value to the real value.

This works because by propagating the error backwards through the network we can adjust the weights in each layer to decrease the error.

Gif of the training and error calculation of a neural network on the MNIST dataset
Neural Network Training Process by 3Blue1Brown — Under Youtube Standard License

Above, you can see the two steps mentioned. A set of features with a label arrives at the input until it propagates through the neurons and associates with one of the output classes. Then we do the reverse route by propagating the error.

Iteration

An iteration refers to the number of batches needed to complete 1 epoch.

Now that we know the definition of an epoch, we can affirm that the number of iterations is the number of times the learning algorithm will work through the entire training dataset.

Put into practice

Let’s continue illustrating what we learn with the MNIST dataset.

It contains 70 000 samples. Let’s arbitrarily choose a batch size of 100 and a number of epochs equal to 500.

If we apply what we’ve learned in this story we can say that:

  • The dataset will be divided into 700 batches and for every 100 samples, the model weights will be updated.
  • One epoch will represent a 100 update of the model’s parameters. Also, as we defined the number of epochs equal to 500, we’ll have to through the dataset 500 times. It represents a total of 50 000 batches.

Note: In the above equations, the symbol “#” stands for “number of”.

Conclusion

In this story, I’ve introduced 4 key concepts to Machine Learning: Sample, Batch, Iteration, and Epoch, using Stochastic Gradient Descent and the MNIST dataset as illustrations.

Here are the takeaways from this short story:

  • A Sample refers to a single instance of data used to train or test a model.
  • A Batch refers to a set of samples used during the training of a model.
  • An Epoch refers to the number of times the machine learning will go through the entire dataset.
  • An Iteration refers to the number of batches needed to complete 1 epoch.
  • Gradient Descent is an algorithm used to minimize the cost function in Machine Learning problems.
  • The difference between all the variations of Gradient Descent presented is the number of samples used to calculate the gradient.

--

--

Machine Learning Engineer 🌱 | French CS Engineer | Canadian MSc in AI | Data is my anchor in exploring all realms 🌍📊 | linkedin.com/in/clementdelteil/