Getting into Deep Learning? Here are 5 Things you Should Absolutely Know

Starting your Deep Learning Career?

Deep learning can be a complex and daunting field for newcomers. Concepts like hidden layers, convolutional neural networks, backpropagation keep coming up as you try to grasp deep learning topics.

The five essentials for starting your deep learning journey are:

  1. Getting your system ready
  2. Python programming
  3. Linear Algebra and Calculus
  4. Probability and Statistics
  5. Key Machine Learning Concepts

1. Getting your System Ready for Deep Learning

For learning a new skill, say cooking, you would first need to have all the equipment. You would need tools like a knife, a cooking pan, and of course, a gas stove! You would also need to know how to use the tools given to you.

GPU (Graphics Processing Unit):

You would need a GPU to work with image and video data for most deep learning projects. You can build a deep learning model on your laptop/PC without the GPU as well, but then it would be extremely time-consuming to do. The main advantages a GPU has to offer are:

  1. In a CPU+GPU combination, CPU assigns complex tasks to GPU and other tasks to itself, thereby saving a lot of time


TPU, or Tensor Processing Unit, is essentially a co-processor which you use with the CPU. Cheaper than a GPU, a TPU is much faster and thus makes building deep learning models affordable.

2. Python Programming

Continuing the same analogy of learning to cook, you have now got the hang of operating a knife and a gas stove. But what about the skills and the recipes needed to actually cook food?

  • That being said, rather than mastering the vast ocean that is programming in Python, you can start off by learning about some specific libraries exclusively geared towards machine learning and dealing with data

1. Variables and data types in Python

The main data types in Python are:

  • Float: Decimal numbers
  • String: a single character or a sequence of characters
  • Bool: to hold the 2 boolean values — True and False

2. Operators in Python

There are 5 main types of operators in Python:

  • Comparison operators: like <, >, <=, >=, ==, !=
  • Logical operators: and, or, not
  • Identity operators: is, is not
  • Membership operators: in, not in

3. Data Structures in Python

Python offers a variety of datasets that we can use for different purposes. Each data structure has its unique properties that we can leverage to store different types of data and data types. These properties are:

  • Immutable: This means that the data structure cannot be changed. If a data structure is mutable, it means that it can be changed
my_list = [1, 3, 7, 9]
my_set = {'apple', 'banana', 'cherry'}

4. Control Flow in Python

Control flow means controlling the flow of the execution of your code. We execute the code line by line, and what we execute on a line affects how we write the next line of code:

Conditional statements

These are used to set a condition with the conditional operators we saw earlier.

if marks >= 40:


Example: We have a list having values from 1 to 5 and we need to multiply each value in this list with 3:

numbers_list = [1, 2, 3, 4, 5]for each_number is numbers_list:
print(each_number * 3)

5. Pandas Python

This is one of the first libraries you would come across when you start Machine Learning and Deep Learning. An extremely popular library, Pandas is just as required for deep learning as for machine learning.

3. Linear Algebra and Calculus for Deep Learning

There is a common myth that Deep Learning requires advanced knowledge of linear algebra and calculus. Well, let me dispel that myth right here.

Linear Algebra for Deep Learning

1. Scalars and vectors: While scalars only have magnitude, vectors have both direction and magnitude.

  • Cross product: The Cross product of two vectors returns another vector which is orthogonal (right-angled) to both
a . b = (a1 * b1) + (a2 * b2) + (a3 * b3)
= (1 * 4) + (-3 * -2) + (5 * 1)
= 3
a X b = [c1, c2, c3] = [13, 21, 10]
c1 = (a2*b3) - (a3*b2)
c2 = (a3*b1) - (a1*b3)
c3 = (a1*b2) - (a2*b1)
  • Matrix Multiplication: Multiplying 2 matrices means calculating the dot product of the rows and columns and creating a new matrix with different dimensions than the 2 input matrices
  • Transpose of the matrix: We swap the rows and the columns in a matrix to get its transpose
  • Inverse to the matrix: Conceptually similar to inverting numbers, an inverse of a matrix multiplied with the matrix gives you an identity matrix

Calculus for Deep Learning

The value we are trying to predict, say, ‘y’, is whether the image is a cat or a dog. This value can be expressed as a function of the input variables/input vectors. Our main aim is to make this predicted value as close to the actual value.

If y = f(x),
then the derivative of y with respect to x, id given as
dy/dx = change in y / change in x
If y = f(g(x)),
where g(x) is a function of x, and f is a function of g(x), then
dy/dx = df/dx * dg/dx
y = sin(x^2)
dy/dx = d(sin(x2))/dx * d(x2)/dx = cos(x2) * 2x

4. Probability and Statistics for Deep Learning

Just like Linear Algebra, ‘Statistics and Probability’ is its own new world of mathematics. It can be quite intimidating for beginners and even seasoned data scientists sometimes find it challenging to recall advanced statistical concepts.

  • Descriptive statistics is the study of the mathematical tools to describe and represent the data
  • Probability measures the likelihood that an event will occur

Descriptive Statistics

Let me give you a simple example. Suppose you have the marks scored by 1000 students on an entrance exam (the marks are out of 100). Someone asks you — how did the students perform in this exam? Would you present that person with a detailed study of the scores of the students? In the future, you might, but initially, you can start off by saying that the average score was 68. This is the mean of the data.

  • Variance
  • Normal distribution
  • Central Limit Theorem


Based on the same example, let’s say that you are asked a question: if I pick a student randomly from these 1000 students, what are the chances that he/she has passed the test? The concept of probability will help you answer this question. If you get a probability of 0.6, it implies that there is a 60% chance that he/she passed it (assuming the passing criteria is 40 marks).

  • Was a high score by the student the result of studying hard or because the questions in the test were easy?

5. Key Machine Learning Concepts for Deep Learning

Here’s the good news — you don’t need to know the entire gamut of the Machine Learning algorithms that exist today. Not to say that they are insignificant, but just from the point of view of starting deep learning, there are not many you need to be acquainted with.

Supervised and Unsupervised algorithms

  • Supervised Learning: In these algorithms, we know the target variable (what we want to predict) and we know the input variables (the independent features which contribute to the target variable). We then generate an equation that gives the relationship between the input and the target variables and apply it to the data we have. Examples: kNN, SVM, Linear Regression, etc.
  • Unsupervised Learning: In unsupervised learning, we do not know the target variable. It is mainly used to cluster our data into groups, and we can identify the groups after we have clustered the data. Examples of unsupervised learning include k-means clustering, apriori algorithm, etc.

Evaluation Metrics

Building a predictive model is not the only step required in deep learning. You need to check how good the model is and keep improving it till we reach the best model we can.

  • Accuracy
  • Precision and Recall
  • F1-score
  • Log Loss
  • R2 and adjusted R2

Validation Techniques

A deep learning model trains itself on the data provided to it. However, as I mentioned above, we need to improve this model and we need to check its performance. The true mettle of the model can only be observed when we give it totally new (although cleaned) data.

Gradient Descent

Let us go back to the calculus we saw earlier and the need for optimization. How do we know that we have achieved the best model there can be? We can make small changes in the equation and at each change, we check if we are closer to the actual value.

Linear Models

What is the simplest equation you can think of? Let me list a few:

  1. 4x + 3y -2z = 56
  2. Y = x/(1-x)

Underfitting and Overfitting

You will often come across situations where your deep learning model is performing very well on the training set but gives you poor accuracy on the validation set. This is because the model is learning each and every pattern from the training set, and thus, it is unable to detect these patterns in the validation set. This is called overfitting the data and it makes the model too complex.

  • Underfitting is that student who doesn’t perform well in the class, nor in the exam. We aim for that model/student who need not know all the problems discussed in the class but performs well enough in the exam to show that he/she knows the concepts


In simplest terms, bias is the difference between the actual value and the predicted value. Variance is measured by the change in the output when we change the training data.

  1. Top right: The predicted data points are centered around the bullseye (low variance), but they are far from each other and from the center as well (high bias)
  2. Bottom left: The predicted values are clustered together (low variance), but are pretty far from the bulls-eye (high bias)
  3. Bottom right: The predicted data points are neither close to the bullseye (high bias) nor are close to each other (high variance)


Just like the Pandas library, there is another library that forms the foundation of machine learning. The sklearn library is the most popular library in machine learning. It contains a large number of machine learning algorithms which can you can apply to your data in the form of functions.

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error
reg = LinearRegression()
#train your data-remember how we train the model on our train set?,y_train)
#predict on our validation set to improve it
y_pred = reg.predict(X_Valid)
#evaluation metric: MSE
print('Mean Squared Error:', mean_squared_error(y_test, y_pred)) #further improvement of our model

End Notes

In this article, we covered 5 essential things you need to know before building your first deep learning model. It is here that you will encounter the popular deep learning frameworks like PyTorch and TensorFlow. They have been built with Python in mind and you can now easily understand working with them since you have a good grasp of Python.



Data Science Product Manager at Analytics Vidhya. Masters in Data Science from University of Mumbai. Research Interest: NLP

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Purva Huilgol

Data Science Product Manager at Analytics Vidhya. Masters in Data Science from University of Mumbai. Research Interest: NLP