Getting into Deep Learning? Here are 5 Things you Should Absolutely Know

Starting your Deep Learning Career?

It’s not easy — especially if you take an unstructured learning path and don’t cover your basic fundamental concepts first. You’ll be stumbling around a foreign city like a tourist without a map!

Here’s the good news — you don’t need an advanced degree or a Ph.D. to learn and master deep learning. But there are certain key concepts you should know (and be well versed in) before you plunge into the deep learning world.

I’ll be covering five such essential concepts in this article. I also recommend going through the below resources to augment your deep learning experience:

The five essentials for starting your deep learning journey are:

  1. Python programming
  2. Linear Algebra and Calculus
  3. Probability and Statistics
  4. Key Machine Learning Concepts

Let’s go over them one by one.

1. Getting your System Ready for Deep Learning

Similarly, it is important to set up your system for deep learning, have some knowledge of the tools you would need, and how to use them.

Regardless of your operating system, Windows, Linux or Mac, it is important to know the basic commands. Here is a handy table for your reference:

Here is a great tutorial to get started with Git and the basic Git commands: Git — Tutorial.

The Deep learning boom has not only brought path-breaking research in the field of AI but has also broken new barriers in computer hardware.

GPU (Graphics Processing Unit):

  1. It allows for parallel processing
  2. In a CPU+GPU combination, CPU assigns complex tasks to GPU and other tasks to itself, thereby saving a lot of time

Here is a great video explaining the difference between a GPU and CPU:

The best part? You don’t need to buy a GPU or get one installed on your machine. There are multiple Cloud Computing resources that provide GPUs either for free or for an extremely low cost. Additionally, there are a few GPUs that come preinstalled with some practice datasets and their own tutorials preloaded. Some of them are Paperspace Gradient, Google Colab, and Kaggle Kernels.

On the other hand, there are full-fledged servers as well which require some installation steps and some customization like Amazon Web Services EC2.

Here is a table illustrating the options you have:

Deep Learning has also led to Google developing its own type of Processing Units exclusively catering towards building neural networks and deep learning tasks — TPUs.

TPUs

Google Colab also provides free usage of the TPU (not its full-fledged enterprise version, but a cloud version). Here is Google’s own Colab tutorial on working with TPUs and building models on them: Colab notebooks | Cloud TPU.

To summarize, here are the basic minimum hardware requirements to start cooking your deep learning model:

2. Python Programming

This is where we encounter the software required for deep learning. Python is a programming language that is used across industries for deep learning.

However, we can’t use only Python for the level of computations and operations that deep learning needs. Additional functionalities are provided by what are known as libraries in Python. A library can have hundreds of small tools, called functions, that we can use for programming.

  • While you don’t need to be a coding ninja for Deep Learning, you do need to know the basics of programming in Python
  • That being said, rather than mastering the vast ocean that is programming in Python, you can start off by learning about some specific libraries exclusively geared towards machine learning and dealing with data

Anaconda is a framework that helps you keep track of your Python versions and the libraries as well. It is a handy all-in-one tool that is quite popular, easy to work with, and has simple documentation as well. Here is how you can install Anaconda.

So what do I mean by the basics of Python? Let’s discuss this in a bit more detail.

Note: You can start learning Python in our free and popular course — Python for Data Science.

1. Variables and data types in Python

  • Int: Integer numbers, can be signed
  • Float: Decimal numbers
  • String: a single character or a sequence of characters
  • Bool: to hold the 2 boolean values — True and False

2. Operators in Python

  • Arithmetic operators: Like +, -, *, /, etc
  • Comparison operators: like <, >, <=, >=, ==, !=
  • Logical operators: and, or, not
  • Identity operators: is, is not
  • Membership operators: in, not in

3. Data Structures in Python

  • Ordered: This means that there is a specific order in which the elements in the data structure are stored. No matter how and when we use it, this order remains the same (unless we change it explicitly)
  • Immutable: This means that the data structure cannot be changed. If a data structure is mutable, it means that it can be changed

In data science, the most frequently used data structures are:

Example: We have a list like this:

my_list = [1, 3, 7, 9]

This order will remain the same everywhere we use this list. Also, we can change this list, like removing 7, adding 11, etc.

Example: A tuple can be declared as:

Now, again, this order will remain the same, but unlike a list, we cannot remove ‘cherry’, or add ‘orange’ to the tuple.

Example: A set uses curly braces like this:

my_set = {'apple', 'banana', 'cherry'}

The order is not defined for a set.

Example: A dictionary also uses curly braces with a key-value format:

Here, ‘brand’, ‘model’, and ‘year’ are the keys that have the values ‘Ford’, ‘Mustang’, and ‘1964’ respectively. The order of the keys can be different every time you print the dictionary.

4. Control Flow in Python

Conditional statements

Example: You need to check if a student has passed or failed. If he has obtained marks >= 40, he has passed, otherwise, he has failed.

In that case, our conditional statement would be:

if marks >= 40:
print("Pass")
else:
print("Fail")

Loops

numbers_list = [1, 2, 3, 4, 5]for each_number is numbers_list:
print(each_number * 3)

Try out the above snippets and you can see how easy Python is!

Fun note: Unlike other programming languages, we don’t need to store variables of the same type in a data structure. We can totally have a list like this [John, 153, 78.5, “A+”] or even a list of lists like [[“A”, 56], [“B”, 36.5]]. It is this variety and flexibility of Python that has made it so popular among data scientists!

You can also avail the below free courses that cover Python and Pandas essentials:

5. Pandas Python

We store data in a variety of formats, such as CSV (Comma Separated Values) file, Excel sheets, etc. In order to work with the data in these files, Pandas provides a data structure called a Pandas dataframe (you can think of it as a table).

Dataframes and the sheer number of manipulation operations Pandas provides on dataframes make it the workhorse library for machine and deep learning.

You can take this free and easy course to get started with Pandas if you haven’t already: Pandas for Data Analysis in Python.

Now, if you read the list of 5 things we started out with, you might have a question: What do we do with all the mathematics in deep learning?

Well, let’s find out!

3. Linear Algebra and Calculus for Deep Learning

You only need to recollect your high school-level math to start your Deep Learning journey!

Let us take a simple example. We have images of cats and dogs and we want the machine to tell us which animal is present in any given image:

Now, we can easily identify the cat and the dog here. But how will the machine distinguish the two? The only way is to give this data to the model in the form of numbers, and that is where we need linear algebra. We basically convert the images of a cat and a dog into numbers. These numbers can be either expressed as vectors or as matrices.

We will cover some key terms and some great resources you can learn from.

Linear Algebra for Deep Learning

  • Dot product: The Dot product of 2 vectors returns a scalar value
  • Cross product: The Cross product of two vectors returns another vector which is orthogonal (right-angled) to both

Example: If we have 2 vectors a = [1, -3, 5] and b = [4, -2, -1], then:

a) Dot product:

a . b = (a1 * b1) + (a2 * b2) + (a3 * b3)
= (1 * 4) + (-3 * -2) + (5 * 1)
= 3

b) Cross product:

a X b = [c1, c2, c3] = [13, 21, 10]

where,

c1 = (a2*b3) - (a3*b2)
c2 = (a3*b1) - (a1*b3)
c3 = (a1*b2) - (a2*b1)

2. Matrices and Matrix Operations: A matrix is an array of numbers in the form of rows and columns. Now, for example, the above image of a cat can be written as a matrix of pixels:

Just like numbers, we can perform operations like adding and subtracting two matrices. However, operations like multiplication and division are performed slightly differently from the regular way:

  • Scalar Multiplication: When we multiply a single scalar value with a matrix, we multiply the scalar with all the elements in the matrix
  • Matrix Multiplication: Multiplying 2 matrices means calculating the dot product of the rows and columns and creating a new matrix with different dimensions than the 2 input matrices
  • Transpose of the matrix: We swap the rows and the columns in a matrix to get its transpose
  • Inverse to the matrix: Conceptually similar to inverting numbers, an inverse of a matrix multiplied with the matrix gives you an identity matrix

You can refer to this excellent Khan Academy course on Linear Algebra to learn the above concepts in detail. You can also check out 10 powerful applications of linear algebra here.

Calculus for Deep Learning

Now, imagine dealing with thousands of cat images and dog images. These are surely cute to look at, but you can imagine that working on these images and numbers is not easy at all!

Since deep learning essentially involves large amounts of data and complex machine learning models, working with both is often time and resource expensive. That is why it is important to optimize our deep learning model in such a way that it is able to predict as accurately as possible without using too many resources and time.

This is where the crux of the calculus used in deep learning lies: Optimization.

In any deep learning or machine learning model, we can express the output as a mathematical function of the input variables. Thus, we need to see how our output changes with changes in each of the input variables. We need derivatives to do this since derivatives express the rate of change.

If y = f(x),
then the derivative of y with respect to x, id given as
dy/dx = change in y / change in x

Geometrically, if we express f(x) as a graph, the derivative at a point is also the slope of the tangent to the graph at that point.

Here is a figure to help you understand it:

The derivative we have seen above talks only of one variable, x. However, in deep learning, there can be hundreds of variables on which our final output, y, depends. In such cases, we need to calculate the rate of change in y with respect to each of these input variables. Here is where partial derivatives come into the picture.

Partial derivatives: Basically, we consider only one variable, and keep all the other variables as constant. Then, we calculate the derivative of y with the remaining variable. Like this, we calculate the derivative with respect to each variable.

Chain Rule: Oftentimes, the function of y in terms of the input variables can be much more complicated. How do we calculate the derivative then? The chain rule helps us compute this:

If y = f(g(x)),
where g(x) is a function of x, and f is a function of g(x), then
dy/dx = df/dx * dg/dx

Let us consider a relatively simple example:

y = sin(x^2)

Thus, using the Chain Rule:

dy/dx = d(sin(x2))/dx * d(x2)/dx = cos(x2) * 2x

Learning Resources for Calculus in Deep Learning:

4. Probability and Statistics for Deep Learning

However, it cannot be denied that Statistics form the backbone of Machine Learning and Deep Learning. The concepts of probability and statistics like descriptive statistics and hypothesis testing are extremely crucial in the industry where the interpretability of your deep learning model is the topmost priority.

Let us start with the basic definitions:

  • Statistics is the study of data
  • Descriptive statistics is the study of the mathematical tools to describe and represent the data
  • Probability measures the likelihood that an event will occur

Descriptive Statistics

Similarly, we can figure out more simple statements based on the data:

There you go — just with these few lines, we can say that a majority of the students performed well, but not many were able to score really high marks in the test. This is what descriptive statistics is. We represented the data of 1000 students using just 5 values.

There are other key terms used in descriptive statistics as well, such as:

  • Standard Deviation
  • Variance
  • Normal distribution
  • Central Limit Theorem

Probability

Other questions on the same data (as shown below) can be answered using Hypothesis testing and Inferential Statistics to answer them:

  • Can the entrance test be considered to be tough?
  • Was a high score by the student the result of studying hard or because the questions in the test were easy?

You can learn all about statistics and probability from the below resources:

5. Key Machine Learning Concepts for Deep Learning

There are, however, a few concepts that are crucial to build your foundation and acquaint yourself with. Let us go over these concepts.

Supervised and Unsupervised algorithms

  • Unsupervised Learning: In unsupervised learning, we do not know the target variable. It is mainly used to cluster our data into groups, and we can identify the groups after we have clustered the data. Examples of unsupervised learning include k-means clustering, apriori algorithm, etc.

Evaluation Metrics

So how do we judge the performance of a deep learning model? We use some evaluation metrics. Depending on the task, we have different evaluation metrics for regression and classification.

Evaluation metrics for Classification:

  • Confusion Matrix
  • Accuracy
  • Precision and Recall
  • F1-score
  • AUC-ROC
  • Log Loss

Evaluation metrics for Regression:

  • RMSE
  • RMSLE
  • R2 and adjusted R2

Evaluation metrics are extremely crucial in deep learning. Be it in the research domain or in the industry, your deep learning model will be judged on the value of the evaluation metric.

Validation Techniques

But then, how do we improve on the model? Do we give it new data every time we want to change even a single parameter? You can imagine how time-consuming and costly such a task would be!

This is why we use validation. We divide our entire data into 3 parts: training, validation, and testing. Here is a single sentence to help you remember:

We train the model on the training set, improve it on the validation set, and finally predict on the so-far unseen test set.

Some common strategies for Cross-validation are: k-fold Cross-Validation and Leave-One-Out Cross-Validation (LOOCV).

Here’s a comprehensive article covering validation techniques and how to implement them in Python: Improve Your Model Performance using Cross-Validation (in Python / R).

Gradient Descent

It is this act of taking small steps towards a possible direction which is the basic intuition behind gradient descent. Gradient descent is one of the most important concepts you will come across and revisit often in deep learning.

Explanation and implementation of Gradient Descent in Python: Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning.

Linear Models

  1. Y = x + 1
  2. 4x + 3y -2z = 56
  3. Y = x/(1-x)

Did you notice the one thing that was common in all the 3 functions? Yes, they are all linear functions. What if we could predict the value of y using these functions?

These would then be called linear models. You would be surprised to know how popular linear models are in the industry. They are not too complicated, are interpretable, and with the right gradient descent, we can get high evaluation metrics too! Not only this, linear models form the basis of deep learning. For instance, do you know that you can create a logistic regression model using a simple neural network?

Here’s a detailed guide covering not only linear and logistic regression but other linear models as well: 7 Regression Types and Techniques in Data Science.

Underfitting and Overfitting

On the other hand, if your deep learning model is performing poorly on both the training set as well as the validation set, it is most likely underfitting. Think of it as applying a linear equation (a too simple model) on our data when it is, in fact, non-linear (complex):

A simple analogy for overfitting and underfitting is a student’s example in a math class:

  • Overfitting is associated with that student who does rote learning of all the problems discussed in the class but is unable to answer different questions pertaining to the same concepts during the exam
  • Underfitting is that student who doesn’t perform well in the class, nor in the exam. We aim for that model/student who need not know all the problems discussed in the class but performs well enough in the exam to show that he/she knows the concepts

Check out this intuitive explanation of overfitting and underfitting, along with the comparison between them: Underfitting vs. Overfitting in Machine Learning.

Bias-Variance

Let’s quickly summarize what we can interpret from the above image:

  1. Top left: A model that is very accurate, therefore the error of our model will be low, meaning a low bias and low variance. All the data points fit within the bullseye
  2. Top right: The predicted data points are centered around the bullseye (low variance), but they are far from each other and from the center as well (high bias)
  3. Bottom left: The predicted values are clustered together (low variance), but are pretty far from the bulls-eye (high bias)
  4. Bottom right: The predicted data points are neither close to the bullseye (high bias) nor are close to each other (high variance)

Both high bias and high variance lead to an increase in the error. Typically, a high bias implies underfitting, and a high variance implies overfitting. It is very difficult to achieve both low bias and low variance — one usually comes at the cost of the other.

In terms of model complexity, we can use the below diagram to decide on the optimal complexity of our model:

I encourage you to go through this awesome essay by Scott Fortmann-Roe on Bias Variance using examples: Understanding the Bias-Variance Tradeoff.

sklearn

What’s more, sklearn even has the functionalities for all the evaluation metrics, cross-validation, and scaling/normalizing your data as well.

Here’s a quick example of sklearn in action:

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error
reg = LinearRegression()
#train your data-remember how we train the model on our train set?
reg.fit(X_train,y_train)
#predict on our validation set to improve it
y_pred = reg.predict(X_Valid)
#evaluation metric: MSE
print('Mean Squared Error:', mean_squared_error(y_test, y_pred)) #further improvement of our model
....

There you go! We could build a simple linear regression model with essentially less than 10 lines of code!

Here are a couple of excellent resources to learn more about sklearn:

End Notes

Here are a couple of great articles to get started on these frameworks:

Once you have built your foundations on these 5 pillars, you can always explore more advanced concepts like Hyperparameter Tuning, Backpropagation, etc. These are the concepts I built my knowledge of deep learning on.

How would you go about starting your deep learning journey? Please reply in the comments below! You can also read this article on Analytics Vidhya’s Android APP

Originally published at https://www.analyticsvidhya.com on March 10, 2020.

Data Science Product Manager at Analytics Vidhya. Masters in Data Science from University of Mumbai. Research Interest: NLP

Data Science Product Manager at Analytics Vidhya. Masters in Data Science from University of Mumbai. Research Interest: NLP