Deep Learning: Fundamentals and Practical Examples with Python and Numpy

Deep learning has made incredible advances in recent years and is becoming increasingly powerful and relevant. In March 2016, Google’s AlghaGo program beat 9-dan professional Go player Lee Sedol at the game of Go, a Chinese board game that was previously thought to be untouchable by machines.

Experts in the field of Artificial Intelligence believed that a victory of this kind was at least 10 years away, but deep learning’s progress has proven to be incredibly rapid. Despite deep learning being a complex topic, it is not any more difficult to learn than other machine learning algorithms.

This article is an introduction to the basics of neural networks and deep learning. It contains practical examples and code which can be downloaded and installed for free. We will use the Python programming language, along with the numerical computing library Numpy. The article will also show how to build a deep network using Theano and TensorFlow, libraries specifically built for deep learning and which can accelerate computation by taking advantage of the GPU.

What is Deep Learning?

Deep learning is part of a broader family of machine learning methods based on artificial neural networks. With deep learning, a computer can learn from observation. It can understand the world in terms of a hierarchy of concepts, with each concept defined in relation to simpler concepts, and represented by increasingly complex data. For example, in image recognition, the first layer might learn different strokes, and in the next layer put the strokes together to learn shapes, and in the next layer put the shapes together to form facial features, and in the next layer have a high level representation of faces.

Unlike other machine learning algorithms, deep learning is particularly powerful because it automatically learns features. This means that you don’t have to spend time trying to create and test “kernels” or “interaction effects” – something statisticians love to do. Instead, let the neural network learn these things for you.

Getting Started with Deep Learning

If you want to get started with deep learning, it’s important to understand the fundamentals. All of deep learning depends on one fundamental algorithm, the “secret sauce”, if you will. It’s important that you understand what this algorithm is and how it works.

You don’t need any complex mathematics or programming skills to get started with deep learning. You will need some basic understanding of undergraduate-level mathematics and programming, but all the materials in this article can be downloaded and installed for free.

If you skip over these important fundamentals, you may find yourself in a situation where you can talk about machine learning endlessly but can’t actually use Sci-Kit Learn. You may be able to regurgitate that convolutional neural networks “do convolution” so that “they can find features in different places in an image”, but can’t actually make one work, much less write one. You may be able to regurgitate that LSTMs can “remember long-term dependencies!” and “circumvent the vanishing gradient problem!” but have no idea what formulas actually govern an LSTM or how to write one other than using the Keras built-ins.

It’s important to understand deep learning on a mathematical and algorithmic level. A true computer scientist can take an algorithm, transform it into pseudocode, and transform that into real, working code.

At the highest level, all we are doing is “minimizing cost”. Even business people can understand this intuitive concept. All businesses try to minimize their costs and maximize their profits. This article will show you how to take an intuitive objective like “minimizing cost” and how that eventually results in deep learning.

Using Python and Numpy

The code used in this article is available for free on Github. All you need to invest after purchasing this book is your effort and your time. The prerequisites are that you are comfortable with Python and the Numpy stack and you know the basics of probability.

The relevant folders for this book are: linear_regression_class, logistic_regression_class, ann_class, and ann_class2. Make sure you always “git pull” so you have the latest version.

Conclusion

Deep learning is an incredibly powerful and relevant tool for the modern world. It can be used to create incredible results, but it’s important to understand the fundamentals before attempting to use it. This article has introduced some of the basics of neural networks and deep learning, and has also provided practical examples and code which can be downloaded and installed for free. With some basic understanding of undergraduate-level mathematics and programming, and the materials in this article, you can be well on your way to mastering deep learning.

Introduction to Linear Regression in Multiple Dimensions
Linear regression is a powerful tool used in machine learning, statistics and data analysis. It is used to model the relationship between a dependent variable and one or more independent variables. In this article, we will explore linear regression in multiple dimensions, and how it can be used to solve real-world problems.

What Is Linear Regression?
Linear regression is a type of predictive modelling that uses a set of input variables (known as independent variables) to predict an output variable (known as the dependent variable). It is based on the assumption that the relationship between the dependent variable and the independent variable is linear, meaning that the output variable can be expressed as a linear combination of the input variables.

For example, if we want to predict the price of a house, we can use linear regression to make a prediction based on the size of the house, the location, the neighborhood, and any other factors that influence the price.

Extending Linear Regression to Multiple Dimensions
In real-world machine learning problems, we often have more than one input feature. So each independent variable xi becomes a vector. When x is 1-dimensional, we get a line. When x is 2-dimensional, we get a plane. When x is 3-dimensional or higher, we get a hyperplane.

Using vectorized operations is crucial when coding in MATLAB or Python, as it is much faster than looping. For example, if we want to do the dot product between w = [1, 2, 3] and x = [4, 5, 6], the answer is 14 + 25 + 3*6. The code for this would look like this:

answer = 0
for i in xrange(3):
answer += w[i]*x[i]

We can use a library like numpy and call the dot function instead:

import numpy as np
w = np.array([1,2,3])
x = np.array([4,5,6])
answer = w.dot(x)

This is much faster. It is also more efficient to look at an entire training set at the same time, instead of considering individual input vectors xi and individual outputs yi.

We can express the output of our model as:

y = w0 + wT x

We usually use a dummy variable x0 = 1, and combine w0 with [w1, …, wD] so that we can make the entire thing one dot product.

We can turn our objective function:

J = sum[i=1..N] { (ti – wT xi ) ² }

Into matrix form:

J = |T – Xw| ²

Note that the parameter weights w move over to the other side, because the matrix X has each sample along the rows. When we talk about individual vectors, we usually mean they are column vectors.

Taking the gradient or vector derivative of J with respect to w, and setting it to 0, we can solve for w. This should give us the answer:

w = (XTX)-1XTT

In numpy we can use the following code:

w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(T)

A better way would be:

w = np.linalg.solve(X.T.dot(X), X.T.dot(T))

Putting It All Together
We can create a LinearRegression class that uses these methods. It can be used just like the Sci-Kit Learn LinearRegression class. The class should have the following methods:

fit(X, Y): Takes an input matrix X (size NxD), and a target matrix Y (size N), and finds the best parameters for our model given this input data.
predict(X): Makes predictions for which we do not yet know the output.

These two methods are the basic capabilities of machine learning – fitting a model to the data, and making predictions.

Conclusion
Linear regression in multiple dimensions is a powerful tool for machine learning, statistics, and data analysis. It can be used to predict outcomes based on input variables, and can be used to solve real-world problems. Understanding vectors and matrices is crucial to understanding linear regression in multiple dimensions, and using vectorized operations makes it much faster. Additionally, understanding the two main methods used in machine learning – fit and predict – is essential.

Using linear regression on some real data (1-D)

This code reads in a data set of X and Y values, computes the line of best fit, and displays both the original data points and the line of best fit in a graph. It then calculates the r-square, which tells us how much better our prediction is than just predicting the mean of Y. If the prediction is perfect, the r-square will be 1; if the prediction is exactly the mean, the r-square will be 0; if the prediction is worse than the mean, the r-square will be negative.

import numpy as np
import matplotlib.pyplot as plt

load the data

X = []
Y = []
for line in open(‘data_1d.csv’):
x, y = line.split(‘,’)
X.append(float(x))
Y.append(float(y))

turn X and Y into numpy arrays

X = np.array(X)
Y = np.array(Y)

plot the data

plt.scatter(X, Y)
plt.show()

calculate the best fit

denominator = X.dot(X) – X.mean() * X.sum()
a = ( X.dot(Y) – Y.mean()*X.sum() ) / denominator
b = ( Y.mean() * X.dot(X) – X.mean() * X.dot(Y) ) / denominator

calculate the predicted Y

Yhat = a*X + b

plot the predicted values

plt.scatter(X, Y)
plt.plot(X, Yhat)
plt.show()

calculate the r-square

d1 = Y – Yhat
d2 = Y – Y.mean()
r2 = 1 – d1.dot(d1) / d2.dot(d2)
print(“the r-squared is:”, r2)

Using linear regression on some real data (multi-D)

The following is Python/Numpy code that reads in a data set, computes the plane of best fit, and displays the original data points in 3 dimensions.
It can also be found at: https://github.com/lazyprogrammer/
ma-chine_learng_examples/blob/master/linear_regression_class/lr_2d.py
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
load the data
X = []
Y = []
for line in open(‘data_2d.csv’):
x1, x2, y = line.split(‘,’)
X.append([float(x1), float(x2), 1]) # add the bias term
Y.append(float(y))
let’s turn X and Y into numpy arrays since that will be useful later
X = np.array(X)
Y = np.array(Y)
let’s plot the data to see what it looks like
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
ax.scatter(X[:,0], X[:,1], Y)
plt.show()
apply the equations we learned to calculate a and b
numpy has a special method for solving Ax = b
so we don’t use x = inv(A)*b
note: the * operator does element-by-element multiplication in numpy
np.dot() does what we expect for matrix multiplication
w = np.linalg.solve(np.dot(X.T, X), np.dot(X.T, Y))
Yhat = np.dot(X, w)
determine how good the model is by computing the r-squared
d1 = Y – Yhat
d2 = Y – Y.mean()
r2 = 1 – d1.dot(d1) / d2.dot(d2)
print “the r-squared is:”, r2

Check out following article which is part of this article.

https://srecontracting.com/what-is-maximum-likelihood-estimation-mle/