Introduction to Neural Networks and Deep Learning

Society of AI
11 min readSep 22, 2020

Introduction to Neural Networks

● Neural network is a functional unit of deep learning.

● Deep Learning uses neural networks to mimic human brain activity to solve complex data-driven problems.

● A Neural Network functions when some input data is fed to it.This data is then processed via layers of Perceptions to produce a desired output.

● There are three layers.

○ Input Layer

■ Input layer brings the initial data into the system for further processing by subsequent layers of artificial neurons.

○ Hidden Layers

■ Hidden layer is the layer between input and output layer,where artificial neurons take in a set of weighted inputs and produce an output through an activation function.

○ Output Layer

■ Output layer is the last layer of neurons that produces given outputs for program.

Layers : Hidden, Input and Output

● Let’s understand neural networks with example.

● Suppose we have to classify the leaf images as either diseased or no — diseased.

● Then each leaf image will be broken down into pixels depending on the dimencial of the image.For example if images compose 30 * 30 pixels then the total number of pixels will be 900.

● These pixels are later represented as matrices which are then fed into the input layer of the neural network.

● A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect features or business intelligence in the input data.

● Our brains have neurons for building and connecting the thoughts just like that artificial neural network has perceptrons that accept inputs and process them by passing them on from the input to the hidden layer and finally to the output layer.

● As the input is passed to the from input layer to the hidden layer an initial random rate is assigned to each input.

● The inputs are then multiplied with their corresponding weights and then the sum is further processed to the network.

● Then assign an additional value called bias to each perceptron.

● After the perceptron is passed through the activation function or we can say that transformation function that determines whether a particular perceptron gets activated or not.

● The activated perceptron is used to transfer data to the next layer.In this way the data is propagated forward through the neural network until the perceptron reaches the output layer.

● The probability is decided at the output layer which determines whether the data belongs to class A or class B.

● Let’s assume a case where the predicted output is wrong.In such case,we train the neural network by using the backpropagation method.

● Initially while designing the neural networks we initialize the weights to each input with some random values.

● The importance of each input variable denoted by weights

● So in the backpropagation method we propagate backward to the neural network and compare the actual output with the predicted output then readjust the weights.of each input in such a way that error is minimized.

Some Real world Application of Neural network in real world

○ With the help of deep learning techniques google can instantly translate between more than 100 different human languages.

○ With the help of Neural Networks, self-driving cars are being perfected from Tesla to Google owned by WAYMO. Virtual assistants are exclusively based on technologies such as deep learning, machine learning and natural language processing.

Activation and Loss functions

Activation Function

○ Non linearity is also called activation function in machine learning.

○ Activation function determines whether or not to activate a neuron by measuring weighted total and applying bias with it.

○ The purpose of the activation function is to introduce non-linearity into the output of a neuron.

○ The activation function of a neuron defines the output of that neuron given set of inputs.

○ There are seven types of activation functions that we can use when building a neural network.

○ Activation functions:

■ Binary step function

● Formula: f(x) = 1 if x > 0 else 0 if x < 0

■ The linear or identity function

● Formula: Y = mZ

■ Sigmoid or logistic function

● Formula:f(x) = 1/(1+e(-x) )

■ Hyperbolic tangent or tanh function

● Formula:tanh(z)=2/(1+e-2x)

■ Hyperbolic tangent or tanh function

● Formula:tanh(z)=2/(1+e-2x)

■ The rectified linear unit(ReLU) function

● Formula:f(z)=max(0,z)

■ The leaky ReLU function

■ The softmax function


● It’s graph is different every time .

Loss Function

○ The loss function is one of the essential components of Neural Networks.

○ Loss is nothing but a predictive error of Neural Net. And the process to measure the loss is called Loss Function.

○ The Loss is used to measure the gradients. And gradients are used to adjust the weights of the neural net. There are several common loss functions given by theanets.

○ The theanets package provides tools for defining and optimizing several common types of neural network models

○ These losses often measure the squared or absolute error between a network’s output and some target or desired output. Other loss functions are designed specifically for classification models; the cross-entropy is a common loss designed to minimize the distance between the network’s distribution over class labels and the distribution that the dataset defines.

○ Models in theanets have at least one loss to optimize during trainingThere are default losses for the built-in model types, but we can also override such defaults only by providing a non-default value for the loss keyword argument. when creating your model. For example, to create a regression model with a mean absolute error loss:

● There are some loss functions available for neural network models.

Gradient Descent

○ Gradient descent makes our network to learn.

○ Basically gradient descent calculates by how much our weights and biases should be updated to so that our cost reaches 0.This is done using partial derivatives.

○ Gradient descent is based on the fact that the minimum value of a function ,its partial derivative will equal to zero.

○ Cost depends on the weights and bias values in our layer. This derivative of cost with respect to weights and biases.

○ The equation used to make this update is called the learning equation.

Batch Normalization

● Normalization and standardization have the same goal of transforming the data to put all the data points on the same scale.

● A typical normalization process consists of scaling numerical data down to be on a scale from zero to one, and a typical standardization process involves of subtracting the mean of the dataset from each data point, and then dividing that difference by the data set’s standard deviation.

● By normalizing our inputs ,we put all our data into the same scale that increases training speed.

● But in a neural network one other problem arises with normalized data.

● In the neural network, the weights in the model become updated over each epoch during training via the process of stochastic gradient descent.

● The problem occurs when during training, one of the weights ends up becoming drastically larger than the other weights.

● This large weight will then cause the output from its corresponding neuron to be extremely large, and this imbalance will, again, continue to cascade through the network, causing instability, So that we have to use Batch normalization.

Process Of Batch Normalization

○ Normalize the output from the activation function.

■ z=(x-mean)/std

○ Multiply normalized output z by arbitrary parameter g.

■ z * g

○ Add arbitrary parameter b to resulting product (z * g)

Tensorflow and Keras For neural Network

Introduction To Tensorflow

● The official definition of tensorflow is “TensorFlow is an open source software library for numerical computation using dataflow graphs. Nodes in the graph represents mathematical operations, while graph edges represent multidimensional data arrays (aka tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.”

● Installation Of Tensorflow:

○ We can install tensorflow using the following command: “ pip install tensorflow”

● Basic Components:

○ Tensor:

■ Tensors are the basic data structure in TensorFlow which store data in any number of dimensions, like to multi dimensional arrays in NumPy. There are three main types of tensors: constants, variables, and placeholders

■ Constants are immutable types of tensors. They may be seen as nodes without inputs, outputting a single value they store internally.

■ Variables are mutable types of tenors whose value can alter during a run of a graph. In ML applications, the variables typically store the parameters which need to be optimized (eg. the weights between nodes in a neural network). Variables need to be initialized before running the graph by explicitly calling a special operation.

■ Placeholders are tensors which store data from external sources. They represent a “promise” that a value will be given when the graph is run. In ML applications, placeholders are usually used for inputting data to the learning model.

● Graph:

○ A graph is basically an arrangement of nodes that represent the operations in our model

○ The graph is composed of a series of nodes connected to each other by edges . Each node in the graph is called operation. So we’ll have one node for each operation; either for operations on tensors (like math operations) or generating tensors (like variables and constants).

● Sessions

○ Our graph should be run inside a session. Variables are initialized beforehand, while the placeholder tensor receives concrete values through the feed_dict attribute.

Sample Code for create and train a tensorflow model of a neural network.

Introduction To Keras:

○ Keras is a simple-to-use but powerful deep learning library for Python.

○ Keras is high level API building deep learning models.

○ Building a complex deep learning model can be achieved by keras with only a few lines of code.

○ Keras normally runs tops of low level library such as tensorflow So we have to first install and import the tensorflow.

Different types of model in keras:

Sequential API:

■ It’s basically like a linear stack of models.

■ It is best for a simple stack of layers which have 1 input tensor and 1 output tensor.

■ It is more useful for building simple models like

● Simple classification network

● Encoder-decoder model

■ This model is not suited when any of the layers in the stack has multiple inputs or outputs. If we want non-linear topology,then also it is not suited.

Functional API:

■ It provides more flexibility to define a model and add layers in keras. Functional API lets us to build models with multiple input or output. It also allows us to share these layers. In other words. We can make graphs of layers using Keras functional API.

As a functional API is a data structure, it is easy to save it as a single file that helps in recreating the exact model without having the original code. Also it’s easy to model the graph here and access its nodes as well.

Steps For implementing neural network with keras

■ Prepare input:

● Preparing the input an specify the input dimensional(size)

● Images,videos,text and audio

■ Define the ANN model

● In this we have to define the model architecture and build the computational graph

● Sequential or Functional Style


■ Optimizers

● Specify the optimizer and configure the learning process

● SGD,RMSprop,Adam

■ Loss Function

● Specify the inputs.Outputs of the computational graph (model) and the Loss function

● MSE,Cross Entropy,Hinge

■ Train and Evaluate Model

● Train the model based on the training data

● And test the model on the dataset with the testing data.

○ Sample Code

Hyper Parameter Tuning

● Hyperparameters are types of parameters that can not be run directly from a regular training process.

● Generally they are set before starting the actual training phase.

● These parameters express important properties of the model such as its complexity or how fast it should learn.

● Examples of Hyperparameters are:

○ The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization

○ The learning rate for training a neural network.

○ Hyperparameters for support vector machines are C and sigma.

○ The k in k-nearest neighbors.

● There are two best hyperparameter tuning techniques.

○ GridSearchCV

■ In the GridSearchCV approach, machine learning models are evaluated for a range of hyperparameter values. This approach is called GridSearchCV, because it searches for the best set of hyperparameters from a grid of hyperparameter values.

■ For example, if we want to set two hyperparameters C and Alpha of Logistic Regression Classifier model, with different sets of values.

The grid search technique constructs many versions of the model with all possible hyperparameter combinations, and returns the best one. For C = [0.1, 0.2 , 0.3 , 0.4, 0.5] and for Alpha = [0.1, 0.2 , 0.3 , 0.4], as shown in the picture. For a combination C=0.3 and Alpha=0.2, the output score is 0.726(Highest), thus it is chosen.

○ RandomizedSearchCV

■ RandomizedSearchCV solves GridSearchCV’s disadvantages, since it only passes a fixed number of hyperparameter settings. In random fashion it travels inside the grid to find the best set of hyperparameters.. This approach reduces unnecessary computation.

If you liked the story and want to appreciate us you can clap as much as you can. Appreciate our work by your constructive comment and also you can connect to us on….


LinkedIn :


Website :



Society of AI

Society of AI has an vision to educate people how Artificial Intelligence can change their life!