Last Updated : 30 Aug, 2024
Comments
Improve
The XOR (exclusive OR) is a simple logic gate problem that cannot be solved using a single-layer perceptron (a basic neural network model). We can solve this using neural networks. Neural networks are powerful tools in machine learning.
In this article, we are going to discuss what is XOR problem, how we can solve it using neural networks, and also a simple code to demonstrate this.
Table of Content
- What is the XOR Problem?
- Why Single-Layer Perceptrons Fail?
- How Multi-Layer Neural Networks Solve XOR?
- Mathematics Behind the MLP Solution
- Geometric Interpretation
- Training the Neural Network to Solve XOR Problem
- Conclusion
What is the XOR Problem?
The XOR operation is a binary operation that takes two binary inputs and produces a binary output. The output of the operation is 1 only when the inputs are different.
Below is the truth table for XOR:
Input A | Input B | XOR Output |
---|---|---|
1 | 1 | |
1 | 1 | |
1 | 1 |
The main problem is that a single-layer perceptron cannot solve this problem because the data is not linearly separable i.e. we cannot draw a straight line to separate the output classes (0s and 1s)
Why Single-Layer Perceptrons Fail?
A single-layer perceptron can solve problems that are linearly separable by learning a linear decision boundary.
Mathematically, the decision boundary is represented by:
[Tex]y = \text{step}(\mathbf{w} \cdot \mathbf{x} + b)[/Tex]
Where:
- [Tex]\mathbf{w}[/Tex] is the weight vector.
- [Tex]\mathbf{x}[/Tex] is the input vector.
- [Tex]b[/Tex] is the bias term.
- [Tex]\text{step}[/Tex] is the activation function, often a Heaviside step function that outputs 1 if the input is positive and 0 otherwise.
For linearly separable data, the perceptron can adjust the weights [Tex]\mathbf{w}[/Tex] and bias [Tex]b[/Tex] during training to correctly classify the data. However, because XOR is not linearly separable, no single line (or hyperplane) can separate the outputs 0 and 1, making a single-layer perceptron inadequate for solving the XOR problem.
How Multi-Layer Neural Networks Solve XOR?
A multi-layer neural network which is also known as a feedforward neural network or multi-layer perceptron is able to solve the XOR problem. There are multiple layer of neurons such as input layer, hidden layer, and output layer.
The working of each layer:
- Input Layer: This layer takes the two inputs (A and B).
- Hidden Layer: This layer applies non-linear activation functions to create new, transformed features that help separate the classes.
- Output Layer: This layer produces the final XOR result.
Mathematics Behind the MLP Solution
Let’s break down the mathematics behind how an MLP can solve the XOR problem.
Step 1: Input to Hidden Layer Transformation
Consider an MLP with two neurons in the hidden layer, each applying a non-linear activation function (like the sigmoid function). The output of the hidden neurons can be represented as:
[Tex]h_1 = \sigma(w_{11} A + w_{12} B + b_1)[/Tex]
[Tex]h_2 = \sigma(w_{21} A + w_{22} B + b_2)[/Tex]
Where:
- [Tex]\sigma(x) = \frac{1}{1 + e^{-x}}[/Tex] is the sigmoid activation function.
- [Tex]w_{ij}[/Tex] are the weights from the input neurons to the hidden neurons.
- [Tex]b_i[/Tex] are the biases for the hidden neurons.
Activation functions such as the sigmoid or ReLU (Rectified Linear Unit) introduce non-linearity into the model. It enables the neural network to handle complex patterns like XOR. Without these functions, the network would behave like a simple linear model, which is insufficient for solving XOR.
Step 2: Hidden Layer to Output Layer Transformation
The output neuron combines the outputs of the hidden neurons to produce the final output:
[Tex]\text{Output} = \sigma(w_{31} h_1 + w_{32} h_2 + b_3)[/Tex]
Where [Tex]w_{3i}[/Tex] are the weights from the hidden neurons to the output neuron, and [Tex]b_3[/Tex] is the bias for the output neuron.
Step 3: Learning Weights and Biases
During the training process, the network adjusts the weights [Tex]w_{ij}[/Tex] and biases [Tex]b_i[/Tex] using backpropagation and gradient descent to minimize the error between the predicted output and the actual XOR output.
Example Configuration:
Let’s consider a specific configuration of weights and biases that solves the XOR problem:
- For the hidden layer:
- [Tex]w_{11} = 1 , w_{12} =1, b_1 =0.5[/Tex]
- [Tex]w_{21} =1 , w_{22} =1 , b_2 = -1.5[/Tex]
- For the output layer:
- [Tex]w_{31} =1, w_{32} =1, b_3 = -1[/Tex]
With these weights and biases, the network produces the correct XOR output for each input pair (A, B).
Geometric Interpretation
In the hidden layer, the network effectively transforms the input space into a new space where the XOR problem becomes linearly separable. This can be visualized as bending or twisting the input space such that the points corresponding to different XOR outputs (0s and 1s) are now separable by a linear decision boundary.
Training the Neural Network to Solve XOR Problem
The neural network learns to solve the XOR problem by adjusting the weights during training. This is done using backpropagation, where the network calculates the error in its output and adjusts its internal weights to minimize this error over time. This process continues until the network can correctly predict the XOR output for all given input combinations.
The following python code implementation demonstrates how neural networks solve the XOR problem using TensorFlow and Keras:
import numpy as npimport tensorflow as tffrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense# Define the XOR input and output dataX = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])y = np.array([[0], [1], [1], [0]])# Build the neural network modelmodel = Sequential()model.add(Dense(2, input_dim=2, activation='relu')) # Hidden layer with 2 neuronsmodel.add(Dense(1, activation='sigmoid')) # Output layer with 1 neuron# Compile the modelmodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])# Train the modelmodel.fit(X, y, epochs=10000, verbose=0)# Evaluate the model_, accuracy = model.evaluate(X, y)print(f"Accuracy: {accuracy * 100:.2f}%")# Make predictionspredictions = model.predict(X)predictions = np.round(predictions).astype(int)print("Predictions:")for i in range(len(X)): print(f"Input: {X[i]} => Predicted Output: {predictions[i]}, Actual Output: {y[i]}")
Output:
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - accuracy: 0.5000 - loss: 0.6931
Accuracy: 50.00%
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step
Predictions:
Input: [0 0] => Predicted Output: [0], Actual Output: [0]
Input: [0 1] => Predicted Output: [0], Actual Output: [1]
Input: [1 0] => Predicted Output: [0], Actual Output: [1]
Input: [1 1] => Predicted Output: [0], Actual Output: [0]
Conclusion
The XOR problem is a classic example that highlights the limitations of simple neural networks and the need for multi-layer architectures. By introducing a hidden layer and non-linear activation functions, an MLP can solve the XOR problem by learning complex decision boundaries that a single-layer perceptron cannot. Understanding this solution provides valuable insight into the power of deep learning models and their ability to tackle non-linear problems in various domains.
Previous Article
Training Neural Networks using Pytorch Lightning
Next Article
What Is the Relationship Between PyTorch and Torch?