Teaching a Neural Network to Add Two Numbers with PyTorch (Full Working Example)

Neural networks are often associated with advanced AI systems, large language models, and image generation. But underneath all of that complexity lies a surprisingly simple foundation: learning mathematical relationships from data.

One of the best beginner exercises in PyTorch is teaching a neural network how to add two numbers.

It may sound trivial, but this small example demonstrates many of the core mechanisms behind deep learning systems:

  • Forward propagation
  • Hidden representations
  • Loss calculation
  • Gradient descent
  • Backpropagation
  • Weight updates
  • Generalization to unseen data

In this article, we will build a complete PyTorch program that learns addition from examples.

Teaching a Neural Network to Add Two Numbers with PyTorch (Full Working Example)
Teaching a Neural Network to Add Two Numbers with PyTorch (Full Working Example)

What Are We Trying to Teach?

We want a neural network to learn this relationship:

y=x1+x2y = x_1 + x_2y=x1​+x2​

Instead of manually programming addition logic, we allow the neural network to discover the pattern from training examples.

The model will observe examples such as:

InputOutput
1 + 23
2 + 35
5 + 611

Over time, the model adjusts internal weights until it can correctly predict sums for numbers it has never seen before.

Why This Example Matters

Although adding two numbers is simple for humans, this example introduces nearly every important concept in neural network training.

A neural network does not “understand” addition symbolically like a calculator. Instead, it approximates numerical relationships through learned parameters.

This same learning mechanism scales into:

  • Image recognition
  • Language translation
  • AI agents
  • Recommendation systems
  • Transformer architectures
  • Reasoning models

Understanding this small example builds intuition for much larger systems.

Full Working PyTorch Code

Below is the complete working program.

You can copy this directly into a Python file such as:

add_two_numbers_with_pytorch.py

and run it immediately.

👉 You can experiment with a practical Python implementation of this concept in the official GitHub repository for the Reasoning Systems examples: https://github.com/BenardoKemp/reasoningsystems/tree/main/practical-python/add-two-numbers-with-pytorch

Step-by-Step Breakdown

1. Importing PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

These imports provide:

  • Tensor operations
  • Neural network layers
  • Optimization algorithms

PyTorch tensors are similar to NumPy arrays but support automatic differentiation and GPU acceleration.

2. Creating the Training Data

X = torch.tensor([
[1.0, 2.0],
[2.0, 3.0],
...
])

Each row contains two input numbers.

The targets are:

y = torch.tensor([
[3.0],
[5.0],
...
])

which represent the correct sums.

The neural network will try to learn the mapping:

Input → Correct Output

3. Building the Neural Network

model = nn.Sequential(
nn.Linear(2, 8),
nn.ReLU(),
nn.Linear(8, 1)
)

This architecture contains:

LayerPurpose
Linear(2,8)Converts 2 inputs into 8 learned features
ReLUAdds non-linearity
Linear(8,1)Produces the final prediction

The hidden layer computes an intermediate representation:

h=ReLU(Wx+b)h = \mathrm{ReLU}(Wx + b)h=ReLU(Wx+b)

This is where the network internally learns numerical relationships.

4. Loss Function

The loss function measures prediction error.

criterion = nn.MSELoss()

Mean Squared Error computes:

MSE=1ni=1n(yiy^i)2\mathrm{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2MSE=n1​∑i=1n​(yi​−y^​i​)2

Smaller loss values mean better predictions.

The goal of training is to minimize this error.

5. Optimizer

optimizer = optim.Adam(model.parameters(), lr=0.01)

Adam is a popular optimization algorithm that updates model weights efficiently.

The optimizer adjusts the neural network parameters after every training step.

6. Training Loop

The training loop repeatedly shows data to the model.

for epoch in range(epochs):

Each epoch performs:

  1. Forward pass
  2. Loss calculation
  3. Gradient computation
  4. Weight update

Forward Pass

predictions = model(X)

Compute Error

loss = criterion(predictions, y)

The model compares predictions to correct answers.

Clear Old Gradients

optimizer.zero_grad()

Backpropagation

loss.backward()

PyTorch automatically computes gradients using automatic differentiation.

This is one of the most important mechanisms in deep learning.

Update Weights

optimizer.step()

The optimizer adjusts parameters to reduce error.

Over many iterations, the model improves.

7. Testing the Model

After training, we test the network on unseen data.

test_input = torch.tensor([[10.0, 15.0]])

Expected answer:

25

The network predicts a value close to this despite never seeing the example before.

This ability to generalize is central to machine learning.

Example Output

You may see output similar to:

Epoch 0, Loss: 58.231247
Epoch 200, Loss: 0.341912
Epoch 400, Loss: 0.021457
Epoch 600, Loss: 0.002194
Test Result
------------------
Input Numbers : 10 and 15
Predicted Sum : 24.99

The decreasing loss shows that the network is learning.

What Is Actually Happening Internally?

The neural network learns parameters that approximate addition behavior.

Internally, the model performs operations like:

y^=W2ReLU(W1x+b1)+b2\hat{y} = W_2\cdot\mathrm{ReLU}(W_1x+b_1)+b_2

The weights gradually shift until predictions become accurate.

The network is not explicitly programmed with arithmetic rules.

Instead, it statistically approximates the relationship between inputs and outputs.

Why Use a Neural Network for Addition?

In practice, you would never use a neural network to perform basic arithmetic.

A calculator is faster, exact, and more efficient.

However, this example is valuable because it demonstrates:

  • How neural networks learn
  • How training loops work
  • How gradients update parameters
  • How loss minimization functions
  • How generalization emerges

This is foundational knowledge for:

Next Steps

Once you understand this example, you can extend it into:

More Complex Arithmetic

  • Subtraction
  • Multiplication
  • Division

Sequence-Based Reasoning

Teach the model step-by-step arithmetic.

Transformer-Based Numerical Reasoning

Token-based arithmetic similar to language models.

Chain-of-Thought Reasoning

Intermediate reasoning steps before final answers.

Symbolic + Neural Hybrid Systems

Combining neural networks with explicit reasoning logic.

Final Thoughts

Teaching a neural network to add two numbers may appear simple, but it reveals the mechanics behind modern AI systems.

At its core, deep learning is about discovering patterns through optimization.

Even advanced reasoning models begin with the same underlying principles:

  • Inputs
  • Hidden representations
  • Loss functions
  • Gradient updates
  • Iterative learning

Mastering these fundamentals makes larger neural architectures far easier to understand.

Designed with WordPress