Interactive Convolution Visualizer

Controls

What is Convolution?

Convolution is a fundamental mathematical operation in CNNs that involves sliding a filter (or kernel) over an input image, performing element-wise multiplications, and summing the results.

This operation enables the network to detect features like edges, textures, and patterns in the image.

Experiment with different filters and parameters to see how they affect the convolution result.

Visualization

Input Image

Feature Map

Interpretation

The feature map shows the result of applying the selected filter to the input image.

Brighter areas indicate high filter activation, meaning it has detected a pattern similar to what the filter is designed to find.

The 3D visualization shows the intensity of activations as height, giving an additional perspective on how the filter responds to different parts of the image.

Advanced Features

Multichannel Convolution

In real CNNs, images usually have multiple channels (RGB), and multiple filters are applied to detect different features.

Each filter generates a different feature map, and these maps are combined in later layers to form more complex representations.

The filter size affects the receptive field: larger filters can capture broader patterns but require more computation.

Multichannel Visualizer

RGB Image

Interpretation

Each feature map shows the response of a different filter applied to the input image.

Observe how different filters detect different aspects of the image, such as horizontal or vertical edges, or specific textures.

In a real CNN, these feature maps would serve as input for the next convolutional layer.

Activation Functions

Activation Functions

Activation functions introduce non-linearity into neural networks, allowing them to learn complex relationships.

ReLU (Rectified Linear Unit): f(x) = max(0, x). Simple and efficient, but it can suffer from "dead neurons".

Sigmoid: f(x) = 1/(1+e-x). Maps values to the [0,1] range, but suffers from the vanishing gradient problem.

Tanh: f(x) = tanh(x). Similar to sigmoid but with a range of [-1,1].

Leaky ReLU: f(x) = max(αx, x). Solves the dead neuron problem by allowing small negative gradients.

Activation Visualizer

Before Activation

After Activation

Interpretation

See how the activation function transforms the values in the feature map.

ReLU removes all negative values, while Sigmoid compresses all values into the range [0,1].

The chart shows the applied activation function, where the X-axis represents input values and the Y-axis shows transformed values.

Pooling Operations

Pooling Operations

Pooling reduces the spatial dimensionality of feature maps, decreasing computational cost and providing invariance to small translations.

Max Pooling: Selects the maximum value in each window, preserving the most prominent features.

Average Pooling: Calculates the average of the values in each window, preserving background information.

Global Average Pooling: Calculates the average of the entire feature map, reducing each map to a single value.

Pooling Visualization

Before Pooling

After Pooling

Interpretation

Observe how pooling reduces the spatial resolution while trying to preserve important information.

Max Pooling tends to preserve edges and prominent features, while Average Pooling smooths the image.

Stride controls how far the pooling window moves: a higher stride results in a more aggressive dimensionality reduction.