Neural Networks: Back to Basics, Part I — Lucas Martin Calderon

From a one-weight pricing model to gradient descent — the cleanest path I know into the fundamentals.

Introduction

Diving into the basics, I realized a need for more beginner-friendly resources. This guide is a response to that need.

Kickoff with Basics

Imagine you're assisting someone in determining if a $500,000 tag for a 2,500 sq ft apartment (about 230 m²) is fair.

Without comparisons, it's challenging. So, after some research, you gather data from recent apartment sales.

A logical initial approach: find the price per sq ft. This equates to $200 per sq ft.

Congratulations — you've just constructed your first, albeit basic, neural network. Not quite AI-chatbot level, but it's the fundamental block.

This simplistic diagram represents how the network structures its prediction. The calculation commences from the left input node. The input value transitions rightward, multiplies with the weight, and the result emerges as our output.

For a 2,500 sq ft apartment, the multiplication with $200 gives $500,000. At this tier, the prediction is mere multiplication. Before this, determining the weight for multiplication was essential. This weight determination is what we term as the "training" phase. So "training" a neural network essentially means determining the weights to predict.

In essence, it's a prediction model. Technically, since the output can range continuously, this model is a "regression model."

To visualize this (let's simplify price units from $1 to $1,000, altering our weight to 0.2 instead of 200) we get the same prediction at a different scale.

Enhancement and Precision

Is a mere average of data points the best we can do? Let's refine. For enhancement, we need a clear definition of "better." Evaluating our model against our data points, we get visible errors.

In the diagram, yellow represents errors. We aim to minimize this.

Here, we note the actual price, predicted price, and their difference. Averaging these differences gives us a measure of the model's error. Negative values, like the −78, pose challenges. Squaring the error eliminates this negativity. Thus, our refinement goal is minimizing this error. This "Mean Square Error" becomes our loss function.

Experimenting with various weights, we realize a simple weight variation won't suffice. Introducing a bias, however, can improve the model. With one input and one output (and no hidden layers), it appears as:

Here, W (weight) and b (bias) are determined during training. X is our input (square footage), and Y is our predicted price.

The prediction formula now evolves to:

Y = (W × X) + b

Interactive Training Session

Why not have a go at training this basic neural network? Your objective: minimize the loss function by adjusting weight and bias. Can you achieve an error below 2,000?

A proposed solution, done manually in the command line, lands you within striking distance — but the trial-and-error gets tedious fast.

Automating the Process

Kudos on your manual neural network training. Next, let's explore automation. Consider this autopilot functionality.

These buttons apply gradient descent, optimizing weight and bias to reduce the loss function. The new graphs help monitor error rates. The essence of training is error reduction.

Gradient Descent's direction is informed by calculus. By understanding our loss function and current weight and bias, the function's derivatives guide the adjustments.

For a deeper dive into gradient descent, consider Coursera's Machine Learning course's initial lectures.

Adding Complexity

Is apartment size the sole price determinant? Obviously not. Let's incorporate another factor: number of bedrooms.

The updated neural network has two weights (one for each input) and one bias. The prediction formula evolves to:

Y = (w₁ × x₁) + (w₂ × x₂) + b

Figuring out w₁ and w₂ is intricate. Gradient descent is, once again, our ally.

Feature Implementation

Having explored networks with one or two features, it's evident how to scale. As features increase, weight optimization becomes complex. Feature selection is pivotal and is an art in itself. For a feature-selection example, refer to "A Journey Through Titanic" by Omar El Gabry, which tackles Kaggle's Titanic challenge.

Categorization

Taking our example further, imagine a list of apartments labelled based on size and number of bedrooms.

The objective is predicting apartment desirability. Neural networks thus far have been regression-based, providing continuous values. However, often, they're employed for classification, providing discrete outputs like "Good" or "Poor."

For instance, TensorFlow's app, discussed previously, is a classification model. A practical adaptation involves outputting probabilities for each class, like "Good" or "Poor." The softmax operation aids in this.

For an array input [3, 5] into softmax, it might yield [0.12, 0.88], suggesting an 88% probability of the "Poor" label.

Softmax outputs are positive, summing to 1, making them apt for probability. The exaggerated difference between outputs aids training.

Closing Thoughts

This guide provides foundational insights into neural networks. As AI and machine learning continue to evolve, understanding these basics empowers us to grasp more intricate concepts and applications.