Categories

## Linear and Logistic Regressions as Degenerate Neural Networks in Keras

by Elod Pal Csirmaz

If you are tasked with creating a prediction for some measure, you may wonder whether a simple linear or multiple regression would be sufficient (or, in case we want to predict a binary value, logistic regression), or perhaps use a neural network. And how much coding would be involved in trying out the different models. The good news is, the high-level neural network framework Keras is sufficient for all these purposes, as I will show using a simple example.

Aaron Zhu’s overview of the regression models used here is a great start, so here we will only mention the basics.

## Linear regressions

The goal with simple linear regression is to model an outcome (Y[i]), a continuous variable, as a linear function of a continuous input variable (X[i]) using two constants:

𝛃X[i] + 𝛂 = Y[i]*

The error in the prediction (Y[i]* – Y[i]) is usually measured using the sum of squared errors, because minimizing this error conveniently happens to maximize the likelihood that 𝛂 and 𝛃 provide the underlying model given the observed data of X and Y.

Notice that the mapping from X to Y* is actually the same as the one provided by a fully connected (or dense) layer in a neural network. This dense layer is very simple: it has one input, and one output; 𝛃 is the weight associated with the single input, and 𝛂 is the bias.

While a simple linear regression can be solved directly, that is, the values for 𝛂 and 𝛃 can be found using some algebra, should we decide to use gradient descent to optimize their values, what we would get is a very simple, degenerate neural network with a single dense layer and a squared error loss function. Both of these are readily available in Keras. I call this neural network degenerate since it does not have an activation function, and so it does not even have actual perceptrons in it.

Why stop here, though? If we have multiple input variables, X1, X2, … Xn, we may consider multiple regression, where the single outcome (Y) is predicted using a vector of factors:

𝛃1X1[i] + 𝛃2X2[i] + … + 𝛃nXn[i] + 𝛂 = Y[i]*

The direct algebraic method to obtain 𝛃n and 𝛂 is to solve a system of linear equations. The alternative, using gradient descent as above, can prove less resource intensive in general, which again yields a degenerate neural network: this time with a dense layer with n inputs and a single output with 𝛃1, 𝛃2, …, 𝛃n being the weights and 𝛂 the bias. For the same reason as in the case of simple linear regression, the choice of the error function is the sum of squared errors.

## Logistic regression

Sometimes though we need to predict not a continuous value, but a true-or-false one, which is where logistic regression enters the picture. It is a linear regression model the outcome of which is fed into the logistic function 𝝈(x) = 1/(1+exp(-x)), a sigmoid function that maps all real numbers to the 0-1 interval. This makes the output of a logistic regression interpretable as a probability: if we are trying to predict whether a person will buy a red or a green balloon, this could be the probability that they go for the red one.

𝝈(𝛃1X1[i] + 𝛃2X2[i] + … + 𝛃nXn[i] + 𝛂) = Y[i]* = prob(person[i] buys red balloon)

Here the usual sum of squared errors loss would not maximize the likelihood of 𝛃1, 𝛃2, …, 𝛃n, 𝛂 being the underlying model directly. The correct loss function we need to use to achieve that is called binary cross-entropy. (See Arron Zhu’s article for a derivation of this.) The complexity of finding an optimal solution directly is increased by the logistic function, so, similarly to multiple regression, gradient descent can offer a faster and less resource-intensive alternative.

It will come as no surprise that with gradient descent, optimizing a logistic regression is equivalent to training a simple neural network, using components available in the arsenal of all relevant software packages. Actually, logistic regression represents a single layer of perceptrons, which in Keras can be modeled as a dense layer with a sigmoid activation. Training this model using the binary cross-entropy loss function gives us exactly what we want.

## An example

That is, there is no need to write separate code or call separate libraries to try linear and logistic regressions, or full-fledged neural networks. Simply varying the neural network model allows us to try all three types of models fast and easily.

To demonstrate this, we create a toy example. Given a person’s age, relationship status and number of children, we try to predict how many balloons they buy, and whether the balloons are red or green. (Or how many chairs they buy and whether they need a wide or narrow dining table.) We represent relationship status as a single value that is either -0.5 for partnered, or +0.5 for single.

In our toy example we choose an underlying model that is not entirely linear to see if full neural networks fare better than the linear models. We generate our inputs using uniform random values. We do so from the [-0.5, +0.5] interval so the inputs would already be normalized: have zero mean and uniform variance. Then we calculate:

```if relationship == -0.5:
number_of_balloons = 1. * children - .2 * age
balloon_color = 1 if (.8 * children + .2 * age > 0) else 0
else:
number_of_balloons = .8 * children + .5 * age
balloon_color = 1 if (.5 * children + .5 * age > 0) else 0
```

We create four models: a multiple linear regression and a neural network with no sigmoid function at the end to predict the number of balloons; here we use a sum of squares loss. Then a logistic regression one and a neural network with a final sigmoid function to predict the color of the balloons; here we use binary cross-entropy as the loss.

See this gist for the code that trains these models, and in the case of the regression ones, also displays the weights of the dense layers (corresponding to 𝛃 and 𝛂):

One example run produced the following output – reproduced here with slight modifications for readability:

```======= Output type: num_balloons Model type: regression
Epoch 1/1000
loss: 0.0526 - val_loss: 0.0205
Epoch 2/1000
loss: 0.0205 - val_loss: 0.0201
Epoch 3/1000
loss: 0.0204 - val_loss: 0.0207

Weights:
[<'dense/kernel:0' ([[-0.00324172], [0.43457505], [0.1423042]])>,
<'dense/bias:0' ([0.00032589])>]

======= Output type: num_balloons Model type: neural
Epoch 1/1000
loss: 0.0206 - val_loss: 1.1117e-04
Epoch 2/1000
loss: 1.2853e-04 - val_loss: 1.1776e-04

======= Output type: color Model type: regression
Epoch 1/1000
loss: 0.5130 - val_loss: 0.2387
Epoch 2/1000
loss: 0.2213 - val_loss: 0.2005
Epoch 3/1000
loss: 0.2009 - val_loss: 0.1935
Epoch 4/1000
loss: 0.1965 - val_loss: 0.1969

Weights:
[<'dense/kernel:0' ([[0.02176554], [14.170614], [8.668548]])>,
<'dense/bias:0' ([0.02175274])>]

======= Output type: color Model type: neural
Epoch 1/1000
loss: 0.4589 - val_loss: 0.0651
Epoch 2/1000
loss: 0.0518 - val_loss: 0.0345
Epoch 3/1000
loss: 0.0271 - val_loss: 0.0199
Epoch 4/1000
loss: 0.0161 - val_loss: 0.0147
Epoch 5/1000
loss: 0.0119 - val_loss: 0.0098
Epoch 6/1000
loss: 0.0104 - val_loss: 0.0098
```

The first observation is that the neural models fared better in both cases than the regressions (0.001178 validation loss vs. 0.0207; 0.0098 loss vs. 0.1969). As expected, they could model the non-linear relationships.

The weights returned by the regressions merit a bit more analysis and sanity checking. For the number of balloons, the multiple regressions predicts

-0.003 relationship + 0.435 num_children + 0.142 age + 0.000 = num_of_balloons

Since the number of children and age both have a mean of zero, the mean of the number of balloons returned by the underlying model is also zero for both relationship statuses. This is reflected in that the factor for the relationship input and the bias are both practically zero.

The underlying model produces the same amount of examples for the two relationship statuses, so we expect that the best linear model approximating their combination is in the middle between the two linear expressions in the underlying model. Indeed this is what we find: the factor for the number of children is close to (1.0 + 0.8) / 2 = 0.45, and the factor for the age is close to (0.5 – 0.2) / 2 = 0.15.

For the color of the balloons, the logistic regression gives the model

𝝈(0.0218 relationship + 14.171 num_children + 8.669 age + 0.0218) = prob(red)

The same symmetries apply as before, and again we find that the factor for the relationship and the bias are close to zero, especially if compared to the other two numbers. We again expect the linear model to be between the two linear expressions. For the number of children we expect (0.8 + 0.5) / 2 = 0.65, and for the age (0.2 + 0.5) / 2 = 0.35. However, due to the sigmoid function the model is only interested in whether the output is negative or positive, so there is an arbitrary scaling factor; taking this into account we see the factors do make sense: 14.171 / 8.669 = 1.635, which is close to 0.65 / 0.35 = 1.857.

In fact, the scaling factor is not arbitrary: the fact that they are big numbers make the input into the sigmoid function large in absolute terms, which forces its output to be very close to zero or one.

## Conclusion

We have seen how neural networks are supersets of linear and logistic regressions, and how with existing software components used to build neural networks we can very easily implement regression models. Implementing regression models this way can make it very easy to upgrade them to a neural network if necessary.

Categories

## Nervous chair

One of the so-called “nervous chairs” by Grant Wilkinson and Teresa Rivera, designers based in London. #furniture #interior #design

Check out more wooden dining and living room chairs on our website.

Categories

Categories

## Wardrobes on Furniture Ferret

We have just finished work to include wardrobes on Furniture Ferret! With the unique search based on measurements, and over 5000 wardrobes to choose from, there should now be no bedroom with a nook of any size in need of a beautiful wardrobe to add to you storage space.

Categories

## Extracting text from web code

Here at Furniture Ferret most of what we do is extract descriptions of items of furniture from web pages. Even if we know where it is on the page, it is often not a trivial task to get the text from the HTML code web pages use: words and sentences can run into each other unless we do something clever.

This is why we created semantic text, a software library that is a drop-in replacement for BeautifulSoup’s get_text() in Python.

get_text() in BeautifulSoup simply concatenates strings among the descendants. This can create unexpected results when so-called block-level HTML elements are used, which are expected to semantically separate portions of the text.

For example, for the following HTML:

``<ul><li><strong>V</strong>ery interesting</li><li>Thing it is</li></ul>``

get_text() returns “Very interestingThing it is” instead of the expected “Very interesting Thing it is” as it disregards that <li> is a block-level element.

beautifulsoup_semantic_text.bs_semantic_text() overcomes this problem by adding a space in front of each block-level element. However, block-level and inline is a historical categorisation of HTML elements and is not defined everywhere. The distinction is still useful to approximate the expected presentation of the HTML in a string.

Categories

## Help to furnish a whole room

Wouldn’t it be great if your computer could suggest more pieces of furniture that would go with what you are buying, even if they are not from the same retailer? Furniture Ferret can now do so, and help you redesign your room, or furnish some empty space. Just click on any product, and the website will automatically load suggestions for other pieces that may go with it.

This feature is currently only available in English, but we plan to roll it out to all our supported languages and regions soon.

Categories

## JYSK leaves Russia

After four weeks of temporary closure of JYSK’s stores in Russia, JYSK announces that the closure will be permanent, and that JYSK will withdraw completely from Russia. As of 31 March, the JYSK stores in Russia will reopen temporarily to carry out a discontinuation sale to empty the stores. Once the products in the stores have been sold, the stores will be closed permanently.

JYSK is an international home furnishing retailer with Scandinavian roots that makes it easy to furnish every room in any home and garden. Its products are indexed by Furniture Ferret in many countries.

Source

Categories

## 🌟 150,000 🌟

We are thrilled having reached the milestone of indexing 150,000 products in 10 countries. We sincerely hope we can help everyone make the best of their homes, flats and houses. With space being limited, and housing crises looming in many regions, we believe being able to find practical soluions and pieces of furniture by size (and price!) is a step towards better living.

Categories

## Help to get you to the right place

We have rolled out a new feature to get you to the right furniture shops even if you land on a page for a different country. All computers and devices on the internet have a special address, and it’s often possible to guess what country the device is in from this address. We now use this, and system to match the pages in different countries, like 80cm wide desks in the UK and 80cm wide desks in Australia, to get you to the right place. Happy shopping!

Categories

## The start of our blog

We are excited to launch the Furniture Ferret Blog, where we include updates about our platform, information from our friends and partners, and news from the world of home and furniture.

As the first piece of news, we are happy to have started the work to expand our services to Spain. There is still much to do, but the first pages are already up and running with a small selection of products.