Categories
Furniture Heaven

Ripple table

Beautiful coffee table by Noe Duchaufour-Lawrence. Not the right size? Take a look at all tables by size.

Categories
Savings

Save Β£100 on cookers at Appliances Direct

Use the code “RANGE100” at checkout to save Β£100 on selected range cookers. Promotion ends on 6 June. #kitchen #savings

Start shopping for range cookers… and maybe a new dishwasher that fits just fine between the wall and the cabinet.

Categories
Furniture Heaven

Creepy cabinet

This creepy wooden cabinet by Caleb Woodard does look like something not from this world. #interiordesign #livingroom #furniture

Categories
Technology

Linear and Logistic Regressions as Degenerate Neural Networks in Keras

by Elod Pal Csirmaz

If you are tasked with creating a prediction for some measure, you may wonder whether a simple linear or multiple regression would be sufficient (or, in case we want to predict a binary value, logistic regression), or perhaps use a neural network. And how much coding would be involved in trying out the different models. The good news is, the high-level neural network framework Keras is sufficient for all these purposes, as I will show using a simple example.

Aaron Zhu’s overview of the regression models used here is a great start, so here we will only mention the basics.

Linear regressions

The goal with simple linear regression is to model an outcome (Y[i]), a continuous variable, as a linear function of a continuous input variable (X[i]) using two constants:

𝛃X[i] + 𝛂 = Y[i]*

The error in the prediction (Y[i]* – Y[i]) is usually measured using the sum of squared errors, because minimizing this error conveniently happens to maximize the likelihood that 𝛂 and 𝛃 provide the underlying model given the observed data of X and Y.

Notice that the mapping from X to Y* is actually the same as the one provided by a fully connected (or dense) layer in a neural network. This dense layer is very simple: it has one input, and one output; 𝛃 is the weight associated with the single input, and 𝛂 is the bias.

While a simple linear regression can be solved directly, that is, the values for 𝛂 and 𝛃 can be found using some algebra, should we decide to use gradient descent to optimize their values, what we would get is a very simple, degenerate neural network with a single dense layer and a squared error loss function. Both of these are readily available in Keras. I call this neural network degenerate since it does not have an activation function, and so it does not even have actual perceptrons in it.

Why stop here, though? If we have multiple input variables, X1, X2, … Xn, we may consider multiple regression, where the single outcome (Y) is predicted using a vector of factors:

𝛃1X1[i] + 𝛃2X2[i] + … + 𝛃nXn[i] + 𝛂 = Y[i]*

The direct algebraic method to obtain 𝛃n and 𝛂 is to solve a system of linear equations. The alternative, using gradient descent as above, can prove less resource intensive in general, which again yields a degenerate neural network: this time with a dense layer with n inputs and a single output with 𝛃1, 𝛃2, …, 𝛃n being the weights and 𝛂 the bias. For the same reason as in the case of simple linear regression, the choice of the error function is the sum of squared errors.

Logistic regression

Sometimes though we need to predict not a continuous value, but a true-or-false one, which is where logistic regression enters the picture. It is a linear regression model the outcome of which is fed into the logistic function 𝝈(x) = 1/(1+exp(-x)), a sigmoid function that maps all real numbers to the 0-1 interval. This makes the output of a logistic regression interpretable as a probability: if we are trying to predict whether a person will buy a red or a green balloon, this could be the probability that they go for the red one.

𝝈(𝛃1X1[i] + 𝛃2X2[i] + … + 𝛃nXn[i] + 𝛂) = Y[i]* = prob(person[i] buys red balloon)

Here the usual sum of squared errors loss would not maximize the likelihood of 𝛃1, 𝛃2, …, 𝛃n, 𝛂 being the underlying model directly. The correct loss function we need to use to achieve that is called binary cross-entropy. (See Arron Zhu’s article for a derivation of this.) The complexity of finding an optimal solution directly is increased by the logistic function, so, similarly to multiple regression, gradient descent can offer a faster and less resource-intensive alternative.

It will come as no surprise that with gradient descent, optimizing a logistic regression is equivalent to training a simple neural network, using components available in the arsenal of all relevant software packages. Actually, logistic regression represents a single layer of perceptrons, which in Keras can be modeled as a dense layer with a sigmoid activation. Training this model using the binary cross-entropy loss function gives us exactly what we want.

An example

That is, there is no need to write separate code or call separate libraries to try linear and logistic regressions, or full-fledged neural networks. Simply varying the neural network model allows us to try all three types of models fast and easily.

To demonstrate this, we create a toy example. Given a person’s age, relationship status and number of children, we try to predict how many balloons they buy, and whether the balloons are red or green. (Or how many chairs they buy and whether they need a wide or narrow dining table.) We represent relationship status as a single value that is either -0.5 for partnered, or +0.5 for single.

In our toy example we choose an underlying model that is not entirely linear to see if full neural networks fare better than the linear models. We generate our inputs using uniform random values. We do so from the [-0.5, +0.5] interval so the inputs would already be normalized: have zero mean and uniform variance. Then we calculate:

if relationship == -0.5:
    number_of_balloons = 1. * children - .2 * age
    balloon_color = 1 if (.8 * children + .2 * age > 0) else 0
else:
    number_of_balloons = .8 * children + .5 * age
    balloon_color = 1 if (.5 * children + .5 * age > 0) else 0

We create four models: a multiple linear regression and a neural network with no sigmoid function at the end to predict the number of balloons; here we use a sum of squares loss. Then a logistic regression one and a neural network with a final sigmoid function to predict the color of the balloons; here we use binary cross-entropy as the loss.

See this gist for the code that trains these models, and in the case of the regression ones, also displays the weights of the dense layers (corresponding to 𝛃 and 𝛂):

One example run produced the following output – reproduced here with slight modifications for readability:

======= Output type: num_balloons Model type: regression
Epoch 1/1000
loss: 0.0526 - val_loss: 0.0205
Epoch 2/1000
loss: 0.0205 - val_loss: 0.0201
Epoch 3/1000
loss: 0.0204 - val_loss: 0.0207

Weights:
[<'dense/kernel:0' ([[-0.00324172], [0.43457505], [0.1423042]])>,
<'dense/bias:0' ([0.00032589])>]

======= Output type: num_balloons Model type: neural
Epoch 1/1000
loss: 0.0206 - val_loss: 1.1117e-04
Epoch 2/1000
loss: 1.2853e-04 - val_loss: 1.1776e-04

======= Output type: color Model type: regression
Epoch 1/1000
loss: 0.5130 - val_loss: 0.2387
Epoch 2/1000
loss: 0.2213 - val_loss: 0.2005
Epoch 3/1000
loss: 0.2009 - val_loss: 0.1935
Epoch 4/1000
loss: 0.1965 - val_loss: 0.1969

Weights:
[<'dense/kernel:0' ([[0.02176554], [14.170614], [8.668548]])>,
<'dense/bias:0' ([0.02175274])>]

======= Output type: color Model type: neural 
Epoch 1/1000
loss: 0.4589 - val_loss: 0.0651
Epoch 2/1000
loss: 0.0518 - val_loss: 0.0345
Epoch 3/1000
loss: 0.0271 - val_loss: 0.0199
Epoch 4/1000
loss: 0.0161 - val_loss: 0.0147
Epoch 5/1000
loss: 0.0119 - val_loss: 0.0098
Epoch 6/1000
loss: 0.0104 - val_loss: 0.0098

The first observation is that the neural models fared better in both cases than the regressions (0.001178 validation loss vs. 0.0207; 0.0098 loss vs. 0.1969). As expected, they could model the non-linear relationships.

The weights returned by the regressions merit a bit more analysis and sanity checking. For the number of balloons, the multiple regressions predicts

-0.003 relationship + 0.435 num_children + 0.142 age + 0.000 = num_of_balloons

Since the number of children and age both have a mean of zero, the mean of the number of balloons returned by the underlying model is also zero for both relationship statuses. This is reflected in that the factor for the relationship input and the bias are both practically zero.

The underlying model produces the same amount of examples for the two relationship statuses, so we expect that the best linear model approximating their combination is in the middle between the two linear expressions in the underlying model. Indeed this is what we find: the factor for the number of children is close to (1.0 + 0.8) / 2 = 0.45, and the factor for the age is close to (0.5 – 0.2) / 2 = 0.15.

For the color of the balloons, the logistic regression gives the model

𝝈(0.0218 relationship + 14.171 num_children + 8.669 age + 0.0218) = prob(red)

The same symmetries apply as before, and again we find that the factor for the relationship and the bias are close to zero, especially if compared to the other two numbers. We again expect the linear model to be between the two linear expressions. For the number of children we expect (0.8 + 0.5) / 2 = 0.65, and for the age (0.2 + 0.5) / 2 = 0.35. However, due to the sigmoid function the model is only interested in whether the output is negative or positive, so there is an arbitrary scaling factor; taking this into account we see the factors do make sense: 14.171 / 8.669 = 1.635, which is close to 0.65 / 0.35 = 1.857.

In fact, the scaling factor is not arbitrary: the fact that they are big numbers make the input into the sigmoid function large in absolute terms, which forces its output to be very close to zero or one.

Conclusion

We have seen how neural networks are supersets of linear and logistic regressions, and how with existing software components used to build neural networks we can very easily implement regression models. Implementing regression models this way can make it very easy to upgrade them to a neural network if necessary.

Categories
Furniture Heaven

Nervous chair

One of the so-called “nervous chairs” by Grant Wilkinson and Teresa Rivera, designers based in London. #furniture #interior #design

Check out more wooden dining and living room chairs on our website.

Categories
Furniture Heaven

Furniture Heaven: The Ghost of a Chair

“The Ghost of a Chair,” Valentina Gonzalez Wohlers
Categories
Updates

Wardrobes on Furniture Ferret

We have just finished work to include wardrobes on Furniture Ferret! With the unique search based on measurements, and over 5000 wardrobes to choose from, there should now be no bedroom with a nook of any size in need of a beautiful wardrobe to add to you storage space.

Categories
Savings

Lifestyle Furniture Spring Sale

Lifestyle Furniture’s spring #sale runs until midnight on 9 May with up to 10% off all items, and a price match promise on all furniture to boot. They also offer 14 day money back returns, free deliveries and 0% finance with no deposit.

Start the spring furniture sale!

Categories
Technology

Extracting text from web code

Here at Furniture Ferret most of what we do is extract descriptions of items of furniture from web pages. Even if we know where it is on the page, it is often not a trivial task to get the text from the HTML code web pages use: words and sentences can run into each other unless we do something clever.

This is why we created semantic text, a software library that is a drop-in replacement for BeautifulSoup’s get_text() in Python.

get_text() in BeautifulSoup simply concatenates strings among the descendants. This can create unexpected results when so-called block-level HTML elements are used, which are expected to semantically separate portions of the text.

For example, for the following HTML:

<ul><li><strong>V</strong>ery interesting</li><li>Thing it is</li></ul>

get_text() returns “Very interestingThing it is” instead of the expected “Very interesting Thing it is” as it disregards that <li> is a block-level element.

beautifulsoup_semantic_text.bs_semantic_text() overcomes this problem by adding a space in front of each block-level element. However, block-level and inline is a historical categorisation of HTML elements and is not defined everywhere. The distinction is still useful to approximate the expected presentation of the HTML in a string.

Categories
Savings

Aosom bank holiday sale

Get 15% off from Aosom with a new voucher code in the UK and Ireland! Use Holiday15 until midnight on 3 May (UK time) to get the reward.

Start shopping on Aosom UK

Start shopping on Aosom IE