Neural network weights too large? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR

A proverb that is used to imply that you have unexpectedly faced a big problem

How to write capital alpha?

Where is the Next Backup Size entry on iOS 12?

What does it mean that physics no longer uses mechanical models to describe phenomena?

License to disallow distribution in closed source software, but allow exceptions made by owner?

Why are vacuum tubes still used in amateur radios?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

What initially awakened the Balrog?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

Is multiple magic items in one inherently imbalanced?

Trying to understand entropy as a novice in thermodynamics

RSA find public exponent

Does silver oxide react with hydrogen sulfide?

What does 丫 mean? 丫是什么意思？

What is the "studentd" process?

"klopfte jemand" or "jemand klopfte"?

Printing attributes of selection in ArcPy?

Is there hard evidence that the grant peer review system performs significantly better than random?

What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?

Weaponising the Grasp-at-a-Distance spell

Why datecode is SO IMPORTANT to chip manufacturers?

Can two people see the same photon?

Special flights

My mentor says to set image to Fine instead of RAW — how is this different from JPG?

Neural network weights too large?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

Data science time! April 2019 and salary with experience

The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)

The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.

After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.

Anyway, are weights in a neural network suppose to be this high?

Gracias

asked Mar 8 at 23:14

lildoodilydo

214

What activation function are you using?

– Cristy
Mar 8 at 23:16

@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20

What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53

1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53

add a comment |

Anyway, are weights in a neural network suppose to be this high?

Gracias

asked Mar 8 at 23:14

lildoodilydo

214

What activation function are you using?

– Cristy
Mar 8 at 23:16

@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20

What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53

1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53

add a comment |

Anyway, are weights in a neural network suppose to be this high?

Gracias

asked Mar 8 at 23:14

lildoodilydo

214

Anyway, are weights in a neural network suppose to be this high?

Gracias

tensorflow machine-learning neural-network

asked Mar 8 at 23:14

lildoodilydo

214

asked Mar 8 at 23:14

lildoodilydo

214

asked Mar 8 at 23:14

lildoodilydo

214

asked Mar 8 at 23:14

lildoodilydo

214

asked Mar 8 at 23:14

lildoodilydo

214

What activation function are you using?

– Cristy
Mar 8 at 23:16

@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20

What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53

1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53

add a comment |

What activation function are you using?

– Cristy
Mar 8 at 23:16

@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20

What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53

1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53

What activation function are you using?

– Cristy
Mar 8 at 23:16

@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20

What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53

1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53

add a comment |

1 Answer
1

active

oldest

votes

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

the data

the problem being solved

your activation function choices

the number of neurons in each layer

the number of layers

the value of other weights!

answered Mar 8 at 23:48

dijksterhuis

461315

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072299%2fneural-network-weights-too-large%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

the data

the problem being solved

your activation function choices

the number of neurons in each layer

the number of layers

the value of other weights!

answered Mar 8 at 23:48

dijksterhuis

461315

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

add a comment |

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

the data

the problem being solved

your activation function choices

the number of neurons in each layer

the number of layers

the value of other weights!

answered Mar 8 at 23:48

dijksterhuis

461315

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

add a comment |

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

the data

the problem being solved

your activation function choices

the number of neurons in each layer

the number of layers

the value of other weights!

answered Mar 8 at 23:48

dijksterhuis

461315

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

the data

the problem being solved

your activation function choices

the number of neurons in each layer

the number of layers

the value of other weights!

answered Mar 8 at 23:48

dijksterhuis

461315

answered Mar 8 at 23:48

dijksterhuis

461315

answered Mar 8 at 23:48

dijksterhuis

461315

answered Mar 8 at 23:48

dijksterhuis

461315

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

add a comment |

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03

Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1