Neural network weights too large? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR
A proverb that is used to imply that you have unexpectedly faced a big problem
How to write capital alpha?
Where is the Next Backup Size entry on iOS 12?
What does it mean that physics no longer uses mechanical models to describe phenomena?
License to disallow distribution in closed source software, but allow exceptions made by owner?
Why are vacuum tubes still used in amateur radios?
As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?
What initially awakened the Balrog?
In musical terms, what properties are varied by the human voice to produce different words / syllables?
Is multiple magic items in one inherently imbalanced?
Trying to understand entropy as a novice in thermodynamics
RSA find public exponent
Does silver oxide react with hydrogen sulfide?
What does 丫 mean? 丫是什么意思?
What is the "studentd" process?
"klopfte jemand" or "jemand klopfte"?
Printing attributes of selection in ArcPy?
Is there hard evidence that the grant peer review system performs significantly better than random?
What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?
Weaponising the Grasp-at-a-Distance spell
Why datecode is SO IMPORTANT to chip manufacturers?
Can two people see the same photon?
Special flights
My mentor says to set image to Fine instead of RAW — how is this different from JPG?
Neural network weights too large?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)
The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.
After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.
Anyway, are weights in a neural network suppose to be this high?
Gracias
tensorflow machine-learning neural-network
add a comment |
I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)
The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.
After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.
Anyway, are weights in a neural network suppose to be this high?
Gracias
tensorflow machine-learning neural-network
What activation function are you using?
– Cristy
Mar 8 at 23:16
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53
add a comment |
I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)
The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.
After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.
Anyway, are weights in a neural network suppose to be this high?
Gracias
tensorflow machine-learning neural-network
I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)
The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.
After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.
Anyway, are weights in a neural network suppose to be this high?
Gracias
tensorflow machine-learning neural-network
tensorflow machine-learning neural-network
asked Mar 8 at 23:14
lildoodilydolildoodilydo
214
214
What activation function are you using?
– Cristy
Mar 8 at 23:16
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53
add a comment |
What activation function are you using?
– Cristy
Mar 8 at 23:16
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53
What activation function are you using?
– Cristy
Mar 8 at 23:16
What activation function are you using?
– Cristy
Mar 8 at 23:16
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53
add a comment |
1 Answer
1
active
oldest
votes
A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!
Let’s say that you have two classes: A & B
Inputs for A classes are typically always around 0.00001
. Values for B classes are the same, but some input values are around 0.001
.
The input to a node is w * x
A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1
When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.
But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.
The values of your weights depend on many things:
- the data
- the problem being solved
- your activation function choices
- the number of neurons in each layer
- the number of layers
- the value of other weights!
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072299%2fneural-network-weights-too-large%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!
Let’s say that you have two classes: A & B
Inputs for A classes are typically always around 0.00001
. Values for B classes are the same, but some input values are around 0.001
.
The input to a node is w * x
A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1
When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.
But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.
The values of your weights depend on many things:
- the data
- the problem being solved
- your activation function choices
- the number of neurons in each layer
- the number of layers
- the value of other weights!
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
add a comment |
A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!
Let’s say that you have two classes: A & B
Inputs for A classes are typically always around 0.00001
. Values for B classes are the same, but some input values are around 0.001
.
The input to a node is w * x
A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1
When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.
But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.
The values of your weights depend on many things:
- the data
- the problem being solved
- your activation function choices
- the number of neurons in each layer
- the number of layers
- the value of other weights!
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
add a comment |
A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!
Let’s say that you have two classes: A & B
Inputs for A classes are typically always around 0.00001
. Values for B classes are the same, but some input values are around 0.001
.
The input to a node is w * x
A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1
When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.
But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.
The values of your weights depend on many things:
- the data
- the problem being solved
- your activation function choices
- the number of neurons in each layer
- the number of layers
- the value of other weights!
A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!
Let’s say that you have two classes: A & B
Inputs for A classes are typically always around 0.00001
. Values for B classes are the same, but some input values are around 0.001
.
The input to a node is w * x
A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1
When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.
But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.
The values of your weights depend on many things:
- the data
- the problem being solved
- your activation function choices
- the number of neurons in each layer
- the number of layers
- the value of other weights!
answered Mar 8 at 23:48
dijksterhuisdijksterhuis
461315
461315
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
add a comment |
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.
– lildoodilydo
Mar 9 at 0:03
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?
– dijksterhuis
Mar 9 at 12:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072299%2fneural-network-weights-too-large%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What activation function are you using?
– Cristy
Mar 8 at 23:16
@Cristy 1/(1+e^(-input))
– lildoodilydo
Mar 8 at 23:20
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.
– Peteris
Mar 9 at 3:53
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.
– Ray Tayek
Mar 9 at 5:53