Neural network weights too large? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR

A proverb that is used to imply that you have unexpectedly faced a big problem

How to write capital alpha?

Where is the Next Backup Size entry on iOS 12?

What does it mean that physics no longer uses mechanical models to describe phenomena?

License to disallow distribution in closed source software, but allow exceptions made by owner?

Why are vacuum tubes still used in amateur radios?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

What initially awakened the Balrog?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

Is multiple magic items in one inherently imbalanced?

Trying to understand entropy as a novice in thermodynamics

RSA find public exponent

Does silver oxide react with hydrogen sulfide?

What does 丫 mean? 丫是什么意思?

What is the "studentd" process?

"klopfte jemand" or "jemand klopfte"?

Printing attributes of selection in ArcPy?

Is there hard evidence that the grant peer review system performs significantly better than random?

What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?

Weaponising the Grasp-at-a-Distance spell

Why datecode is SO IMPORTANT to chip manufacturers?

Can two people see the same photon?

Special flights

My mentor says to set image to Fine instead of RAW — how is this different from JPG?



Neural network weights too large?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Understanding Neural Network BackpropagationRole of Bias in Neural NetworksBack propagation algorithm of Neural Network : XOR trainingNeural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAtternHow to choose number of hidden layers and nodes in neural network?Matlab neural network simulate up to hidden layerNeural Network - updating weight matrix - back-propagation algorithmDesigning Neural NetworksDeep Belief Networks vs Convolutional Neural NetworksNeural Network trained using back propagation solves AND , Or but doesn't solve XOR



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)



The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.



After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.



Anyway, are weights in a neural network suppose to be this high?



Gracias










share|improve this question






















  • What activation function are you using?

    – Cristy
    Mar 8 at 23:16











  • @Cristy 1/(1+e^(-input))

    – lildoodilydo
    Mar 8 at 23:20











  • What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

    – Peteris
    Mar 9 at 3:53












  • 1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

    – Ray Tayek
    Mar 9 at 5:53

















0















I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)



The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.



After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.



Anyway, are weights in a neural network suppose to be this high?



Gracias










share|improve this question






















  • What activation function are you using?

    – Cristy
    Mar 8 at 23:16











  • @Cristy 1/(1+e^(-input))

    – lildoodilydo
    Mar 8 at 23:20











  • What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

    – Peteris
    Mar 9 at 3:53












  • 1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

    – Ray Tayek
    Mar 9 at 5:53













0












0








0


1






I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)



The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.



After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.



Anyway, are weights in a neural network suppose to be this high?



Gracias










share|improve this question














I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)



The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer.
The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes.
The training data set has 40,000 entries, they are normalized with their z-scores.



After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.



Anyway, are weights in a neural network suppose to be this high?



Gracias







tensorflow machine-learning neural-network






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 23:14









lildoodilydolildoodilydo

214




214












  • What activation function are you using?

    – Cristy
    Mar 8 at 23:16











  • @Cristy 1/(1+e^(-input))

    – lildoodilydo
    Mar 8 at 23:20











  • What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

    – Peteris
    Mar 9 at 3:53












  • 1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

    – Ray Tayek
    Mar 9 at 5:53

















  • What activation function are you using?

    – Cristy
    Mar 8 at 23:16











  • @Cristy 1/(1+e^(-input))

    – lildoodilydo
    Mar 8 at 23:20











  • What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

    – Peteris
    Mar 9 at 3:53












  • 1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

    – Ray Tayek
    Mar 9 at 5:53
















What activation function are you using?

– Cristy
Mar 8 at 23:16





What activation function are you using?

– Cristy
Mar 8 at 23:16













@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20





@Cristy 1/(1+e^(-input))

– lildoodilydo
Mar 8 at 23:20













What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53






What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.

– Peteris
Mar 9 at 3:53














1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53





1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.

– Ray Tayek
Mar 9 at 5:53












1 Answer
1






active

oldest

votes


















0














A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!



Let’s say that you have two classes: A & B



Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.



The input to a node is w * x



A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1


When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.



But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.



The values of your weights depend on many things:



  • the data

  • the problem being solved

  • your activation function choices

  • the number of neurons in each layer

  • the number of layers

  • the value of other weights!





share|improve this answer























  • what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

    – lildoodilydo
    Mar 9 at 0:03











  • Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

    – dijksterhuis
    Mar 9 at 12:37











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072299%2fneural-network-weights-too-large%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!



Let’s say that you have two classes: A & B



Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.



The input to a node is w * x



A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1


When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.



But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.



The values of your weights depend on many things:



  • the data

  • the problem being solved

  • your activation function choices

  • the number of neurons in each layer

  • the number of layers

  • the value of other weights!





share|improve this answer























  • what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

    – lildoodilydo
    Mar 9 at 0:03











  • Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

    – dijksterhuis
    Mar 9 at 12:37















0














A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!



Let’s say that you have two classes: A & B



Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.



The input to a node is w * x



A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1


When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.



But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.



The values of your weights depend on many things:



  • the data

  • the problem being solved

  • your activation function choices

  • the number of neurons in each layer

  • the number of layers

  • the value of other weights!





share|improve this answer























  • what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

    – lildoodilydo
    Mar 9 at 0:03











  • Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

    – dijksterhuis
    Mar 9 at 12:37













0












0








0







A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!



Let’s say that you have two classes: A & B



Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.



The input to a node is w * x



A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1


When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.



But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.



The values of your weights depend on many things:



  • the data

  • the problem being solved

  • your activation function choices

  • the number of neurons in each layer

  • the number of layers

  • the value of other weights!





share|improve this answer













A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!



Let’s say that you have two classes: A & B



Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.



The input to a node is w * x



A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1


When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.



But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.



The values of your weights depend on many things:



  • the data

  • the problem being solved

  • your activation function choices

  • the number of neurons in each layer

  • the number of layers

  • the value of other weights!






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 8 at 23:48









dijksterhuisdijksterhuis

461315




461315












  • what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

    – lildoodilydo
    Mar 9 at 0:03











  • Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

    – dijksterhuis
    Mar 9 at 12:37

















  • what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

    – lildoodilydo
    Mar 9 at 0:03











  • Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

    – dijksterhuis
    Mar 9 at 12:37
















what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03





what if most of the weights have values that high (like 1000) and some have even higher weights (like 40000?). And what if some weights are negative? (like -700) I guess I should've put in my question that the weights fluctuate greatly. I'm wondering if this is normal behavior of weights in a network? Thanks.

– lildoodilydo
Mar 9 at 0:03













Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37





Weights can be any real number. Backprop, during training, is trying to identify which signal pathways will minimise the output error. Here’s a question for you.... is the model giving you accurate results? What is your error rate?

– dijksterhuis
Mar 9 at 12:37



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072299%2fneural-network-weights-too-large%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved