Masked language model processing, deeper explanation Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceNatural Language Processing ModelHow to tune a Machine Translation model with huge language model?stanfordnlp - Training space separated words as a single token to Stanford NER model generationhow to fine-tune word2vec when training our CNN for text classification?How to add new embeddings for unknown words in Tensorflow (training & pre-set for testing)Python: clustering similar words based on word2vecSentence order prediction from user given input using RNN- LSTM language modelingWhy does MITIE get stuck on segment classifier?Using pre-trained word embeddings - how to create vector for unknown / OOV Token?How to use BERT in image caption tasks,such as im2txt,densecap

Why does this iterative way of solving of equation work?

What to do with post with dry rot?

Using "nakedly" instead of "with nothing on"

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

Slither Like a Snake

Why use gamma over alpha radiation?

Simulating Exploding Dice

Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?

What loss function to use when labels are probabilities?

Is 1 ppb equal to 1 μg/kg?

Active filter with series inductor and resistor - do these exist?

Need a suitable toxic chemical for a murder plot in my novel

3 doors, three guards, one stone

Is drag coefficient lowest at zero angle of attack?

New Order #5: where Fibonacci and Beatty meet at Wythoff

Classification of bundles, Postnikov towers, obstruction theory, local coefficients

Is there a service that would inform me whenever a new direct route is scheduled from a given airport?

Estimate capacitor parameters

How to say that you spent the night with someone, you were only sleeping and nothing else?

90's book, teen horror

Single author papers against my advisor's will?

What computer would be fastest for Mathematica Home Edition?

Do working physicists consider Newtonian mechanics to be "falsified"?

Cold is to Refrigerator as warm is to?

Masked language model processing, deeper explanation

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experienceNatural Language Processing ModelHow to tune a Machine Translation model with huge language model?stanfordnlp - Training space separated words as a single token to Stanford NER model generationhow to fine-tune word2vec when training our CNN for text classification?How to add new embeddings for unknown words in Tensorflow (training & pre-set for testing)Python: clustering similar words based on word2vecSentence order prediction from user given input using RNN- LSTM language modelingWhy does MITIE get stuck on segment classifier?Using pre-trained word embeddings - how to create vector for unknown / OOV Token?How to use BERT in image caption tasks,such as im2txt,densecap

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm looking to BERT model (you can found the description here) in detail and I'm getting problem to understand clearly the need to keep or replace random word 20% of the time instead or just use [MASK] token always for the masked language model.

We try to train the bidirectional technique and the article explains "[MASK] token is never seen during fine-tuning" but it is two different steps for me, we train first bidirectional and after we downstream task.

If someone can explain to me where I'm wrong in my comprehension.

asked Mar 8 at 15:09

Jonor

9313

add a comment |

If someone can explain to me where I'm wrong in my comprehension.

asked Mar 8 at 15:09

Jonor

9313

add a comment |

If someone can explain to me where I'm wrong in my comprehension.

asked Mar 8 at 15:09

Jonor

9313

If someone can explain to me where I'm wrong in my comprehension.

nlp stanford-nlp

asked Mar 8 at 15:09

Jonor

9313

asked Mar 8 at 15:09

Jonor

9313

asked Mar 8 at 15:09

Jonor

9313

asked Mar 8 at 15:09

Jonor

9313

asked Mar 8 at 15:09

Jonor

9313

add a comment |

1 Answer
1

active

oldest

votes

If you don't use random replacement during training your network won't learn to extract useful features from non-masked tokens.

in other words, if you only use masking and try to predict them, it will be a waste of resources for your network to extract good features for the non-masked tokens(remember that your network is as good as your task and it will try to find the easiest way to solve your task)

answered Mar 10 at 20:56

Separius

379313

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

1

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55066010%2fmasked-language-model-processing-deeper-explanation%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If you don't use random replacement during training your network won't learn to extract useful features from non-masked tokens.

answered Mar 10 at 20:56

Separius

379313

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

1

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

add a comment |

If you don't use random replacement during training your network won't learn to extract useful features from non-masked tokens.

answered Mar 10 at 20:56

Separius

379313

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

1

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

add a comment |

If you don't use random replacement during training your network won't learn to extract useful features from non-masked tokens.

answered Mar 10 at 20:56

Separius

379313

If you don't use random replacement during training your network won't learn to extract useful features from non-masked tokens.

answered Mar 10 at 20:56

Separius

379313

answered Mar 10 at 20:56

Separius

379313

answered Mar 10 at 20:56

Separius

379313

answered Mar 10 at 20:56

Separius

379313

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

1

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

add a comment |

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

1

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

Thanks, but I don't see the goal of unchanged the sentence in that case rather than just use random words sometimes

– Jonor
Mar 11 at 10:17

If you always replaced the words, then your network would always try to guess something other than the word provided (remember that they calculate the loss only for some words) and that would be bad and destructive

– Separius
Mar 11 at 10:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer
1

1 Answer
1

1 Answer
1