Multiple cpu producers with few gpus not utilize 100% of the gpus (pytorch) The 2019 Stack Overflow Developer Survey Results Are InEffect of Data Parallelism on Training ResultHigh GPU Memory-Usage but low volatile gpu-utilParallel GPU computations - utilization fluctuationCaffe's GPU Utilization Is Not Full Enough When Doing Forward Inference, Any Idea?tensorflow on multiple GPUs (very odd behaviour)Tensorflow-GPU Eager Execution: Batch Normalization extremely slow and decreases Volatile GPU-UtilPytorch CPU and GPU run in parallelPytorch on GPU: variable still on CPUPyTorch: Move Weights Between GPU and CPU on the fly
Is bread bad for ducks?
Pristine Bit Checking
Falsification in Math vs Science
Why could you hear an Amstrad CPC working?
Is domain driven design an anti-SQL pattern?
Are there any other methods to apply to solving simultaneous equations?
Manuscript was "unsubmitted" because the manuscript was deposited in Arxiv Preprints
The difference between dialogue marks
On the insanity of kings as an argument against Monarchy
I see my dog run
How can I create a character who can assume the widest possible range of creature sizes?
Why do UK politicians seemingly ignore opinion polls on Brexit?
Unbreakable Formation vs. Cry of the Carnarium
In microwave frequencies, do you use a circulator when you need a (near) perfect diode?
Access elements in std::string where positon of string is greater than its size
Does it makes sense to buy a new cycle to learn riding?
"What time...?" or "At what time...?" - what is more grammatically correct?
A poker game description that does not feel gimmicky
How to manage monthly salary
What does "rabbited" mean/imply in this sentence?
Output the Arecibo Message
What do hard-Brexiteers want with respect to the Irish border?
Extreme, unacceptable situation and I can't attend work tomorrow morning
Landlord wants to switch my lease to a "Land contract" to "get back at the city"
Multiple cpu producers with few gpus not utilize 100% of the gpus (pytorch)
The 2019 Stack Overflow Developer Survey Results Are InEffect of Data Parallelism on Training ResultHigh GPU Memory-Usage but low volatile gpu-utilParallel GPU computations - utilization fluctuationCaffe's GPU Utilization Is Not Full Enough When Doing Forward Inference, Any Idea?tensorflow on multiple GPUs (very odd behaviour)Tensorflow-GPU Eager Execution: Batch Normalization extremely slow and decreases Volatile GPU-UtilPytorch CPU and GPU run in parallelPytorch on GPU: variable still on CPUPyTorch: Move Weights Between GPU and CPU on the fly
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I tried to implement board game self-play data generation in parallel using multiple cpus to do self-paly concurrently. For parent process, i created 4 NN model for 30cpus (1 model for 10 cpus and 1 model to train) each model is in different gpus.(the model is implemented as 20 blocks resnet-like architecture with batchnorm) Pseudo code as follows
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
parallel_self_play() is implemented as follows
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
My code work perfectly fine with almost 100% gpu utilization throughout the first self-play (less than 10 minutes for an iteration) but after the first iteration (training) when i loaded new weights into nnet1-3 gpu utilization never reach 80% again (~30min - 1hour per iteration). I notice a few things while mess around with me code
This model includes batchnorm layers, when switch model to train() mode -> train -> switch back to eval() causes the self-play (use forward pass from model) to not use gpu at all.
If it doesn't switch from eval() -> train() (train using eval mode) this causes gpu utilization to be lower (30-50%) but not entirely gone.
If the models that are not the main one doesn't load the weights from the main one, self-play still utilize 100% gpu so my guess is that something happened during training process and change some states in the model.
This also happen when use only 8 cpus - 1gpu architecture and train model on the fly (no intermediate one).
Can someone guide me where to fix my code or how i should train my model?
python parallel-processing deep-learning pytorch reinforcement-learning
add a comment |
I tried to implement board game self-play data generation in parallel using multiple cpus to do self-paly concurrently. For parent process, i created 4 NN model for 30cpus (1 model for 10 cpus and 1 model to train) each model is in different gpus.(the model is implemented as 20 blocks resnet-like architecture with batchnorm) Pseudo code as follows
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
parallel_self_play() is implemented as follows
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
My code work perfectly fine with almost 100% gpu utilization throughout the first self-play (less than 10 minutes for an iteration) but after the first iteration (training) when i loaded new weights into nnet1-3 gpu utilization never reach 80% again (~30min - 1hour per iteration). I notice a few things while mess around with me code
This model includes batchnorm layers, when switch model to train() mode -> train -> switch back to eval() causes the self-play (use forward pass from model) to not use gpu at all.
If it doesn't switch from eval() -> train() (train using eval mode) this causes gpu utilization to be lower (30-50%) but not entirely gone.
If the models that are not the main one doesn't load the weights from the main one, self-play still utilize 100% gpu so my guess is that something happened during training process and change some states in the model.
This also happen when use only 8 cpus - 1gpu architecture and train model on the fly (no intermediate one).
Can someone guide me where to fix my code or how i should train my model?
python parallel-processing deep-learning pytorch reinforcement-learning
add a comment |
I tried to implement board game self-play data generation in parallel using multiple cpus to do self-paly concurrently. For parent process, i created 4 NN model for 30cpus (1 model for 10 cpus and 1 model to train) each model is in different gpus.(the model is implemented as 20 blocks resnet-like architecture with batchnorm) Pseudo code as follows
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
parallel_self_play() is implemented as follows
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
My code work perfectly fine with almost 100% gpu utilization throughout the first self-play (less than 10 minutes for an iteration) but after the first iteration (training) when i loaded new weights into nnet1-3 gpu utilization never reach 80% again (~30min - 1hour per iteration). I notice a few things while mess around with me code
This model includes batchnorm layers, when switch model to train() mode -> train -> switch back to eval() causes the self-play (use forward pass from model) to not use gpu at all.
If it doesn't switch from eval() -> train() (train using eval mode) this causes gpu utilization to be lower (30-50%) but not entirely gone.
If the models that are not the main one doesn't load the weights from the main one, self-play still utilize 100% gpu so my guess is that something happened during training process and change some states in the model.
This also happen when use only 8 cpus - 1gpu architecture and train model on the fly (no intermediate one).
Can someone guide me where to fix my code or how i should train my model?
python parallel-processing deep-learning pytorch reinforcement-learning
I tried to implement board game self-play data generation in parallel using multiple cpus to do self-paly concurrently. For parent process, i created 4 NN model for 30cpus (1 model for 10 cpus and 1 model to train) each model is in different gpus.(the model is implemented as 20 blocks resnet-like architecture with batchnorm) Pseudo code as follows
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
parallel_self_play() is implemented as follows
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
My code work perfectly fine with almost 100% gpu utilization throughout the first self-play (less than 10 minutes for an iteration) but after the first iteration (training) when i loaded new weights into nnet1-3 gpu utilization never reach 80% again (~30min - 1hour per iteration). I notice a few things while mess around with me code
This model includes batchnorm layers, when switch model to train() mode -> train -> switch back to eval() causes the self-play (use forward pass from model) to not use gpu at all.
If it doesn't switch from eval() -> train() (train using eval mode) this causes gpu utilization to be lower (30-50%) but not entirely gone.
If the models that are not the main one doesn't load the weights from the main one, self-play still utilize 100% gpu so my guess is that something happened during training process and change some states in the model.
This also happen when use only 8 cpus - 1gpu architecture and train model on the fly (no intermediate one).
Can someone guide me where to fix my code or how i should train my model?
python parallel-processing deep-learning pytorch reinforcement-learning
python parallel-processing deep-learning pytorch reinforcement-learning
edited Mar 8 at 8:42
51616
asked Mar 8 at 8:31
5161651616
62
62
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55059358%2fmultiple-cpu-producers-with-few-gpus-not-utilize-100-of-the-gpus-pytorch%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55059358%2fmultiple-cpu-producers-with-few-gpus-not-utilize-100-of-the-gpus-pytorch%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown