How can I use puppeteer for cachewarmer?How do JavaScript closures work?How can I upload files asynchronously?How do I check if an element is hidden in jQuery?How do I remove a property from a JavaScript object?How do I redirect to another webpage?How do I include a JavaScript file in another JavaScript file?How to check whether a string contains a substring in JavaScript?How to decide when to use Node.js?How do I remove a particular element from an array in JavaScript?How do I return the response from an asynchronous call?
How do you conduct xenoanthropology after first contact?
How to make payment on the internet without leaving a money trail?
How can I fix this gap between bookcases I made?
Is Social Media Science Fiction?
Why is the design of haulage companies so “special”?
How to report a triplet of septets in NMR tabulation?
How does one intimidate enemies without having the capacity for violence?
Banach space and Hilbert space topology
Should I join office cleaning event for free?
Why is "Reports" in sentence down without "The"
Circuitry of TV splitters
Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).
Work Breakdown with Tikz
How much RAM could one put in a typical 80386 setup?
Compute hash value according to multiplication method
What is the logic behind how bash tests for true/false?
Prevent a directory in /tmp from being deleted
Is it possible to do 50 km distance without any previous training?
Schwarzchild Radius of the Universe
Simulate Bitwise Cyclic Tag
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
What is the offset in a seaplane's hull?
Why don't electron-positron collisions release infinite energy?
Example of a relative pronoun
How can I use puppeteer for cachewarmer?
How do JavaScript closures work?How can I upload files asynchronously?How do I check if an element is hidden in jQuery?How do I remove a property from a JavaScript object?How do I redirect to another webpage?How do I include a JavaScript file in another JavaScript file?How to check whether a string contains a substring in JavaScript?How to decide when to use Node.js?How do I remove a particular element from an array in JavaScript?How do I return the response from an asynchronous call?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a long txt
file with ~1000 urls, which needs to be executed, to warm the varnish cache.
Since I need puppeteer, is that there is important content loaded by AJAX call.
This is my first attempt, but not master in node.
The real issue is that it make a 100% load, and starts too many threads.
const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');
const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);
rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);
browser.close();
)();
);
javascript node.js web-scraping puppeteer
add a comment |
I have a long txt
file with ~1000 urls, which needs to be executed, to warm the varnish cache.
Since I need puppeteer, is that there is important content loaded by AJAX call.
This is my first attempt, but not master in node.
The real issue is that it make a 100% load, and starts too many threads.
const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');
const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);
rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);
browser.close();
)();
);
javascript node.js web-scraping puppeteer
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22
add a comment |
I have a long txt
file with ~1000 urls, which needs to be executed, to warm the varnish cache.
Since I need puppeteer, is that there is important content loaded by AJAX call.
This is my first attempt, but not master in node.
The real issue is that it make a 100% load, and starts too many threads.
const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');
const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);
rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);
browser.close();
)();
);
javascript node.js web-scraping puppeteer
I have a long txt
file with ~1000 urls, which needs to be executed, to warm the varnish cache.
Since I need puppeteer, is that there is important content loaded by AJAX call.
This is my first attempt, but not master in node.
The real issue is that it make a 100% load, and starts too many threads.
const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');
const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);
rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);
browser.close();
)();
);
javascript node.js web-scraping puppeteer
javascript node.js web-scraping puppeteer
edited Mar 8 at 6:53
Tiw
4,38961730
4,38961730
asked Mar 8 at 5:59
ramlevramlev
81110
81110
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22
add a comment |
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22
add a comment |
1 Answer
1
active
oldest
votes
As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).
Option 1
Launches one browser and visits all pages one after another:
const puppeteer = require('puppeteer');
const fs = require('fs');
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);
await browser.close();
)();
Option 2
As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).
const Cluster = require('puppeteer-cluster');
const fs = require('fs');
(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);
// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);
// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));
// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();
You can play around with the value of maxConcurrency
to change the number of workers depending on the capabilities (CPU/memory) of your system.
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057539%2fhow-can-i-use-puppeteer-for-cachewarmer%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).
Option 1
Launches one browser and visits all pages one after another:
const puppeteer = require('puppeteer');
const fs = require('fs');
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);
await browser.close();
)();
Option 2
As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).
const Cluster = require('puppeteer-cluster');
const fs = require('fs');
(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);
// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);
// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));
// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();
You can play around with the value of maxConcurrency
to change the number of workers depending on the capabilities (CPU/memory) of your system.
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
add a comment |
As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).
Option 1
Launches one browser and visits all pages one after another:
const puppeteer = require('puppeteer');
const fs = require('fs');
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);
await browser.close();
)();
Option 2
As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).
const Cluster = require('puppeteer-cluster');
const fs = require('fs');
(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);
// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);
// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));
// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();
You can play around with the value of maxConcurrency
to change the number of workers depending on the capabilities (CPU/memory) of your system.
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
add a comment |
As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).
Option 1
Launches one browser and visits all pages one after another:
const puppeteer = require('puppeteer');
const fs = require('fs');
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);
await browser.close();
)();
Option 2
As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).
const Cluster = require('puppeteer-cluster');
const fs = require('fs');
(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);
// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);
// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));
// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();
You can play around with the value of maxConcurrency
to change the number of workers depending on the capabilities (CPU/memory) of your system.
As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).
Option 1
Launches one browser and visits all pages one after another:
const puppeteer = require('puppeteer');
const fs = require('fs');
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);
await browser.close();
)();
Option 2
As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).
const Cluster = require('puppeteer-cluster');
const fs = require('fs');
(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);
// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);
// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));
// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();
You can play around with the value of maxConcurrency
to change the number of workers depending on the capabilities (CPU/memory) of your system.
answered Mar 8 at 16:28
Thomas DondorfThomas Dondorf
1,886517
1,886517
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
add a comment |
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
1
1
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
Thank you, im going with option 2, the puppeteer-cluster works like a charm here.
– ramlev
Mar 11 at 7:09
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057539%2fhow-can-i-use-puppeteer-for-cachewarmer%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?
– vsemozhetbyt
Mar 8 at 6:22