How can I use puppeteer for cachewarmer?How do JavaScript closures work?How can I upload files asynchronously?How do I check if an element is hidden in jQuery?How do I remove a property from a JavaScript object?How do I redirect to another webpage?How do I include a JavaScript file in another JavaScript file?How to check whether a string contains a substring in JavaScript?How to decide when to use Node.js?How do I remove a particular element from an array in JavaScript?How do I return the response from an asynchronous call?

How do you conduct xenoanthropology after first contact?

How to make payment on the internet without leaving a money trail?

How can I fix this gap between bookcases I made?

Is Social Media Science Fiction?

Why is the design of haulage companies so “special”?

How to report a triplet of septets in NMR tabulation?

How does one intimidate enemies without having the capacity for violence?

Banach space and Hilbert space topology

Should I join office cleaning event for free?

Why is "Reports" in sentence down without "The"

Circuitry of TV splitters

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

Work Breakdown with Tikz

How much RAM could one put in a typical 80386 setup?

Compute hash value according to multiplication method

What is the logic behind how bash tests for true/false?

Prevent a directory in /tmp from being deleted

Is it possible to do 50 km distance without any previous training?

Schwarzchild Radius of the Universe

Simulate Bitwise Cyclic Tag

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

What is the offset in a seaplane's hull?

Why don't electron-positron collisions release infinite energy?

Example of a relative pronoun



How can I use puppeteer for cachewarmer?


How do JavaScript closures work?How can I upload files asynchronously?How do I check if an element is hidden in jQuery?How do I remove a property from a JavaScript object?How do I redirect to another webpage?How do I include a JavaScript file in another JavaScript file?How to check whether a string contains a substring in JavaScript?How to decide when to use Node.js?How do I remove a particular element from an array in JavaScript?How do I return the response from an asynchronous call?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I have a long txt file with ~1000 urls, which needs to be executed, to warm the varnish cache.

Since I need puppeteer, is that there is important content loaded by AJAX call.



This is my first attempt, but not master in node.

The real issue is that it make a 100% load, and starts too many threads.



const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');

const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);

rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);

browser.close();


)();
);









share|improve this question
























  • This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

    – vsemozhetbyt
    Mar 8 at 6:22

















1















I have a long txt file with ~1000 urls, which needs to be executed, to warm the varnish cache.

Since I need puppeteer, is that there is important content loaded by AJAX call.



This is my first attempt, but not master in node.

The real issue is that it make a 100% load, and starts too many threads.



const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');

const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);

rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);

browser.close();


)();
);









share|improve this question
























  • This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

    – vsemozhetbyt
    Mar 8 at 6:22













1












1








1








I have a long txt file with ~1000 urls, which needs to be executed, to warm the varnish cache.

Since I need puppeteer, is that there is important content loaded by AJAX call.



This is my first attempt, but not master in node.

The real issue is that it make a 100% load, and starts too many threads.



const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');

const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);

rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);

browser.close();


)();
);









share|improve this question
















I have a long txt file with ~1000 urls, which needs to be executed, to warm the varnish cache.

Since I need puppeteer, is that there is important content loaded by AJAX call.



This is my first attempt, but not master in node.

The real issue is that it make a 100% load, and starts too many threads.



const puppeteer = require('puppeteer');
const readline = require('readline');
const fs = require('fs');

const rl = readline.createInterface(
input: fs.createReadStream('varnish-warmer.txt')
);

rl.on('line', (line) =>
(async () =>
if (line != '')
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(line);
await page.waitFor(1000);

browser.close();


)();
);






javascript node.js web-scraping puppeteer






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 at 6:53









Tiw

4,38961730




4,38961730










asked Mar 8 at 5:59









ramlevramlev

81110




81110












  • This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

    – vsemozhetbyt
    Mar 8 at 6:22

















  • This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

    – vsemozhetbyt
    Mar 8 at 6:22
















This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

– vsemozhetbyt
Mar 8 at 6:22





This code launches 1000 browser instances in parallel. What Node.js version do you use? And would it be appropriate if you open these pages in series, one by one?

– vsemozhetbyt
Mar 8 at 6:22












1 Answer
1






active

oldest

votes


















1














As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).



Option 1



Launches one browser and visits all pages one after another:



const puppeteer = require('puppeteer');
const fs = require('fs');

const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');

(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);

await browser.close();
)();


Option 2



As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).



const Cluster = require('puppeteer-cluster');
const fs = require('fs');

(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);

// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);

// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));

// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();


You can play around with the value of maxConcurrency to change the number of workers depending on the capabilities (CPU/memory) of your system.






share|improve this answer


















  • 1





    Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

    – ramlev
    Mar 11 at 7:09











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057539%2fhow-can-i-use-puppeteer-for-cachewarmer%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).



Option 1



Launches one browser and visits all pages one after another:



const puppeteer = require('puppeteer');
const fs = require('fs');

const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');

(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);

await browser.close();
)();


Option 2



As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).



const Cluster = require('puppeteer-cluster');
const fs = require('fs');

(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);

// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);

// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));

// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();


You can play around with the value of maxConcurrency to change the number of workers depending on the capabilities (CPU/memory) of your system.






share|improve this answer


















  • 1





    Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

    – ramlev
    Mar 11 at 7:09















1














As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).



Option 1



Launches one browser and visits all pages one after another:



const puppeteer = require('puppeteer');
const fs = require('fs');

const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');

(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);

await browser.close();
)();


Option 2



As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).



const Cluster = require('puppeteer-cluster');
const fs = require('fs');

(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);

// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);

// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));

// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();


You can play around with the value of maxConcurrency to change the number of workers depending on the capabilities (CPU/memory) of your system.






share|improve this answer


















  • 1





    Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

    – ramlev
    Mar 11 at 7:09













1












1








1







As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).



Option 1



Launches one browser and visits all pages one after another:



const puppeteer = require('puppeteer');
const fs = require('fs');

const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');

(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);

await browser.close();
)();


Option 2



As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).



const Cluster = require('puppeteer-cluster');
const fs = require('fs');

(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);

// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);

// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));

// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();


You can play around with the value of maxConcurrency to change the number of workers depending on the capabilities (CPU/memory) of your system.






share|improve this answer













As you already noticed, your code launches all browsers in parallel which overloads your system. You could either visit each URL one after another (option 1) or use a pool of browsers to speed the process up (option 2).



Option 1



Launches one browser and visits all pages one after another:



const puppeteer = require('puppeteer');
const fs = require('fs');

const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');

(async () =>
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const line of lines)
await page.goto(line);
await page.waitFor(1000);

await browser.close();
)();


Option 2



As option 1 might take a while for 1000 URLs, you might want to use a pool of browsers to visit the pages in parallel and speed things up. You can use puppeteer-cluster for that (disclaimer: I'm the author of the library).



const Cluster = require('puppeteer-cluster');
const fs = require('fs');

(async () =>
const cluster = await Cluster.launch(
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10, // how many URLs should be visited in parallel
// monitor: true, // uncomment to see information about progress
);

// Define the task for each URL
await cluster.task(async ( page, data: url ) =>
await page.goto(url);
await page.waitFor(1000);
);

// Queue the URLs
const lines = fs.readFileSync('varnish-warmer.txt').toString().split('n');
lines.forEach(line => cluster.queue(line));

// Wait for the tasks to finish and close the cluster after that
await cluster.idle();
await cluster.close();
)();


You can play around with the value of maxConcurrency to change the number of workers depending on the capabilities (CPU/memory) of your system.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 8 at 16:28









Thomas DondorfThomas Dondorf

1,886517




1,886517







  • 1





    Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

    – ramlev
    Mar 11 at 7:09












  • 1





    Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

    – ramlev
    Mar 11 at 7:09







1




1





Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

– ramlev
Mar 11 at 7:09





Thank you, im going with option 2, the puppeteer-cluster works like a charm here.

– ramlev
Mar 11 at 7:09



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057539%2fhow-can-i-use-puppeteer-for-cachewarmer%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved