Why is JSoup timing out at random places in my code? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Why is subtracting these two times (in 1927) giving a strange result?Why does this code using random strings print “hello world”?jsoup posting JavaGWT 2.5.1 and Kindle paperwhite user agentHow Spring MVC make HttpServletRequest field threadsafe?Spring Java servlet return incorrect user agentHow to save the body content of New York Times links using jsoupWhy is executing Java code in comments with certain Unicode characters allowed?Jsoup catchdata appear unknowhost exception ,and can`t ping the website ,but my web browser can visitScrapy, can't crawl any page: “TCP connection timed out: 110: Connection timed out.”

In musical terms, what properties are varied by the human voice to produce different words / syllables?

The test team as an enemy of development? And how can this be avoided?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

What does 丫 mean? 丫是什么意思?

How often does castling occur in grandmaster games?

How to write capital alpha?

Tips to organize LaTeX presentations for a semester

Tannaka duality for semisimple groups

What is the difference between a "ranged attack" and a "ranged weapon attack"?

How much damage would a cupful of neutron star matter do to the Earth?

Co-worker has annoying ringtone

Is it possible for SQL statements to execute concurrently within a single session in SQL Server?

How were pictures turned from film to a big picture in a picture frame before digital scanning?

I can't produce songs

Did Mueller's report provide an evidentiary basis for the claim of Russian govt election interference via social media?

Why do early math courses focus on the cross sections of a cone and not on other 3D objects?

Constant factor of an array

Why complex landing gears are used instead of simple,reliability and light weight muscle wire or shape memory alloys?

Project Euler #1 in C++

Why BitLocker does not use RSA

Can you force honesty by using the Speak with Dead and Zone of Truth spells together?

GDP with Intermediate Production

What would you call this weird metallic apparatus that allows you to lift people?

Putting class ranking in CV, but against dept guidelines



Why is JSoup timing out at random places in my code?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Why is subtracting these two times (in 1927) giving a strange result?Why does this code using random strings print “hello world”?jsoup posting JavaGWT 2.5.1 and Kindle paperwhite user agentHow Spring MVC make HttpServletRequest field threadsafe?Spring Java servlet return incorrect user agentHow to save the body content of New York Times links using jsoupWhy is executing Java code in comments with certain Unicode characters allowed?Jsoup catchdata appear unknowhost exception ,and can`t ping the website ,but my web browser can visitScrapy, can't crawl any page: “TCP connection timed out: 110: Connection timed out.”



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



Here is an example of a connection I am performing (all connections follow this same format).



doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


This is the error I am getting










share|improve this question






























    0















    I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



    I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



    The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



    Here is an example of a connection I am performing (all connections follow this same format).



    doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


    This is the error I am getting










    share|improve this question


























      0












      0








      0


      0






      I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



      I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



      The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



      Here is an example of a connection I am performing (all connections follow this same format).



      doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


      This is the error I am getting










      share|improve this question
















      I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



      I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



      The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



      Here is an example of a connection I am performing (all connections follow this same format).



      doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


      This is the error I am getting







      java web-scraping connection timeout jsoup






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 8 at 23:43







      Jacob Snyder

















      asked Mar 8 at 23:33









      Jacob SnyderJacob Snyder

      32




      32






















          1 Answer
          1






          active

          oldest

          votes


















          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072441%2fwhy-is-jsoup-timing-out-at-random-places-in-my-code%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41















          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41













          0












          0








          0







          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer













          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 8 at 23:50









          mvmnmvmn

          1,8091524




          1,8091524












          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41

















          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41
















          Is there a way around this? Thank you for the answer.

          – Jacob Snyder
          Mar 9 at 0:00





          Is there a way around this? Thank you for the answer.

          – Jacob Snyder
          Mar 9 at 0:00













          Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

          – mvmn
          Mar 9 at 0:04





          Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

          – mvmn
          Mar 9 at 0:04













          Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

          – mvmn
          Mar 9 at 0:06






          Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

          – mvmn
          Mar 9 at 0:06














          P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

          – mvmn
          Mar 9 at 11:41





          P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

          – mvmn
          Mar 9 at 11:41



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072441%2fwhy-is-jsoup-timing-out-at-random-places-in-my-code%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          1928 у кіно

          Захаров Федір Захарович

          Ель Греко