Scrape main content using php The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceHow can I prevent SQL injection in PHP?PHP: Delete an element from an arrayConvert HTML + CSS to PDF with PHP?How to make div not larger than its contents?startsWith() and endsWith() functions in PHPHow do I get PHP errors to display?How Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Did God make two great lights or did He make the great light two?

How are presidential pardons supposed to be used?

Take groceries in checked luggage

Wolves and sheep

Sort a list of pairs representing an acyclic, partial automorphism

How can I protect witches in combat who wear limited clothing?

Didn't get enough time to take a Coding Test - what to do now?

Who or what is the being for whom Being is a question for Heidegger?

Finding the path in a graph from A to B then back to A with a minimum of shared edges

Difference between "generating set" and free product?

Make it rain characters

How can I define good in a religion that claims no moral authority?

How should I replace vector<uint8_t>::const_iterator in an API?

Why is the object placed in the middle of the sentence here?

Cooking pasta in a water boiler

What aspect of planet Earth must be changed to prevent the industrial revolution?

Was credit for the black hole image misattributed?

Is this wall load bearing? Blueprints and photos attached

Is there a writing software that you can sort scenes like slides in PowerPoint?

How does ice melt when immersed in water?

What information about me do stores get via my credit card?

Typeface like Times New Roman but with "tied" percent sign

Why does this iterative way of solving of equation work?

Single author papers against my advisor's will?



Scrape main content using php



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceHow can I prevent SQL injection in PHP?PHP: Delete an element from an arrayConvert HTML + CSS to PDF with PHP?How to make div not larger than its contents?startsWith() and endsWith() functions in PHPHow do I get PHP errors to display?How Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















I am building a import tool just like medium.com story import tool so far i have used this code



include('includes/import/simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('https://neilpatel.com/blog/starting-over/');

// find all link
foreach($html->find('a') as $e)
echo $e->href . '<br>';

// find all image
foreach($html->find('img') as $e)
echo $e->src . '<br>';

// find all image with full tag
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';

// find all div tags with id=gbar
foreach($html->find('div#gbar') as $e)
echo $e->innertext . '<br>';

// find all span tags with class=gb1
foreach($html->find('span.gb1') as $e)
echo $e->outertext . '<br>';

// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';

// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';

// extract text from HTML
echo $html->plaintext;


But this scrape the whole page is it possible to just find and scrape only the main content like the medium import tool doing for any link



Kindly solve this problem and how can i achieve this kind of result










share|improve this question
























  • please tell us what you have tried so far to solve the problem

    – Arikael
    Mar 8 at 13:39











  • The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

    – Nigel Ren
    Mar 8 at 13:41











  • I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

    – donm
    Mar 8 at 13:41











  • @NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

    – donm
    Mar 8 at 13:43











  • @NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

    – donm
    Mar 8 at 13:44

















2















I am building a import tool just like medium.com story import tool so far i have used this code



include('includes/import/simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('https://neilpatel.com/blog/starting-over/');

// find all link
foreach($html->find('a') as $e)
echo $e->href . '<br>';

// find all image
foreach($html->find('img') as $e)
echo $e->src . '<br>';

// find all image with full tag
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';

// find all div tags with id=gbar
foreach($html->find('div#gbar') as $e)
echo $e->innertext . '<br>';

// find all span tags with class=gb1
foreach($html->find('span.gb1') as $e)
echo $e->outertext . '<br>';

// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';

// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';

// extract text from HTML
echo $html->plaintext;


But this scrape the whole page is it possible to just find and scrape only the main content like the medium import tool doing for any link



Kindly solve this problem and how can i achieve this kind of result










share|improve this question
























  • please tell us what you have tried so far to solve the problem

    – Arikael
    Mar 8 at 13:39











  • The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

    – Nigel Ren
    Mar 8 at 13:41











  • I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

    – donm
    Mar 8 at 13:41











  • @NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

    – donm
    Mar 8 at 13:43











  • @NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

    – donm
    Mar 8 at 13:44













2












2








2








I am building a import tool just like medium.com story import tool so far i have used this code



include('includes/import/simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('https://neilpatel.com/blog/starting-over/');

// find all link
foreach($html->find('a') as $e)
echo $e->href . '<br>';

// find all image
foreach($html->find('img') as $e)
echo $e->src . '<br>';

// find all image with full tag
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';

// find all div tags with id=gbar
foreach($html->find('div#gbar') as $e)
echo $e->innertext . '<br>';

// find all span tags with class=gb1
foreach($html->find('span.gb1') as $e)
echo $e->outertext . '<br>';

// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';

// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';

// extract text from HTML
echo $html->plaintext;


But this scrape the whole page is it possible to just find and scrape only the main content like the medium import tool doing for any link



Kindly solve this problem and how can i achieve this kind of result










share|improve this question
















I am building a import tool just like medium.com story import tool so far i have used this code



include('includes/import/simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('https://neilpatel.com/blog/starting-over/');

// find all link
foreach($html->find('a') as $e)
echo $e->href . '<br>';

// find all image
foreach($html->find('img') as $e)
echo $e->src . '<br>';

// find all image with full tag
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';

// find all div tags with id=gbar
foreach($html->find('div#gbar') as $e)
echo $e->innertext . '<br>';

// find all span tags with class=gb1
foreach($html->find('span.gb1') as $e)
echo $e->outertext . '<br>';

// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';

// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';

// extract text from HTML
echo $html->plaintext;


But this scrape the whole page is it possible to just find and scrape only the main content like the medium import tool doing for any link



Kindly solve this problem and how can i achieve this kind of result







javascript php jquery html regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 at 14:12







donm

















asked Mar 8 at 13:32









donmdonm

197




197












  • please tell us what you have tried so far to solve the problem

    – Arikael
    Mar 8 at 13:39











  • The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

    – Nigel Ren
    Mar 8 at 13:41











  • I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

    – donm
    Mar 8 at 13:41











  • @NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

    – donm
    Mar 8 at 13:43











  • @NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

    – donm
    Mar 8 at 13:44

















  • please tell us what you have tried so far to solve the problem

    – Arikael
    Mar 8 at 13:39











  • The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

    – Nigel Ren
    Mar 8 at 13:41











  • I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

    – donm
    Mar 8 at 13:41











  • @NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

    – donm
    Mar 8 at 13:43











  • @NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

    – donm
    Mar 8 at 13:44
















please tell us what you have tried so far to solve the problem

– Arikael
Mar 8 at 13:39





please tell us what you have tried so far to solve the problem

– Arikael
Mar 8 at 13:39













The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

– Nigel Ren
Mar 8 at 13:41





The main issue is probably how do you recognise the main content, if you can define how to identify it that would help.

– Nigel Ren
Mar 8 at 13:41













I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

– donm
Mar 8 at 13:41





I have tried the above code and got the whole page and i just want the main content like the from where the main article starts and ends

– donm
Mar 8 at 13:41













@NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

– donm
Mar 8 at 13:43





@NigelRen yes you are right but we wanted to create a general tool for every url so how i identify where the main article starts and ends like only the text content of the article

– donm
Mar 8 at 13:43













@NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

– donm
Mar 8 at 13:44





@NigelRen I hope you got my point every url content, tags are different so how can I identify the article content starting and end

– donm
Mar 8 at 13:44












1 Answer
1






active

oldest

votes


















1














I'm not completely sure what you are asking / trying to do.. But I'll give it a try.



You are trying to Identify the main content area - To scrape only the needed information without any garbage or unneeded content.



My approach is to use the common structures and good practices of well formatted HTML pages. Consider this:



  • The main article will be encapsulated in a unique ARTICLE tag on the page.

  • The H1 tag on the article will be its header.

  • We know that there are some repeating ID's used such as (main_content, main_article, etc..).

Summarize those rules on your targets and build an Identifiers list sorted by priority -> Then you can try and parse the target until one of the identifiers will be found - which indicates that you identified the main content area.



Here is an Example -> using the URL you provided:



$search_logic = [
"#main_content",
"#main_article",
"#main",
"article",
];

// get DOM from URL or file
$html = file_get_contents('https://neilpatel.com/blog/starting-over/');
$dom = new DOMDocument ();
@$dom->loadHTML($html);

//
foreach ($search_logic as $logic)

$main_container = null;

//Search by ID or By tag name:
if ($logic[0] === "#")
//Serch by ID:
$main_container = $dom->getElementById(ltrim($logic, '#'));
else
//Serch by tag name:
$main_container = $dom->getElementsByTagName($logic);


//Do we have results:
if (!empty($main_container))

echo "> Found main part identified by: ".$logic."n";
$article = isset($main_container->length) ? $main_container[0] : $main_container; // Normalize the container.

//Parse the $main_container:
echo " - Example get the title:n";
echo "t".$article->getElementsByTagName("h1")[0]->textContent."nn";

//You can stop the iteration:
//break;

else
echo "> Nothing on the page containing: ".$logic."nn";




As you can see the firs to ID's were not found so we keep trying down the list until we hit the result we want -> a good set of those tagnames / ID's will be good enough.



Here is the result:



> Nothing on the page containing: #main_content

> Nothing on the page containing: #main_article

> Found main part identified by: #main
- Example get the title:
If I Had to Start All Over Again, I Would…

> Found main part identified by: article
- Example get the title:
If I Had to Start All Over Again, I Would…


Hope I helped.






share|improve this answer























  • Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

    – donm
    Mar 8 at 15:21











  • have you ever used medium.com story import tool?

    – donm
    Mar 8 at 15:23











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064292%2fscrape-main-content-using-php%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I'm not completely sure what you are asking / trying to do.. But I'll give it a try.



You are trying to Identify the main content area - To scrape only the needed information without any garbage or unneeded content.



My approach is to use the common structures and good practices of well formatted HTML pages. Consider this:



  • The main article will be encapsulated in a unique ARTICLE tag on the page.

  • The H1 tag on the article will be its header.

  • We know that there are some repeating ID's used such as (main_content, main_article, etc..).

Summarize those rules on your targets and build an Identifiers list sorted by priority -> Then you can try and parse the target until one of the identifiers will be found - which indicates that you identified the main content area.



Here is an Example -> using the URL you provided:



$search_logic = [
"#main_content",
"#main_article",
"#main",
"article",
];

// get DOM from URL or file
$html = file_get_contents('https://neilpatel.com/blog/starting-over/');
$dom = new DOMDocument ();
@$dom->loadHTML($html);

//
foreach ($search_logic as $logic)

$main_container = null;

//Search by ID or By tag name:
if ($logic[0] === "#")
//Serch by ID:
$main_container = $dom->getElementById(ltrim($logic, '#'));
else
//Serch by tag name:
$main_container = $dom->getElementsByTagName($logic);


//Do we have results:
if (!empty($main_container))

echo "> Found main part identified by: ".$logic."n";
$article = isset($main_container->length) ? $main_container[0] : $main_container; // Normalize the container.

//Parse the $main_container:
echo " - Example get the title:n";
echo "t".$article->getElementsByTagName("h1")[0]->textContent."nn";

//You can stop the iteration:
//break;

else
echo "> Nothing on the page containing: ".$logic."nn";




As you can see the firs to ID's were not found so we keep trying down the list until we hit the result we want -> a good set of those tagnames / ID's will be good enough.



Here is the result:



> Nothing on the page containing: #main_content

> Nothing on the page containing: #main_article

> Found main part identified by: #main
- Example get the title:
If I Had to Start All Over Again, I Would…

> Found main part identified by: article
- Example get the title:
If I Had to Start All Over Again, I Would…


Hope I helped.






share|improve this answer























  • Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

    – donm
    Mar 8 at 15:21











  • have you ever used medium.com story import tool?

    – donm
    Mar 8 at 15:23















1














I'm not completely sure what you are asking / trying to do.. But I'll give it a try.



You are trying to Identify the main content area - To scrape only the needed information without any garbage or unneeded content.



My approach is to use the common structures and good practices of well formatted HTML pages. Consider this:



  • The main article will be encapsulated in a unique ARTICLE tag on the page.

  • The H1 tag on the article will be its header.

  • We know that there are some repeating ID's used such as (main_content, main_article, etc..).

Summarize those rules on your targets and build an Identifiers list sorted by priority -> Then you can try and parse the target until one of the identifiers will be found - which indicates that you identified the main content area.



Here is an Example -> using the URL you provided:



$search_logic = [
"#main_content",
"#main_article",
"#main",
"article",
];

// get DOM from URL or file
$html = file_get_contents('https://neilpatel.com/blog/starting-over/');
$dom = new DOMDocument ();
@$dom->loadHTML($html);

//
foreach ($search_logic as $logic)

$main_container = null;

//Search by ID or By tag name:
if ($logic[0] === "#")
//Serch by ID:
$main_container = $dom->getElementById(ltrim($logic, '#'));
else
//Serch by tag name:
$main_container = $dom->getElementsByTagName($logic);


//Do we have results:
if (!empty($main_container))

echo "> Found main part identified by: ".$logic."n";
$article = isset($main_container->length) ? $main_container[0] : $main_container; // Normalize the container.

//Parse the $main_container:
echo " - Example get the title:n";
echo "t".$article->getElementsByTagName("h1")[0]->textContent."nn";

//You can stop the iteration:
//break;

else
echo "> Nothing on the page containing: ".$logic."nn";




As you can see the firs to ID's were not found so we keep trying down the list until we hit the result we want -> a good set of those tagnames / ID's will be good enough.



Here is the result:



> Nothing on the page containing: #main_content

> Nothing on the page containing: #main_article

> Found main part identified by: #main
- Example get the title:
If I Had to Start All Over Again, I Would…

> Found main part identified by: article
- Example get the title:
If I Had to Start All Over Again, I Would…


Hope I helped.






share|improve this answer























  • Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

    – donm
    Mar 8 at 15:21











  • have you ever used medium.com story import tool?

    – donm
    Mar 8 at 15:23













1












1








1







I'm not completely sure what you are asking / trying to do.. But I'll give it a try.



You are trying to Identify the main content area - To scrape only the needed information without any garbage or unneeded content.



My approach is to use the common structures and good practices of well formatted HTML pages. Consider this:



  • The main article will be encapsulated in a unique ARTICLE tag on the page.

  • The H1 tag on the article will be its header.

  • We know that there are some repeating ID's used such as (main_content, main_article, etc..).

Summarize those rules on your targets and build an Identifiers list sorted by priority -> Then you can try and parse the target until one of the identifiers will be found - which indicates that you identified the main content area.



Here is an Example -> using the URL you provided:



$search_logic = [
"#main_content",
"#main_article",
"#main",
"article",
];

// get DOM from URL or file
$html = file_get_contents('https://neilpatel.com/blog/starting-over/');
$dom = new DOMDocument ();
@$dom->loadHTML($html);

//
foreach ($search_logic as $logic)

$main_container = null;

//Search by ID or By tag name:
if ($logic[0] === "#")
//Serch by ID:
$main_container = $dom->getElementById(ltrim($logic, '#'));
else
//Serch by tag name:
$main_container = $dom->getElementsByTagName($logic);


//Do we have results:
if (!empty($main_container))

echo "> Found main part identified by: ".$logic."n";
$article = isset($main_container->length) ? $main_container[0] : $main_container; // Normalize the container.

//Parse the $main_container:
echo " - Example get the title:n";
echo "t".$article->getElementsByTagName("h1")[0]->textContent."nn";

//You can stop the iteration:
//break;

else
echo "> Nothing on the page containing: ".$logic."nn";




As you can see the firs to ID's were not found so we keep trying down the list until we hit the result we want -> a good set of those tagnames / ID's will be good enough.



Here is the result:



> Nothing on the page containing: #main_content

> Nothing on the page containing: #main_article

> Found main part identified by: #main
- Example get the title:
If I Had to Start All Over Again, I Would…

> Found main part identified by: article
- Example get the title:
If I Had to Start All Over Again, I Would…


Hope I helped.






share|improve this answer













I'm not completely sure what you are asking / trying to do.. But I'll give it a try.



You are trying to Identify the main content area - To scrape only the needed information without any garbage or unneeded content.



My approach is to use the common structures and good practices of well formatted HTML pages. Consider this:



  • The main article will be encapsulated in a unique ARTICLE tag on the page.

  • The H1 tag on the article will be its header.

  • We know that there are some repeating ID's used such as (main_content, main_article, etc..).

Summarize those rules on your targets and build an Identifiers list sorted by priority -> Then you can try and parse the target until one of the identifiers will be found - which indicates that you identified the main content area.



Here is an Example -> using the URL you provided:



$search_logic = [
"#main_content",
"#main_article",
"#main",
"article",
];

// get DOM from URL or file
$html = file_get_contents('https://neilpatel.com/blog/starting-over/');
$dom = new DOMDocument ();
@$dom->loadHTML($html);

//
foreach ($search_logic as $logic)

$main_container = null;

//Search by ID or By tag name:
if ($logic[0] === "#")
//Serch by ID:
$main_container = $dom->getElementById(ltrim($logic, '#'));
else
//Serch by tag name:
$main_container = $dom->getElementsByTagName($logic);


//Do we have results:
if (!empty($main_container))

echo "> Found main part identified by: ".$logic."n";
$article = isset($main_container->length) ? $main_container[0] : $main_container; // Normalize the container.

//Parse the $main_container:
echo " - Example get the title:n";
echo "t".$article->getElementsByTagName("h1")[0]->textContent."nn";

//You can stop the iteration:
//break;

else
echo "> Nothing on the page containing: ".$logic."nn";




As you can see the firs to ID's were not found so we keep trying down the list until we hit the result we want -> a good set of those tagnames / ID's will be good enough.



Here is the result:



> Nothing on the page containing: #main_content

> Nothing on the page containing: #main_article

> Found main part identified by: #main
- Example get the title:
If I Had to Start All Over Again, I Would…

> Found main part identified by: article
- Example get the title:
If I Had to Start All Over Again, I Would…


Hope I helped.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 8 at 15:09









Shlomi HassidShlomi Hassid

5,29322038




5,29322038












  • Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

    – donm
    Mar 8 at 15:21











  • have you ever used medium.com story import tool?

    – donm
    Mar 8 at 15:23

















  • Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

    – donm
    Mar 8 at 15:21











  • have you ever used medium.com story import tool?

    – donm
    Mar 8 at 15:23
















Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

– donm
Mar 8 at 15:21





Thanks for the help we can go for this option but what the url content does not contains any of the above mentioned tags is there any other way we can do this maybe in jquery, javascript

– donm
Mar 8 at 15:21













have you ever used medium.com story import tool?

– donm
Mar 8 at 15:23





have you ever used medium.com story import tool?

– donm
Mar 8 at 15:23



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064292%2fscrape-main-content-using-php%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved