How to parse xml using scrapy The Next CEO of Stack OverflowHow to merge two dictionaries in a single expression?How do I check if a list is empty?How do I check whether a file exists without exceptions?How can I safely create a nested directory in Python?How do I parse a string to a float or int in Python?How do I sort a dictionary by value?How to make a chain of function decorators?How do I parse XML in Python?How do I list all files of a directory?How Do You Parse and Process HTML/XML in PHP?
Can we say or write : "No, it'sn't"?
Is it convenient to ask the journal's editor for two additional days to complete a review?
Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis
Is it okay to majorly distort historical facts while writing a fiction story?
Is the D&D universe the same as the Forgotten Realms universe?
Easy to read palindrome checker
Can MTA send mail via a relay without being told so?
What connection does MS Office have to Netscape Navigator?
Yu-Gi-Oh cards in Python 3
What is meant by "large scale tonal organization?"
Is it ever safe to open a suspicious HTML file (e.g. email attachment)?
What happened in Rome, when the western empire "fell"?
How do I align (1) and (2)?
Make solar eclipses exceedingly rare, but still have new moons
Does increasing your ability score affect your main stat?
Is it my responsibility to learn a new technology in my own time my employer wants to implement?
0-rank tensor vs vector in 1D
How to check if all elements of 1 list are in the *same quantity* and in any order, in the list2?
Do I need to write [sic] when a number is less than 10 but isn't written out?
Won the lottery - how do I keep the money?
Are police here, aren't itthey?
Why specifically branches as firewood on the Altar?
Defamation due to breach of confidentiality
Is it professional to write unrelated content in an almost-empty email?
How to parse xml using scrapy
The Next CEO of Stack OverflowHow to merge two dictionaries in a single expression?How do I check if a list is empty?How do I check whether a file exists without exceptions?How can I safely create a nested directory in Python?How do I parse a string to a float or int in Python?How do I sort a dictionary by value?How to make a chain of function decorators?How do I parse XML in Python?How do I list all files of a directory?How Do You Parse and Process HTML/XML in PHP?
How to scrape the XML using scrapy.
My XML looks something like this:
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<generator>NFE/5.0</generator>
<title>"python" - Google News</title>
<link>
https://news.google.com/search?q=python&hl=en-IN&gl=IN&ceid=IN:en
</link>
<language>en-IN</language>
<webMaster>news-webmaster@google.com</webMaster>
<copyright>2019 Google Inc.</copyright>
<lastBuildDate>Thu, 07 Mar 2019 16:48:55 GMT</lastBuildDate>
<description>Google News</description>
<item>
<title>
Brown snake attacks python eating a rat - NEWS.com.au
</title>
</channel>
</rss>
My code looks like this:
from scrapy.spiders import XMLFeedSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
response = HtmlResponse(url='https://news.google.com/rss/search?q=python&hl=en-IN&gl=IN&ceid=IN:en')
xxs = Selector(response)
obj = xxs.xpath('//title/text()').extract()
I want to get the text in the title tag. But here I'm getting an empty list. Please help me out. It's important.
Thanks a lot
python xml web-scraping scrapy
add a comment |
How to scrape the XML using scrapy.
My XML looks something like this:
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<generator>NFE/5.0</generator>
<title>"python" - Google News</title>
<link>
https://news.google.com/search?q=python&hl=en-IN&gl=IN&ceid=IN:en
</link>
<language>en-IN</language>
<webMaster>news-webmaster@google.com</webMaster>
<copyright>2019 Google Inc.</copyright>
<lastBuildDate>Thu, 07 Mar 2019 16:48:55 GMT</lastBuildDate>
<description>Google News</description>
<item>
<title>
Brown snake attacks python eating a rat - NEWS.com.au
</title>
</channel>
</rss>
My code looks like this:
from scrapy.spiders import XMLFeedSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
response = HtmlResponse(url='https://news.google.com/rss/search?q=python&hl=en-IN&gl=IN&ceid=IN:en')
xxs = Selector(response)
obj = xxs.xpath('//title/text()').extract()
I want to get the text in the title tag. But here I'm getting an empty list. Please help me out. It's important.
Thanks a lot
python xml web-scraping scrapy
add a comment |
How to scrape the XML using scrapy.
My XML looks something like this:
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<generator>NFE/5.0</generator>
<title>"python" - Google News</title>
<link>
https://news.google.com/search?q=python&hl=en-IN&gl=IN&ceid=IN:en
</link>
<language>en-IN</language>
<webMaster>news-webmaster@google.com</webMaster>
<copyright>2019 Google Inc.</copyright>
<lastBuildDate>Thu, 07 Mar 2019 16:48:55 GMT</lastBuildDate>
<description>Google News</description>
<item>
<title>
Brown snake attacks python eating a rat - NEWS.com.au
</title>
</channel>
</rss>
My code looks like this:
from scrapy.spiders import XMLFeedSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
response = HtmlResponse(url='https://news.google.com/rss/search?q=python&hl=en-IN&gl=IN&ceid=IN:en')
xxs = Selector(response)
obj = xxs.xpath('//title/text()').extract()
I want to get the text in the title tag. But here I'm getting an empty list. Please help me out. It's important.
Thanks a lot
python xml web-scraping scrapy
How to scrape the XML using scrapy.
My XML looks something like this:
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<generator>NFE/5.0</generator>
<title>"python" - Google News</title>
<link>
https://news.google.com/search?q=python&hl=en-IN&gl=IN&ceid=IN:en
</link>
<language>en-IN</language>
<webMaster>news-webmaster@google.com</webMaster>
<copyright>2019 Google Inc.</copyright>
<lastBuildDate>Thu, 07 Mar 2019 16:48:55 GMT</lastBuildDate>
<description>Google News</description>
<item>
<title>
Brown snake attacks python eating a rat - NEWS.com.au
</title>
</channel>
</rss>
My code looks like this:
from scrapy.spiders import XMLFeedSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
response = HtmlResponse(url='https://news.google.com/rss/search?q=python&hl=en-IN&gl=IN&ceid=IN:en')
xxs = Selector(response)
obj = xxs.xpath('//title/text()').extract()
I want to get the text in the title tag. But here I'm getting an empty list. Please help me out. It's important.
Thanks a lot
python xml web-scraping scrapy
python xml web-scraping scrapy
asked Mar 7 at 16:58
A. AnandA. Anand
61
61
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You are getting forbidden by robots.txt.
You need to change this behavior in the settings.py
and change ROBOTSTXT_OBEY=True
to ROBOTSTXT_OBEY=False
.
It's still not working
– A. Anand
Mar 8 at 4:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55049168%2fhow-to-parse-xml-using-scrapy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You are getting forbidden by robots.txt.
You need to change this behavior in the settings.py
and change ROBOTSTXT_OBEY=True
to ROBOTSTXT_OBEY=False
.
It's still not working
– A. Anand
Mar 8 at 4:23
add a comment |
You are getting forbidden by robots.txt.
You need to change this behavior in the settings.py
and change ROBOTSTXT_OBEY=True
to ROBOTSTXT_OBEY=False
.
It's still not working
– A. Anand
Mar 8 at 4:23
add a comment |
You are getting forbidden by robots.txt.
You need to change this behavior in the settings.py
and change ROBOTSTXT_OBEY=True
to ROBOTSTXT_OBEY=False
.
You are getting forbidden by robots.txt.
You need to change this behavior in the settings.py
and change ROBOTSTXT_OBEY=True
to ROBOTSTXT_OBEY=False
.
answered Mar 7 at 17:35
H. DucatiH. Ducati
31
31
It's still not working
– A. Anand
Mar 8 at 4:23
add a comment |
It's still not working
– A. Anand
Mar 8 at 4:23
It's still not working
– A. Anand
Mar 8 at 4:23
It's still not working
– A. Anand
Mar 8 at 4:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55049168%2fhow-to-parse-xml-using-scrapy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown