scrap a table from a non-html website with R yet examples shown are for hmtl The Next CEO of Stack OverflowScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

When airplanes disconnect from a tanker during air to air refueling, why do they bank so sharply to the right?

How can I quit an app using Terminal?

Does it take more energy to get to Venus or to Mars?

Was a professor correct to chastise me for writing "Prof. X" rather than "Professor X"?

Visit to the USA with ESTA approved before trip to Iran

Term for the "extreme-extension" version of a straw man fallacy?

Describing a person. What needs to be mentioned?

How do I get the green key off the shelf in the Dobby level of Lego Harry Potter 2?

What is the purpose of the Evocation wizard's Potent Cantrip feature?

How to write the block matrix in LaTex?

Why Were Madagascar and New Zealand Discovered So Late?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

The King's new dress

Trouble understanding the speech of overseas colleagues

Apart from "berlinern", do any other German dialects have a corresponding verb?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

MAZDA 3 2006 (UK) - poor acceleration then takes off at 3250 revs

Why didn't Theresa May consult with Parliament before negotiating a deal with the EU?

Why doesn't a table tennis ball float on the surface? How do we calculate buoyancy here?

Any way to transfer all permissions from one role to another?

Return of the Riley Riddles in Reverse

Text adventure game code

Science fiction novels about a solar system spanning civilisation where people change their bodies at will

A pseudo-riley?

scrap a table from a non-html website with R yet examples shown are for hmtl

The Next CEO of Stack OverflowScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

r web-scraping

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1078

asked Mar 7 at 14:03

GaB

1078

asked Mar 7 at 14:03

GaB

1078

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

1 Answer
1

active

oldest

votes

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

27219

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

27219

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

27219

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

27219

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

27219

answered Mar 7 at 14:11

ha_pu

27219

answered Mar 7 at 14:11

ha_pu

27219

answered Mar 7 at 14:11

ha_pu

27219

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer
1

1 Answer
1

1 Answer
1