scrap a table from a non-html website with R yet examples shown are for hmtl The Next CEO of Stack OverflowScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

When airplanes disconnect from a tanker during air to air refueling, why do they bank so sharply to the right?

How can I quit an app using Terminal?

Does it take more energy to get to Venus or to Mars?

Was a professor correct to chastise me for writing "Prof. X" rather than "Professor X"?

Visit to the USA with ESTA approved before trip to Iran

Term for the "extreme-extension" version of a straw man fallacy?

Describing a person. What needs to be mentioned?

How do I get the green key off the shelf in the Dobby level of Lego Harry Potter 2?

What is the purpose of the Evocation wizard's Potent Cantrip feature?

How to write the block matrix in LaTex?

Why Were Madagascar and New Zealand Discovered So Late?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

The King's new dress

Trouble understanding the speech of overseas colleagues

Apart from "berlinern", do any other German dialects have a corresponding verb?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

MAZDA 3 2006 (UK) - poor acceleration then takes off at 3250 revs

Why didn't Theresa May consult with Parliament before negotiating a deal with the EU?

Why doesn't a table tennis ball float on the surface? How do we calculate buoyancy here?

Any way to transfer all permissions from one role to another?

Return of the Riley Riddles in Reverse

Text adventure game code

Science fiction novels about a solar system spanning civilisation where people change their bodies at will

A pseudo-riley?



scrap a table from a non-html website with R yet examples shown are for hmtl



The Next CEO of Stack OverflowScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R










0















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
























  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14















0















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
























  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14













0












0








0








I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.







r web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 7 at 14:52







GaB

















asked Mar 7 at 14:03









GaBGaB

1078




1078












  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14

















  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14
















It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14





It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14












1 Answer
1






active

oldest

votes


















3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18















3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18













3












3








3







Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer













Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 7 at 14:11









ha_puha_pu

27219




27219












  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18

















  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18
















I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50





I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50




1




1





I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56






I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56














thank you ha_pu

– GaB
Mar 12 at 14:18





thank you ha_pu

– GaB
Mar 12 at 14:18



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

AWS Lex not identifying response if by a variable The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceEnforcing custom enumeration in AWS LEX for slot valuesHow to give response based on user response in Amazon Lex?Intercepting AWS Lambda Response to a AWS Lex QueryLex chat bot error: Reached second execution of fulfillment lambda on the same utteranceamazon lex showing invalid responseLambda response send back to Lex slot?Response card in Amazon lexAmazon Lex - Lambda response return HTML to botHow can I solve 424 (Failed Dependency) (python) obtained from Amazon lex?

Алба-Юлія

Захаров Федір Захарович