Scraping Table Python2019 Community Moderator ElectionCalling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonDifference between append vs. extend list methods in PythonHow can I safely create a nested directory in Python?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
Algorithm to convert a fixed-length string to the smallest possible collision-free representation?
My story is written in English, but is set in my home country. What language should I use for the dialogue?
Is it true that real estate prices mainly go up?
Good allowance savings plan?
The bar has been raised
Word for a person who has no opinion about whether god exists
BitNot does not flip bits in the way I expected
What is the likely impact of grounding an entire aircraft series?
In the late 1940’s to early 1950’s what technology was available that could melt a LOT of ice?
PTIJ: How can I halachically kill a vampire?
Rejected in 4th interview round citing insufficient years of experience
Best approach to update all entries in a list that is paginated?
What are some noteworthy "mic-drop" moments in math?
Latest web browser compatible with Windows 98
Low budget alien movie about the Earth being cooked
Extra alignment tab has been changed to cr. } using table, tabular and resizebox
Do items de-spawn in Diablo?
Is "history" a male-biased word ("his+story")?
Grey hair or white hair
Are the terms "stab" and "staccato" synonyms?
Is there a window switcher for GNOME that shows the actual window?
Good for you! in Russian
Things to avoid when using voltage regulators?
Can you reject a postdoc offer after the PI has paid a large sum for flights/accommodation for your visit?
Scraping Table Python
2019 Community Moderator ElectionCalling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonDifference between append vs. extend list methods in PythonHow can I safely create a nested directory in Python?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
I've been programming for a short time. I want to scrap football statistics from the web totalcorner.com and download it to CSV file. I just want to get the columns with values. The code that I have written is the following:
from bs4 import BeautifulSoup
import requests
import csv
url = ("https://www.totalcorner.com/match/schedule/20190305")
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
columns =[v.text for v in rows[0].find_all('th')]
for row in soup.find_all('tr'):
for col in row.find_all('td'):
print(col.text)
The problem is that I get everything in one column, and not in different columns with different rows. What I want is to save the table in a CSV file. How can I accomplish this?
python python-3.x beautifulsoup
New contributor
add a comment |
I've been programming for a short time. I want to scrap football statistics from the web totalcorner.com and download it to CSV file. I just want to get the columns with values. The code that I have written is the following:
from bs4 import BeautifulSoup
import requests
import csv
url = ("https://www.totalcorner.com/match/schedule/20190305")
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
columns =[v.text for v in rows[0].find_all('th')]
for row in soup.find_all('tr'):
for col in row.find_all('td'):
print(col.text)
The problem is that I get everything in one column, and not in different columns with different rows. What I want is to save the table in a CSV file. How can I accomplish this?
python python-3.x beautifulsoup
New contributor
add a comment |
I've been programming for a short time. I want to scrap football statistics from the web totalcorner.com and download it to CSV file. I just want to get the columns with values. The code that I have written is the following:
from bs4 import BeautifulSoup
import requests
import csv
url = ("https://www.totalcorner.com/match/schedule/20190305")
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
columns =[v.text for v in rows[0].find_all('th')]
for row in soup.find_all('tr'):
for col in row.find_all('td'):
print(col.text)
The problem is that I get everything in one column, and not in different columns with different rows. What I want is to save the table in a CSV file. How can I accomplish this?
python python-3.x beautifulsoup
New contributor
I've been programming for a short time. I want to scrap football statistics from the web totalcorner.com and download it to CSV file. I just want to get the columns with values. The code that I have written is the following:
from bs4 import BeautifulSoup
import requests
import csv
url = ("https://www.totalcorner.com/match/schedule/20190305")
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
columns =[v.text for v in rows[0].find_all('th')]
for row in soup.find_all('tr'):
for col in row.find_all('td'):
print(col.text)
The problem is that I get everything in one column, and not in different columns with different rows. What I want is to save the table in a CSV file. How can I accomplish this?
python python-3.x beautifulsoup
python python-3.x beautifulsoup
New contributor
New contributor
edited Mar 6 at 17:08
Maaz
499312
499312
New contributor
asked Mar 6 at 16:31
Fran2000Fran2000
61
61
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Pandas is a convientient way to parse <table>
tags (it uses beautifulsoup under the hood).
Normally you could simply do pd.read_html(url)
, but you would need to do use requests here.
import pandas as pd
import requests
url = 'https://www.totalcorner.com/match/schedule/20190305'
response = requests.get(url)
tables = pd.read_html(response.text)
table = tables[0]
table = table.dropna(how='all', axis=0)
table = table.dropna(how='all', axis=1)
Output:
print (table)
League ... Analysis
1 Mexico Liga MX Femenil ... C. O. L.
2 Argentina Nacional B ... C. O. L.
3 Ecuador Campeonato Nacional ... C. O. L.
4 Argentina Primera Division ... C. O. L.
5 Peru Primera Division ... C. O. L.
6 Colombia Primera A ... C. O. L.
7 Mexico Liga MX Femenil ... C. O. L.
8 Jamaica Premier League ... C. O. L.
9 Mexico Clausura ... C. O. L.
10 Mexico Liga MX Femenil ... C. O. L.
11 Mexico Liga MX Femenil ... C. O. L.
12 Bangladesh Championship League ... C. O. L.
13 India Mumbai Super Division ... C. O. L.
14 Womens International ... C. O. L.
15 AFC Champions League ... C. O. L.
16 Indonesia Cup ... C. O. L.
17 India Mumbai Super Division ... C. O. L.
18 Australia South Australia State League 1 ... C. O. L.
19 Bangladesh Championship League ... C. O. L.
20 Australia Queensland Premier League Women ... C. O. L.
21 Australia South Australia State League 1 ... C. O. L.
22 AFC Champions League ... C. O. L.
23 Vietnam V-League ... C. O. L.
24 Vietnam V-League ... C. O. L.
25 India I-League 2nd Division ... C. O. L.
26 World Club Friendlies ... C. O. L.
27 AFC Champions League ... C. O. L.
28 Algeria Youth League ... C. O. L.
29 Iran Div 2 ... C. O. L.
30 Iran Div 2 ... C. O. L.
.. ... ... ...
135 England National League South ... C. O. L.
136 England National League South ... C. O. L.
137 England National League South ... C. O. L.
138 England Southern Premier League Central ... C. O. L.
139 England Southern Premier League South ... C. O. L.
140 England Southern Premier League South ... C. O. L.
141 England Isthmian Premier Division ... C. O. L.
142 England Isthmian Premier Division ... C. O. L.
143 England League 1 ... C. O. L.
144 England Northern League Division One ... C. O. L.
145 England Northern League Division One ... C. O. L.
146 Republic of Ireland League Cup ... C. O. L.
147 England Isthmian Division One North ... C. O. L.
148 Republic of Ireland League Cup ... C. O. L.
149 Northern Ireland Mid Ulster Cup ... C. O. L.
150 UEFA Champions League ... C. O. L.
151 UEFA Champions League ... C. O. L.
152 Argentina Primera B Metropolitana ... C. O. L.
153 Argentina Primera C Metropolitana ... C. O. L.
154 Argentina Primera D Metropolitana ... C. O. L.
155 Iceland U19 Cup ... C. O. L.
156 Republic of Ireland Leinster Senior League ... C. O. L.
157 Republic of Ireland Munster Senior Cup ... C. O. L.
158 Argentina Cup ... C. O. L.
159 Copa Libertadores ... C. O. L.
160 Copa Libertadores ... C. O. L.
161 Copa Libertadores ... C. O. L.
162 Womens International ... C. O. L.
163 Argentina Torneo Regional Amateur ... C. O. L.
164 Mexico Liga de Ascenso Clausura ... C. O. L.
[164 rows x 13 columns]
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
add a comment |
You can use a nested list comprehension to properly formate the table into a list of lists, which can then be easily written to a csv file:
import csv, requests, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.totalcorner.com/match/schedule/20190305').text, 'html.parser')
table = d.find('table', 'id':'inplay_match_table')
_headers, _data = [i.text for i in table.find_all('th')], [[i.text for i in b.find_all('td')] for b in table.find_all('tr')[1:]]
headers, data = [re.sub('n+', '', i) for i in _headers if i], [list(filter(None, [re.sub('n+', '', i) for i in b if i])) for b in _data[2:]]
with open('totalcorner_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
Output (Top eight results due to SO's character limit):
League,Time,Home,Score,Away,Handicap,Corner,Goal Line,Tips,Dangerous Attack,Shots,Live Events,Analysis
Mexico Liga MX Femenil,00:00,Full,1Toluca Women,1 - 1,Club America Women21,+0.5,4 - 4(2-3),2.75 1.25 ,6.7-13.2,81 - 6037 - 33,20 - 811 - 4,C.O.L.
Argentina Nacional B,00:05,Full,3Sarmiento,1 - 1,CD Moron5,-0.75,5 - 3(4-2),2.0 0.75 ,62 - 6232 - 23,11 - 108 - 4,C.O.L.
Ecuador Campeonato Nacional,00:15,Full,[10]Universidad Catolica Del Ecuador,6 - 0,Fuerza Amarilla SC[13]1,-1.25,11 - 0(7-0),2.5 1.25 ,88 - 2243 - 10,23 - 512 - 4,C.O.L.
Argentina Primera Division,00:30,Full,5CA Aldosivi,0 - 1,Defensa y Justicia4,+0.25,3 - 6(1-2),1.75 0.75 ,Corner Over6.6-12.7,82 - 8836 - 46,13 - 96 - 6,C.O.L.
Peru Primera Division,01:00,Full,2[14]Sport Huancayo,0 - 3,Academia Deportiva Cantolao[17]2,-1.25,8 - 2(2-0),2.75 1.0 ,7.3-13.9,106 - 4547 - 19,12 - 56 - 1,C.O.L.
Colombia Primera A,01:00,Full,12Atletico Huila,1 - 2,Alianza Petrolera31,-0.25,3 - 4(1-4),2.0 0.75 ,5.6-12.3,47 - 5017 - 31,6 - 82 - 3,C.O.L.
Mexico Liga MX Femenil,01:00,Full,2Chivas Guadalajara Women,1 - 2,Atlas Women2,-0.25,2 - 2(0-0),2.5 1.0 ,6.1-11.8,49 - 7118 - 32,6 - 143 - 8,C.O.L.
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Fran2000 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55027927%2fscraping-table-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Pandas is a convientient way to parse <table>
tags (it uses beautifulsoup under the hood).
Normally you could simply do pd.read_html(url)
, but you would need to do use requests here.
import pandas as pd
import requests
url = 'https://www.totalcorner.com/match/schedule/20190305'
response = requests.get(url)
tables = pd.read_html(response.text)
table = tables[0]
table = table.dropna(how='all', axis=0)
table = table.dropna(how='all', axis=1)
Output:
print (table)
League ... Analysis
1 Mexico Liga MX Femenil ... C. O. L.
2 Argentina Nacional B ... C. O. L.
3 Ecuador Campeonato Nacional ... C. O. L.
4 Argentina Primera Division ... C. O. L.
5 Peru Primera Division ... C. O. L.
6 Colombia Primera A ... C. O. L.
7 Mexico Liga MX Femenil ... C. O. L.
8 Jamaica Premier League ... C. O. L.
9 Mexico Clausura ... C. O. L.
10 Mexico Liga MX Femenil ... C. O. L.
11 Mexico Liga MX Femenil ... C. O. L.
12 Bangladesh Championship League ... C. O. L.
13 India Mumbai Super Division ... C. O. L.
14 Womens International ... C. O. L.
15 AFC Champions League ... C. O. L.
16 Indonesia Cup ... C. O. L.
17 India Mumbai Super Division ... C. O. L.
18 Australia South Australia State League 1 ... C. O. L.
19 Bangladesh Championship League ... C. O. L.
20 Australia Queensland Premier League Women ... C. O. L.
21 Australia South Australia State League 1 ... C. O. L.
22 AFC Champions League ... C. O. L.
23 Vietnam V-League ... C. O. L.
24 Vietnam V-League ... C. O. L.
25 India I-League 2nd Division ... C. O. L.
26 World Club Friendlies ... C. O. L.
27 AFC Champions League ... C. O. L.
28 Algeria Youth League ... C. O. L.
29 Iran Div 2 ... C. O. L.
30 Iran Div 2 ... C. O. L.
.. ... ... ...
135 England National League South ... C. O. L.
136 England National League South ... C. O. L.
137 England National League South ... C. O. L.
138 England Southern Premier League Central ... C. O. L.
139 England Southern Premier League South ... C. O. L.
140 England Southern Premier League South ... C. O. L.
141 England Isthmian Premier Division ... C. O. L.
142 England Isthmian Premier Division ... C. O. L.
143 England League 1 ... C. O. L.
144 England Northern League Division One ... C. O. L.
145 England Northern League Division One ... C. O. L.
146 Republic of Ireland League Cup ... C. O. L.
147 England Isthmian Division One North ... C. O. L.
148 Republic of Ireland League Cup ... C. O. L.
149 Northern Ireland Mid Ulster Cup ... C. O. L.
150 UEFA Champions League ... C. O. L.
151 UEFA Champions League ... C. O. L.
152 Argentina Primera B Metropolitana ... C. O. L.
153 Argentina Primera C Metropolitana ... C. O. L.
154 Argentina Primera D Metropolitana ... C. O. L.
155 Iceland U19 Cup ... C. O. L.
156 Republic of Ireland Leinster Senior League ... C. O. L.
157 Republic of Ireland Munster Senior Cup ... C. O. L.
158 Argentina Cup ... C. O. L.
159 Copa Libertadores ... C. O. L.
160 Copa Libertadores ... C. O. L.
161 Copa Libertadores ... C. O. L.
162 Womens International ... C. O. L.
163 Argentina Torneo Regional Amateur ... C. O. L.
164 Mexico Liga de Ascenso Clausura ... C. O. L.
[164 rows x 13 columns]
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
add a comment |
Pandas is a convientient way to parse <table>
tags (it uses beautifulsoup under the hood).
Normally you could simply do pd.read_html(url)
, but you would need to do use requests here.
import pandas as pd
import requests
url = 'https://www.totalcorner.com/match/schedule/20190305'
response = requests.get(url)
tables = pd.read_html(response.text)
table = tables[0]
table = table.dropna(how='all', axis=0)
table = table.dropna(how='all', axis=1)
Output:
print (table)
League ... Analysis
1 Mexico Liga MX Femenil ... C. O. L.
2 Argentina Nacional B ... C. O. L.
3 Ecuador Campeonato Nacional ... C. O. L.
4 Argentina Primera Division ... C. O. L.
5 Peru Primera Division ... C. O. L.
6 Colombia Primera A ... C. O. L.
7 Mexico Liga MX Femenil ... C. O. L.
8 Jamaica Premier League ... C. O. L.
9 Mexico Clausura ... C. O. L.
10 Mexico Liga MX Femenil ... C. O. L.
11 Mexico Liga MX Femenil ... C. O. L.
12 Bangladesh Championship League ... C. O. L.
13 India Mumbai Super Division ... C. O. L.
14 Womens International ... C. O. L.
15 AFC Champions League ... C. O. L.
16 Indonesia Cup ... C. O. L.
17 India Mumbai Super Division ... C. O. L.
18 Australia South Australia State League 1 ... C. O. L.
19 Bangladesh Championship League ... C. O. L.
20 Australia Queensland Premier League Women ... C. O. L.
21 Australia South Australia State League 1 ... C. O. L.
22 AFC Champions League ... C. O. L.
23 Vietnam V-League ... C. O. L.
24 Vietnam V-League ... C. O. L.
25 India I-League 2nd Division ... C. O. L.
26 World Club Friendlies ... C. O. L.
27 AFC Champions League ... C. O. L.
28 Algeria Youth League ... C. O. L.
29 Iran Div 2 ... C. O. L.
30 Iran Div 2 ... C. O. L.
.. ... ... ...
135 England National League South ... C. O. L.
136 England National League South ... C. O. L.
137 England National League South ... C. O. L.
138 England Southern Premier League Central ... C. O. L.
139 England Southern Premier League South ... C. O. L.
140 England Southern Premier League South ... C. O. L.
141 England Isthmian Premier Division ... C. O. L.
142 England Isthmian Premier Division ... C. O. L.
143 England League 1 ... C. O. L.
144 England Northern League Division One ... C. O. L.
145 England Northern League Division One ... C. O. L.
146 Republic of Ireland League Cup ... C. O. L.
147 England Isthmian Division One North ... C. O. L.
148 Republic of Ireland League Cup ... C. O. L.
149 Northern Ireland Mid Ulster Cup ... C. O. L.
150 UEFA Champions League ... C. O. L.
151 UEFA Champions League ... C. O. L.
152 Argentina Primera B Metropolitana ... C. O. L.
153 Argentina Primera C Metropolitana ... C. O. L.
154 Argentina Primera D Metropolitana ... C. O. L.
155 Iceland U19 Cup ... C. O. L.
156 Republic of Ireland Leinster Senior League ... C. O. L.
157 Republic of Ireland Munster Senior Cup ... C. O. L.
158 Argentina Cup ... C. O. L.
159 Copa Libertadores ... C. O. L.
160 Copa Libertadores ... C. O. L.
161 Copa Libertadores ... C. O. L.
162 Womens International ... C. O. L.
163 Argentina Torneo Regional Amateur ... C. O. L.
164 Mexico Liga de Ascenso Clausura ... C. O. L.
[164 rows x 13 columns]
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
add a comment |
Pandas is a convientient way to parse <table>
tags (it uses beautifulsoup under the hood).
Normally you could simply do pd.read_html(url)
, but you would need to do use requests here.
import pandas as pd
import requests
url = 'https://www.totalcorner.com/match/schedule/20190305'
response = requests.get(url)
tables = pd.read_html(response.text)
table = tables[0]
table = table.dropna(how='all', axis=0)
table = table.dropna(how='all', axis=1)
Output:
print (table)
League ... Analysis
1 Mexico Liga MX Femenil ... C. O. L.
2 Argentina Nacional B ... C. O. L.
3 Ecuador Campeonato Nacional ... C. O. L.
4 Argentina Primera Division ... C. O. L.
5 Peru Primera Division ... C. O. L.
6 Colombia Primera A ... C. O. L.
7 Mexico Liga MX Femenil ... C. O. L.
8 Jamaica Premier League ... C. O. L.
9 Mexico Clausura ... C. O. L.
10 Mexico Liga MX Femenil ... C. O. L.
11 Mexico Liga MX Femenil ... C. O. L.
12 Bangladesh Championship League ... C. O. L.
13 India Mumbai Super Division ... C. O. L.
14 Womens International ... C. O. L.
15 AFC Champions League ... C. O. L.
16 Indonesia Cup ... C. O. L.
17 India Mumbai Super Division ... C. O. L.
18 Australia South Australia State League 1 ... C. O. L.
19 Bangladesh Championship League ... C. O. L.
20 Australia Queensland Premier League Women ... C. O. L.
21 Australia South Australia State League 1 ... C. O. L.
22 AFC Champions League ... C. O. L.
23 Vietnam V-League ... C. O. L.
24 Vietnam V-League ... C. O. L.
25 India I-League 2nd Division ... C. O. L.
26 World Club Friendlies ... C. O. L.
27 AFC Champions League ... C. O. L.
28 Algeria Youth League ... C. O. L.
29 Iran Div 2 ... C. O. L.
30 Iran Div 2 ... C. O. L.
.. ... ... ...
135 England National League South ... C. O. L.
136 England National League South ... C. O. L.
137 England National League South ... C. O. L.
138 England Southern Premier League Central ... C. O. L.
139 England Southern Premier League South ... C. O. L.
140 England Southern Premier League South ... C. O. L.
141 England Isthmian Premier Division ... C. O. L.
142 England Isthmian Premier Division ... C. O. L.
143 England League 1 ... C. O. L.
144 England Northern League Division One ... C. O. L.
145 England Northern League Division One ... C. O. L.
146 Republic of Ireland League Cup ... C. O. L.
147 England Isthmian Division One North ... C. O. L.
148 Republic of Ireland League Cup ... C. O. L.
149 Northern Ireland Mid Ulster Cup ... C. O. L.
150 UEFA Champions League ... C. O. L.
151 UEFA Champions League ... C. O. L.
152 Argentina Primera B Metropolitana ... C. O. L.
153 Argentina Primera C Metropolitana ... C. O. L.
154 Argentina Primera D Metropolitana ... C. O. L.
155 Iceland U19 Cup ... C. O. L.
156 Republic of Ireland Leinster Senior League ... C. O. L.
157 Republic of Ireland Munster Senior Cup ... C. O. L.
158 Argentina Cup ... C. O. L.
159 Copa Libertadores ... C. O. L.
160 Copa Libertadores ... C. O. L.
161 Copa Libertadores ... C. O. L.
162 Womens International ... C. O. L.
163 Argentina Torneo Regional Amateur ... C. O. L.
164 Mexico Liga de Ascenso Clausura ... C. O. L.
[164 rows x 13 columns]
Pandas is a convientient way to parse <table>
tags (it uses beautifulsoup under the hood).
Normally you could simply do pd.read_html(url)
, but you would need to do use requests here.
import pandas as pd
import requests
url = 'https://www.totalcorner.com/match/schedule/20190305'
response = requests.get(url)
tables = pd.read_html(response.text)
table = tables[0]
table = table.dropna(how='all', axis=0)
table = table.dropna(how='all', axis=1)
Output:
print (table)
League ... Analysis
1 Mexico Liga MX Femenil ... C. O. L.
2 Argentina Nacional B ... C. O. L.
3 Ecuador Campeonato Nacional ... C. O. L.
4 Argentina Primera Division ... C. O. L.
5 Peru Primera Division ... C. O. L.
6 Colombia Primera A ... C. O. L.
7 Mexico Liga MX Femenil ... C. O. L.
8 Jamaica Premier League ... C. O. L.
9 Mexico Clausura ... C. O. L.
10 Mexico Liga MX Femenil ... C. O. L.
11 Mexico Liga MX Femenil ... C. O. L.
12 Bangladesh Championship League ... C. O. L.
13 India Mumbai Super Division ... C. O. L.
14 Womens International ... C. O. L.
15 AFC Champions League ... C. O. L.
16 Indonesia Cup ... C. O. L.
17 India Mumbai Super Division ... C. O. L.
18 Australia South Australia State League 1 ... C. O. L.
19 Bangladesh Championship League ... C. O. L.
20 Australia Queensland Premier League Women ... C. O. L.
21 Australia South Australia State League 1 ... C. O. L.
22 AFC Champions League ... C. O. L.
23 Vietnam V-League ... C. O. L.
24 Vietnam V-League ... C. O. L.
25 India I-League 2nd Division ... C. O. L.
26 World Club Friendlies ... C. O. L.
27 AFC Champions League ... C. O. L.
28 Algeria Youth League ... C. O. L.
29 Iran Div 2 ... C. O. L.
30 Iran Div 2 ... C. O. L.
.. ... ... ...
135 England National League South ... C. O. L.
136 England National League South ... C. O. L.
137 England National League South ... C. O. L.
138 England Southern Premier League Central ... C. O. L.
139 England Southern Premier League South ... C. O. L.
140 England Southern Premier League South ... C. O. L.
141 England Isthmian Premier Division ... C. O. L.
142 England Isthmian Premier Division ... C. O. L.
143 England League 1 ... C. O. L.
144 England Northern League Division One ... C. O. L.
145 England Northern League Division One ... C. O. L.
146 Republic of Ireland League Cup ... C. O. L.
147 England Isthmian Division One North ... C. O. L.
148 Republic of Ireland League Cup ... C. O. L.
149 Northern Ireland Mid Ulster Cup ... C. O. L.
150 UEFA Champions League ... C. O. L.
151 UEFA Champions League ... C. O. L.
152 Argentina Primera B Metropolitana ... C. O. L.
153 Argentina Primera C Metropolitana ... C. O. L.
154 Argentina Primera D Metropolitana ... C. O. L.
155 Iceland U19 Cup ... C. O. L.
156 Republic of Ireland Leinster Senior League ... C. O. L.
157 Republic of Ireland Munster Senior Cup ... C. O. L.
158 Argentina Cup ... C. O. L.
159 Copa Libertadores ... C. O. L.
160 Copa Libertadores ... C. O. L.
161 Copa Libertadores ... C. O. L.
162 Womens International ... C. O. L.
163 Argentina Torneo Regional Amateur ... C. O. L.
164 Mexico Liga de Ascenso Clausura ... C. O. L.
[164 rows x 13 columns]
answered Mar 6 at 17:19
chitown88chitown88
4,8071524
4,8071524
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
add a comment |
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
I didn't have the time to read the doc and find a solution using panda (I don't use it regularly). But I was thinking about this package. Thanks, I think this is a good way to get it :-)
– Maaz
Mar 7 at 7:51
add a comment |
You can use a nested list comprehension to properly formate the table into a list of lists, which can then be easily written to a csv file:
import csv, requests, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.totalcorner.com/match/schedule/20190305').text, 'html.parser')
table = d.find('table', 'id':'inplay_match_table')
_headers, _data = [i.text for i in table.find_all('th')], [[i.text for i in b.find_all('td')] for b in table.find_all('tr')[1:]]
headers, data = [re.sub('n+', '', i) for i in _headers if i], [list(filter(None, [re.sub('n+', '', i) for i in b if i])) for b in _data[2:]]
with open('totalcorner_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
Output (Top eight results due to SO's character limit):
League,Time,Home,Score,Away,Handicap,Corner,Goal Line,Tips,Dangerous Attack,Shots,Live Events,Analysis
Mexico Liga MX Femenil,00:00,Full,1Toluca Women,1 - 1,Club America Women21,+0.5,4 - 4(2-3),2.75 1.25 ,6.7-13.2,81 - 6037 - 33,20 - 811 - 4,C.O.L.
Argentina Nacional B,00:05,Full,3Sarmiento,1 - 1,CD Moron5,-0.75,5 - 3(4-2),2.0 0.75 ,62 - 6232 - 23,11 - 108 - 4,C.O.L.
Ecuador Campeonato Nacional,00:15,Full,[10]Universidad Catolica Del Ecuador,6 - 0,Fuerza Amarilla SC[13]1,-1.25,11 - 0(7-0),2.5 1.25 ,88 - 2243 - 10,23 - 512 - 4,C.O.L.
Argentina Primera Division,00:30,Full,5CA Aldosivi,0 - 1,Defensa y Justicia4,+0.25,3 - 6(1-2),1.75 0.75 ,Corner Over6.6-12.7,82 - 8836 - 46,13 - 96 - 6,C.O.L.
Peru Primera Division,01:00,Full,2[14]Sport Huancayo,0 - 3,Academia Deportiva Cantolao[17]2,-1.25,8 - 2(2-0),2.75 1.0 ,7.3-13.9,106 - 4547 - 19,12 - 56 - 1,C.O.L.
Colombia Primera A,01:00,Full,12Atletico Huila,1 - 2,Alianza Petrolera31,-0.25,3 - 4(1-4),2.0 0.75 ,5.6-12.3,47 - 5017 - 31,6 - 82 - 3,C.O.L.
Mexico Liga MX Femenil,01:00,Full,2Chivas Guadalajara Women,1 - 2,Atlas Women2,-0.25,2 - 2(0-0),2.5 1.0 ,6.1-11.8,49 - 7118 - 32,6 - 143 - 8,C.O.L.
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
add a comment |
You can use a nested list comprehension to properly formate the table into a list of lists, which can then be easily written to a csv file:
import csv, requests, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.totalcorner.com/match/schedule/20190305').text, 'html.parser')
table = d.find('table', 'id':'inplay_match_table')
_headers, _data = [i.text for i in table.find_all('th')], [[i.text for i in b.find_all('td')] for b in table.find_all('tr')[1:]]
headers, data = [re.sub('n+', '', i) for i in _headers if i], [list(filter(None, [re.sub('n+', '', i) for i in b if i])) for b in _data[2:]]
with open('totalcorner_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
Output (Top eight results due to SO's character limit):
League,Time,Home,Score,Away,Handicap,Corner,Goal Line,Tips,Dangerous Attack,Shots,Live Events,Analysis
Mexico Liga MX Femenil,00:00,Full,1Toluca Women,1 - 1,Club America Women21,+0.5,4 - 4(2-3),2.75 1.25 ,6.7-13.2,81 - 6037 - 33,20 - 811 - 4,C.O.L.
Argentina Nacional B,00:05,Full,3Sarmiento,1 - 1,CD Moron5,-0.75,5 - 3(4-2),2.0 0.75 ,62 - 6232 - 23,11 - 108 - 4,C.O.L.
Ecuador Campeonato Nacional,00:15,Full,[10]Universidad Catolica Del Ecuador,6 - 0,Fuerza Amarilla SC[13]1,-1.25,11 - 0(7-0),2.5 1.25 ,88 - 2243 - 10,23 - 512 - 4,C.O.L.
Argentina Primera Division,00:30,Full,5CA Aldosivi,0 - 1,Defensa y Justicia4,+0.25,3 - 6(1-2),1.75 0.75 ,Corner Over6.6-12.7,82 - 8836 - 46,13 - 96 - 6,C.O.L.
Peru Primera Division,01:00,Full,2[14]Sport Huancayo,0 - 3,Academia Deportiva Cantolao[17]2,-1.25,8 - 2(2-0),2.75 1.0 ,7.3-13.9,106 - 4547 - 19,12 - 56 - 1,C.O.L.
Colombia Primera A,01:00,Full,12Atletico Huila,1 - 2,Alianza Petrolera31,-0.25,3 - 4(1-4),2.0 0.75 ,5.6-12.3,47 - 5017 - 31,6 - 82 - 3,C.O.L.
Mexico Liga MX Femenil,01:00,Full,2Chivas Guadalajara Women,1 - 2,Atlas Women2,-0.25,2 - 2(0-0),2.5 1.0 ,6.1-11.8,49 - 7118 - 32,6 - 143 - 8,C.O.L.
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
add a comment |
You can use a nested list comprehension to properly formate the table into a list of lists, which can then be easily written to a csv file:
import csv, requests, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.totalcorner.com/match/schedule/20190305').text, 'html.parser')
table = d.find('table', 'id':'inplay_match_table')
_headers, _data = [i.text for i in table.find_all('th')], [[i.text for i in b.find_all('td')] for b in table.find_all('tr')[1:]]
headers, data = [re.sub('n+', '', i) for i in _headers if i], [list(filter(None, [re.sub('n+', '', i) for i in b if i])) for b in _data[2:]]
with open('totalcorner_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
Output (Top eight results due to SO's character limit):
League,Time,Home,Score,Away,Handicap,Corner,Goal Line,Tips,Dangerous Attack,Shots,Live Events,Analysis
Mexico Liga MX Femenil,00:00,Full,1Toluca Women,1 - 1,Club America Women21,+0.5,4 - 4(2-3),2.75 1.25 ,6.7-13.2,81 - 6037 - 33,20 - 811 - 4,C.O.L.
Argentina Nacional B,00:05,Full,3Sarmiento,1 - 1,CD Moron5,-0.75,5 - 3(4-2),2.0 0.75 ,62 - 6232 - 23,11 - 108 - 4,C.O.L.
Ecuador Campeonato Nacional,00:15,Full,[10]Universidad Catolica Del Ecuador,6 - 0,Fuerza Amarilla SC[13]1,-1.25,11 - 0(7-0),2.5 1.25 ,88 - 2243 - 10,23 - 512 - 4,C.O.L.
Argentina Primera Division,00:30,Full,5CA Aldosivi,0 - 1,Defensa y Justicia4,+0.25,3 - 6(1-2),1.75 0.75 ,Corner Over6.6-12.7,82 - 8836 - 46,13 - 96 - 6,C.O.L.
Peru Primera Division,01:00,Full,2[14]Sport Huancayo,0 - 3,Academia Deportiva Cantolao[17]2,-1.25,8 - 2(2-0),2.75 1.0 ,7.3-13.9,106 - 4547 - 19,12 - 56 - 1,C.O.L.
Colombia Primera A,01:00,Full,12Atletico Huila,1 - 2,Alianza Petrolera31,-0.25,3 - 4(1-4),2.0 0.75 ,5.6-12.3,47 - 5017 - 31,6 - 82 - 3,C.O.L.
Mexico Liga MX Femenil,01:00,Full,2Chivas Guadalajara Women,1 - 2,Atlas Women2,-0.25,2 - 2(0-0),2.5 1.0 ,6.1-11.8,49 - 7118 - 32,6 - 143 - 8,C.O.L.
You can use a nested list comprehension to properly formate the table into a list of lists, which can then be easily written to a csv file:
import csv, requests, re
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.totalcorner.com/match/schedule/20190305').text, 'html.parser')
table = d.find('table', 'id':'inplay_match_table')
_headers, _data = [i.text for i in table.find_all('th')], [[i.text for i in b.find_all('td')] for b in table.find_all('tr')[1:]]
headers, data = [re.sub('n+', '', i) for i in _headers if i], [list(filter(None, [re.sub('n+', '', i) for i in b if i])) for b in _data[2:]]
with open('totalcorner_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
Output (Top eight results due to SO's character limit):
League,Time,Home,Score,Away,Handicap,Corner,Goal Line,Tips,Dangerous Attack,Shots,Live Events,Analysis
Mexico Liga MX Femenil,00:00,Full,1Toluca Women,1 - 1,Club America Women21,+0.5,4 - 4(2-3),2.75 1.25 ,6.7-13.2,81 - 6037 - 33,20 - 811 - 4,C.O.L.
Argentina Nacional B,00:05,Full,3Sarmiento,1 - 1,CD Moron5,-0.75,5 - 3(4-2),2.0 0.75 ,62 - 6232 - 23,11 - 108 - 4,C.O.L.
Ecuador Campeonato Nacional,00:15,Full,[10]Universidad Catolica Del Ecuador,6 - 0,Fuerza Amarilla SC[13]1,-1.25,11 - 0(7-0),2.5 1.25 ,88 - 2243 - 10,23 - 512 - 4,C.O.L.
Argentina Primera Division,00:30,Full,5CA Aldosivi,0 - 1,Defensa y Justicia4,+0.25,3 - 6(1-2),1.75 0.75 ,Corner Over6.6-12.7,82 - 8836 - 46,13 - 96 - 6,C.O.L.
Peru Primera Division,01:00,Full,2[14]Sport Huancayo,0 - 3,Academia Deportiva Cantolao[17]2,-1.25,8 - 2(2-0),2.75 1.0 ,7.3-13.9,106 - 4547 - 19,12 - 56 - 1,C.O.L.
Colombia Primera A,01:00,Full,12Atletico Huila,1 - 2,Alianza Petrolera31,-0.25,3 - 4(1-4),2.0 0.75 ,5.6-12.3,47 - 5017 - 31,6 - 82 - 3,C.O.L.
Mexico Liga MX Femenil,01:00,Full,2Chivas Guadalajara Women,1 - 2,Atlas Women2,-0.25,2 - 2(0-0),2.5 1.0 ,6.1-11.8,49 - 7118 - 32,6 - 143 - 8,C.O.L.
answered Mar 6 at 16:51
Ajax1234Ajax1234
42.2k42854
42.2k42854
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
add a comment |
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Thank you very much for your help. I will follow your advice.
– Fran2000
Mar 6 at 17:01
Why the downvote?
– Ajax1234
Mar 8 at 22:13
Why the downvote?
– Ajax1234
Mar 8 at 22:13
add a comment |
Fran2000 is a new contributor. Be nice, and check out our Code of Conduct.
Fran2000 is a new contributor. Be nice, and check out our Code of Conduct.
Fran2000 is a new contributor. Be nice, and check out our Code of Conduct.
Fran2000 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55027927%2fscraping-table-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown