我真的是Python的初学者,我想使用
BeautifulSoup4
在web上抓取和获取结构化数据,但我遇到了麻烦,不知道如何解决它。这是我的代码
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.nba.com/players')
soup = BeautifulSoup(page.text, 'html.parser')
players = soup.find(class_ ='row nba-player-index__row')
players_info = players.find_all(class_='nba-player-index__trending-item small-4 medium-3 large-2 team-okc-thunder')
players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
print(players_name)
这是我的错误
C:\Users\moham\PycharmProjects\WebScrape\Nba\venv\Scripts\python.exe C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py
Traceback (most recent call last):
File "C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py", line 10, in <module>
players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
File "C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py", line 10, in <listcomp>
players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
File "C:\Users\moham\PycharmProjects\WebScrape\Nba\venv\lib\site-packages\bs4\element.py", line 2080, in __getattr__
raise AttributeError(
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Process finished with exit code 1
它总是说调用
'find_all() when you meant to call find() and attributes'
,我试着把它改成find(),但还是不行。我尝试了所有的解决方案,但仍然不起作用,希望有人能帮助我,谢谢大家
我猜你正在寻找类似这样的东西:
import requests
from bs4 import BeautifulSoup
nba_players_url = 'https://www.nba.com/players'
nba_players_section = {'class': 'row nba-player-index__row'}
nba_info_section = {'class': 'nba-player-index__trending-item'}
nba_player_name_p = {'class': 'nba-player-index__name'}
r = requests.get(nba_players_url)
if r.ok:
soup = BeautifulSoup(r.content, 'html.parser')
players_section = soup.find('section', nba_players_section)
players = players_section.find_all('section', nba_info_section)
names = [
player.find('p', nba_player_name_p).get_text(separator=' ')
for player in players
print(names)
但是你的生活将会变得容易得多:
nba_players_url = 'https://www.nba.com/players/active_players.json'
这将为您提供一个JSON,其中包含您正在查找的所有数据,因此不需要使用BeautifulSoup进行解析。
看看这个:
import requests
# just for printing players in a nice way
from pprint import pprint
nba_players_url = 'https://www.nba.com/players/active_players.json'
r = requests.get(nba_players_url)
if r.ok:
players = r.json()
pprint(players, width=40)
既然我们已经在做了,这里有另一个选择:
import pandas as pd
df = pd.read_json('https://www.nba.com/players/active_players.json')
并且会有类似这样的东西:
>>> df[['firstName', 'lastName', 'isAllStar']]
firstName lastName isAllStar
0 Steven Adams False
1 Bam Adebayo True
2 LaMarcus Aldridge False
3 Nickeil Alexander-Walker False
4 Kyle Alexander False
.. ... ... ...
498 Thaddeus Young False
499 Trae Young True
500 Cody Zeller False
501 Ante Zizic False
502 Ivica Zubac False
[503 rows x 3 columns]
>>> df.info()
>>> <class 'pandas.core.frame.DataFrame'>
RangeIndex: 503 entries, 0 to 502
Data columns (total 14 columns):
firstName 503 non-null object
lastName 503 non-null object
jersey 503 non-null int64
pos 503 non-null object
posExpanded 503 non-null object
heightFeet 503 non-null object
heightInches 503 non-null object
weightPounds 503 non-null object
personId 503 non-null int64
teamData 503 non-null object
isAllStar 503 non-null bool