用户对问题“如何使用BeautifulSoup4修复Python抓取中的错误”的回答

如何使用BeautifulSoup4修复Python抓取中的错误

import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.nba.com/players')
soup = BeautifulSoup(page.text, 'html.parser')
players = soup.find(class_ ='row nba-player-index__row')
players_info = players.find_all(class_='nba-player-index__trending-item small-4 medium-3 large-2 team-okc-thunder')
players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
print(players_name)

C:\Users\moham\PycharmProjects\WebScrape\Nba\venv\Scripts\python.exe C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py
Traceback (most recent call last):
  File "C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py", line 10, in <module>
    players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
  File "C:/Users/moham/PycharmProjects/WebScrape/Nba/Nba.py", line 10, in <listcomp>
    players_name = [players_info.find(class_ ='nba-player-index__details').get_text() for player in players_info]
  File "C:\Users\moham\PycharmProjects\WebScrape\Nba\venv\lib\site-packages\bs4\element.py", line 2080, in __getattr__
    raise AttributeError(
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Process finished with exit code 1

import requests
from bs4 import BeautifulSoup
nba_players_url = 'https://www.nba.com/players'
nba_players_section = {'class': 'row nba-player-index__row'}
nba_info_section = {'class': 'nba-player-index__trending-item'}
nba_player_name_p = {'class': 'nba-player-index__name'}
r = requests.get(nba_players_url)
if r.ok:
    soup = BeautifulSoup(r.content, 'html.parser')
    players_section = soup.find('section', nba_players_section)
    players = players_section.find_all('section', nba_info_section)
    names = [
        player.find('p', nba_player_name_p).get_text(separator=' ')
        for player in players
    print(names)

import requests
# just for printing players in a nice way
from pprint import pprint 
nba_players_url = 'https://www.nba.com/players/active_players.json'
r = requests.get(nba_players_url)
if r.ok:
    players = r.json()
    pprint(players, width=40)

import pandas as pd
df = pd.read_json('https://www.nba.com/players/active_players.json')

>>> df[['firstName', 'lastName', 'isAllStar']]
    firstName          lastName  isAllStar
0      Steven             Adams      False
1         Bam           Adebayo       True
2    LaMarcus          Aldridge      False
3     Nickeil  Alexander-Walker      False
4        Kyle         Alexander      False
..        ...               ...        ...
498  Thaddeus             Young      False
499      Trae             Young       True
500      Cody            Zeller      False
501      Ante             Zizic      False
502     Ivica             Zubac      False
[503 rows x 3 columns]
>>> df.info()
>>> <class 'pandas.core.frame.DataFrame'>
RangeIndex: 503 entries, 0 to 502
Data columns (total 14 columns):
firstName       503 non-null object
lastName        503 non-null object
jersey          503 non-null int64
pos             503 non-null object
posExpanded     503 non-null object
heightFeet      503 non-null object
heightInches    503 non-null object
weightPounds    503 non-null object
personId        503 non-null int64
teamData        503 non-null object
isAllStar       503 non-null bool

如何使用BeautifulSoup4修复Python抓取中的错误

1 个回答