парсинг ссылок с сайта с ошибкой 403 python

Я пытаюсь извлечь ссылки из списка ссылок (все на разные страницы одного и того же веб-сайта), но постоянно получаю ошибку 403. Вот пример ссылки, которую я пытаюсь очистить

https://www.spectatornews.com/page/6/?s=band< /а>

https://www.spectatornews.com/page/7/?s=band< /а>

и т.п.

Вот мой код:

getarticles = []

from bs4 import BeautifulSoup
import urllib.request

for i in listoflinks:
    resp = urllib.request.urlopen(i)
    soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))

    for link in soup.find_all('a', href=True):

        getarticles.append(link['href'])

Я пытался использовать некоторые ответы из HTTP error 403 в Python 3 Web Scraping, но у меня не было большого успеха. Я не уверен, правильно ли применяю их ко всему списку ссылок. Я попытался использовать одно из приведенных ниже решений с помощью заголовка, но это возвращает ошибку HTTP 406: неприемлемо

Вот мой код, который пытались исправить:

getarticles = []
from bs4 import BeautifulSoup

from bs4 import BeautifulSoup
import urllib.request

for i in listoflinks:
    req=urllib.request.Request(i, headers={'User-Agent': 'Mozilla/5.0'})
    resp = urllib.request.urlopen(req)
    soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))

    for link in soup.find_all('a', href=True):

        getarticles.append(link['href'])

Любая помощь приветствуется. Я очень новичок в этом, поэтому, если вы можете объяснить и помочь, это здорово. Я просто хотел бы собрать ссылки из моего списка веб-сайтов!

Спасибо


person kaci155    schedule 13.01.2019    source источник


Ответы (2)


Сразу скажу, что я редко использую библиотеку urllib/3. Однако я попытался использовать команду терминала оболочки scrapy, а также использовать библиотеку запросов без пользовательского агента и получил ответ 200.

Я заметил, что вы не указали тип анализатора при объявлении «суп».

 soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))

Хотя мне гораздо удобнее использовать синтаксический анализатор scrapy, несмотря на то, что он тяжелее, но если вы правильно помните, вы должны объявить тип синтаксического анализатора, например

soup = BeautifulSoup(resp, "lxml")

Битто Бенни-чан говорит, что ему удалось заставить его ответить 200 urllib.request, поэтому попробуйте его изменения. Который просто вводил полное имя пользовательского агента.

Мое предложение было бы использовать библиотеку запросов. было бы просто достаточно простое изменение, я думаю.

from bs4 import BeautifulSoup
import requests

listoflinks = ['https://www.spectatornews.com/page/6/?s=band', 'https://www.spectatornews.com/page/7/?s=band']

getarticles = []

for i in listoflinks:
    resp = requests.get(i)
    soup = BeautifulSoup(resp.content, "lxml")

    for link in soup.find_all('a', href=True):

        getarticles.append(link['href'])

список getarticles вывел это:

'https://www.spectatornews.com/category/showcase/',
 'https://www.spectatornews.com/showcase/2003/02/06/minneapolis-band-trips-into-eau-claire/',
 'https://www.spectatornews.com/category/showcase/',
 'https://www.spectatornews.com/page/5/?s=band',
 'https://www.spectatornews.com/?s=band',
 'https://www.spectatornews.com/page/2/?s=band',
 'https://www.spectatornews.com/page/3/?s=band',
 'https://www.spectatornews.com/page/4/?s=band',
 'https://www.spectatornews.com/page/5/?s=band',
 'https://www.spectatornews.com/page/7/?s=band',
 'https://www.spectatornews.com/page/8/?s=band',
 'https://www.spectatornews.com/page/9/?s=band',
 'https://www.spectatornews.com/page/127/?s=band',
 'https://www.spectatornews.com/page/7/?s=band',
 'https://www.spectatornews.com',
 'https://www.spectatornews.com/feed/rss/',
 '#',
 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ',
 'https://www.snapchat.com/add/spectator news',
 'https://www.instagram.com/spectatornews/',
 'http://twitter.com/spectatornews',
 'http://facebook.com/spectatornews',
 '/',
 'https://snosites.com/why-sno/',
 'http://snosites.com',
 'https://www.spectatornews.com/wp-login.php',
 '#top',
 '/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/category/sports/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/category/multimedia-2/',
 'https://www.spectatornews.com/ads/banner-advertise-with-the-spectator/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/category/sports/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/category/multimedia-2/',
 '/',
 'https://www.spectatornews.com/about/',
 'https://www.spectatornews.com/about/editorial-policy/',
 'https://www.spectatornews.com/about/correction-policy/',
 'https://www.spectatornews.com/about/bylaws/',
 'https://www.spectatornews.com/advertise/',
 'https://www.spectatornews.com/contact/',
 'https://www.spectatornews.com/staff/',
 'https://www.spectatornews.com/submit-a-letter/',
 'https://www.spectatornews.com/submit-a-news-tip/',
 '/',
 'https://www.spectatornews.com',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/category/sports/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/category/multimedia-2/',
 '/',
 'https://www.spectatornews.com/feed/rss/',
 '#',
 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ',
 'https://www.snapchat.com/add/spectator news',
 'https://www.instagram.com/spectatornews/',
 'http://twitter.com/spectatornews',
 'http://facebook.com/spectatornews',
 'https://www.spectatornews.com/campus-news/2002/05/09/late-night-bus-service-idea-abandoned-due-to-expense/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/opinion/2002/03/21/yates-deserved-what-she-got-husband-also-to-blame/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/opinion/2001/11/29/air-force-concert-band-inspires-zorn-arena-audience/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/campus-news/2001/10/25/goth-style-bands-will-entertain-at-halloween-costume-concert/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/campus-news/2001/04/19/campus-group-will-host-hemp-event-with-bands-information/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/currents/2018/12/10/geekin-out/',
 'https://www.spectatornews.com/currents/2018/12/10/geekin-out/',
 'https://www.spectatornews.com/staff/?writer=Alanna%20Huggett',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/tag/geekcon/',
 'https://www.spectatornews.com/tag/tv10/',
 'https://www.spectatornews.com/tag/uwec/',
 'https://www.spectatornews.com/opinion/2018/12/07/keeping-up-with-the-kar-fashions-11/',
 'https://www.spectatornews.com/opinion/2018/12/07/keeping-up-with-the-kar-fashions-11/',
 'https://www.spectatornews.com/staff/?writer=Kar%20Wei%20Cheng',
 'https://www.spectatornews.com/category/column-2/',
 'https://www.spectatornews.com/category/multimedia-2/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/tag/accessories/',
 'https://www.spectatornews.com/tag/fashion/',
 'https://www.spectatornews.com/tag/multimedia/',
 'https://www.spectatornews.com/tag/winter/',
 'https://www.spectatornews.com/multimedia-2/2018/12/07/a-magical-night/',
 'https://www.spectatornews.com/multimedia-2/2018/12/07/a-magical-night/',
 'https://www.spectatornews.com/staff/?writer=Julia%20Van%20Allen',
 'https://www.spectatornews.com/category/multimedia-2/',
 'https://www.spectatornews.com/tag/dancing/',
 'https://www.spectatornews.com/tag/harry-potter/',
 'https://www.spectatornews.com/tag/smom/',
 'https://www.spectatornews.com/tag/student-ministry-of-magic/',
 'https://www.spectatornews.com/tag/uwec/',
 'https://www.spectatornews.com/tag/yule/',
 'https://www.spectatornews.com/tag/yule-ball/',
 'https://www.spectatornews.com/campus-news/2018/11/26/old-news-5/',
 'https://www.spectatornews.com/campus-news/2018/11/26/old-news-5/',
 'https://www.spectatornews.com/staff/?writer=Madeline%20Fuerstenberg',
 'https://www.spectatornews.com/category/column-2/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/tag/1950/',
 'https://www.spectatornews.com/tag/1975/',
 'https://www.spectatornews.com/tag/2000/',
 'https://www.spectatornews.com/tag/articles/',
 'https://www.spectatornews.com/tag/spectator/',
 'https://www.spectatornews.com/tag/throwback/',
 'https://www.spectatornews.com/currents/2018/11/21/boss-women-highlighting-businesswomen-in-eau-claire-6/',
 'https://www.spectatornews.com/currents/2018/11/21/boss-women-highlighting-businesswomen-in-eau-claire-6/',
 'https://www.spectatornews.com/staff/?writer=Taylor%20Reisdorf',
 'https://www.spectatornews.com/category/column-2/',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/tag/altoona/',
 'https://www.spectatornews.com/tag/boss-women/',
 'https://www.spectatornews.com/tag/business-women/',
 'https://www.spectatornews.com/tag/cherish-woodford/',
 'https://www.spectatornews.com/tag/crossfit/',
 'https://www.spectatornews.com/tag/crossfit-river-prairie/',
 'https://www.spectatornews.com/tag/eau-claire/',
 'https://www.spectatornews.com/tag/fitness/',
 'https://www.spectatornews.com/tag/gym/',
 'https://www.spectatornews.com/tag/local/',
 'https://www.spectatornews.com/tag/nicole-randall/',
 'https://www.spectatornews.com/tag/river-prairie/',
 'https://www.spectatornews.com/currents/2018/11/20/bad-art-good-music/',
 'https://www.spectatornews.com/currents/2018/11/20/bad-art-good-music/',
 'https://www.spectatornews.com/staff/?writer=Lea%20Kopke',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/tag/bad-art/',
 'https://www.spectatornews.com/tag/fmdown/',
 'https://www.spectatornews.com/tag/ghosts-of-the-sun/',
 'https://www.spectatornews.com/tag/music/',
 'https://www.spectatornews.com/tag/pablo-center/',
 'https://www.spectatornews.com/opinion/2018/11/14/the-tator-21/',
 'https://www.spectatornews.com/opinion/2018/11/14/the-tator-21/',
 'https://www.spectatornews.com/staff/?writer=Stephanie%20Janssen',
 'https://www.spectatornews.com/category/column-2/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/tag/satire/',
 'https://www.spectatornews.com/tag/sleepy/',
 'https://www.spectatornews.com/tag/tator/',
 'https://www.spectatornews.com/tag/uw-eau-claire/',
 'https://www.spectatornews.com/tag/uwec/',
 'https://www.spectatornews.com/page/6/?s=band',
 'https://www.spectatornews.com/?s=band',
 'https://www.spectatornews.com/page/2/?s=band',
 'https://www.spectatornews.com/page/3/?s=band',
 'https://www.spectatornews.com/page/4/?s=band',
 'https://www.spectatornews.com/page/5/?s=band',
 'https://www.spectatornews.com/page/6/?s=band',
 'https://www.spectatornews.com/page/8/?s=band',
 'https://www.spectatornews.com/page/9/?s=band',
 'https://www.spectatornews.com/page/10/?s=band',
 'https://www.spectatornews.com/page/127/?s=band',
 'https://www.spectatornews.com/page/8/?s=band',
 'https://www.spectatornews.com',
 'https://www.spectatornews.com/feed/rss/',
 '#',
 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ',
 'https://www.snapchat.com/add/spectator news',
 'https://www.instagram.com/spectatornews/',
 'http://twitter.com/spectatornews',
 'http://facebook.com/spectatornews',
 '/',
 'https://snosites.com/why-sno/',
 'http://snosites.com',
 'https://www.spectatornews.com/wp-login.php',
 '#top',
 '/',
 'https://www.spectatornews.com/category/campus-news/',
 'https://www.spectatornews.com/category/currents/',
 'https://www.spectatornews.com/category/sports/',
 'https://www.spectatornews.com/category/opinion/',
 'https://www.spectatornews.com/category/multimedia-2/']
person Erick Guerra    schedule 13.01.2019

403 ЗАПРЕЩЕНО

Сервер понял запрос, но отказывается авторизовать его.

406 НЕ ПРИЕМЛЕМО

Целевой ресурс не имеет текущего представления, которое было бы приемлемо для пользовательского агента, в соответствии с полями заголовка упреждающего согласования, полученными в запросе, и сервер не желает предоставлять представление по умолчанию.

Ваш User-Agent может быть проблемой. Я смог получить результат, изменив его

from bs4 import BeautifulSoup
import urllib.request
listoflinks=['https://www.spectatornews.com/page/6/?s=band','https://www.spectatornews.com/page/6/?s=band']
getarticles = []
for i in listoflinks:
    req = urllib.request.Request(
    i,
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
    )
    resp= urllib.request.urlopen(req)
    soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'),features="lxml")
    for link in soup.find_all('a', href=True):
        getarticles.append(link['href'])
print(getarticles)

Выход

['https://www.spectatornews.com/ads/banner-advertise-with-the-spectator/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/', '/', 'https://www.spectatornews.com/about/', 'https://www.spectatornews.com/about/editorial-policy/', 'https://www.spectatornews.com/about/correction-policy/', 'https://www.spectatornews.com/about/bylaws/', 'https://www.spectatornews.com/advertise/', 'https://www.spectatornews.com/contact/', 'https://www.spectatornews.com/staff/', 'https://www.spectatornews.com/submit-a-letter/', 'https://www.spectatornews.com/submit-a-news-tip/', '/', 'https://www.spectatornews.com', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/', '/', 'https://www.spectatornews.com/feed/rss/', '#', 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ', 'https://www.snapchat.com/add/spectator news', 'https://www.instagram.com/spectatornews/', 'http://twitter.com/spectatornews', 'http://facebook.com/spectatornews', 'https://www.spectatornews.com/campus-news/2004/05/06/english-fest-draws-speakers-bands/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/campus-news/2004/05/03/burgers-on-the-grill-bands-on-the-scene/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/showcase/2004/04/29/hempfest-celebrates-its-10th-year-with-11-bands/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/29/pat-mcgee-band-rocks-mad-town/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/22/leinenkugels-battle-of-the-bands/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/08/on-the-music-scene-band-makes-mondays-better/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/03/18/on-the-music-scene-band-carries-on-duluozs-work/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/10/09/jamband-grooving-to-eau-claire/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/05/01/joepalooza-set-with-5-bands-one-drummer/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/campus-news/2003/05/01/hempfest-features-nine-bands/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/showcase/2003/02/17/houston-based-band-reaching-out-to-college-students-on-tour/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/02/06/minneapolis-band-trips-into-eau-claire/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/page/5/?s=band', 'https://www.spectatornews.com/?s=band', 'https://www.spectatornews.com/page/2/?s=band', 'https://www.spectatornews.com/page/3/?s=band', 'https://www.spectatornews.com/page/4/?s=band', 'https://www.spectatornews.com/page/5/?s=band', 'https://www.spectatornews.com/page/7/?s=band', 'https://www.spectatornews.com/page/8/?s=band', 'https://www.spectatornews.com/page/9/?s=band', 'https://www.spectatornews.com/page/127/?s=band', 'https://www.spectatornews.com/page/7/?s=band', 'https://www.spectatornews.com', 'https://www.spectatornews.com/feed/rss/', '#', 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ', 'https://www.snapchat.com/add/spectator news', 'https://www.instagram.com/spectatornews/', 'http://twitter.com/spectatornews', 'http://facebook.com/spectatornews', '/', 'https://snosites.com/why-sno/', 'http://snosites.com', 'https://www.spectatornews.com/wp-login.php', '#top', '/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/', 'https://www.spectatornews.com/ads/banner-advertise-with-the-spectator/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/', '/', 'https://www.spectatornews.com/about/', 'https://www.spectatornews.com/about/editorial-policy/', 'https://www.spectatornews.com/about/correction-policy/', 'https://www.spectatornews.com/about/bylaws/', 'https://www.spectatornews.com/advertise/', 'https://www.spectatornews.com/contact/', 'https://www.spectatornews.com/staff/', 'https://www.spectatornews.com/submit-a-letter/', 'https://www.spectatornews.com/submit-a-news-tip/', '/', 'https://www.spectatornews.com', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/', '/', 'https://www.spectatornews.com/feed/rss/', '#', 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ', 'https://www.snapchat.com/add/spectator news', 'https://www.instagram.com/spectatornews/', 'http://twitter.com/spectatornews', 'http://facebook.com/spectatornews', 'https://www.spectatornews.com/campus-news/2004/05/06/english-fest-draws-speakers-bands/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/campus-news/2004/05/03/burgers-on-the-grill-bands-on-the-scene/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/showcase/2004/04/29/hempfest-celebrates-its-10th-year-with-11-bands/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/29/pat-mcgee-band-rocks-mad-town/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/22/leinenkugels-battle-of-the-bands/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/04/08/on-the-music-scene-band-makes-mondays-better/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2004/03/18/on-the-music-scene-band-carries-on-duluozs-work/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/10/09/jamband-grooving-to-eau-claire/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/05/01/joepalooza-set-with-5-bands-one-drummer/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/campus-news/2003/05/01/hempfest-features-nine-bands/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/showcase/2003/02/17/houston-based-band-reaching-out-to-college-students-on-tour/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/showcase/2003/02/06/minneapolis-band-trips-into-eau-claire/', 'https://www.spectatornews.com/category/showcase/', 'https://www.spectatornews.com/page/5/?s=band', 'https://www.spectatornews.com/?s=band', 'https://www.spectatornews.com/page/2/?s=band', 'https://www.spectatornews.com/page/3/?s=band', 'https://www.spectatornews.com/page/4/?s=band', 'https://www.spectatornews.com/page/5/?s=band', 'https://www.spectatornews.com/page/7/?s=band', 'https://www.spectatornews.com/page/8/?s=band', 'https://www.spectatornews.com/page/9/?s=band', 'https://www.spectatornews.com/page/127/?s=band', 'https://www.spectatornews.com/page/7/?s=band', 'https://www.spectatornews.com', 'https://www.spectatornews.com/feed/rss/', '#', 'https://www.youtube.com/channel/UC1SM8q3lk_fQS1KuY77bDgQ', 'https://www.snapchat.com/add/spectator news', 'https://www.instagram.com/spectatornews/', 'http://twitter.com/spectatornews', 'http://facebook.com/spectatornews', '/', 'https://snosites.com/why-sno/', 'http://snosites.com', 'https://www.spectatornews.com/wp-login.php', '#top', '/', 'https://www.spectatornews.com/category/campus-news/', 'https://www.spectatornews.com/category/currents/', 'https://www.spectatornews.com/category/sports/', 'https://www.spectatornews.com/category/opinion/', 'https://www.spectatornews.com/category/multimedia-2/']

Изменить для обработки ошибок 404:

Некоторые ссылки в вашем списке могут быть недоступны. Один из вариантов — использовать блок try-except для их обработки и обработки оставшихся ссылок.

Таким образом, окончательный код будет

from bs4 import BeautifulSoup
import urllib.request
listoflinks=['https://www.spectatornews.com/page/6/?s=band','https://www.spectatornews.com/page/6/?s=band','https://www.spectatornews.com/page/100099?s=band','http://sdfgsdjhgfjsgdhfgsj.com']
getarticles = []
for i in listoflinks:
    req = urllib.request.Request(
    i,
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
    )
    try:
        resp= urllib.request.urlopen(req)
    except urllib.error.HTTPError as e:
        if e.code == 404:
            print("Unavailable link",i," skipping---")
        else:
            raise e
    soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'),features="lxml")
    for link in soup.find_all('a', href=True):
        getarticles.append(link['href'])
print(getarticles)
person Bitto Bennichan    schedule 13.01.2019
comment
хм, почему-то не работает. С вашим кодом я возвращаю ошибку 404. Любые другие идеи? спасибо за попытку! - person kaci155; 13.01.2019
comment
Наконец-то я смог найти свой пользовательский агент, и это изменение исправило его. Большое спасибо!!! - person kaci155; 14.01.2019
comment
Я также попробую использовать ваше редактирование, если проблема сохранится на других страницах. Благодарю вас! - person kaci155; 14.01.2019