Web scraping in Python (Part 3): Building a dataset

Python



This is part 3 of an introductory web scraping tutorial. In this video, we’ll create a structured dataset from a New York Times article using Python’s Beautiful Soup library.

Watch the 4-video series: https://www.youtube.com/playlist?list=PL5-da3qGB5IDbOi0g5WFh1YPDNzXw4LNL

== RESOURCES ==
Download the Jupyter notebook: https://github.com/justmarkham/trump-lies
New York Times article: https://www.nytimes.com/interactive/2017/06/23/opinion/trumps-lies.html
Beautiful Soup documentation:…

29 thoughts on “Web scraping in Python (Part 3): Building a dataset

  1. I never commented on any YouTube video but man you explained it very good, you are an awesome teacher. Right from the start u explained every single thing very good and clear. Thank u so much for this video tutorial. God Bless u with Much more Teaching abilities .

  2. i always met one problem,
    AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
    Can you tell me how to solve this ?my code as below:
    import requests
    r = requests.get('https://www.xxxx.com/xxxx_xxxx_1-0-1.html&#39😉
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(r.text, 'html.parser')
    results = soup.find_all('div', attrs={'class':'product-item item-template-0 alternative'})
    records = []
    for result in results:
    name = results.find('div', attrs={'class':'name'}).text
    price = results.find('div', attrs={'class':'price'}).text[13:-11]
    records.append((name, price,))

  3. HI, Dear Sir, thank you very much for your lesson. I have a question, why all my result there is a u'
    just like this
    records[0:3]
    u"I wasn't a fan of Iraq. I didn't want to go into Iraq.",
    u'He was for an invasion before he was against it.',
    u'https://www.buzzfeed.com/andrewkaczynski/in-2002-donald-trump-said-he-supported-invading-iraq-on-the&#39😉,
    (u'Jan. 21, 2017',
    u'A reporter for Time magazine u2014 and I have been on their cover 14 or 15 times. I think we have the all-time record in the history of Time magazine.',
    u'Trump was on the cover 11 times and Nixon appeared 55 times.',
    u'http://nation.time.com/2013/11/06/10-things-you-didnt-know-about-time/&#39😉,
    (u'Jan. 23, 2017',
    u'Between 3 million and 5 million illegal votes caused me to lose the popular vote.',
    u"There's no evidence of illegal voting.",
    u'https://www.nytimes.com/2017/01/23/us/politics/donald-trump-congress-democrats.html')]
    In [ ]:

  4. great video series – really enjoying it. I did however have an error pop up and I was wondering if you could resolve it:

    I got this error twice:

    records = []
    for result in results:
    date = result.find('strong').text[0:-1] + ', 2017'
    lie = result.contents[1][1:-2]
    explanation = result.find('a').text[1:-1]
    url = result.find('a')['href']
    records.append((date, lie, explanation, url))
    len(records)

    TypeError Traceback (most recent call last)
    <ipython-input-92-ec0caa6d8f0e> in <module>()
    —-> 1 len(records)

    TypeError: 'list' object is not callable

    results = soup.find_all('span', attrs={'class':'short-desc'})
    len(results)
    —————————————————————————
    TypeError Traceback (most recent call last)
    <ipython-input-74-391947dba58b> in <module>()
    —-> 1 len(results)

    TypeError: 'list' object is not callable

  5. you know what, becuase of your perfect explination, i'm hitting like for each video i watched through your chanel, and really i'm not pressing like button unless i'm really happy with the conent
    big thanks man.

  6. I rarely write comments on any youtube vdeo but seriously man the way you explained everything right from html tags till the extracton of data, everything was easily understood and was awesome.
    Thank you so much for making such an informational tutorial.

  7. for result in results:
    File "<ipython-input-101-f2643cad9b93>", line 1
    for result in results:
    ^
    SyntaxError: unexpected EOF while parsing

    This is the error I am getting what I am suppose to do. Please advise

  8. You are so far the best teacher that I have encountered in the whole online space.
    Your methodology is second to none.
    You are so precise in your build up of topics covered, order of presentation, speech speed, thought structure and the list goes on.
    Simply brilliant. Thank you.

    P.S. Do you have trouble living in such an disorderly world? 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *