Use Python to read HTML element by XPath
Ubuntu 16.04.4 LTS
- Use Python 3 to read an HTML element attribute: data-endpointby XPath and lxmlHTML file 
 ...<div data-listing="article" data-endpoint="https://www.sample.com/article-list.json" ...
- Use the data-endpointto fetch and parse json data of article listJSON of an article list 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13[ 
 {
 chucks:
 [
 {
 title: "Test Title",
 url: "https://www.sample.com/article1.html"
 },
 {...},
 {...}
 ]
 }
 ]
- Install lxml1 sudo apt-get install python3-lxml 
- Python scripttest.py 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13from lxml import html 
 import requests
 urls = ["https://www.sample.com/sample.html"]
 page = requests.get(url)
 content = html.fromstring(page.content)
 endpoints = content.xpath('//div[@data-listing="article"]/@data-endpoint')
 for endpoint in endpoints:
 r = requests.get(endpoint)
 data = r.json()[0]
 for article in data['chunks']:
 print (url, "\t", article['url'], "\t", article['title'])
- Run the script1 python3 test.py