code

Scraping html content rendered by Javascript Plugins

Be hold warrior, we are going past the gates of unknown.

All of you at times, I am sure, must have scraped a web page using your favourite programming language. It’s pretty straight forward. Just load the page and run a regex. Some sage individuals would prefer parsing the DOM instead.

But what about the plugins that are dynamically rendered by Javascript?

Don’t fret! Here’s how you can do it:

1. Goto the plugin’s page.
2. Select the plugin you want to scrape content from.
3. It will give you a Javascript code.

Take that code and implement it on a page. Now open the webpage you just created. It will render the plugin as expected. But here’s the trick. All of these (most, if not all) plugins shoot an XHR request to pull content from their servers. This is where you need to dive into the Inspect Element/Firebug etc.

Open your developer tools’s window. And goto the network tab. Reload the page and look for the XHR requests being made. If you look close enough, you can zero down on to the XHR made by that plugin. Copy that URL and paste that in another window.

Voila!!

You now have the data that was rendered by the plugin. Though the data format returned will depend on the implementation of the plugin’s website. but you will have data in some format which you can process.

Published by

Neeraj Kumar

#technologist #musician #traveller