Python, Grab me a Cartoon
Scrapping the satire

Everyone Loves Cartoons/Comics,eh? When it comes from political satire most of you won’t say a No.Like this , you won’t be laughing again in life. Fortunately, I had a nice night scraping up the Cartoon-scape from India’s century old newspaper The Hindu once which renowned cartoonists like R. K. Laxman, K. Shankar Pillai were ruling the pen.

What?

Scrapping up all visible cartoon-scape images in (.jpg format)from the English Hindu Website and stores them in a folder with the naming convention of the respective dates.

Why?

It’s a fun to to know about the missed cartoons, a small portable image column in a corner of the fresh newspaper but will divulge the nation’s hot topic or a minimal blow in the asses of something/someone caused the unconvincing situation on the last daylight. Persuading with the the knowledge you are gaining is a ecstasy and a chance to test your skills.

One of my Drona, mentioned about the book in his blog here made me addicted back to Cartoon/comic crazy again.

How?

Below is the gist containing the code. It’s Python, so nothing more to explain of the plain-sight Pizza.

Interestingly, I have never used the mechanize module much as I curled under Invisble cloak of requestes/bs4. But mechanize has certain much needed/minimal features like listing the current page’s urls with regex filtering options like url_regex, text_regex etc. It eased the job with few lines of code (w.r.t my knowledge).

Reference:

  1. Interestingly, when searching google for any projects/codes done before on scraping The Hindu came across This one. .
  2. @2020saurav’s random script repo has one script but outdated.
© *Nobody*
*****
Written by Thilip Varadharajan on 29 April 2016