Archive: Web crawling and data extraction

Budget 516$ per month
Posted: 5 years ago
Closed
Description
We have a list of around 100 UK websites that publish certain data sets on their websites (in a particular section of their website). They publish at different times, some monthly, some quarterly, some just randomly.
We would like a piece of software that scans the websites (could be continuous or set to scan daily) that alerts us as soon as any of the websites publish a new file of this particular data.
Some of the pages have pop-ups that you have to click to get rid of and some publish several different data sets as well as the one we're interested in.
The data sets published are normally xls or cvs.

We did have a small program written 6 months ago (via PPH) but it is malfunctioning and the original writer will not respond to our messages. It also was creating many false positives since it would send an alert when anything changed on the page and not just the info we wanted.
Skills:
microsoft excel,data extraction,resume (CV) writing,software development,web,web crawling
Category