Web Scraper That Can Bypass Distil Protection For theknot.com
Budget
194$
per month
Posted: 5 years ago
Opened
- Description
- I need to find an experienced web scraping specialist who is well versed in methods or scraping architecture to bypass distil anti-bot protection for the website theknot.com
The goal is to scrape basic data (listed below) for all the wedding venues in the United States on theknot.com
Starting with this page:
https://www.theknot.com/marketplace/wedding-reception-venues?redirectToCity=false
And going through every state at the bottom. And then going through every city at the bottom of every state's page. And then cycling through all the pages of the city results and first capturing all the URLs attached to all the venues.
Once all URLs captured, deduplicate them since there will be a lot of crossover between cities.
(I would just use a sitemap to find all the URLs instead of scraping but it appears this site doesn't have or hides their marketplace sitemap very well)
Once the final list of wedding venues is complete and deduplicated, go to each URL and scrape the following into a CSV:
• Domain (website) of the venue
• Address of venue
• Facebook URL
• Instagram URL
• Twitter URL
• Pinterest URL
• Guest Capacity
• Settings (a field under amenities)
• Phone Number
• [array] of urls used in slideshow
Skills:
architectural design,facebook,instagram,marketplace,pinterest,sitemap,software development,twitter,web,web scraping
- Category
Source: peopleperhour.com