PDF to Excel and data scraping program
Budget
581$
per month
Posted: 5 years ago
Opened
- Description
- Problem: Have data in PDF format which do not convert well to CSV using automated software available as some data appears in random rows and columns. The PDFs are updated periodically and so I need a way to constantly convert them to CSV format. Also need to scrape a third party website based on data within the PDF.
Sample PDF:
https://drive.google.com/file/d/1zNyYUWRmHRgxIao210N6Nn27k3P-2qV_/view?usp=sharing
Need the CSV file to contain the following (correctly extracted) columns:
(Sample manually pulled from first entry of the PDF above)
Name (ANDERSON WALTER LEE & MARY LUE)
Amount Bid at Tax Sale (213.46)
Year (2000) - note: some years begin with 19xx, some with 20xx. Need the conversion to recognize and properly format.
Parcel ID (3800163016004000) - note: remove the first two numbers (68) and the last four numbers (0000)
Description (LOT 3 KUCHINS 1ST ADD TO BESS)
------------------------------
Once the data is in CSV format, need to scrape this website using the Parcel ID (above).
Visit http://eringcapture.jccal.org/caportal/
Click "• Search your Real Property. Click Here."
Select "Parcel #"
Search for each Parcel ID (sample from above 3800163016004000)
Click on "38 00 16 3 016 004.000" link (link name will change to match each parcel ID searched)
Add the following data to columns to the right of the data above:
(Sample manually pulled from the website above)
Address (321 BLACK AVE BESSEMER AL 35020-6800)
Location (321 BLACK AVE BESSEMER AL 35020) - note: Location differs most of the time from the Address filed, and may be blank at times. Need to make sure that Address and Location are pulled from correct fields. VERY IMPORTANT!
Land (2700)
Imp (7800) - note: could be blank or missing from the page altogether. Need to account for this variation.
Total (10500)
Total Market Value (10450)
-------------------------------------------------------------------------------------------------
At the end, the resulting CSV file should have the following data:
Name (ANDERSON WALTER LEE & MARY LUE)
Amount Bid at Tax Sale (213.46)
Year (2000)
Parcel ID (3800163016004000)
Description (LOT 3 KUCHINS 1ST ADD TO BESS)
Address (321 BLACK AVE BESSEMER AL 35020-6800)
Location (321 BLACK AVE BESSEMER AL 35020)
Land (2700)
Imp (7800)
Total (10500)
Total Market Value (10450)
11 columns of information. Each PDF processed will have up to 20000 rows of data.
Please let me know if any clarification is needed on the process. I look forward to working with you on this project!
Skills:
data scraping,tax,adobe portable document format (pdf),automation,comma separated values(csv),extraction,IMP programming language,marketing,web
- Category
Source: peopleperhour.com