Budget 46$ per month
Posted: 5 years ago
Opened
Description
Here are the specifications for the content Parser

index.php
Page with an HTML textarea to put Bulk URLS in and a Submit button to start Content parsing

When the Parsing of a URL is finished the URL is subtracted from the textarea list (without new page loading).

We parse also source code not Plain Text only.

Config.php
2.1 Define were parsing should start for example

=> div class "XXX" (pharse match, means if is class="yyy XXX zzz" = match)
=> div id "XXX" (pharse match)
=> H1 or H2

2.2 Define were parsing should end for example

=> div class "XXX" (pharse match)
=> div id "XXX" (pharse match)

2.3 Define elements (array) to skip without saving in the database for example

=> div class "XXX" (pharse match)
=> div id "XXX" (pharse match)
=> script
=> images
=> , , , , that does not have any words

2.4 Define Proxy List

A proxy list to make sure that parsing does not stop after 100 pages.

2.5 Define Parser User-Agent

for example "Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0"

2.5 Define Intervall per URL

For example '10' for 10 Seconds between finished URL and next start

Cleaning.php
The cleaned up text should contain only the following elements.

3.1 Allowed HTML Elements that should stay in the final version of the text are , , to , , , , , , , should be should be should be should be

Save the cleaned up english version of the text in database. Columns can be "headline", "english", "german", "orginal url",

Translate.php
Translate API

Use the following php scripts to translate the cleaned up text into german and save the translated text in the database

https://github.com/GoogleCloudPlatform/php-docs-samples/tree/master/translate
https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/translate/src/translate.php
Skills:
html/html5,application programming interface (API),microsoft windows,mozilla (firefox),PHP programming language,software development
Category
Source: peopleperhour.com

Add a bid

days