Archive: Finding ineffective Regexp filters used on text filtering

Budget By arrangement
Posted: 5 years ago
Closed
Description
MySQL dump is provided.

Basic idea - find regexp'es that are ineffective from the database log and display them to me by allowing me to narrow down the result by choosing what type of search to perform. Server runs PHP 7.1, Centos 7 with Apache 2.4, MySql is Maria DB 10.1

PHP code should have minimal comments and should apply standard styling for variable naming.

We have a database from which i made a dump using:

SELECT id, unfiltered_title FROM jobs WHERE unfiltered_title!=""

And I got a dump file which is zipped using 7Zip. It is a Mysql table dump that you should be making tests with the code you are about to write.


You will find a table where we store unfiltered Job Title which is then cleaned up by a REGEXP from unnecessary words.

=> is a marker which shows what has happened to the job title after filtering it with a Regexp shown on the left of that symbol.

There could be a series of Regexp filtering applied to the Job Title text and after each filtering - we run next filter until filters find nothing else to filter.

For example we have a Job Title:

Facility Maintenance Worker (12 PM START-OUTDOOR WORK)

After running all our filter on it we get a final result which becomes:

Facility Maintenance Worker/OUTDOOR

And in the database we record the whole process where we can see each filtering step, what filter applied and what result received:

Unfiltered title: "Facility Maintenance Worker (12 PM START-OUTDOOR WORK)
(REGEXP REMOVED DUE TO LIMITATIONS ON CHARACTER COUNT HERE)

So we had 4 filters applied here as we can see 4 symbols =>

Not many filtering cycles but if systems tells me that there were 4 times, I can probably optimize it and make it to 2-3 cycles

And by looking at the filters I notice that 12Pm is a common phrase and could be grouped, so I improve existing Regexp filters and we get 3 cycles only. Program works faster. So now filtering would look that way in database log:


Unfiltered title: "Facility Maintenance Worker (12 PM START-OUTDOOR WORK)

(REGEXP REMOVED DUE TO LIMITATIONS ON CHARACTER COUNT HERE)

Now the task is to write a php script which analyses database log for filtering done on Job Titles and show me the ones that a potentially not very optimal.

So PHP script should allow me finding filtering where we had from x to y filtering cycles applied (and you can could filtering cycles by looking for => symbol). So for example I want to see what filters were applied on Job Titles where we had from 2 to 4 filtering cycles run.
It should allow me selecting just filters which were removing from x to y singular words, selecting filters which filter partial words and selecting the length of job title in words from x to y and these search criteria can go in any combination.
There should be a search selection to find certain max job title length: 0 (means empty job title), 1,2,3 or unlimited symbols in the job title remaining after filtering

There should be pagination of around 25-50 results as we may get thousands of results sometimes.
Skills:
mysql,apache http server,database,PHP programming language,website development
Category

Similar freelance jobs

safe 
...
7 months ago
safe$7
Hi Robert P. , I noticed your profile and would like to offer you my project. We can discuss any details over chat....
safe 
"Hello, I need to create a website based on a . psd design file. Please also specify the timeframe within which you can complete this. Technical requirements: Layout (not pixel-perfect, but close to it) Responsiveness (the website should properly adapt to different screen sizes and devices) Clean code......
safe 
Project Title: Delphi Code Feature Creation Using delphi: Check to see if this already exsits, if not: create a shedulled task to start my program on PC startup with highest permissions, with no password or prompt. program is called for example: C:\Program Files (x86)\Myfolder\myprogram. exe If it already......
safe 
I want to create a program for customers and sellers to be able to use, to create a template for there artwork and be able to size them to there specific needs. And be able to download them as pdf or png files. This program will benefit the clothing industry with the industry i am in....
safe 
Hi, I am not quite sure if what I am asking is possible but within my University degree there is a calculator program which only works on the HP Prime prime calculator, I own a Casio fx-CG50 AU. I was hoping to have this program converted into a format which is compatible with my Casio calculator, all......
1 year ago
safe 
I am looking for a skilled C++ developer to help with a real-time object detection system for my project....
1 year ago
safe 
Project Title: Correct Rcpp armadillo code Overview: I am seeking a skilled developer to fix a compilation error in my Rcpp armadillo code. The code is currently throwing a compilation error, and I need it to be resolved promptly. Requirements: - Proficient in Rcpp and armadillo library - Strong understanding......
1 year ago
safe 
Project Description: - I am looking for a freelancer who can combine all the columns of 2 rows with similar values in Telerik Reporting. - The combined data needs to be displayed in a table view. - Ideal skills and experience for this job include: - Proficiency in Telerik Reporting. - Strong understanding......
1 year ago
safe$250
Bonjour Dmytro M. , j'ai remarqué votre profil et je souhaite vous proposer mon projet. Nous pouvons discuter des détails via le chat....
safe 
Hello, I am looking to develop a unity soccer game with moves like score hero 1. Map with 1-3 stars achievments 2. engine to make the goals like i want 3. smooth game play 4. IAP , shop , buy lives 5. lives to play 6. finger smoothness touch 7. the level will be added automticlly to the map 8. rewind......
1 year ago
safe 
Project Title: Bootloader For Renesas RL78 mcu Description: I am looking for a skilled developer to create a bootloader for the Renesas RL78 mcu. The ideal candidate should have experience with Renesas RL78 mcu and be familiar with the development environment. bootloader code is ready need help for run......
safe 
I am using EMA´s (50/75/100/200),two Stoch (5/5/5 and 5/3/3), RDX Indicator and the Indicator "The Arty". For Short Positions the Prive has to be under the 200 EMA (the 100EMA under 200, 75 under 100, 50 under 75, NOT crossing) and for Long Positions over the 200 EMA ( 100 over 200, 75 over 100, 50 over......
1 year ago
safe 
My project involves using an Emotional Stroop Task to measure and assess attentional bias towards emotional stimuli. To do this, I will be using the Drift Diffusion Model which is an analytical model to study the human decision making process. My study will need to include 90 participants in order to......
safe 
I am looking for a WPF Developer for a long term project....
1 year ago
safe 
I'd like to extract tables from pdf invoices files. At this stage i really only want the products that are in the tables on the invoices. The only information i would like from the invoices is supplier date (received) location received. Within thetables i require stock code, product item, the unit measurement......
1 year ago
safe 
Im creating a config using 'openbullet 1. 4', everything added successfully, due i have 3 years of using openbullet, but this request i never faced it before!...
1 year ago
safe 
This position is responsible for verifying data quality reports and queries, correcting errors and also tracking them....
1 year ago
safe 
We are currently looking for developer to contact us on developing integration/plugin for below listed platform. Please drop me message on which plugin you're able to develop. Please send similar plugin portfolio of your previous work. The plugin need to ready for uploading to marketplace . ActiveCampaign.......
3 year ago
View all