Java web crawler and text extraction modules
$250-750 USD
Paid on delivery
Part A ) Extract information from a given set of url's (BID URLs) which contain many PDF in Spanish and extract from the PDFs text using regular expressions.
Example:
The URL [login to view URL] should produce the following : Gerente de proyecto, Desarollador Java, Desarrollador PHP, Desarrollador Forms, Desarrollador .NET , Arquitecto de Software. This text is in page 47 of one of the files listed in the url. Keep in mind you have to parse all the docs in the URL.
Part B) After extracting the text the idea is to Store some of the text that matches certain criteria into a relational database (Mysql). With the above example the idea would be to store in a table with three fields:
| URL
| [login to view URL] | Gerente de Proyecto | Ingeniero de Sistemas
Un (1) año en Gerencia de proyectos informáticos | 1
Conditions:
1. Automatic replies that do not ask for especific information will be automatically discarded.
2. Deliverable MUST be configured as a working java maven project and does NOT have to be web.
3. Only one payment will be made when deliverables work and fully tested.
4. Project will be awarded to the first programmer to submit a working prototype of part A.
Project ID: #4618081
About the project
15 freelancers are bidding on average $655 for this job
Hello Sir, I can do this project for you, and Part A ready please check your PM for more details Thanks Bing
I have good experience in Java Web Scraping applications. Please check your P.M.B. sir.....
Hi, I am an expert web-scrapping application maker and also very comfortable with extracting text from pdf and regex. Please see private message for more details. Thanks
Dear sir, I have experience about Java extracting. Please see pmb for more details. Thanks.