I have hundreds of PDF exam files which are all in the same format, I want all the exams formatted into a database, but for this I need them to be something I can parse easily like XML/HTML.
The info I need from each exam is:
For each question:
1. Question number (and if the exam is divided to topics, which topic it belongs to)
2. Question Text (the actual question.
3. If the question has multiple choices the text of each choice. (the question title specifies if it is a multiple choice question or not).
4. Question answer.
5. Question Answer Explanation.
The hard part is that fields 2-5 might contain images in them, if there is an image, it should be extracted to a file, and referenced to from the correct place.
I don't care if the script/program that you'll supply will handle one exam at a time and I'll create a script the runs it on on all the files.
Attaching a sample exam file, there is an image in question #4, I'll supply later 2 more exams that will basically cover all the possible cases of how an exam should look like.
18 freelancers are bidding on average $246 for this job
Hello, I'm a novice freelancer with great experience in the development, I want to make the most quickly and efficiently. Send a more detailed this job! Any question welcome! Best regards, Vasiliy