This is a short project and the attached file has everything you will need.
It contains a very small perl script which we would like assistance optimising.
You can run the script with this command:
[login to view URL] ./html_files/[login to view URL]
This will print out every line from [login to view URL] that appears in guardian.html.
It uses Perl's built in match operator. We are trying to do a lot of matches on a lot of HTML files and when we parallelise it, it is very CPU intensive. I want to find a way to reduce the CPU. One approach may be to tokenize the HTML file first and sort the tokens.
Your solution should be simple and well documented. We do not need solutions which forks or threads as we have that already (I removed that code for simplicity). The final code you deliver should behave in exactly the same way - using a single thread and it's my hope that you will find a more efficient way of doing the matches.
Your solution should be as simple as possible (like the attached), and very well commented.
In your bid, please mention the word 'Antelope' so that I know you have read this far down!
Thanks !