small PHP project: parse HTML files, write XML - repost
$30-250 USD
Cancelled
Posted over 10 years ago
$30-250 USD
Paid on delivery
Hey PHP experts!\r\n\r\nWe need a PHP 5 program that parses HTML files, extracts specific content and writes it to a XML file. \r\nYou find as an attachment a zip, please have a look at the sample files.\r\n\r\nHere is a task description in pseudocode:\r\n\r\n// PHP5\r\nclass Utilities\r\n{\r\n \r\n public $pathToSourceDirectory = 'someSourceDirectory';\r\n public $pathToTargetDirectory = 'someTargetDirectory';\r\n public $nameXMLfile = 'newXMLFile';\r\n public $targetNode = "misc_texts";\r\n public $targetTag = 'body'\r\n public $ignoreDate:Boolean;\r\n \r\n $xmlFile = $pathToTargetDirectory.'/'.$nameXMLfile;\r\n if exists, open $xmlFile \r\n else create $xmlFile first \r\n \r\n write one or several methods that perform the following routine:\r\n \r\n loop through all files inside $pathToSourceDirectory and all its subdirectories\r\n if the file is a HTML file (any extension like .html || .htm || .HTML etc.)\r\n if date of the file is newer than date of $xmlFile || $ignoreDate == true \r\n open file\r\n \r\n parse it: loop through all the top-level tags (do not loop through children tags) \r\n if div does not have the class 'private'\r\n extract content of div\r\n write it to $xmlFile \r\n as a child of the $targetNode \r\n if node with this page name already exists (compare page name) replace content \r\n else add new node\r\n \r\n structure of the resulting $xmlFile:\r\n \r\n // $targetNode\r\n \r\n // path to file (replace slashes (/) with double underscore ('__') + filename (without extension)\r\n // id of first div\r\n content to be extracted\r\n ]]>\r\n \r\n // id of second div\r\n content to be extracted\r\n ]]>\r\n \r\n \r\n \r\n \r\n \r\n content of file B to be extracted\r\n ]]>\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n save $xmlFile\r\n close all files\r\n return success or error\r\n \r\n}\r\n\r\nLooking forward to hear from you!\r\nAndreas