Your job is to create a simple, short string classifier in Perl. The input to the system is a person's name, thus an UTF-8 encoded string usually between 10-40 characters in length, and the system will classify to which of the pre-defined classes the string belongs to. The classes are ethnicity groups.
For example, if input to the system is "John Smith", the system would output class "English", or if the input is "hiromi akiyama" the system would output "Japanese". There are 18 different classes (ethnicity groups).
The system has two parts: 1) Training script called [url removed, login to view] which trains the system using given training data (list of known 'name = ethnicity' pairs) and saves the "trained state" of the system to disk. The script is called by "perl [url removed, login to view] [url removed, login to view]".
2) Analyzer script [url removed, login to view] which loads the "trained state" from disk (generated by the training script previously), and uses the loaded data to classify to which class a given string belongs to. The script is called by "perl [url removed, login to view] [url removed, login to view]" in which case it will load the given test file, OR as in "perl [url removed, login to view] "john smith"" in which case it would simply analyze (classify) the given string from the command line ("john smith" in this case).
Attached is data.zip. It contains [url removed, login to view] and testing_data.txt. The data is in format of "name:class" where the name is base64 encoded.
Your system must be able to be trained using the given [url removed, login to view] in a way it analyzes [url removed, login to view] with 90% or better accuracy.
Notice: The solution must be some kind of training based solution. For example, a bayesian classifier, ngram analyzer or artificial intelligence or machine learning of some sort. The solution must not be based on any regular expressions or fixed (human written) set of detection rules.
You are free to use any existing free Perl code, libraries and modules, such as AI or data classifier libraries.
5 freelancers are bidding on average $156 for this job
hi, I've checked the project spec. I can come out a perl script for you by using Algorithm::NaiveBayes for example to predict the person's ethnicity using the training data sets