Perl short text classifier to guess person's ethnicity from their name

Your job is to create a simple, short string classifier in Perl. The input to the system is a person's name, thus an UTF-8 encoded string usually between 10-40 characters in length, and the system will classify to which of the pre-defined classes the string belongs to. The classes are ethnicity groups.

For example, if input to the system is "John Smith", the system would output class "English", or if the input is "hiromi akiyama" the system would output "Japanese". There are 18 different classes (ethnicity groups).

The system has two parts: 1) Training script called [url removed, login to view] which trains the system using given training data (list of known 'name = ethnicity' pairs) and saves the "trained state" of the system to disk. The script is called by "perl [url removed, login to view] [url removed, login to view]".

2) Analyzer script [url removed, login to view] which loads the "trained state" from disk (generated by the training script previously), and uses the loaded data to classify to which class a given string belongs to. The script is called by "perl [url removed, login to view] [url removed, login to view]" in which case it will load the given test file, OR as in "perl [url removed, login to view] "john smith"" in which case it would simply analyze (classify) the given string from the command line ("john smith" in this case).

Attached is It contains [url removed, login to view] and testing_data.txt. The data is in format of "name:class" where the name is base64 encoded.

Your system must be able to be trained using the given [url removed, login to view] in a way it analyzes [url removed, login to view] with 90% or better accuracy.

Notice: The solution must be some kind of training based solution. For example, a bayesian classifier, ngram analyzer or artificial intelligence or machine learning of some sort. The solution must not be based on any regular expressions or fixed (human written) set of detection rules.

You are free to use any existing free Perl code, libraries and modules, such as AI or data classifier libraries.

Skills: Perl

See more: automatic text summarization project, literature review on automatic text summarization, text summarizer projects, scope of text summarization, text summarization using word2vec, objective of text summarization, text summarization book, automatic text summarization using machine learning approach, output perl script text file, perl parse text insert mysql, perl extract text tags, perl match text length, english short text, perl extract text html tags, perl add text pdf, use perl add text pdf, perl matching text files, perl match text file, perl match text exactly order, perl search text contains

About the Employer:
( 601 reviews ) Turku, Thailand

Project ID: #16538057

Awarded to:


Hi I am quite experienced programmer knowing several programming languages. Your project is interesting. In past I have studied Computer Science and the AI topic is something what I like to think about. Unfortunately t More

$165 USD in 10 days
(2 Reviews)

5 freelancers are bidding on average $156 for this job


hi, I've checked the project spec. I can come out a perl script for you by using Algorithm::NaiveBayes for example to predict the person's ethnicity using the training data sets

$155 USD in 3 days
(49 Reviews)
$155 USD in 3 days
(0 Reviews)
$150 USD in 3 days
(0 Reviews)

I would like to work on this project as I have enough experience in perl .... If interested please let's know.

$155 USD in 3 days
(0 Reviews)