Find Jobs
Hire Freelancers

Fortran clustering for making groups if related terms.

$250-750 USD

Closed
Posted over 10 years ago

$250-750 USD

Paid on delivery
Write custom clustering code for making groups if related terms. Note, this is the same as this project: https://www.freelancer.com/projects/C-Programming-Algorithm/Custom-clustering-code-for-making.html but for several reasons, I prefer it written in Fortran. Please bid on this if you have strong fortran experience, and also with clustering algorithms. SOURCE: The source information will be a set of N-dimensional vectors, where N is a set of words that often appear in the same paragraphs as other words. The input are topics generated from a proprietary corpus using latent Dirichlet allocation (LDA). We currently have a dozen vectors (each vector is a topic from LDA), and N ~= 300. We have a simple file format delimitated with newlines, "|" and ";". OUTPUT: Code should be in Fortran. You will probably use Group Average Agglomerative Clusterer. We used python NLTK as a proof of concept, and we had preliminary success. You can see our simple python. There will be additional weighting information, as we have additional data about the weights of some of the other N words between eachother. The algorithm is intended to have the degree of clustering depend on the initial similarity of the clusters. There will be 5, tightly related tasks: 1) Write compiled code for merging our source vectors. The result will be analogous to our python NLTK sample. 2) Add weighting information we provide. (We have weighting scores for some of the N terms, which will cause any cluster they are in to be more or less important.). Specifically, we have 100 themes. Example themes are "sports" and "food". We know that the word "apple" has a high weight for the "food" theme, and a low score for the "sports" theme. Therefore a cluster containing [apple, THEME:sports] would be weighted lower than a cluster containing [apple, THEME:food]. 3) Adjust similarities for a subset M of N terms, so they are less likely to be combined. For example, if M = [orange, apple], then two sets [orange, banana] and [pear, apple] would be considered more distant. (not the subset M is the same as the THEMES in #2). Not all M have different relationships. Some are negative or positive. e.g., food:sports = -1; but computer:science = 0.8. We will provide a list. 4) Add information from an additional set of W vectors. These vectors are sets of terms extracted from Wikipedia. For example, a vector in W would be all the outgoing links from a wikipedia article, with higher weights depending on their closeness to the start of the wikipedia article. 5) Filter to omit stopwords (will be provided), irrelevant parts of speech (tbd), duplicates (i.e., no word should be in >1 final cluster), and low-probability groups (eliminated). The output will be a list of potentially related terms.
Project ID: 4758928

About the project

5 proposals
Remote project
Active 11 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
5 freelancers are bidding on average $775 USD for this job
User Avatar
Hello, expert in Fortran programming here, I also have some experience with Natural Language Processing. Thanks, Paul
$1,600 USD in 21 days
5.0 (123 reviews)
6.9
6.9
User Avatar
The source code should be on Fortran90 or earlier?
$400 USD in 10 days
4.1 (12 reviews)
4.2
4.2
User Avatar
Hi. Read in private please. Thanks.
$400 USD in 7 days
4.7 (8 reviews)
3.8
3.8
User Avatar
Contact me for further details.
$444 USD in 15 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Rockville, United States
5.0
83
Payment method verified
Member since Jun 26, 2010

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.