Find Jobs
Hire Freelancers

data processing command line software

$30-100 USD

Completed
Posted over 15 years ago

$30-100 USD

Paid on delivery
This software will operate from the command line (bash prefered) and process a set of files in a directory.? Each input file will be used to generate a modified output file. The input file consists of two columns of data.? The second column needs to be parsed into 25 columns and output as a tab delimited text file.? In addition to parsing, the numbers in the column need to be de-scaled and de-normalized according to equations that will be provided.? Statistics also need to be calculated by comparing each of the 25 output columns to a column in a reference file.? All the columns in the reference file need to be includes as the first columns in the output file.? A statistic from the statistical processing will be concatenated onto the name of the input file to create the output file name. The number of rows in each of the 25 columns can be determined by reading the reference file.? The numerical values for the de-scale and de-normalize operations can be included in this file as well, as opposed to passing them in with the command line arguments.? Sample files will be provided along with the .xls macro that is currently used for processing.? Software should be in C or C++, compiled under cygwin GNU g++ using makefiles.? A java, ruby, script language(perl, python etc.) or database solution using H2 or other freeware database would also be acceptable.? Feel free to make other suggestions. ## Deliverables This software will be used to process the output of an artificial neural net.? The neural net outputs a predicted value for a set of input patterns (rows) every 100 training epochs.? This output is appended to a file resulting in a single column of data.? There are currently no separators between the sets of output (the column is continuous), although this could be changed if absolutely necessary. The software will convert this single column of data into multiple columns (one for each 100 epoch output) in spreadsheet format.? In the included example, column one of the output will be the first 571 rows, the next column will be rows 572-1142 etc.? The software must read the block size (571 in this example) from the reference file so that any dimension can be processed.? Currently there are 25 columns in the final output, but this number should be flexible.? Along with parsing the single column to multiple columns, some numeric manipulation is necessary.? The numbers have been normalized (z-score) and scaled (from 0.2-0.8).? This must be reversed.? I will provide equations if you do not know how to do this.? The equations may also be taken from the included spreadsheet. After parsing and de-scale, de-normalize, statistics must be calculated by comparing the values in each column to values in the reference file.? A squared correlation coefficient (r2) and a mean absolute error will be calculated.? Ask for equations if you need them.? There are three subsets of data, train, cross validate and external validate.? In the included example reference file, rows 2-422 are the train rows (T in the group column), rows 423-467 are the cross validate rows (S in the group column) and rows 468-572 are the external validate rows (V in the group column).? Three separate sets of statistics need to be calculated.? If you open the sample output file in excel, you will see the statistics in the cells above each of the 25 columns of output, starting with column F for epoch 100.? These statistics are calculated in the included spreadsheet [login to view URL] The minimum EV-MAE value, in this case cell AD7, must be determined and concatenated with an underscore to the beginning of the output filename. I think that C or C++ would provide good performance for this, as there could be a large number of files to process.? Input to the command line should be the directory where files are to be processed (./ as default) and the name of the reference file.? Other languages and solutions are permissible, such as java, ruby or a database application.? All components should be freeware unless cleared in advance.? Software needs to be compiled under windows using cygwin (Eclipse if it is a java app). I am open to other suggestions, just let me know. Thanks for considering my project, LMH_medchemist Sample Files: sample file to be processed [login to view URL] reference file to go with input file [login to view URL] sample output file (output of software) [login to view URL] .xls file and macro currently used for processing [login to view URL] plots of final output (for context) [login to view URL]
Project ID: 3049342

About the project

3 proposals
Remote project
Active 16 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
See private message.
$7 USD in 14 days
5.0 (116 reviews)
6.7
6.7
3 freelancers are bidding on average $59 USD for this job
User Avatar
See private message.
$85 USD in 14 days
4.9 (55 reviews)
5.0
5.0
User Avatar
See private message.
$85 USD in 14 days
5.0 (41 reviews)
4.8
4.8

About the client

Flag of UNITED STATES
Quincy, United States
5.0
39
Member since May 7, 2007

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.