Find Jobs
Hire Freelancers

auto-summarization tool

$35-50 USD

Closed
Posted over 12 years ago

$35-50 USD

Paid on delivery
Abstract of the project Auto-summarization is a technique used to generate summaries of electronic documents. This has some applications like summarizing the search-engine results, providing briefs of big documents that do not have an abstract etc. There are two categories of summarizers, linguistic and statistical. Linguistic summarizers use knowledge about the languange (syntax/semantics/usage etc) to summarize a document. Statistical ones operate by finding the important sentences using statistical methods (like frequency of a particular word etc). Statistical summarizers normally do not use any linguistic information. In this project, an auto-summarization tool is developed using statistical techniques. The techniques involve finding the frequency of words, scoring the sentences, ranking the sentences etc. The summary is obtained by selecting a particular number of sentences (specified by the user) from the top of the list. It operates on a single document (but can be made to work on multiple documents by choosing proper algorithms for integration) and provides a summary of the document. The size of the summary can be specified by the user when invoking the tool. Pre-processing interfaces are there to handle the following document types: Plain Text, HTML, Word Document. Keywords Generic Technlogy keywords : Algorithm, Programming Specific Technology keywords :C, C++, Java, C-Sharp Project type keywords :Statistics, User Interface Functional components of the project Following is a list of the functional components of the tool. 1. Text pre-processor. This will work on the HTML or Word Documents and convert them to plain text for processing by the rest of the system. 2. Sentence separator. This goes through the document and separates the sentences based on some rules (like a sentence ending is determined by a dot and a space etc). Any other appropriate criteria might also be added to separate the sentences. 3. Word separator. This separates the words based on some criteria (like a space denotes the end of a word etc). 4. Stop-words eliminator. This eliminates the regular English words like ‘a, an, the, of, from..’ etc for further processing. These words are known as ‘stop-words’. A list of applicable stop-words for English is available on the Internet. 5. Word-frequency calculator. This calculates the number of times a word appears in the document (stop-words have been eliminated earlier itself and will not figure in this calculation) and also the number of sentences that word appears in the document. For example, the word ‘Unix’ may appear a total of 100 times in a document, and in 80 sentences. (Some sentences might have more than one occurrence of the word). Some min-max thresholds can be set for the frequencies (the thresholds to be determined by trial-and-error) 6. Scoring algorithm. This algorithm determines the score of each sentence. Several possibilities exist. The score can be made to be proportional to the sum of frequencies of the different words comprising the sentence (ie, if a sentence has 3 words A, B and C, then the score is proportional the sum of how many times A, B and C have occurred in the document). The score can also be made to be inversely proportional to the number of sentences in which the words in the sentence appear in the document. Likewise, many such heuristic rules can be applied to score the sentences. 7. Ranking. The sentences will be ranked according to the scores. Any other criteria like the position of a sentence in the document can be used to control the ranking. For example, even though the scores are high, we would not put consecutive sentences together. 8. Summarizing. Based on the user input on the size of the summary, the sentences will be picked from the ranked list and concatenated. The resulting summary file could be stored with a name like _summary.txt. 9. User Interface. The tool could use a GUI or a plain command-line interface. In either case, it should have
Project ID: 1306206

About the project

2 proposals
Remote project
Active 12 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
2 freelancers are bidding on average $98 USD for this job
User Avatar
Hi, i can deliver you the required code. Plz see PMB for further details..
$45 USD in 1 day
5.0 (4 reviews)
3.1
3.1
User Avatar
HELLO SIR PLEASE CHECK PMB
$150 USD in 2 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of INDIA
chennai, India
0.0
0
Member since Jan 16, 2008

Client Verification

Other jobs from this client

IOS Developer
₹1500-12500 INR
Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.