Télécharger des ethernet.txt (ethernet.txt ( lien externe: SourceForge.net): 12,683 octets) va bientôt commencer. Sinon, cliquez sur ethernet.txt ( lien externe: SourceForge.net).

Informations sur le fichier

Taille du fichier
12,683 octets
MD5
7a0d4741c3f4f7d05ee3e3bb5d130fe0

Description du projet

Auto summarization provides a concise summary for a document. In this I present a Statistical approach to addressing the text generation problem in domain-independent, single-document summarization.
My thesis Includes salton’s vector space model which divides the sentences into categories which can also be used for summarizing the contents in WebPages.

The summarizer initially breaks the entire document into sentences based on the separators.
The Second step is that the unnecessary words are removed from the document.
The document after removing the stop words is revised again for the unique words. Unique words are the one which have the same meaning or might be redundant in the document. These are removed by a method called stemming.

By using the Stemming mechanism the occurrence of a word is calculated and the results are displayed in the format of how many times they occur and the number of sentences they have occurred.