Text-Statistics-Latin version 0.06 ================================== ABSTRACT This module performs statistical nalisys (2) of corpora (1). DESCRIPTION Given a copus as input, Text::Statistics::Latin creates a seven column CSV file as output, with one line for each token per text. Names of input files need match the following pattern: 1 (1). txt', '1 (2). txt', ..., '1 (n).txt' or 1 \(([1-9]|[1-9][0-9]+)\)\.txt Columns store statistical information: (1) number of word forms in document d; (2) number of tokens in d; (3) Id number of d, ie., n; (4) frequency of term t in d; (5) corpus frequency of t ; (6) document frequency of t (number of documents where t occurs at + least once); (7) t, UTF8 latin coded token-string delimited by C<< /[ -@]|[\[-` +]|[{-¿]|[ɐ-˩]|[ʹ-�]/ >> Main output file name is '1 (n + 5).txt' and it is stored in the s +ame directory as the corpus, together with residual files on each input file with . +txu and .txv ad hoc extensions. This code was written under CAPES BEX-09323-5 Example: #!/usr/bin/perl use strict; use Text::CStatiBR; &Text::CStatiBR::CSTATIBR("5"); #5 files are analised. #Main output #file created is #1 (10).txt INSTALLATION To install this module type the following: perl Makefile.PL make make test make install DEPENDENCIES This module requires these other modules and libraries: utf8 Text::ParseWords SEE ALSO http://search.cpan.org/~ambs/ http://search.cpan.org/~tpederse/ REFERENCES (1) BERBER-SARDINHA, Tony. Linguistica de Corpus. Manole, 2004 (2) http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html COPYRIGHT AND LICENCE Copyright (C) 2007 by Rodrigo Panchiniak Fernandes This code was written under CAPES BEX-09323-5 This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.