![]() |
InfoSequenceFiles.pl - List information about sequence and alignment files
InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] [-f, --frequency] [--FrequencyBins number | ''number, number, [number,...]''] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] SequenceFile(s)...
List information about contents of SequenceFile(s) and AlignmentFile(s): number of sequences, shortest and longest sequences, distribution of sequence lengths and so on. The file names are separated by spaces. All the sequence files in a current directory can be specified by *.aln, *.msf, *.fasta, *.fta, *.pir or any other supported formats; additionally, DirName corresponds to all the sequence files in the current directory with any of the supported file extension: .aln, .msf, .fasta, .fta, and .pir.
Supported sequence formats are: ALN/CLustalW, GCG/MSF, PILEUP/MSF, Pearson/FASTA, and NBRF/PIR. Instead of using file extensions, file formats are detected by parsing the contents of SequenceFile(s) and AlignmentFile(s).
This option is ignored for input files containing only single sequence.
The bin range list is used to group sequence lengths into different groups; It must contain values in ascending order. Examples:
The frequency value calculated for a specific bin corresponds to all the sequence lengths which are greater than the previous bin value and less than or equal to the current bin value.
To count number of sequences in sequence files, type:
To list all available information with maximum level of available detail for a sequence alignment file Sample1.msf, type:
To list sequence length information after ignoring sequence gaps in Sample1.aln file, type:
To list shortest and longest sequence length information after ignoring sequence gaps in Sample1.aln file, type:
To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into 10 bins, type:
To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into specified bin range, type:
AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, InfoAminoAcids.pl, InfoNucleicAcids.pl
Copyright (C) 2004-2012 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.