NAME
InfoFingerprintsTextFiles.pl - List information about fingerprints data in TextFile(s)
SYNOPSIS
InfoFingerprintsTextFiles.pl TextFile(s)...
InfoFingerprintsTextFiles.pl [-a, --all] [--AverageBitDensity] [--BitDensity]
[-c, --count] [-c, --ColMode ColNum | ColLabel] [--DataCheck]
[-d, --detail InfoLevel] [-e, --empty] [--FingerprintsCol col number | col name]
[--FingerprintsType] [--FingerprintsDescription] [--FingerprintsSize]
[--FingerprintsBitStringFormat] [--FingerprintsBitOrder]
[--FingerprintsVectorValuesType] [--FingerprintsVectorValuesFormat]
[-h, --help] [--InDelim comma | semicolon]
[--NumOfOnBits] [--NumOfNonZeroValues]
[-w, --WorkingDir dirname] TextFile(s)...
DESCRIPTION
List information about fingerprints data in TextFile(s): number of rows containing
fingerprints data, type of fingerprints vector, description and size of fingerprints, bit density
and average bit density for bit-vector fingerprints strings, and so on.
The valid file extensions are .csv and .tsv for comma/semicolon and tab delimited
text files respectively. All other file names are ignored. All the text files in a
current directory can be specified by *.csv, *.tsv, or the current directory
name. The --indelim option determines the format of TextFile(s). Any file
which doesn't correspond to the format indicated by --indelim option is ignored.
Format of fingerprint strings data in TextFile(s) is automatically detected. The current release
of MayaChemTools supports the following types of fingerprint bit-vector and vector strings:
FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes;1024;
HexadecimalString;Ascending;00000000000000000000000040000000000000
000000000000000000000000000020200000000000000000004000000000000...
FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes;1024;
BinaryString;Ascending;0000000000000000000000000000000000000000000
000000000000001000000001000000000010000000000000000000010000000...
FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes;27;
NumericalValues;IDsAndValuesPairsString;C 8 O 1 C:C 8 C:O 2 C:C:C 9
C:C:O 3 C:O:C 1 C:C:C:C 10 C:C:C:O 4 C:C:O:C 3 C:C:C:C:C 10 ...
FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;000000000
000000000000000000000000000000000000000000000000001000000000000
000000000010000000000001001000000000000000000001000000000000000...
FingerprintsBitVector;MACCSKeyBits;166;HexadecimalString;Ascending;0000
002000002010008040084010080100902805e1
FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;1100000000
0000001000001000010011000001100000001000000000000000101000000000
0000000000000000000000000000000000000000000000000000100000000000...
FingerprintsBitVector;MACCSKeyBits;322;HexadecimalString;Ascending;3001
48c060400041000000000000000100000000000000000000000000000000500
000000000000000
FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesString;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesString;
2 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0 1 0 0 7 1 0 0 0 0 0 2 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 2 0 0 0 0 0 0 0 0 ...
FingerprintsVector;ExtendedConnectivity:AtomicInvariantsAtomTypes;14;
AlphaNumericalValues;ValuesString;333564680 1142173602 14814699391
977749791 2006158649 291020918 443330853 692611812 816539344173
1657806 2039728782 931045615 1273931663 1317501190
FingerprintsVector;ExtendedConnectivity:FunctionalClassAtomTypes;11;
AlphaNumericalValues;ValuesString;862102353 981185303 12517955598
10600886 885767127 1452087973 1878436093 2029559552 1465773182
1530666307 2113761516
FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes;23;
NumericalValues;IDsAndValuesString;C.X1.BO1.H3-D1-C.X3.BO4 C.X2.BO3.H1-
D1-C.X2.BO3.H1 C.X2.BO3.H1-D1-C.X3.BO4 C.X2.BO3.H1-D1-N.X2.BO2.H1
C.X3.BO4-D1-C.X3.BO4 C.X3.BO4-D1-O.X1.BO2 C.X1.BO1.H3-D2-C.X2.BO3.H1
C.X1.BO1.H3-D2-C.X3.BO4...; 1 1 2 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 2 1 1 1
FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes;23;
NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3-D1-C.X3.BO4 1
C.X2.BO3.H1-D1-C.X2.BO3.H1 1 C.X2.BO3.H1-D1-C.X3.BO4 2 C.X2.BO3.H1-
D1-N.X2.BO2.H1 2 C.X3.BO4-D1-C.X3.BO4 1 C.X3.BO4-D1-O.X1.BO2 1
C.X1.BO1.H3-D2-C.X2.BO3.H1 1 C.X1.BO1.H3-D2-C.X3.BO4 1
C.X2.BO3.H1-D2-C.X2.BO3.H1 1 C.X2.BO3.H1-D2-C.X3.BO4 3...
FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;11;
NumericalValues;IDsAndValuesString;C.X1.BO1.H3-C.X3.BO4-C.X2.BO3.H1-
N.X2.BO2.H1 C.X1.BO1.H3-C.X3.BO4-C.X3.BO4-C.X2.BO3.H1 C.X1.BO1.H3-
C.X3.BO4-C.X3.BO4-O.X1.BO2 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-C.X3.BO4
C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-O.X1.BO2...;
1 1 1 1 1 1 1 1 1 1 1
FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;11;
NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3-C.X3.BO4-C.X2.BO3.H1-
N.X2.BO2.H1 1 C.X1.BO1.H3-C.X3.BO4-C.X3.BO4-C.X2.BO3.H1 1 C.X1.BO1.H3-
C.X3.BO4-C.X3.BO4-O.X1.BO2 1 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-
C.X3.BO4 1 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-O.X1.BO2 1 C.X2.BO3.H1-
C.X2.BO3.H1-N.X2.BO2.H1-C.X2.BO3.H1 1 C.X2.BO3.H1-C.X3.BO4-C.X3.BO4-
C.X2.BO3.H1 1...
FingerprintsVector;TopologicalPharmacophoreAtomPairs;150;
OrderedNumericalValues;ValuesString;1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0
0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
FingerprintsVector;TopologicalPharmacophoreAtomPairs;150;
OrderedNumericalValues;IDsAndValuesString;H-D1-H H-D1-HBA H-D1-HBD
H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI HBA-D1-PI HBD-D1-HBD
HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D2-H H-D2-HBA ...;
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 ...
FingerprintsVector;TopologicalPharmacophoreAtomTriplets;4960;
OrderedNumericalValues;ValuesString;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
FingerprintsVector;TopologicalPharmacophoreAtomTriplets;4960;
OrderedNumericalValues;IDsAndValuesString;Ar1-Ar1-Ar1 Ar1-Ar1-H1
Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-Ar1-NI1 Ar1-Ar1-PI1 Ar1-H1-H1 ...;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0...
FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes;10;
AlphaNumericalValues;ValuesString;NR0-C.X2.BO3.H1-ATC1:NR1-C.X2.BO3.H1-
ATC1:NR1-C.X3.BO4-ATC1:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC1:NR2-
N.X1.BO1.H2-ATC1 NR0-C.X2.BO3.H1-ATC1:NR1-C.X2.BO3.H1-ATC1:NR1-
C.X3.BO4-ATC1:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC2 NR0-C.X2.BO3.H1-
ATC1:NR1-C.X2.BO3.H1-ATC2:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC1...
OPTIONS
- -a, --all
-
List all the available information.
- --AverageBitDensity
-
List average bit density of fingerprint bit-vector strings.
- --BitDensity
-
List bit density of fingerprints bit-vector strings data in each row.
- --count
-
List number of rows containing fingerprints bit-vector or vector strings data. This
is default behavior.
- -c, --ColMode ColNum | ColLabel
-
Specify how columns are identified in TextFile(s): using column number or column
label. Possible values: ColNum or ColLabel. Default value: ColNum
- -d, --detail InfoLevel
-
Level of information to print about lines being ignored. Default: 1. Possible values:
1, 2 or 3.
- --DataCheck
-
Validate fingerprints data specified using --FingerprintsCol and list information
about missing and invalid data.
- -e, --empty
-
List number of rows containing no fingerprints data.
- --FingerprintsCol col number | col name
-
This value is -c, --colmode specific. It corresponds to column in TextFile(s)
containing fingerprints data. Possible values: col number or col label.
Default value: first column containing the word Fingerprints in its column label.
- --FingerprintsType
-
List types of fingerprint strings: FingerprintsBitVector or FingerprintsVector.
- --FingerprintsDescription
-
List types of fingerprints: PathLengthBits, PathLengthCount, MACCSKeyCount,
ExtendedConnectivity and so on.
- --FingerprintsSize
-
List size of fingerprints.
- --FingerprintsBitStringFormat
-
List format of fingerprint bit-vector strings: BinaryString or HexadecimalString.
- --FingerprintsBitOrder
-
List order of bits data in fingerprint bit-vector bit strings: Ascending or Descending.
- --FingerprintsVectorValuesType
-
List type of values in fingerprint vector strings: OrderedNumericalValues, NumericalValues or
AlphaNumericalValues.
- --FingerprintsVectorValuesFormat
-
List format of values in fingerprint vector strings: ValuesString, IDsAndValuesString,
IDsAndValuesPairsString, ValuesAndIDsString or ValuesAndIDsPairsString.
- -h, --help
-
Print this help message.
- --InDelim comma | semicolon
-
Input delimiter for CSV TextFile(s). Possible values: comma or semicolon.
Default value: comma. For TSV files, this option is ignored and tab is used as a
delimiter.
- --NumOfOnBits
-
List number of on bits in fingerprints bit-vector strings data in each row.
- --NumOfNonZeroValues
-
List number of non-zero values in fingerprints vector strings data in each row.
- -w, --WorkingDir DirName
-
Location of working directory. Default: current directory.
EXAMPLES
To count number of lines containing fingerprints bit-vector or vector strings data present in a
column name containing Fingerprint substring, type:
% InfoFingerprintsTextFiles.pl SampleFPBin.csv
% InfoFingerprintsTextFiles.pl SampleFPHex.csv
% InfoFingerprintsTextFiles.pl SampleFPcount.csv
To list all available information about fingerprints bit-vector or vector strings data present in a
column name containing Fingerprint substring, type:
% InfoFingerprintsTextFiles.pl -a SampleFPHex.csv
% InfoFingerprintsTextFiles.pl -a SampleFPcount.csv
To list all available information about fingerprints bit-vector or vector strings data present in a
column named Fingerprints , type:
% InfoFingerprintsTextFiles.pl -a --ColMode ColLabel --FingerprintsCol
Fingerprints SampleFPHex.csv
% InfoFingerprintsTextFiles.pl -a --ColMode ColLabel --FingerprintsCol
Fingerprints SampleFPcount.csv
To list bit density, average bit density, and number of on bits for fingerprints bit-vector strings data
present in a column named containing Fingerprint substring, type:
% InfoFingerprintsTextFiles.pl --BitDensity --AverageBitDensity
--NumOfOnBits SampleFPBin.csv
To list vector values type, format and number of non-zero values for fingerprints vector strings
data present in a column named containing Fingerprint substring along with fingerprints type
and description, type:
% InfoFingerprintsTextFiles.pl --FingerprintsType --FingerprintsDescription
--FingerprintsVectorValuesType --FingerprintsVectorValuesFormat
--NumOfNonZeroValues SampleFPcount.csv
AUTHOR
Manish Sud
SEE ALSO
InfoFingerprintsSDFiles.pl, SimilarityMatrixSDFiles.pl, SimilarityMatrixTextFiles.pl, 
AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, 
MACCSKeysFingerprints.pl, PathLengthFingerprints.pl, 
TopologicalAtomPairsFingerprints.pl, TopologicalAtomTorsionsFingerprints.pl, 
TopologicalPharmacophoreAtomPairsFingerprints.pl, TopologicalPharmacophoreAtomTripletsFingerprints.pl
COPYRIGHT
Copyright (C) 2004-2010 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option)
any later version.