MayaChemTools

Previous  TOC  NextInfoFingerprintsTextFiles.plCode | PDF | PDFGreen | PDFA4 | PDFA4Green

NAME

InfoFingerprintsTextFiles.pl - List information about fingerprints data in TextFile(s)

SYNOPSIS

InfoFingerprintsTextFiles.pl TextFile(s)...

InfoFingerprintsTextFiles.pl [-a, --all] [--AverageBitDensity] [--BitDensity] [-c, --count] [-c, --ColMode ColNum | ColLabel] [--DataCheck] [-d, --detail InfoLevel] [-e, --empty] [--FingerprintsCol col number | col name] [--FingerprintsType] [--FingerprintsDescription] [--FingerprintsSize] [--FingerprintsBitStringFormat] [--FingerprintsBitOrder] [--FingerprintsVectorValuesType] [--FingerprintsVectorValuesFormat] [-h, --help] [--InDelim comma | semicolon] [--NumOfOnBits] [--NumOfNonZeroValues] [-w, --WorkingDir dirname] TextFile(s)...

DESCRIPTION

List information about fingerprints data in TextFile(s): number of rows containing fingerprints data, type of fingerprints vector, description and size of fingerprints, bit density and average bit density for bit-vector fingerprints strings, and so on.

The valid file extensions are .csv and .tsv for comma/semicolon and tab delimited text files respectively. All other file names are ignored. All the text files in a current directory can be specified by *.csv, *.tsv, or the current directory name. The --indelim option determines the format of TextFile(s). Any file which doesn't correspond to the format indicated by --indelim option is ignored.

Format of fingerprint strings data in TextFile(s) is automatically detected. The current release of MayaChemTools supports the following types of fingerprint bit-vector and vector strings:

FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes;1024;
HexadecimalString;Ascending;00000000000000000000000040000000000000
000000000000000000000000000020200000000000000000004000000000000...
FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes;1024;
BinaryString;Ascending;0000000000000000000000000000000000000000000
000000000000001000000001000000000010000000000000000000010000000...
FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes;27;
NumericalValues;IDsAndValuesPairsString;C 8 O 1 C:C 8 C:O 2 C:C:C 9
C:C:O 3 C:O:C 1 C:C:C:C 10 C:C:C:O 4 C:C:O:C 3 C:C:C:C:C 10 ...
FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;000000000
000000000000000000000000000000000000000000000000001000000000000
000000000010000000000001001000000000000000000001000000000000000...
FingerprintsBitVector;MACCSKeyBits;166;HexadecimalString;Ascending;0000
002000002010008040084010080100902805e1
FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;1100000000
0000001000001000010011000001100000001000000000000000101000000000
0000000000000000000000000000000000000000000000000000100000000000...
FingerprintsBitVector;MACCSKeyBits;322;HexadecimalString;Ascending;3001
48c060400041000000000000000100000000000000000000000000000000500
000000000000000
FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesString;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesString;
2 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0 1 0 0 7 1 0 0 0 0 0 2 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 2 0 0 0 0 0 0 0 0 ...
FingerprintsVector;ExtendedConnectivity:AtomicInvariantsAtomTypes;14;
AlphaNumericalValues;ValuesString;333564680 1142173602 14814699391
977749791 2006158649 291020918 443330853 692611812 816539344173
1657806 2039728782 931045615 1273931663 1317501190
FingerprintsVector;ExtendedConnectivity:FunctionalClassAtomTypes;11;
AlphaNumericalValues;ValuesString;862102353 981185303 12517955598
10600886 885767127 1452087973 1878436093 2029559552 1465773182
1530666307 2113761516
FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes;23;
NumericalValues;IDsAndValuesString;C.X1.BO1.H3-D1-C.X3.BO4 C.X2.BO3.H1-
D1-C.X2.BO3.H1 C.X2.BO3.H1-D1-C.X3.BO4 C.X2.BO3.H1-D1-N.X2.BO2.H1
C.X3.BO4-D1-C.X3.BO4 C.X3.BO4-D1-O.X1.BO2 C.X1.BO1.H3-D2-C.X2.BO3.H1
C.X1.BO1.H3-D2-C.X3.BO4...; 1 1 2 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 2 1 1 1
FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes;23;
NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3-D1-C.X3.BO4 1
C.X2.BO3.H1-D1-C.X2.BO3.H1 1 C.X2.BO3.H1-D1-C.X3.BO4 2 C.X2.BO3.H1-
D1-N.X2.BO2.H1 2 C.X3.BO4-D1-C.X3.BO4 1 C.X3.BO4-D1-O.X1.BO2 1
C.X1.BO1.H3-D2-C.X2.BO3.H1 1 C.X1.BO1.H3-D2-C.X3.BO4 1
C.X2.BO3.H1-D2-C.X2.BO3.H1 1 C.X2.BO3.H1-D2-C.X3.BO4 3...
FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;11;
NumericalValues;IDsAndValuesString;C.X1.BO1.H3-C.X3.BO4-C.X2.BO3.H1-
N.X2.BO2.H1 C.X1.BO1.H3-C.X3.BO4-C.X3.BO4-C.X2.BO3.H1 C.X1.BO1.H3-
C.X3.BO4-C.X3.BO4-O.X1.BO2 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-C.X3.BO4
C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-O.X1.BO2...;
1 1 1 1 1 1 1 1 1 1 1
FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;11;
NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3-C.X3.BO4-C.X2.BO3.H1-
N.X2.BO2.H1 1 C.X1.BO1.H3-C.X3.BO4-C.X3.BO4-C.X2.BO3.H1 1 C.X1.BO1.H3-
C.X3.BO4-C.X3.BO4-O.X1.BO2 1 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-
C.X3.BO4 1 C.X2.BO3.H1-C.X2.BO3.H1-C.X3.BO4-O.X1.BO2 1 C.X2.BO3.H1-
C.X2.BO3.H1-N.X2.BO2.H1-C.X2.BO3.H1 1 C.X2.BO3.H1-C.X3.BO4-C.X3.BO4-
C.X2.BO3.H1 1...
FingerprintsVector;TopologicalPharmacophoreAtomPairs;150;
OrderedNumericalValues;ValuesString;1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0
0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
FingerprintsVector;TopologicalPharmacophoreAtomPairs;150;
OrderedNumericalValues;IDsAndValuesString;H-D1-H H-D1-HBA H-D1-HBD
H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI HBA-D1-PI HBD-D1-HBD
HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D2-H H-D2-HBA ...;
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 ...
FingerprintsVector;TopologicalPharmacophoreAtomTriplets;4960;
OrderedNumericalValues;ValuesString;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
FingerprintsVector;TopologicalPharmacophoreAtomTriplets;4960;
OrderedNumericalValues;IDsAndValuesString;Ar1-Ar1-Ar1 Ar1-Ar1-H1
Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-Ar1-NI1 Ar1-Ar1-PI1 Ar1-H1-H1 ...;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0...
FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes;10;
AlphaNumericalValues;ValuesString;NR0-C.X2.BO3.H1-ATC1:NR1-C.X2.BO3.H1-
ATC1:NR1-C.X3.BO4-ATC1:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC1:NR2-
N.X1.BO1.H2-ATC1 NR0-C.X2.BO3.H1-ATC1:NR1-C.X2.BO3.H1-ATC1:NR1-
C.X3.BO4-ATC1:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC2 NR0-C.X2.BO3.H1-
ATC1:NR1-C.X2.BO3.H1-ATC2:NR2-C.X2.BO3.H1-ATC1:NR2-C.X3.BO4-ATC1...

OPTIONS

-a, --all
List all the available information.

--AverageBitDensity
List average bit density of fingerprint bit-vector strings.

--BitDensity
List bit density of fingerprints bit-vector strings data in each row.

--count
List number of rows containing fingerprints bit-vector or vector strings data. This is default behavior.

-c, --ColMode ColNum | ColLabel
Specify how columns are identified in TextFile(s): using column number or column label. Possible values: ColNum or ColLabel. Default value: ColNum

-d, --detail InfoLevel
Level of information to print about lines being ignored. Default: 1. Possible values: 1, 2 or 3.

--DataCheck
Validate fingerprints data specified using --FingerprintsCol and list information about missing and invalid data.

-e, --empty
List number of rows containing no fingerprints data.

--FingerprintsCol col number | col name
This value is -c, --colmode specific. It corresponds to column in TextFile(s) containing fingerprints data. Possible values: col number or col label. Default value: first column containing the word Fingerprints in its column label.

--FingerprintsType
List types of fingerprint strings: FingerprintsBitVector or FingerprintsVector.

--FingerprintsDescription
List types of fingerprints: PathLengthBits, PathLengthCount, MACCSKeyCount, ExtendedConnectivity and so on.

--FingerprintsSize
List size of fingerprints.

--FingerprintsBitStringFormat
List format of fingerprint bit-vector strings: BinaryString or HexadecimalString.

--FingerprintsBitOrder
List order of bits data in fingerprint bit-vector bit strings: Ascending or Descending.

--FingerprintsVectorValuesType
List type of values in fingerprint vector strings: OrderedNumericalValues, NumericalValues or AlphaNumericalValues.

--FingerprintsVectorValuesFormat
List format of values in fingerprint vector strings: ValuesString, IDsAndValuesString, IDsAndValuesPairsString, ValuesAndIDsString or ValuesAndIDsPairsString.

-h, --help
Print this help message.

--InDelim comma | semicolon
Input delimiter for CSV TextFile(s). Possible values: comma or semicolon. Default value: comma. For TSV files, this option is ignored and tab is used as a delimiter.

--NumOfOnBits
List number of on bits in fingerprints bit-vector strings data in each row.

--NumOfNonZeroValues
List number of non-zero values in fingerprints vector strings data in each row.

-w, --WorkingDir DirName
Location of working directory. Default: current directory.

EXAMPLES

To count number of lines containing fingerprints bit-vector or vector strings data present in a column name containing Fingerprint substring, type:

% InfoFingerprintsTextFiles.pl SampleFPBin.csv
% InfoFingerprintsTextFiles.pl SampleFPHex.csv
% InfoFingerprintsTextFiles.pl SampleFPcount.csv

To list all available information about fingerprints bit-vector or vector strings data present in a column name containing Fingerprint substring, type:

% InfoFingerprintsTextFiles.pl -a SampleFPHex.csv
% InfoFingerprintsTextFiles.pl -a SampleFPcount.csv

To list all available information about fingerprints bit-vector or vector strings data present in a column named Fingerprints , type:

% InfoFingerprintsTextFiles.pl -a --ColMode ColLabel --FingerprintsCol Fingerprints SampleFPHex.csv
% InfoFingerprintsTextFiles.pl -a --ColMode ColLabel --FingerprintsCol Fingerprints SampleFPcount.csv

To list bit density, average bit density, and number of on bits for fingerprints bit-vector strings data present in a column named containing Fingerprint substring, type:

% InfoFingerprintsTextFiles.pl --BitDensity --AverageBitDensity --NumOfOnBits SampleFPBin.csv

To list vector values type, format and number of non-zero values for fingerprints vector strings data present in a column named containing Fingerprint substring along with fingerprints type and description, type:

% InfoFingerprintsTextFiles.pl --FingerprintsType --FingerprintsDescription --FingerprintsVectorValuesType --FingerprintsVectorValuesFormat --NumOfNonZeroValues SampleFPcount.csv

AUTHOR

Manish Sud

SEE ALSO

InfoFingerprintsSDFiles.plSimilarityMatrixSDFiles.plSimilarityMatrixTextFiles.plAtomNeighborhoodsFingerprints.plExtendedConnectivityFingerprints.plMACCSKeysFingerprints.plPathLengthFingerprints.plTopologicalAtomPairsFingerprints.plTopologicalAtomTorsionsFingerprints.plTopologicalPharmacophoreAtomPairsFingerprints.plTopologicalPharmacophoreAtomTripletsFingerprints.pl

COPYRIGHT

Copyright (C) 2004-2010 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

 

 

Previous  TOC  NextJuly 5, 2010InfoFingerprintsTextFiles.pl