NAME
AtomTypesFingerprints
SYNOPSIS
use AtomTypesFingerprints;
use AtomTypesFingerprints qw(:all);
DESCRIPTION
AtomTypesFingerprints class provides the following methods:
new, GenerateFingerprints, SetAtomIdentifierType, SetAtomTypesSetToUse
, SetAtomicInvariantsToUse, SetFunctionalClassesToUse, SetType
, StringifyAtomTypesFingerprints
AtomTypesFingerprints is derived from Fingerprints class which in turn
is derived from ObjectProperty base class that provides methods not explicitly defined
in AtomNeighborhoodsFingerprints, Fingerprints or ObjectProperty classes using Perl's
AUTOLOAD functionality. These methods are generated on-the-fly for a specified object property:
Set<PropertyName>(<PropertyValue>);
$PropertyValue = Get<PropertyName>();
Delete<PropertyName>();
The current release of MayaChemTools supports generation of AtomTypesFingerpritns
corresponding to following AtomIdentifierTypes:
AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes
Based on the values specified for AtomIdentifierType along with other specified
parameters such as AtomicInvariantsToUse and FunctionalClassesToUse, initial
atom types are assigned to all non-hydrogen atoms or all atoms in a molecule.
Using the assigned atom types and specified Type, one of the following types of
fingerprints are generated:
AtomTypesCount - A vector containing count of atom types
AtomTypesBits - A bit vector indicating presence/absence of atom types
For AtomTypesCount fingerprints, two types of atom types set size is allowed:
ArbitrarySize - Corresponds to only atom types detected in molecule
FixedSize - Corresponds to fixed number of atom types previously defined
For AtomTypesBits fingerprints, only FixedSize atom type set is allowed.
ArbitrarySize corresponds to atom types detected in a molecule where as FixedSize implies
a fix number of all possible atom types previously defined for a specific AtomIdentifierType.
Fix number of all possible atom types for supported AtomIdentifierTypes in current release
of MayaChemTools are:
AtomIdentifier Total TotalWithoutHydrogens
DREIDINGAtomTypes 37 34
EStateAtomTypes 109 87
MMFF94AtomTypes 212 171
SLogPAtomTypes 72 67
SYBYLAtomTypes 45 44
TPSAAtomTypes 47 47
UFFAtomTypes 126 124
Combination of Type and AtomTypesSetToUse along with AtomtomIdentifierType
allows generation of following different atom types fingerprints:
Type AtomIdentifierType AtomTypesSetToUse
AtomTypesCount AtomicInvariantsAtomTypes ArbitrarySize
AtomTypesCount DREIDINGAtomTypes ArbitrarySize
AtomTypesCount DREIDINGAtomTypes FixedSize
AtomTypesBits DREIDINGAtomTypes FixedSize
AtomTypesCount EStateAtomTypes ArbitrarySize
AtomTypesCount EStateAtomTypes FixedSize
AtomTypesBits EStateAtomTypes FixedSize
AtomTypesCount FunctionalClassAtomTypes ArbitrarySize
AtomTypesCount MMFF94AtomTypes ArbitrarySize
AtomTypesCount MMFF94AtomTypes FixedSize
AtomTypesBits MMFF94AtomTypes FixedSize
AtomTypesCount SLogPAtomTypes ArbitrarySize
AtomTypesCount SLogPAtomTypes FixedSize
AtomTypesBits SLogPAtomTypes FixedSize
AtomTypesCount SYBYLAtomTypes ArbitrarySize
AtomTypesCount SYBYLAtomTypes FixedSize
AtomTypesBits SYBYLAtomTypes FixedSize
AtomTypesCount TPSAAtomTypes FixedSize
AtomTypesBits TPSAAtomTypes FixedSize
AtomTypesCount UFFAtomTypes ArbitrarySize
AtomTypesCount UFFAtomTypes FixedSize
AtomTypesBits UFFAtomTypes FixedSize
The current release of MayaChemTools generates the following types of atom types
fingerprints bit-vector and vector strings:
FingerprintsVector;AtomTypesCount:AtomicInvariantsAtomTypes;9;Numerical
Values;IDsAndValuesString;C.X1.BO1.H3 C.X2.BO2.H2 C.X3.BO3.H1 C.X3.BO4
N.X1.BO1.H2 N.X1.BO2.H1 N.X2.BO2.H1 O.X1.BO1.H1 O.X1.BO2;3 3 6 3 1 1 2
2 2
FingerprintsVector;AtomTypesCount:FunctionalClassAtomTypes;2;Numerical
Values;IDsAndValuesString;Ar Ar.HBA;4 1
FingerprintsVector;AtomTypesCount:DREIDINGAtomTypes;6;NumericalValues;
IDsAndValuesString;C_2 C_3 N_2 N_3 O_2 O_3;3 12 1 3 2 2
FingerprintsVector;AtomTypesCount:DREIDINGAtomTypes;34;OrderedNumerical
Values;IDsAndValuesString;B_3 B_2 C_3 C_R C_2 C_1 N_3 N_R N_2 N_1 O_3 O
_R O_2 O_1 F_ Al3 Si3 P_3 S_3 Cl Ga3 Ge3 As3 Se3 Br In3 Sn3 Sb3 Te3 I_
Na Ca Fe Zn;0 0 12 0 3 0 3 0 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:DREIDINGAtomTypes;34;BinaryString;
Ascending;0010101010101000000000000000000000000000
FingerprintsVector;AtomTypesCount:EStateAtomTypes;9;NumericalValues;IDs
AndValuesString;dNH dO dssC sCH3 sNH2 sOH ssCH2 ssNH sssCH;1 2 3 3 1 2
3 2 6
FingerprintsVector;AtomTypesCount:EStateAtomTypes;87;OrderedNumerical
Values;IDsAndValuesString;sLi ssBe ssssBem sBH2 ssBH sssB ssssBm sCH3 d
CH2 ssCH2 tCH dsCH aaCH sssCH ddC tsC dssC aasC aaaC ssssC sNH3p sNH2 s
sNH2p dNH ssNH aaNH tN sssNHp dsN aaN sssN ddsN aasN ssssNp sOH dO s...
;0 0 0 0 0 0 0 3 0 3 0 0 0 6 0 0 3 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 0 2
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
FingerprintsBitVector;AtomTypesBits:EStateAtomTypes;87;BinaryString;
Ascending;0000000101000100100001011000000000110000000000000000000000000
000000000000000000000000000
FingerprintsVector;AtomTypesCount:MMFF94AtomTypes;11;NumericalValues;IDs
AndValuesString;C=ON CGD COO CR N=C NC=N NC=O O=CN O=CO OC=O OR;1 1 1 12
1 2 1 1 1 1 1
FingerprintsVector;AtomTypesCount:MMFF94AtomTypes;171;OrderedNumerical
Values;IDsAndValuesString;CR C=C CSP2 C=O C=N CGD C=OR C=ON CONN COO CO
ON COOO C=OS C=S C=SN CSO2 CS=O CSS C=P CSP =C= OR OC=O OC=C OC=N OC=S
ONO2 ON=O OSO3 OSO2 OSO OS=O -OS OPO3 OPO2 OPO -OP -O- O=C O=CN O=CR O=
;12 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 ...
FingerprintsBitVector;AtomTypesBits:MMFF94AtomTypes;171;BinaryString;
Ascending;1000010101000000000001100000000000000001010000101000000000000
00000000000000000000000000000000000001000000000000000000000000000000000
0000000000000000000000000000000000000000000
FingerprintsVector;AtomTypesCount:SLogPAtomTypes;9;NumericalValues;IDs
AndValuesString;C1 C2 C5 CS N1 N2 N5 O2 O9;6 3 3 3 1 2 1 2 2
FingerprintsVector;AtomTypesCount:SLogPAtomTypes;67;OrderedNumerical
Values;IDsAndValuesString;C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C1
4 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 CS N1 N2 N3 N4 N5
N6 N7 N8 N9 N10 N11 N12 N13 N14 NS O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O1...
;6 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 1 0 0
0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:SLogPAtomTypes;67;BinaryString;
Ascending;1100100000000000000000000001110010000000000010000001000000000
00000000000
FingerprintsVector;AtomTypesCount:SYBYLAtomTypes;8;NumericalValues;IDs
AndValuesString;C.2 C.3 C.cat N.am N.pl3 O.2 O.3 O.co2;2 12 1 1 3 1 1 2
FingerprintsVector;AtomTypesCount:SYBYLAtomTypes;44;OrderedNumerical
Values;IDsAndValuesString;C.3 C.2 C.1 C.ar C.cat N.3 N.2 N.1 N.ar N.am
N.pl3 N.4 O.3 O.2 O.co2 S.3 S.2 S.o S.o2 P.3 F Cl Br I ANY HAL HET Li N
a Mg Al Si K Ca Cr.th Cr.oh Mn Fe Co.oh Cu Zn Se Mo Sn;12 2 0 0 1 0 0 0
0 1 3 0 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:SYBYLAtomTypes;44;BinaryString;
Ascending;110010000110111000000000000000000000000000000000
FingerprintsVector;AtomTypesCount:TPSAAtomTypes;47;OrderedNumerical
Values;IDsAndValuesString;N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N1
4 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N O1 O2 O3 O4 O5 O6 O
S1 S2 S3 S4 S5 S6 S7 S P1 P2 P3 P4 P;0 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:TPSAAtomTypes;47;BinaryString;
Ascending;000000101100000000000000000001100000000000000000
FingerprintsVector;AtomTypesCount:UFFAtomTypes;6;NumericalValues;IDsAnd
ValuesString;C_2 C_3 N_2 N_3 O_2 O_3;3 12 1 3 2 2
FingerprintsVector;AtomTypesCount:UFFAtomTypes;124;OrderedNumerical
Values;IDsAndValuesString;He4+4 Li Be3+2 B_3 B_2 C_3 C_R C_2 C_1 N_3 N_
R N_2 N_1 O_3 O_3_z O_R O_2 O_1 F_ Ne4+4 Na Mg3+2 Al3 Si3 P_3+3 P_3+5 P
_3+q S_3+2 S_3+4 S_3+6 S_R S_2 Cl Ar4+4 K_ Ca6+2 Sc3+3 Ti3+4 Ti6+4 V_3+
;0 0 0 0 0 12 0 3 0 3 0 1 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
METHODS
- new
-
$NewAtomTypesFingerprints = new AtomTypesFingerprints(%NamesAndValues);
-
Using specified AtomTypesFingerprints property names and values hash, new method creates a new object
and returns a reference to newly created PathLengthFingerprints object. By default, the
following properties are initialized:
-
Molecule = '';
Type = ''
AtomIdentifierType = ''
AtomTypesSetToUse = ''
IgnoreHydrogens = 1
AtomicInvariantsToUse = ['AS', 'X', 'BO', 'H', 'FC', 'MN']
FunctionalClassesToUse = ['HBD', 'HBA', 'PI', 'NI', 'Ar', 'Hal']
-
Examples:
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesCount',
'AtomIdentifierType' =>
'AtomicInvariantsAtomTypes');
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesCount',
'AtomIdentifierType' =>
'AtomicInvariantsAtomTypes',
'AtomicInvariantsToUse' =>
['AS', 'X', 'BO', 'H', 'FC'] );
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesCount',
'AtomIdentifierType' =>
'DREIDINGAtomTypes');
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesCount',
'AtomIdentifierType' =>
'EStateAtomTypes',
'AtomTypesSetToUse' =>
'ArbitrarySize');
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesCount',
'AtomIdentifierType' =>
'SLogPAtomTypes',
'AtomTypesSetToUse' =>
'FixedSize');
-
$AtomTypesFingerprints = new AtomTypesFingerprints(
'Molecule' => $Molecule,
'Type' => 'AtomTypesBits',
'AtomIdentifierType' =>
'MMFF94AtomTypes',
'AtomTypesSetToUse' =>
'FixedSize');
-
$AtomTypesFingerprints->GenerateFingerprints();
print "$AtomTypesFingerprints\n";
- GenerateFingerprints
-
$AtomTypesFingerprints->GenerateFingerprints();
-
Generates atom types fingerprints and returns AtomTypesFingerprints.
- SetAtomIdentifierType
-
$AtomTypesFingerprints->SetAtomIdentifierType($IdentifierType);
-
Sets atom IdentifierType to use during atom types fingerprints generation and
returns AtomTypesFingerprints.
-
Possible values: AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes,
TPSAAtomTypes, UFFAtomTypes.
- SetAtomTypesSetToUse
-
$AtomTypesFingerprints->SetAtomTypesSetToUse($Value);
-
Sets Value of AtomTypesSetToUse and returns AtomTypesFingerprints. Possible
values: ArbitrarySize or FixedSize. Default for AtomTypesCount value of
AtomTypesSetToUse: ArbitrarySize.
- SetAtomicInvariantsToUse
-
$AtomTypesFingerprints->SetAtomicInvariantsToUse($ValuesRef);
$AtomTypesFingerprints->SetAtomicInvariantsToUse(@Values);
-
Sets atomic invariants to use during AtomicInvariantsAtomTypes value of AtomIdentifierType
for atom neighborhood fingerprints generation and returns AtomTypesFingerprints.
-
Possible values for atomic invariants are: AS, X, BO, LBO, SB, DB, TB,
H, Ar, RA, FC, MN, SM. Default value: AS,X,BO,H,FC.
-
The atomic invariants abbreviations correspond to:
-
AS = Atom symbol corresponding to element symbol
-
X<n> = Number of non-hydrogen atom neighbors or heavy atoms
BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
H<n> = Number of implicit and explicit hydrogens for atom
Ar = Aromatic annotation indicating whether atom is aromatic
RA = Ring atom annotation indicating whether atom is a ring
FC<+n/-n> = Formal charge assigned to atom
MN<n> = Mass number indicating isotope other than most abundant isotope
SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
3 (triplet)
-
Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class corresponds to:
-
AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
-
Except for AS which is a required atomic invariant in atom types, all other atomic invariants are
optional. Atom type specification doesn't include atomic invariants with zero or undefined values.
-
In addition to usage of abbreviations for specifying atomic invariants, the following descriptive words
are also allowed:
-
X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
H : NumOfImplicitAndExplicitHydrogens
Ar : Aromatic
RA : RingAtom
FC : FormalCharge
MN : MassNumber
SM : SpinMultiplicity
-
AtomTypes::AtomicInvariantsAtomTypes module is used to assign atomic invariant
atom types.
- SetFunctionalClassesToUse
-
$AtomTypesFingerprints->SetFunctionalClassesToUse($ValuesRef);
$AtomTypesFingerprints->SetFunctionalClassesToUse(@Values);
-
Sets functional classes invariants to use during FunctionalClassAtomTypes value of AtomIdentifierType
for atom types fingerprints generation and returns AtomTypesFingerprints.
-
Possible values for atom functional classes are: Ar, CA, H, HBA, HBD, Hal, NI, PI, RA.
Default value [ Ref 24 ]: HBD,HBA,PI,NI,Ar,Hal.
-
The functional class abbreviations correspond to:
-
HBD: HydrogenBondDonor
HBA: HydrogenBondAcceptor
PI : PositivelyIonizable
NI : NegativelyIonizable
Ar : Aromatic
Hal : Halogen
H : Hydrophobic
RA : RingAtom
CA : ChainAtom
-
Functional class atom type specification for an atom corresponds to:
-
Ar.CA.H.HBA.HBD.Hal.NI.PI.RA or None
-
AtomTypes::FunctionalClassAtomTypes module is used to assign functional class atom
types. It uses following definitions [ Ref 60-61, Ref 65-66 ]:
-
HydrogenBondDonor: NH, NH2, OH
HydrogenBondAcceptor: N[!H], O
PositivelyIonizable: +, NH2
NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
- SetType
-
$AtomTypesFingerprints->SetType($Type);
-
Sets type of AtomTypes fingerpritns and returns AtomTypesFingerprints. Possible values: AtomTypesFingerprintsBits or
AtomTypesFingerprintsCount.
- StringifyAtomTypesFingerprints
-
$String = $AtomTypesFingerprints->StringifyAtomTypesFingerprints();
-
Returns a string containing information about AtomTypesFingerprints object.
AUTHOR
Manish Sud
SEE ALSO
Fingerprints.pm, FingerprintsStringUtil.pm, AtomNeighborhoodsFingerprints.pm, 
EStateIndiciesFingerprints.pm, ExtendedConnectivityFingerprints.pm, MACCSKeys.pm, 
PathLengthFingerprints.pm, TopologicalAtomPairsFingerprints.pm, TopologicalAtomTripletsFingerprints.pm, 
TopologicalAtomTorsionsFingerprints.pm, TopologicalPharmacophoreAtomPairsFingerprints.pm, 
TopologicalPharmacophoreAtomTripletsFingerprints.pm
COPYRIGHT
Copyright (C) 2004-2010 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option)
any later version.