MayaChemTools

Previous  TOC  NextFingerprints::AtomTypesFingerprints.pmCode | PDF | PDFGreen | PDFA4 | PDFA4Green

NAME

AtomTypesFingerprints

SYNOPSIS

use AtomTypesFingerprints;

use AtomTypesFingerprints qw(:all);

DESCRIPTION

AtomTypesFingerprints class provides the following methods:

new, GenerateFingerprints, SetAtomIdentifierType, SetAtomTypesSetToUse , SetAtomicInvariantsToUse, SetFunctionalClassesToUse, SetType , StringifyAtomTypesFingerprints

AtomTypesFingerprints is derived from Fingerprints class which in turn is derived from ObjectProperty base class that provides methods not explicitly defined in AtomNeighborhoodsFingerprints, Fingerprints or ObjectProperty classes using Perl's AUTOLOAD functionality. These methods are generated on-the-fly for a specified object property:

Set<PropertyName>(<PropertyValue>);
$PropertyValue = Get<PropertyName>();
Delete<PropertyName>();

The current release of MayaChemTools supports generation of AtomTypesFingerpritns corresponding to following AtomIdentifierTypes:

AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes

Based on the values specified for AtomIdentifierType along with other specified parameters such as AtomicInvariantsToUse and FunctionalClassesToUse, initial atom types are assigned to all non-hydrogen atoms or all atoms in a molecule.

Using the assigned atom types and specified Type, one of the following types of fingerprints are generated:

AtomTypesCount - A vector containing count of atom types
AtomTypesBits - A bit vector indicating presence/absence of atom types

For AtomTypesCount fingerprints, two types of atom types set size is allowed:

ArbitrarySize - Corresponds to only atom types detected in molecule
FixedSize - Corresponds to fixed number of atom types previously defined

For AtomTypesBits fingerprints, only FixedSize atom type set is allowed.

ArbitrarySize corresponds to atom types detected in a molecule where as FixedSize implies a fix number of all possible atom types previously defined for a specific AtomIdentifierType.

Fix number of all possible atom types for supported AtomIdentifierTypes in current release of MayaChemTools are:

AtomIdentifier Total TotalWithoutHydrogens
DREIDINGAtomTypes 37 34
EStateAtomTypes 109 87
MMFF94AtomTypes 212 171
SLogPAtomTypes 72 67
SYBYLAtomTypes 45 44
TPSAAtomTypes 47 47
UFFAtomTypes 126 124

Combination of Type and AtomTypesSetToUse along with AtomtomIdentifierType allows generation of following different atom types fingerprints:

Type AtomIdentifierType AtomTypesSetToUse
AtomTypesCount AtomicInvariantsAtomTypes ArbitrarySize
AtomTypesCount DREIDINGAtomTypes ArbitrarySize
AtomTypesCount DREIDINGAtomTypes FixedSize
AtomTypesBits DREIDINGAtomTypes FixedSize
AtomTypesCount EStateAtomTypes ArbitrarySize
AtomTypesCount EStateAtomTypes FixedSize
AtomTypesBits EStateAtomTypes FixedSize
AtomTypesCount FunctionalClassAtomTypes ArbitrarySize
AtomTypesCount MMFF94AtomTypes ArbitrarySize
AtomTypesCount MMFF94AtomTypes FixedSize
AtomTypesBits MMFF94AtomTypes FixedSize
AtomTypesCount SLogPAtomTypes ArbitrarySize
AtomTypesCount SLogPAtomTypes FixedSize
AtomTypesBits SLogPAtomTypes FixedSize
AtomTypesCount SYBYLAtomTypes ArbitrarySize
AtomTypesCount SYBYLAtomTypes FixedSize
AtomTypesBits SYBYLAtomTypes FixedSize
AtomTypesCount TPSAAtomTypes FixedSize
AtomTypesBits TPSAAtomTypes FixedSize
AtomTypesCount UFFAtomTypes ArbitrarySize
AtomTypesCount UFFAtomTypes FixedSize
AtomTypesBits UFFAtomTypes FixedSize

The current release of MayaChemTools generates the following types of atom types fingerprints bit-vector and vector strings:

FingerprintsVector;AtomTypesCount:AtomicInvariantsAtomTypes;9;Numerical
Values;IDsAndValuesString;C.X1.BO1.H3 C.X2.BO2.H2 C.X3.BO3.H1 C.X3.BO4
N.X1.BO1.H2 N.X1.BO2.H1 N.X2.BO2.H1 O.X1.BO1.H1 O.X1.BO2;3 3 6 3 1 1 2
2 2
FingerprintsVector;AtomTypesCount:FunctionalClassAtomTypes;2;Numerical
Values;IDsAndValuesString;Ar Ar.HBA;4 1
FingerprintsVector;AtomTypesCount:DREIDINGAtomTypes;6;NumericalValues;
IDsAndValuesString;C_2 C_3 N_2 N_3 O_2 O_3;3 12 1 3 2 2
FingerprintsVector;AtomTypesCount:DREIDINGAtomTypes;34;OrderedNumerical
Values;IDsAndValuesString;B_3 B_2 C_3 C_R C_2 C_1 N_3 N_R N_2 N_1 O_3 O
_R O_2 O_1 F_ Al3 Si3 P_3 S_3 Cl Ga3 Ge3 As3 Se3 Br In3 Sn3 Sb3 Te3 I_
Na Ca Fe Zn;0 0 12 0 3 0 3 0 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:DREIDINGAtomTypes;34;BinaryString;
Ascending;0010101010101000000000000000000000000000
FingerprintsVector;AtomTypesCount:EStateAtomTypes;9;NumericalValues;IDs
AndValuesString;dNH dO dssC sCH3 sNH2 sOH ssCH2 ssNH sssCH;1 2 3 3 1 2
3 2 6
FingerprintsVector;AtomTypesCount:EStateAtomTypes;87;OrderedNumerical
Values;IDsAndValuesString;sLi ssBe ssssBem sBH2 ssBH sssB ssssBm sCH3 d
CH2 ssCH2 tCH dsCH aaCH sssCH ddC tsC dssC aasC aaaC ssssC sNH3p sNH2 s
sNH2p dNH ssNH aaNH tN sssNHp dsN aaN sssN ddsN aasN ssssNp sOH dO s...
;0 0 0 0 0 0 0 3 0 3 0 0 0 6 0 0 3 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 0 2
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
FingerprintsBitVector;AtomTypesBits:EStateAtomTypes;87;BinaryString;
Ascending;0000000101000100100001011000000000110000000000000000000000000
000000000000000000000000000
FingerprintsVector;AtomTypesCount:MMFF94AtomTypes;11;NumericalValues;IDs
AndValuesString;C=ON CGD COO CR N=C NC=N NC=O O=CN O=CO OC=O OR;1 1 1 12
1 2 1 1 1 1 1
FingerprintsVector;AtomTypesCount:MMFF94AtomTypes;171;OrderedNumerical
Values;IDsAndValuesString;CR C=C CSP2 C=O C=N CGD C=OR C=ON CONN COO CO
ON COOO C=OS C=S C=SN CSO2 CS=O CSS C=P CSP =C= OR OC=O OC=C OC=N OC=S
ONO2 ON=O OSO3 OSO2 OSO OS=O -OS OPO3 OPO2 OPO -OP -O- O=C O=CN O=CR O=
;12 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 ...
FingerprintsBitVector;AtomTypesBits:MMFF94AtomTypes;171;BinaryString;
Ascending;1000010101000000000001100000000000000001010000101000000000000
00000000000000000000000000000000000001000000000000000000000000000000000
0000000000000000000000000000000000000000000
FingerprintsVector;AtomTypesCount:SLogPAtomTypes;9;NumericalValues;IDs
AndValuesString;C1 C2 C5 CS N1 N2 N5 O2 O9;6 3 3 3 1 2 1 2 2
FingerprintsVector;AtomTypesCount:SLogPAtomTypes;67;OrderedNumerical
Values;IDsAndValuesString;C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C1
4 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 CS N1 N2 N3 N4 N5
N6 N7 N8 N9 N10 N11 N12 N13 N14 NS O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O1...
;6 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 1 0 0
0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:SLogPAtomTypes;67;BinaryString;
Ascending;1100100000000000000000000001110010000000000010000001000000000
00000000000
FingerprintsVector;AtomTypesCount:SYBYLAtomTypes;8;NumericalValues;IDs
AndValuesString;C.2 C.3 C.cat N.am N.pl3 O.2 O.3 O.co2;2 12 1 1 3 1 1 2
FingerprintsVector;AtomTypesCount:SYBYLAtomTypes;44;OrderedNumerical
Values;IDsAndValuesString;C.3 C.2 C.1 C.ar C.cat N.3 N.2 N.1 N.ar N.am
N.pl3 N.4 O.3 O.2 O.co2 S.3 S.2 S.o S.o2 P.3 F Cl Br I ANY HAL HET Li N
a Mg Al Si K Ca Cr.th Cr.oh Mn Fe Co.oh Cu Zn Se Mo Sn;12 2 0 0 1 0 0 0
0 1 3 0 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:SYBYLAtomTypes;44;BinaryString;
Ascending;110010000110111000000000000000000000000000000000
FingerprintsVector;AtomTypesCount:TPSAAtomTypes;47;OrderedNumerical
Values;IDsAndValuesString;N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N1
4 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N O1 O2 O3 O4 O5 O6 O
S1 S2 S3 S4 S5 S6 S7 S P1 P2 P3 P4 P;0 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FingerprintsBitVector;AtomTypesBits:TPSAAtomTypes;47;BinaryString;
Ascending;000000101100000000000000000001100000000000000000
FingerprintsVector;AtomTypesCount:UFFAtomTypes;6;NumericalValues;IDsAnd
ValuesString;C_2 C_3 N_2 N_3 O_2 O_3;3 12 1 3 2 2
FingerprintsVector;AtomTypesCount:UFFAtomTypes;124;OrderedNumerical
Values;IDsAndValuesString;He4+4 Li Be3+2 B_3 B_2 C_3 C_R C_2 C_1 N_3 N_
R N_2 N_1 O_3 O_3_z O_R O_2 O_1 F_ Ne4+4 Na Mg3+2 Al3 Si3 P_3+3 P_3+5 P
_3+q S_3+2 S_3+4 S_3+6 S_R S_2 Cl Ar4+4 K_ Ca6+2 Sc3+3 Ti3+4 Ti6+4 V_3+
;0 0 0 0 0 12 0 3 0 3 0 1 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...

METHODS

new
$NewAtomTypesFingerprints = new AtomTypesFingerprints(%NamesAndValues);

Using specified AtomTypesFingerprints property names and values hash, new method creates a new object and returns a reference to newly created PathLengthFingerprints object. By default, the following properties are initialized:

Molecule = '';
Type = ''
AtomIdentifierType = ''
AtomTypesSetToUse = ''
IgnoreHydrogens = 1
AtomicInvariantsToUse = ['AS', 'X', 'BO', 'H', 'FC', 'MN']
FunctionalClassesToUse = ['HBD', 'HBA', 'PI', 'NI', 'Ar', 'Hal']

Examples:

$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesCount', 'AtomIdentifierType' => 'AtomicInvariantsAtomTypes');
$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesCount', 'AtomIdentifierType' => 'AtomicInvariantsAtomTypes', 'AtomicInvariantsToUse' => ['AS', 'X', 'BO', 'H', 'FC'] );
$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesCount', 'AtomIdentifierType' => 'DREIDINGAtomTypes');
$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesCount', 'AtomIdentifierType' => 'EStateAtomTypes', 'AtomTypesSetToUse' => 'ArbitrarySize');
$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesCount', 'AtomIdentifierType' => 'SLogPAtomTypes', 'AtomTypesSetToUse' => 'FixedSize');
$AtomTypesFingerprints = new AtomTypesFingerprints( 'Molecule' => $Molecule, 'Type' => 'AtomTypesBits', 'AtomIdentifierType' => 'MMFF94AtomTypes', 'AtomTypesSetToUse' => 'FixedSize');
$AtomTypesFingerprints->GenerateFingerprints();
print "$AtomTypesFingerprints\n";
GenerateFingerprints
$AtomTypesFingerprints->GenerateFingerprints();

Generates atom types fingerprints and returns AtomTypesFingerprints.

SetAtomIdentifierType
$AtomTypesFingerprints->SetAtomIdentifierType($IdentifierType);

Sets atom IdentifierType to use during atom types fingerprints generation and returns AtomTypesFingerprints.

Possible values: AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes, FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes.

SetAtomTypesSetToUse
$AtomTypesFingerprints->SetAtomTypesSetToUse($Value);

Sets Value of AtomTypesSetToUse and returns AtomTypesFingerprints. Possible values: ArbitrarySize or FixedSize. Default for AtomTypesCount value of AtomTypesSetToUse: ArbitrarySize.

SetAtomicInvariantsToUse
$AtomTypesFingerprints->SetAtomicInvariantsToUse($ValuesRef);
$AtomTypesFingerprints->SetAtomicInvariantsToUse(@Values);

Sets atomic invariants to use during AtomicInvariantsAtomTypes value of AtomIdentifierType for atom neighborhood fingerprints generation and returns AtomTypesFingerprints.

Possible values for atomic invariants are: AS, X, BO, LBO, SB, DB, TB, H, Ar, RA, FC, MN, SM. Default value: AS,X,BO,H,FC.

The atomic invariants abbreviations correspond to:

AS = Atom symbol corresponding to element symbol
X<n> = Number of non-hydrogen atom neighbors or heavy atoms
BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
H<n> = Number of implicit and explicit hydrogens for atom
Ar = Aromatic annotation indicating whether atom is aromatic
RA = Ring atom annotation indicating whether atom is a ring
FC<+n/-n> = Formal charge assigned to atom
MN<n> = Mass number indicating isotope other than most abundant isotope
SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or 3 (triplet)

Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class corresponds to:

AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>

Except for AS which is a required atomic invariant in atom types, all other atomic invariants are optional. Atom type specification doesn't include atomic invariants with zero or undefined values.

In addition to usage of abbreviations for specifying atomic invariants, the following descriptive words are also allowed:

X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
H : NumOfImplicitAndExplicitHydrogens
Ar : Aromatic
RA : RingAtom
FC : FormalCharge
MN : MassNumber
SM : SpinMultiplicity

AtomTypes::AtomicInvariantsAtomTypes module is used to assign atomic invariant atom types.

SetFunctionalClassesToUse
$AtomTypesFingerprints->SetFunctionalClassesToUse($ValuesRef);
$AtomTypesFingerprints->SetFunctionalClassesToUse(@Values);

Sets functional classes invariants to use during FunctionalClassAtomTypes value of AtomIdentifierType for atom types fingerprints generation and returns AtomTypesFingerprints.

Possible values for atom functional classes are: Ar, CA, H, HBA, HBD, Hal, NI, PI, RA. Default value [ Ref 24 ]: HBD,HBA,PI,NI,Ar,Hal.

The functional class abbreviations correspond to:

HBD: HydrogenBondDonor
HBA: HydrogenBondAcceptor
PI : PositivelyIonizable
NI : NegativelyIonizable
Ar : Aromatic
Hal : Halogen
H : Hydrophobic
RA : RingAtom
CA : ChainAtom
Functional class atom type specification for an atom corresponds to:
Ar.CA.H.HBA.HBD.Hal.NI.PI.RA or None

AtomTypes::FunctionalClassAtomTypes module is used to assign functional class atom types. It uses following definitions [ Ref 60-61, Ref 65-66 ]:

HydrogenBondDonor: NH, NH2, OH
HydrogenBondAcceptor: N[!H], O
PositivelyIonizable: +, NH2
NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
SetType
$AtomTypesFingerprints->SetType($Type);

Sets type of AtomTypes fingerpritns and returns AtomTypesFingerprints. Possible values: AtomTypesFingerprintsBits or AtomTypesFingerprintsCount.

StringifyAtomTypesFingerprints
$String = $AtomTypesFingerprints->StringifyAtomTypesFingerprints();

Returns a string containing information about AtomTypesFingerprints object.

AUTHOR

Manish Sud

SEE ALSO

Fingerprints.pmFingerprintsStringUtil.pmAtomNeighborhoodsFingerprints.pmEStateIndiciesFingerprints.pmExtendedConnectivityFingerprints.pmMACCSKeys.pmPathLengthFingerprints.pmTopologicalAtomPairsFingerprints.pmTopologicalAtomTripletsFingerprints.pmTopologicalAtomTorsionsFingerprints.pmTopologicalPharmacophoreAtomPairsFingerprints.pmTopologicalPharmacophoreAtomTripletsFingerprints.pm

COPYRIGHT

Copyright (C) 2004-2010 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

 

 

Previous  TOC  NextAugust 29, 2010Fingerprints::AtomTypesFingerprints.pm