![]() |
ExtractFromSDFiles.pl - Extract specific data from SDFile(s)
ExtractFromSDFiles.pl SDFile(s)...
ExtractFromSDFiles.pl [-h, --help] [-d, --datafields ''fieldlabel,...'' | ''fieldlabel,value,criteria...'' | ''fieldlabel,value,value...''] [--datafieldsfile filename] [--indelim comma | tab | semicolon] [-m, --mode alldatafields | commondatafields | datafields | datafieldsbyvalue | datafieldbylist | datafielduniquebylist | molnames | randomcmpds | recordnum | recordsrange] [-n, --numofcmpds number] [--outdelim comma | tab | semicolon] [--output SD | text | both] [-o, --overwrite] [-q, --quote yes | no] [--record recnum | startrecnum,endrecnum] [-r, --root rootname] [-s, --seed number] [-v, --violations- number] [-w, --workingdir dirname] SDFile(s)...
Extract specific data from SDFile(s) and generate appropriate SD or CSV/TSV text file(s). The structure data from SDFile(s) is not transferred to CSV/TSV text file(s). Multiple SDFile names are separated by spaces. The valid file extensions are .sdf and .sd. All other file names are ignored. All the SD files in a current directory can be specified either by *.sdf or the current directory name.
For datafields mode, input value format is: fieldlabel,.... Examples:
For datafieldsbyvalue mode, input value format contains these triplets: fieldlabel,value, criteria.... Possible values for criteria: le, ge or eq. Examples:
For datafieldbylist and datafielduniquebylist mode, input value format is: fieldlabel,value1,value2.... This is equivalent to datafieldsbyvalue mode with this input value format:fieldlabel,value1,eq,fieldlabel,value2,eq,.... For datafielduniquebylist mode, only unique compounds identified by first occurrence of value associated with fieldlabel in SDFile(s) are kept; any subsequent compounds are simply ignored
For datafields mode, input file lines contain comma delimited field labels: fieldlabel,.... Example:
For datafieldsbyvalue mode, input file lines contains these comma separated triplets: fieldlabel,value, criteria. Possible values for criteria: le, ge or eq. Examples:
For datafieldbylist and datafielduniquebylist mode, input file line format is:
For datafielduniquebylist mode, only unique compounds identified by first occurrence of value associated with fieldlabel in SDFile(s) are kept; any subsequent compounds are simply ignored. Example:
For alldatafields and molnames mode, only a CSV/TSV text file is generated; for all other modes, however, a SD file is generated by default - you can change the behavior to genereate text file using --output option.
To retrieve all data fields from SD files and generate CSV text files, type:
To retrieve common data fields which exists for all the compounds in a SD file and generate a TSV text file NewSample.tsv, type:
To retrieve MolId, ExtReg, and CompoundName data field from a SD file and generate a CSV text file NewSample.csv, type:
To retrieve compounds from a SD which meet a specific set of criteria - MolWt <= 450, LogP <= 5 and SumNO < 10 - from a SD file and generate a new SD file NewSample.sdf, type:
To retrive compounds from a SD file with a specific set of values for MolID and generate a new SD file NewSample.sdf, type:
To retrive 10 random compounds from a SD file and generate a new SD file RandomSample.sdf, type:
To retrive compound record number 10 from a SD file and generate a new SD file NewSample.sdf, type:
To retrive compound records between 10 to 20 from SD file and generate a new SD file NewSample.sdf, type:
FilterSDFiles.pl, InfoSDFiles.pl, SplitSDFiles.pl, MergeTextFilesWithSD.pl
Copyright (C) 2004-2008 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.