MayaChemTools

Previous  TOC  NextMergeTextFilesWithSD.plCode | PDF | PDFGreen | PDFA4 | PDFA4Green

NAME

MergeTextFilesWithSD.pl - Merge CSV or TSV TextFile(s) into SDFile

SYNOPSIS

MergeTextFilesWithSD.pl SDFile TextFile(s)...

MergeTextFilesWithSD.pl [-h, --help] [--indelim comma | semicolon] [-c, --columns colnum,...;... | collabel,...;...] [-k, --keys colkeynum;... | colkeylabel;...] [-m, --mode colnum | collabel] [-o, --overwrite] [-r, --root rootname] [-s, --sdkey sdfieldname] [-w, --workingdir dirname] SDFile TextFile(s)...

DESCRIPTION

Merge multiple CSV or TSV TextFile(s) into SDFile. Unless -k --keys option is used, data rows from all TextFile(s) are added to SDFile in a sequential order, and the number of compounds in SDFile is used to determine how many rows of data are added from TextFile(s).

Multiple TextFile(s) names are separated by spaces. The valid file extensions are .csv and .tsv for comma/semicolon and tab delimited text files respectively. All other file names are ignored. All the text files in a current directory can be specified by *.csv, *.tsv, or the current directory name. The --indelim option determines the format of TextFile(s). Any file which doesn't correspond to the format indicated by --indelim option is ignored.

OPTIONS

-h, --help
Print this help message

--indelim comma | semicolon
Input delimiter for CSV TextFile(s). Possible values: comma or semicolon. Default value: comma. For TSV files, this option is ignored and tab is used as a delimiter.

-c, --columns colnum,...;... | collabel,...;...
This value is mode specific. It is a list of columns to merge into SDFile specified by column numbers or labels for each text file delimited by '';''. All TextFile(s) are merged into SDFile.

Default value: all;all;.... By default, all columns from TextFile(s) are merged into SDFile.

For colnum mode, input value format is: colnum,...;colnum,...;.... Example:

"1,2;1,3,4;7,8,9"

For collabel mode, input value format is: collabel,...;collabel,...;.... Example:

"MW,SumNO;SumNHOH,ClogP,PSA;MolName,Mol_Id,Extreg"

-k, --keys colkeynum;... | colkeylabel;...
This value is mode specific. It specifies column keys to use for merging TextFile(s) into SDFile. The column keys, delimited by '';'', are specified by column numbers or labels for TextFile(s).

By default, data rows from TextFile(s) are merged into SDFile in the order they appear.

For colnum mode, input value format is:colkeynum, colkeynum;.... Example:

"1;3;7"

For collabel mode, input value format is:colkeylabel, colkeylabel;.... Example:

"Mol_Id;Mol_Id;Cmpd_Id"

-m, --mode colnum | collabel
Specify how to merge TextFile(s) into SDFile: using column numbers or column labels. Possible values: colnum or collabel. Default value: colnum

-o, --overwrite
Overwrite existing files

-r, --root rootname
New SD file name is generated using the root: <Root>.sdf. Default file name: <InitialSDFileName>MergedWith<FirstTextFileName>1To<Count>.sdf

-s, --sdkey sdfieldname
SDFile data field name used as a key to merge data from TextFile(s). By default, data rows from TextFile(s) are merged into SDFile in the order they appear.

-w, --workingdir dirname
Location of working directory. Default: current directory

EXAMPLES

To merge Sample1.csv and Sample2.csv into Sample.sdf and generate NewSample.sdf, type:

% MergeTextFileswithSD.pl -r NewSample -o Sample.sdf Sample1.csv Sample2.csv

To merge all Sample*.tsv into Sample.sdf and generate NewSample.sdf file, type:

% MergeTextFilesWithSD.pl -r NewSample --indelim tab -o Sample.sdf Sample*.tsv

To merge column numbers ''1,2'' and ''3,4,5'' from Sample2.csv and Sample3.csv into Sample.sdf and to generate NewSample.sdf, type:

% MergeTextFilesWithSD.pl -r NewSample -m colnum -c "1,2;3,4,5" -o Sample.sdf Sample1.csv Sample2.csv

To merge column ''Mol_ID,Formula,MolWeight'' and ''Mol_ID,ChemBankID,NAME'' from Sample1.csv and Sample2.csv into Sample.sdf using ''Mol_ID'' as SD and column keys to generate NewSample.sdf, type:

% MergeTextFilesWithSD.pl -r NewSample -s Mol_ID -k "Mol_ID;Mol_ID" -m collabel -c "Mol_ID,Formula,MolWeight;Mol_ID,ChemBankID,NAME" -o Sample1.sdf Sample1.csv Sample2.csv

AUTHOR

Manish Sud

SEE ALSO

ExtractFromSDFiles.plFilterSDFiles.plInfoSDFiles.plJoinSDFiles.plJoinTextFiles.plMergeTextFiles.plModifyTextFilesFormat.plSplitSDFiles.plSplitTextFiles.pl

COPYRIGHT

Copyright (C) 2004-2008 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

 

 

Previous  TOC  NextApril 29, 2008MergeTextFilesWithSD.pl