![]() |
ExtractFromTextFiles.pl - Extract specific data from TextFile(s)
ExtractFromTextFiles.pl TextFile(s)...
ExtractFromTextFiles.pl [-c, --colmode colnum | collabel] [--categorycol number | string] [--columns ''colnum,[colnum]...'' | ''collabel,[collabel]...''] [-h, --help] [--indelim comma | semicolon] [-m, --mode columns | rows | categories] [-o, --overwrite] [--outdelim comma | tab | semicolon] [-q, --quote yes | no] [--rows ''colid,value,criteria...'' | ''colid,value...'' | ''colid,mincolvalue,maxcolvalue'' | ''rownum,rownum,...'' | colid | ''minrownum,maxrownum''] [ --rowsmode rowsbycolvalue | rowsbycolvaluelist | rowsbycolvaluerange | rowbymincolvalue | rowbymaxcolvalue | rownums | rownumrange] [-r, --root rootname] [-w, --workingdir dirname] TextFile(s)...
Extract column(s)/row(s) data from TextFile(s) identified by column numbers or labels. Or categorize data using a specified column category. During categorization, a summary text file is generated containing category name and count; an additional text file, containing data for for each category, is also generated. The file names are separated by space. The valid file extensions are .csv and .tsv for comma/semicolon and tab delimited text files respectively. All other file names are ignored. All the text files in a current directory can be specified by *.csv, *.tsv, or the current directory name. The --indelim option determines the format of TextFile(s). Any file which doesn't correspond to the format indicated by --indelim option is ignored.
For colnum value of -c, --colmode option, input value is a column number. Example: 1
For collabel value of -c, --colmode option, input value is a column label. Example: Mol_ID
For colnum value of -c, --colmode option, input values format is: colnum,colnum,.... Example: 1,3,5
For collabel value of -c, --colmode option, input values format is: collabel,collabel,... Example: Mol_ID,MolWeight
For columns mode, data for appropriate columns specified by --columns option is extracted from TextFile(s) and placed into new text files.
For rows mode, appropriate rows specified in conjuction with --rowsmode and rows options are extracted from TextFile(s) and placed into new text files.
For categories mode, coulmn specified by --categorycol is used to categorize data, and a summary text file is generated containing category name and count; an additional text file, containing data for for each category, is also generated.
This option is ignored for multiple input files.
First line containing column labels is always written out. And value comparisons assume numerical column data.
For rowsbycolvalue mode, input value format contains these triplets: colid,value, criteria.... Possible values for criteria: le, ge or eq. Examples:
For rowsbycolvaluelist mode, input value format is: colid,value.... Examples:
For rowsbycolvaluerange mode, input value format is: colid,mincolvalue,maxcolvalue. Examples:
For rowbymincolvalue, rowbymaxcolvalue modes, input value format is: colid.
For rownum mode, input value format is: rownum. Default value: 2
For rownumrange mode, input value format is: minrownum, maxrownum. Examples:
Use --rows option to list rows criterion used for extraction of rows from TextFile(s).
To extract first column from a text file and generate a new CSV text file NewSample1.csv, type:
To extract columns Mol_ID, MolWeight, and NAME from Sample1.csv and generate a new textfile NewSample1.tsv with no quotes, type:
To extract rows containing values for MolWeight column of less than 450 from Sample1.csv and generate a new textfile NewSample1.csv, type:
To extract rows containing values for MolWeight column between 400 and 500 from Sample1.csv and generate a new textfile NewSample1.csv, type:
To extract a row containing minimum value for column MolWeight from Sample1.csv and generate a new textfile NewSample1.csv, type:
JoinTextFiles.pl, MergeTextFilesWithSD.pl, ModifyTextFilesFormat.pl, SplitTextFiles.pl
Copyright (C) 2004-2008 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.