split-data.pl(1) - Linux man page

Name

split-data.pl - Divide a text file in N approximately equal parts

Synopsis

Splits a given data file into N parts such that each part has approximately same number of lines.

Usage

split-data.pl [Options] DATA

Type 'split-data.pl --help' for a quick summary of the Options.

Input

Required Arguments:

DATA

DATA should be a file in plain text format such that each line in the DATA file shows a single training example.

Optional Arguments:

--parts N

Splits the DATA file into N equal parts. If the DATA file has M lines, each part except the last part will have int(M/N) lines while the last part will have all the remaining lines, M - (N-1 * (int(M/N))).

Default N is 10.

Other Options :

--help

Displays the quick summary of options.

--version

Displays the version information.

Output

split-data.pl creates exactly N files in the current directory. If the name of the DATA file is say DATA-file, then the N files will have names as DATA-file1, DATA-file2, DATA-file3,... DATA-fileN. e.g. If the DATA filename is ANC , then the N files created by split-data.pl will have names like ANC1 , ANC2 , ..., ANCN .

A DATA file containing total M lines is split into N parts such that each part/file contains approximately M/N lines.

Thus, if N = 1, the output file will be exactly same as the given DATA file. If N = M where N = value of --parts and M = #lines in DATA then, each part will have a single line.

Author

Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.

Copyright

Copyright © 2004,

Amruta Purandare, University of Minnesota, Duluth. pura0010@umn.edu

Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY ; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA .