RemoveDuplicates (DDUP)
Given a sequence of elements which can be hashed and compared for equality, remove all duplicates from the sequence. The result must contain exactly one of each of the elements in the input and can be in any order.
Default Input Distributions
The test distributions are the following:
-
A random sequence of n integers in the range [0:n) as generated by:
randomSeq -t int -r <n> <n> <filename>
. -
An exponential random sequence of n integers in the range [0:n) as generated by:
exptSeq -t int -r <n> <n> <filename>
. -
Strings from a tri-gram distribution, as generated by the generator
trigramSeq <n> <filename>
For the large inputs n = 100 million, and for the small n = 10 million.
Input and Output File Formats
The input and output data need to be in the sequence file format, both with the same element type.
The output file can be in any order.