View on GitHub

The PBBS Benchmarks

New version of pbbs benchmarks

Word Counts (WC)

Given an input string counts the number of occurences of each word in the string. All non characters (not in [a-z] or [A-Z]) are replaced with a blank, and all upper-case characters are converted to lower case. A word is a maximal contiguous sequence of characters in the resulting string.

The output is a sequence of pairs, each consisting of a string (word) and a count of how many times it appears. Ordering does not matter.

Default Input Distributions

Instances consist of both synthetic and real strings.

The large instances are:

The small instances are:

Input and Output File Formats

The input is a text file and output need to be in the sequence file format, with type StringIntPair.