combinatoradix
is a Java library which enumerates partial permutations,
combinations, and repeated combinations of a set that you give to it in a natural
Java manner and suitable for concurrent execution.
About what permutations, combinations, and repeated combinations are in more detail, please refer to the Wikipedia articles(4 and 5).
combinatoradix
is created and maintained by Hiroshi Ukai (dakusui@gmail.com).
This library requires Java 6 or later as its execution environment. To build this library from source, Maven 3.0.5 or later will be necessary.
<dependency>
<groupId>com.github.dakusui</groupId>
<artifactId>combinatoradix</artifactId>
<version>0.8.2</version>
</dependency>
You can build combinatoradix
by getting the source code from github.
$ git clone https://github.com/dakusui/combinatoradix.git
...
$ cd combinatoradix
...
$ mvn package
...
$
You will find a compiled jar file combinatoradix-{X.Y.Z}-SNAPSHOT.jar
under
target/
directory.
There are 3 main classes in this library,
- Permutator (com.github.dakusui.combinatoradix)
- Combinator (com.github.dakusui.combinatoradix)
- HomogeniousCombinator (com.github.dakusui.combinatoradix)
All of them can be used in the same manner.
Through the constructor of Permutator
, Combinator
, or HomogeniousCombinator
you can specify a list of values to be enumerated and how many elements in the set
should be chosen at once. And then you can iterate all the possible
partial permutations, combinations, or repeated combinations respectively.
Simplest example to enumerate all the possible combinations in a set would be like following.
@Test
public void printCombinations() {
List<String> list = Arrays.asList("A", "B", "C", "D", "E");
for (List<String> each : new Combinator<>(list, 2)) {
System.out.println(each);
}
}
And more importantly you can pick up Nth partial permutation, combination, or repeated combination directly without iterating upto N.
@Test
public void printNthCombinationDirectly() {
List<String> list = Arrays.asList("A", "B", "C", "D", "E");
System.out.println(new Combinator<>(list, 2).get(3));
}
This feature will allow you flexible concurrent enumerations.
About the benefit of this behaviour, you can refer to "Why you want to use combinatoradix
"
sub-section.
As already mentioned, by changing Combinator
to Permutator
or HomogeniousCombinator
,
you can enumerate partial permutations or repeated combinations respectively.
Please refer to "Examples" sections for more samples.
As its name suggests, this library relies on "factorial number system"2 "combinatorial number system"3. Basic idea is to use a number system where each digit specifies an element in a given set and map integers from 0 to P(n,k) - 1 to all the k-sequences without repetition on n-set.
For instance, to enumerate permutations P(5,3), combinatoradix
internally uses a number system below
2nd | 1st | 0th |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 2 | 2 |
(n/a) | 3 | 3 |
(n/a) | (n/a) | 4 |
All the possible numbers in this system are following.
0 1 2 3 4
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
100 101 102 103 104
110 111 112 113 114
120 121 122 123 124
130 131 132 133 134
200 201 202 203 204
210 211 212 213 214
220 221 222 223 224
230 231 232 233 234
As shown, there are only 60 numbers and you may notice that P(5, 3) = 5! / 2! = 5 * 4 * 3 * 2 * 1 / 2 * 1 = 60. We can translate each of those sixty numbers into one permutation respectively for instance,
0 -> (000) -> [Christopher, Bob, Alice]
To describe the algorithm, for convenience, let's consider that those are strings and
suppose a function nth(s, i)
extracts ith digit from a string.
i.e.,
nth(234, 0) = 4
nth(234, 1) = 3
nth(234, 2) = 2
nth( 31, 2) = 0
Then we can come up with a function that translates those strings into decimals as follows.
dec(s) = nth(s, 2) * 4 * 3 + nth(s, 1) * 4 + nth(s, 0)
And suppose that S
is a number represented in the system above. Now the
algorithm can be described as following.
# "initial" can be [Alice, Bob, Christopher, Derek, Elizabeth], for example.
initial=[...]
remaining=initial
tmp=[]
for i in [0 1 2...k-1]; do
each_digit=nth(S, i)
tmp.insert(remaining[each_digt])
done
result=tmp
That is, if the number is 234
,
- pick up 4 and take the 4th item in [Alice, Bob, Christopher, Derek, Elizabeth], which is "Elizabeth".
And it will be inserted in
tmp
([Elizabeth]). The remaining elements will be [Alice, Bob, Christopher, Derek]. - pick up 3 and take the 3rd item in [Alice, Bob, Christopher, Derek], which is "Derek"
And it will be inserted in
tmp
([Derek, Elizabeth]). The remaining elements will be [Alice, Bob, Christopher]. - pick up 2 and take the 2nd item in [Alice, Bob, Christopher], which is "Christopher".
And it will be inserted in
tmp
([Christopher, Derek, Elizabeth]). tmp
will be assigned toresult
.
Of course we can generalize this number system so that we can apply it to any P(n, k). The detail is discussed in a Wikipedia article.
This approach is clearly slower than other normal 'iterating' algorithm such as used by "combinatorics". But there are still very good reason to use it.
- Easy to resume: As long as you have an index (
combinatoradix
classes haveget(long index): List<T>
method, which always return the same same list as long as the same index is given to the same type of object created from the same constructor parameters.), you can stop and resume your operation at any point. - Suitable for parallel execution: For the same reason as the previous one, you can split a big enumeration task into pieces and execute them concurrently.
- Predictable order: As it will be shown in the example, if you give a list sorted in dictionary order, the result will be sorted in the dictionary order, too.
For the first and second point, probably we need some more explanation.
As mentioned, main classes of combinatoradix
, all of which extend Enumerator<T>
,
have get(long index): List<T>
method and size(): long
method.
And it is guaranteed that the method return the same List<T>
if the same initial data set, k
(parameter to specify how many elements
should be picked up from initial data set), and ```index`` are given.
Therefore, for instance, combinatoradix
can answer a question like "There
are 26 alphabets from A to Z, if we create 5 character words from them. But each
character can be used only once. What is the 1,000,000th word in dictionary order?".
The answer is [D, I, K, H, Q]
.
And it is a single line job for combinatoradix
once data set is given.
@Test
public void print1000000thWord() {
List<String> list = Arrays.asList(
/*
1 2 3 4 5 6 7 8 9 10
*/
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
"K", "L", "M", "N", "O", "P", "Q", "R", "S", "T",
"U", "V", "W", "X", "Y", "Z"
);
System.out.println(new Permutator<>(list, 5).get(1_000_000));
}
Probably other libraries can enumerate all the 5 character length words faster than
combinatoradix
. But for figuring out "1,000,000th word", it can be much more
efficient and do it instantly.
This might look a trivial characteristic, when you want to process all the possible patterns, which can easily become huge number, and you have multiple cores (or servers with Hadoop), this becomes useful.
In the above mentioned example, we will have 7,893,600 possible words in total.
If you have N
cores (Core_0
, Core_1
, ... Core_i
,...Core_N-1
)
and in order to process all of them as fast as possible what you can do is to
let Core_i
process words returned by get(j)
, where j mod N = i
.
Most of examples shown in this section is written for Java 7. But by just
changing <>
into <String>
(or other types depending on contexts),
they should work.
To enumerate all the partial permutations, you can use "Permutator" class. An example follows.
@Test
public void printPartialPermutations() {
List<String> list = Arrays.asList("A", "B", "C", "D", "E");
for (List<String> each : new Permutator<>(list, 2)) {
System.out.println(each);
}
}
This program will print output below.
[A, B]
[A, C]
[A, D]
[A, E]
[B, A]
[B, C]
[B, D]
[B, E]
[C, A]
[C, B]
[C, D]
[C, E]
[D, A]
[D, B]
[D, C]
[D, E]
[E, A]
[E, B]
[E, C]
[E, D]
You may notice that the elements are sorted in dictionary order.
This is an important characteristic that combinatorix
offers as already discussed.
Similarly you can enumerate all the combination just by changing the enumerator
class from Permutator
to Combinator
.
@Test
public void printCombinations() {
List<String> list = Arrays.asList("A", "B", "C", "D", "E");
for (List<String> each : new Combinator<>(list, 2)) {
System.out.println(each);
}
}
Same as Permutator
the output is sorted in dictionary order.
[A, B]
[A, C]
[A, D]
[A, E]
[B, C]
[B, D]
[B, E]
[C, D]
[C, E]
[D, E]
The class to be used for repeated combinations is HomogeniousCombinator
.
public void printRepeatedCombinations() {
List<String> list = Arrays.asList("A", "B", "C", "D", "E");
for (List<String> each : new HomogeniousCombinator<>(list, 2)) {
System.out.println(each);
}
}
Similarly, dictionary ordered output will be printed.
But an element in the original list can appear multiple times like [A,A]
,
[B,B]
, etc.
[A, A]
[A, B]
[A, C]
[A, D]
[A, E]
[B, B]
[B, C]
[B, D]
[B, E]
[C, C]
[C, D]
[C, E]
[D, D]
[D, E]
[E, E]
Following example will try to find the longest 3-sequence among all possibles.
In real life, please use Combinator
, instead of Permutator
if you
really want to know the length of the longest sequence.
Essentially this is doing the
@Test
public void findLongestSequenceSequentially() {
int lengthOfLongestSequence = 0;
for (List<String> seq :
new Permutator<>(
Arrays.asList("Alice", "Bob", "Christopher", "Derek", "Elizabeth"),
3)) {
int l = 0;
for (String s : seq) {
l += s.length();
}
if (l > lengthOfLongestSequence) lengthOfLongestSequence = l;
}
System.out.println(lengthOfLongestSequence);
}
This will print "25"
In Jave 8 single node map/reduce is an out-of-box feature. (single-node MapReduce is available in Java 8).
Following example will try to find the longest 3-sequence among all possibles same as the previous one but the difference is that it will be executed concurrently.
@Test
public void findLongestSequenceConcurrently() {
int lengthOfLongestSequence = new EnumeratorAdapter<>(
new Permutator<>(
Arrays.asList("Alice", "Bob", "Christopher", "Derek", "Elizabeth"),
3))
.parallelStream()
.map(seq -> {
int ret = 0;
for (String each : seq) {
ret += each.length();
}
return ret;
}).reduce(0, Integer::max);
System.out.println(lengthOfLongestSequence);
}
EnumeratorAdapter
is a trivial adapter class defined as follows.
public class EnumeratorAdapter<E> extends AbstractList<List<E>> {
final Enumerator<E> enumerator;
public EnumeratorAdapter(Enumerator<E> enumerator) {
this.enumerator = enumerator;
}
@Override
public List<E> get(int index) {
return this.enumerator.get(index);
}
@Override
public int size() {
long ret = this.enumerator.size();
if (ret > Integer.MAX_VALUE) {
throw new IllegalStateException();
}
return (int) ret;
}
}
This example will print "25", same as the previous one. Of course you can use the same technique for combinations and repeated combinations, too.
- 1 "combinatorics" library, Christian Trimble
- 2 "Factorial number system", Wikipedia.org
- 3 "Combinatorial number system", Wikipedia.org
- 4 "Permutation", Wikipedia.org
- 5 "Combination", Wikipedia.org
- 6 "mapreduce - Simple Java Map/Reduce framework"
- 7 "Java SE downloads", Oracle
- 8 "Maven - Download Apache Maven"
Copyright 2013 Hiroshi Ukai.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License in the LICENSE file, or at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.