Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation updates #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,9 @@ $ scripts/grid start zookeeper
You can run directly within the project using maven:

```
$ mvn package
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=SamzaRunner" -P samza-runner
-Dexec.args="--inputFile=pom.xml --output=counts --runner=SamzaRunner" -P samza-runner
```

### Packaging Your Application
Expand All @@ -77,15 +78,15 @@ After packaging, we deploy and explode the tgz in the deploy folder:

### Standalone Cluster with Zookeeper
You can use the `run-beam-standalone.sh` script included in this repo to run an example
in standalone mode. The config file is provided as `config/standalone.properties`. Note by
default we create one single input partition for the whole input. To set the number of
in standalone mode. The config file is provided as `config/standalone.properties`. Note that by
default we create a single input partition for the whole input. To set the number of
partitions, you can add "--maxSourceParallelism=" argument. For example, "--maxSourceParallelism=2"
will create two partitions of the input file, based on size.

```
$ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.WordCount \
--configFilePath=$PWD/deploy/examples/config/standalone.properties \
--inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml --output=word-counts.txt \
--inputFile=$PWD/pom.xml --output=word-counts.txt \
--maxSourceParallelism=2
```

Expand All @@ -101,13 +102,15 @@ $ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.KafkaWordC

### Yarn Cluster
Similar to running standalone, we can use the `run-beam-yarn.sh` to run the examples
in Yarn cluster. The config file is provided as `config/yarn.properties`. To run the
WordCount example in yarn:
in Yarn cluster. The config file is provided as `config/yarn.properties`.
Note that for yarn, we don't need to wait after submitting the job, so there is no need for `waitUntilFinish()`.
Please change `p.run().waitUtilFinish()` to `p.run()` in the `WordCount.java` class.
To run the WordCount example in yarn:

```
$ deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.WordCount \
--configFilePath=$PWD/deploy/examples/config/yarn.properties \
--inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml \
--inputFile=$PWD/pom.xml \
--output=/tmp/word-counts.txt --maxSourceParallelism=2
```

Expand Down
13 changes: 8 additions & 5 deletions src/main/java/org/apache/beam/examples/KafkaWordCount.java
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,16 @@
* <p>To run in standalone with zookeeper:
* (large parallelism will enforce each partition in a task)
* <pre>{@code
* $ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.KafkaWordCount --configFilePath=$PWD/deploy/examples/config/standalone.properties --maxSourceParallelism=1024
* $ ./deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.KafkaWordCount --configFilePath=$PWD/deploy/examples/config/standalone.properties --maxSourceParallelism=1024
* }</pre>
*
* <p>To run in yarn:
* For yarn, we don't need to wait after submitting the job, so there is no need for
* waitUntilFinish(). Please change p.run().waitUtilFinish() to p.run().
*
* (large parallelism will enforce each partition in a task)
* <pre>{@code
* $ deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.KafkaWordCount --configFilePath=$PWD/deploy/examples/config/yarn.properties --maxSourceParallelism=1024
* $ ./deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.KafkaWordCount --configFilePath=$PWD/deploy/examples/config/yarn.properties --maxSourceParallelism=1024
* }</pre>
*
* <p>To produce some test data:
Expand Down Expand Up @@ -127,9 +130,9 @@ public static void main(String[] args) {
.withKeySerializer(StringSerializer.class)
.withValueSerializer(StringSerializer.class));

//For yarn, we don't need to wait after submitting the job,
//so there is no need for waitUntilFinish(). Please use
//p.run()
// For yarn, we don't need to wait after submitting the job,
// so there is no need for waitUntilFinish(). Please use
// p.run()
p.run().waitUntilFinish();
}
}
19 changes: 11 additions & 8 deletions src/main/java/org/apache/beam/examples/WordCount.java
Original file line number Diff line number Diff line change
Expand Up @@ -70,18 +70,21 @@
* <p>To execute this example in standalone with zookeeper:
* (split the input by 2)
* <pre>{@code
* $ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.WordCount \
* $ ./deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.WordCount \
* --configFilePath=$PWD/deploy/examples/config/standalone.properties \
* --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml --output=word-counts.txt \
* --inputFile=$PWD/pom.xml --output=word-counts.txt \
* --maxSourceParallelism=2
* }</pre>
*
* <p>To execute this example in yarn:
* For yarn, we don't need to wait after submitting the job, so there is no need for
* waitUntilFinish(). Please change p.run().waitUtilFinish() to p.run().
*
* (split the input by 2)
* <pre>{@code
* $ deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.WordCount \
* $ ./deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.WordCount \
* --configFilePath=$PWD/deploy/examples/config/yarn.properties \
* --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml \
* --inputFile=$PWD/pom.xml \
* --output=/tmp/word-counts.txt --maxSourceParallelism=2
* }</pre>
*/
Expand Down Expand Up @@ -187,10 +190,10 @@ static void runWordCount(WordCountOptions options) {
.apply(MapElements.via(new FormatAsTextFn()))
.apply("WriteCounts", TextIO.write().to(options.getOutput()).withoutSharding());

//For yarn, we don't need to wait after submitting the job,
//so there is no need for waitUntilFinish(). Please use
//p.run()
p.run().waitUntilFinish();
// For yarn, we don't need to wait after submitting the job,
// so there is no need for waitUntilFinish(). Please use
// p.run()
p.run();
}

public static void main(String[] args) {
Expand Down