Skip to content
pranab edited this page Oct 29, 2014 · 3 revisions

Each configuration parameter is described with name, description and a default value if any.

Hadoop Common configuration

field.delim.regex

Field delimeter in input file ,

field.delim

Field delimeter in output file ,

num.reducer

Number of reducers. Should be set based on data size and cluster size 1

debug.on

Set to true if debug log is desired FALSE

ImplicitRatingEstimator Map Reduce

rating.mapper.config.path

JSON metadata file HDFS path

rating.estimator.output.detail

If set to true additional data is output by ImplicitRatingEstimator MR FALSE

ItemDynamicAttributeSimilarity Map Reduce

bucket.count

Bucket count for hash bucketing of data. Should increase with increasing data size 10

hash.pair.multiplier

Multiplier for hashed bucket values. Generally, there is no need to set this 1000

similarity.algorithm

Similarity algorithm.Choices are: cosine,jaccard,semantic cosine

vec.type.boolean

Set to true if data type boolean FALSE

vec.type.semantic

Set to true if data type semantic FALSE

vec.count.included

Set to true if count is included in vector TRUE

jaccard.srcNonMatchingTermWeight

Weight for source non matching terms for Jaccard similarity

jaccard.trgNonMatchingTermWeight

Weight for target non matching terms for Jaccard similarity

distance.scale

Scaling for distance value 1000

output.correlation

Set to true if output should have correlation instead of distance FALSE

min.intersection.length

Min length of interesection between the two vectors to be considered for distance calculation 2

paritioning.field.ordinal

Ordinal value of field to be used for partioning. Used when data is partioned before distance calculation -1

semantic.matcher.class

Java class name for semantic matching

semantic.top.match.count

Top match count for semantic matching 5

semantic.matcher.params

List pf parameter names for semantic matching

semantic.match.scale

Scale for semantic matcing 10

semantic.rdf.modelFilePath

Semantic model meta data file path

UtilityPredictor Map Reduce

correlation.linear

Set to true if correlation algorithm is linear. Set to false if addtivie TRUE

max.rating

Rating scale 100

sub.field.delim

User rating file sub field delimeter :

rating.file.prefix

Rating data file prefix rating

rating.stat.file.prefix

Rating statistics file prefix stat

correlation.scale

Scale for correlation values 1000

correlation.modifier

Correlation can be modified by raising to a power greater that 1 to give more weight to high correlation value 1

rating.time.window.hour

Rating time widow threshold for time sensitive rating matrix -1

min.input.rating

Only input rating above this value will be used -1

min.correlation only

item correlation value above this will be used -1

UtilityAggregator Map Reduce

corr.length.weighted.average

Set to true if weighted average is to be taken based on correlation length FALSE

input.rating.stdDev.weighted.average

Set to true if input rating is to be weighted by rating std deviation FALSE

rating.aggregator.average

Set to true if rating aggregation is by averaging TRUE

BusinessGoalInjector Map Reduce

biz.goal.file.prefix

Busines goal file name prefix biz

biz.goal.weights

Coma separated weights for all business goals

max.biz.goal.weight

Max value for sum of business goal weights 70

biz.goal.min.threshold

Coma separated list of minimum threshold for business goal

SameTypeSimilarity Map Reduce

bucket.count

Bucket count for hash bucketing of data. Should increase with increasing data size 1000

distance.scale

Scaling for distance value 1000

sub.field.delim.regex

Sub field delimeter for time window start and end ::

faceted.field.ordinal

Coma separated field ordinals for facetted fields. If set only these fields are considered

include.passive.fields

If set to true fields not participating in distance calculations are included in the output FALSE

dist.threshold

If distance is above the defined threshold it's excluded from the output dist.scale

output.id.first

If set to true id is the first field in the output TRUE

inter.set.matching

If set to true only entities belonging to different sets is included in distance calculation
FALSE

set.ID.size

The number characters in the beginning of of the ID field that represents set ID 0

auto.generate.id

If set to true, record ID is auto generated as the first field (false)

output.record

if set to true, the whole record is included in the ouput (false)

TextAnalyzer Map Reduce

consolidate.field

If true all text fields are consolidated into one text field FALSE

text.field.ordinals

Coma separated list of text field ordinals

raw.schema.file.path

Schema file HDFS path

PearsonCorrelator Map Reduce

bucket.count

Bucket count for hash bucketing of data. Should increase with increasing data size 10

hash.pair.multiplier

Multiplier for hashed bucket values. Generally, there is no need to set this 1000

subfield.delim

Sub field delimeter for rating :

rating.scale

Scale for rating 100

correlation.scale

Scale for correlation values 1000

min.rating.intersection.set

Minimum number of common user ratings between 2 rating vectors 3

StructuredTextAnalyzer Map Reduce

text.country

Country for structured text USA

text.language

Language for structured text en

raw.schema.file.path

HDFS path for schema file

Clone this wiki locally