-
Notifications
You must be signed in to change notification settings - Fork 128
Configuration
Each configuration parameter is described with name, description and a default value if any.
Field delimeter in input file ,
Field delimeter in output file ,
Number of reducers. Should be set based on data size and cluster size 1
Set to true if debug log is desired FALSE
JSON metadata file HDFS path
If set to true additional data is output by ImplicitRatingEstimator MR FALSE
Bucket count for hash bucketing of data. Should increase with increasing data size 10
Multiplier for hashed bucket values. Generally, there is no need to set this 1000
Similarity algorithm.Choices are: cosine,jaccard,semantic cosine
Set to true if data type boolean FALSE
Set to true if data type semantic FALSE
Set to true if count is included in vector TRUE
Weight for source non matching terms for Jaccard similarity
Weight for target non matching terms for Jaccard similarity
Scaling for distance value 1000
Set to true if output should have correlation instead of distance FALSE
Min length of interesection between the two vectors to be considered for distance calculation 2
Ordinal value of field to be used for partioning. Used when data is partioned before distance calculation -1
Java class name for semantic matching
Top match count for semantic matching 5
List pf parameter names for semantic matching
Scale for semantic matcing 10
Semantic model meta data file path
Set to true if correlation algorithm is linear. Set to false if addtivie TRUE
Rating scale 100
User rating file sub field delimeter :
Rating data file prefix rating
Rating statistics file prefix stat
Scale for correlation values 1000
Correlation can be modified by raising to a power greater that 1 to give more weight to high correlation value 1
Rating time widow threshold for time sensitive rating matrix -1
Only input rating above this value will be used -1
item correlation value above this will be used -1
Set to true if weighted average is to be taken based on correlation length FALSE
Set to true if input rating is to be weighted by rating std deviation FALSE
Set to true if rating aggregation is by averaging TRUE
Busines goal file name prefix biz
Coma separated weights for all business goals
Max value for sum of business goal weights 70
Coma separated list of minimum threshold for business goal
Bucket count for hash bucketing of data. Should increase with increasing data size 1000
Scaling for distance value 1000
Sub field delimeter for time window start and end ::
Coma separated field ordinals for facetted fields. If set only these fields are considered
If set to true fields not participating in distance calculations are included in the output FALSE
If distance is above the defined threshold it's excluded from the output dist.scale
If set to true id is the first field in the output TRUE
If set to true only entities belonging to different sets is included in distance calculation
FALSE
The number characters in the beginning of of the ID field that represents set ID 0
If set to true, record ID is auto generated as the first field (false)
if set to true, the whole record is included in the ouput (false)
If true all text fields are consolidated into one text field FALSE
Coma separated list of text field ordinals
Schema file HDFS path
Bucket count for hash bucketing of data. Should increase with increasing data size 10
Multiplier for hashed bucket values. Generally, there is no need to set this 1000
Sub field delimeter for rating :
Scale for rating 100
Scale for correlation values 1000
Minimum number of common user ratings between 2 rating vectors 3
Country for structured text USA
Language for structured text en
HDFS path for schema file