-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathneurodoc_en.xml
111 lines (81 loc) · 3.23 KB
/
neurodoc_en.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
<tool id="neurodoc" name="K-means clustering" version="0.9.2">
<requirements>
<container type="docker">visatm/neurodoc</container>
</requirements>
<description>on a “document × term” datafile</description>
<command><![CDATA[
neurodoc -i "$input" -o "$output" -c $clusters --json
#if $metadata
-m "$metadata"
#end if
#if $frequency != 2
-f $frequency
#end if
#if $term != 1.0
-t $term
#end if
#if $document != 0.3
-d $document
#end if
]]></command>
<inputs>
<param name="input" type="data" format="tabular" label="Input file “document × term”" />
<param name="metadata" type="data" format="tabular" optional="true" label="Metadata file" />
<param name="clusters" type="integer" value="" min="2" max="200" label="Number of clusters" />
<param name="frequency" type="integer" value="2" min="1" label="Minimum frequency of terms (= nb. of documents)" />
<param name="term" type="float" value="1.0" min="0.0" label="Term threshold" />
<param name="document" type="float" value="0.3" min="0.0" label="Document threshold" />
</inputs>
<outputs>
<data format="json" name="output" />
</outputs>
<tests>
<test>
<param name="input" value="ndocDocsMots.txt" />
<param name="clusters" value="10" />
<output name="output" file="ndocClusters.xml" />
</test>
</tests>
<help><![CDATA[
This clustering tool applies the ** axial K-means** algorithm — as well as an **PCA** — on a *“document × term”* datafile.
.. class:: warningmark
This UTF-8-encoded datafile is made of 2 tab-separated columns and contains on each line a document identifier and a term “indexing” that document.
.. class:: warningmark
There are as many lines as the number of *“document — term”* pairs.
-----
**Options**
The programme ha several arguments, some **mandatory**, some **optional**.
+ *“document × term”* **datafile name**
+ **number of expected clusters**
+ *metadata file name*
+ *minimum frequency of terms*, i.e. the minimum number of documents in which a term can be foound (by default: 2)
+ *term threshold*, i.e. the minimum weight of a term necessary for inclusion in a cluster (by default: 1.0)
+ *document threshold*, i.e. the minimum weight of a document necessary for inclusion in a cluster (par défaut : 0.3)
.. class:: infomark
For these 2 last options, it is better not to change them.
-----
**Input data**
Example:
::
GS2_0000067 abrupt transition
GS2_0000067 apparent contrast
GS2_0000067 arc collision
...
GS2_0000067 wide variability
GS2_0000592 anomalous change
GS2_0000592 atomic oxygen
...
-----
**Metadata**
The metadata file is composed of tab-separated columns. The first line is reserved to the header containing the different field names. Only the fields appearing in boldface in the following list will be used:
+ **Filename**
+ **IstexId** or **Istex Id**
+ **ARK**
+ **DOI**
+ **Title**
+ **Source**
+ **PublicationDate** or **Publication date**
+ **Author** or **Authors**
Please note that the field names are case-insensitive. “**IstexId**”, “**istexId**” and “**istexid**” are accepted, as well as “**ISTEXID**”.
]]></help>
</tool>