Skip to content

Commit 022f98b

Browse files
committed
- Updated release number
- Updated documentation for release 1.2.4 - Updated missing format params
1 parent 984ccf8 commit 022f98b

File tree

14 files changed

+316
-89
lines changed

14 files changed

+316
-89
lines changed

CHANGELOG

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,56 @@
1+
========
2+
1.2.4
3+
========
4+
---------------
5+
New features
6+
---------------
7+
* https://github.com/JohnSnowLabs/spark-nlp/commit/c17ddac7a5a9e775cddc18d672e80e60f0040e38
8+
ResourceHelper now allows input files to be read in the shape of Spark Dataset, implicitly enabling HDFS paths, allowing larger annotator input files. Needs to set 'TXTDS' as input format Param to let annotators read this way. Allowed in: Lemmatizer, EntityExtractor, RegexMatcher, Sentiment Analysis models, Spell Checker and Dependency Parser.
9+
10+
---------------
11+
Enhancements and progress
12+
---------------
13+
* https://github.com/JohnSnowLabs/spark-nlp/commit/4920e5ce394b25937969cc4cab1d81172be722a3
14+
CRF NER Benchmarking progress
15+
* https://github.com/JohnSnowLabs/spark-nlp/pull/64
16+
EntityExtractor refactored. This annotator uses an input file containing a list of entities to look for inside target text. This annotator has been refactored to be of better use and specifically faster, by using a Trie search algorithm. Proper examples included in python notebooks.
17+
18+
---------------
19+
Bug fixes
20+
---------------
21+
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/41 <> https://github.com/JohnSnowLabs/spark-nlp/commit/d3b9086e834233f3281621d7c82e32195479fc82
22+
Fixed default resources not being loaded properly when using the library through --spark-packages. Improved input reading from resources and folder resources, and falling back to disk, with better error handling.
23+
* https://github.com/JohnSnowLabs/spark-nlp/commit/08405858c6186e6c3e8b668233e30df12fa50374
24+
Corrected param names in DocumentAssembler
25+
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/58 <> https://github.com/JohnSnowLabs/spark-nlp/commit/5a533952cdacf67970c5a8042340c8a4c9416b13
26+
Deleted a left-over deprecated function which was misleading.
27+
* https://github.com/JohnSnowLabs/spark-nlp/commit/c02591bd683db3f615150d7b1d121ffe5d9e4535
28+
Added a filtering to ensure no empty sentences arrive to unnormalized Vivekn Sentiment Analysis
29+
30+
---------------
31+
Documentation and examples
32+
---------------
33+
* https://github.com/JohnSnowLabs/spark-nlp/commit/b81e95ce37ed3c4bd7b05e9f9c7b63b31d57e660
34+
Added additional resources into FAQ page.
35+
* https://github.com/JohnSnowLabs/spark-nlp/commit/0c3f43c0d3e210f3940f7266fe84426900a6294e
36+
Added Spark Summit example notebook with full Pipeline use case
37+
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/53 <> https://github.com/JohnSnowLabs/spark-nlp/commit/20efe4a3a5ffbceedac7bf775466b7a8cde5044f
38+
Fixed scala python documentation mistakes
39+
* https://github.com/JohnSnowLabs/spark-nlp/commit/782eb8dce171b69a615887b3defaf8b729b735f2
40+
Typos fix
41+
42+
---------------
43+
Other
44+
---------------
45+
* https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
46+
Removed Regex NER due to slowness and little use. CRF NER to replace NER.
47+
48+
---------------
49+
Other
50+
---------------
51+
https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
52+
Removed Regex NER due to slowness and little use. CRF NER to replace NER.
53+
154
========
255
1.2.3
356
========

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ This library has been uploaded to the spark-packages repository https://spark-pa
1313
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.0.0` to you spark command
1414

1515
```sh
16-
spark-shell --packages JohnSnowLabs:spark-nlp:1.2.3
16+
spark-shell --packages JohnSnowLabs:spark-nlp:1.2.4
1717
```
1818

1919
```sh
20-
pyspark --packages JohnSnowLabs:spark-nlp:1.2.3
20+
pyspark --packages JohnSnowLabs:spark-nlp:1.2.4
2121
```
2222

2323
```sh
24-
spark-submit --packages JohnSnowLabs:spark-nlp:1.2.3
24+
spark-submit --packages JohnSnowLabs:spark-nlp:1.2.4
2525
```
2626

2727
If you want to use and old version check the spark-packages websites to see all the releases.
@@ -36,19 +36,19 @@ Our package is deployed to maven central. In order to add this package as a depe
3636
<dependency>
3737
<groupId>com.johnsnowlabs.nlp</groupId>
3838
<artifactId>spark-nlp_2.11</artifactId>
39-
<version>1.2.3</version>
39+
<version>1.2.4</version>
4040
</dependency>
4141
```
4242

4343
#### SBT
4444
```sbtshell
45-
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.2.3"
45+
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.2.4"
4646
```
4747

4848
If you are using `scala 2.11`
4949

5050
```sbtshell
51-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.2.3"
51+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.2.4"
5252
```
5353

5454
## Using the jar manually

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ name := "spark-nlp"
77

88
organization := "com.johnsnowlabs.nlp"
99

10-
version := "1.2.3"
10+
version := "1.2.4"
1111

1212
scalaVersion := scalaVer
1313

docs/components.html

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,21 @@ <h4 id="Lemmatizer" class="section-block"> 5. Lemmatizer: Lemmas</h4>
336336
setDictionary(path): Path to file containing multiple key to value
337337
dictionary, or key,value lemma dictionary. Default: Not provided
338338
</li>
339+
<li>
340+
setLemmaFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
341+
Default:
342+
Looks up path in configuration
343+
</li>
344+
<li>
345+
setLemmaKeySep(format): Separator for keys and multiple values
346+
Default:
347+
"->" or Looks up path in configuration
348+
</li>
349+
<li>
350+
setLemmaValSep(format): Separator among values
351+
Default:
352+
"\t" or Looks up path in configuration
353+
</li>
339354
</ul>
340355
<b>Example:</b><br>
341356
</p>
@@ -361,6 +376,21 @@ <h4 id="Lemmatizer" class="section-block"> 5. Lemmatizer: Lemmas</h4>
361376
setDictionary(path): Path to file containing multiple key to value
362377
dictionary, or key,value lemma dictionary. Default: Not provided
363378
</li>
379+
<li>
380+
setLemmaFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
381+
Default:
382+
Looks up path in configuration
383+
</li>
384+
<li>
385+
setLemmaKeySep(format): Separator for keys and multiple values
386+
Default:
387+
"->" or Looks up path in configuration
388+
</li>
389+
<li>
390+
setLemmaValSep(format): Separator among values
391+
Default:
392+
"\t" or Looks up path in configuration
393+
</li>
364394
</ul>
365395
<b>Example:</b><br>
366396
</p>
@@ -396,10 +426,20 @@ <h4 id="RegexMatcher" class="section-block"> 6. RegexMatcher: Rule matching</h4>
396426
MATCH_FIRST|MATCH_ALL|MATCH_COMPLETE
397427
</li>
398428
<li>
399-
setRules(path): Path to file containing a set of regex,key pair.
429+
setRulesPath(path): Path to file containing a set of regex,key pair.
400430
Default:
401431
Looks up path in configuration
402432
</li>
433+
<li>
434+
setRulesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
435+
Default:
436+
TXT or looks up path in configuration
437+
</li>
438+
<li>
439+
setRulesSeparator(sep): Separator for rules file
440+
Default:
441+
"," or looks up path in configuration
442+
</li>
403443
</ul>
404444
<b>Example:</b><br>
405445
</p>
@@ -424,10 +464,20 @@ <h4 id="RegexMatcher" class="section-block"> 6. RegexMatcher: Rule matching</h4>
424464
MATCH_FIRST|MATCH_ALL|MATCH_COMPLETE
425465
</li>
426466
<li>
427-
setRules(path): Path to file containing a set of regex,key pair.
467+
setRulesPath(path): Path to file containing a set of regex,key pair.
428468
Default:
429469
Looks up path in configuration
430470
</li>
471+
<li>
472+
setRulesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
473+
Default:
474+
TXT or looks up path in configuration
475+
</li>
476+
<li>
477+
setRulesSeparator(sep): Separator for rules file
478+
Default:
479+
"," or looks up path in configuration
480+
</li>
431481
</ul>
432482
<b>Example:</b><br>
433483
</p>
@@ -467,10 +517,15 @@ <h4 id="EntityExtractor" class="section-block"> 7. EntityExtractor: Phrase match
467517
boundaries for better precision
468518
</li>
469519
<li>
470-
setEntities(path): Provides a file with phrases to match. Default:
520+
setEntitiesPath(path): Provides a file with phrases to match. Default:
471521
Looks up
472522
path in configuration
473523
</li>
524+
<li>
525+
setEntitiesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
526+
Default:
527+
TXT or looks up path in configuration
528+
</li>
474529
</ul>
475530
<b>Example:</b><br>
476531
</p>
@@ -498,10 +553,15 @@ <h4 id="EntityExtractor" class="section-block"> 7. EntityExtractor: Phrase match
498553
boundaries for better precision
499554
</li>
500555
<li>
501-
setEntities(path): Provides a file with phrases to match. Default:
556+
setEntitiesPath(path): Provides a file with phrases to match. Default:
502557
Looks up
503558
path in configuration
504559
</li>
560+
<li>
561+
setEntitiesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
562+
Default:
563+
TXT or looks up path in configuration
564+
</li>
505565
</ul>
506566
<b>Example:</b><br>
507567
</p>
@@ -710,6 +770,12 @@ <h4 id="SentimentDetector" class="section-block"> 11. SentimentDetector: Sentime
710770
<li>
711771
setDictPath(path)
712772
</li>
773+
<li>
774+
setDictFormat(path)
775+
</li>
776+
<li>
777+
setDictSeparator(path)
778+
</li>
713779
</ul>
714780
<br>
715781
<b>Input:</b>
@@ -739,6 +805,12 @@ <h4 id="SentimentDetector" class="section-block"> 11. SentimentDetector: Sentime
739805
<li>
740806
setDictPath(path)
741807
</li>
808+
<li>
809+
setDictFormat(path)
810+
</li>
811+
<li>
812+
setDictSeparator(path)
813+
</li>
742814
</ul>
743815
<br>
744816
<b>Input:</b>
@@ -884,7 +956,7 @@ <h4 id="SpellChecker" class="section-block"> 13. SpellChecker: Token spell
884956
setCorpusPath: path to training corpus. Can be any good text.
885957
</li>
886958
<li>
887-
setCorpusFormat(format): Allowed “txt” or “txtds”. The latter uses spark dataframes from text
959+
setCorpusFormat(format): Allowed “txt” or “txtds”. The latter uses spark dataframes from text
888960
</li>
889961
<li>
890962
setSlangPath: path to custom dictionares, separated by comma

python/example/vivekn-sentiment/sentiment.ipynb

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,7 @@
3838
{
3939
"cell_type": "code",
4040
"execution_count": null,
41-
"metadata": {
42-
"collapsed": true
43-
},
41+
"metadata": {},
4442
"outputs": [],
4543
"source": [
4644
"#Load the input data to be annotated\n",
@@ -160,9 +158,7 @@
160158
{
161159
"cell_type": "code",
162160
"execution_count": null,
163-
"metadata": {
164-
"collapsed": true
165-
},
161+
"metadata": {},
166162
"outputs": [],
167163
"source": [
168164
"pipeline = Pipeline(stages=[\n",
@@ -182,9 +178,7 @@
182178
{
183179
"cell_type": "code",
184180
"execution_count": null,
185-
"metadata": {
186-
"collapsed": true
187-
},
181+
"metadata": {},
188182
"outputs": [],
189183
"source": [
190184
"for r in sentiment_data.take(5):\n",
@@ -217,9 +211,7 @@
217211
{
218212
"cell_type": "code",
219213
"execution_count": null,
220-
"metadata": {
221-
"collapsed": true
222-
},
214+
"metadata": {},
223215
"outputs": [],
224216
"source": [
225217
"Pipeline.read().load(\"./ps\")\n",
@@ -239,7 +231,7 @@
239231
"metadata": {
240232
"anaconda-cloud": {},
241233
"kernelspec": {
242-
"display_name": "Python [default]",
234+
"display_name": "Python 3",
243235
"language": "python",
244236
"name": "python3"
245237
},

0 commit comments

Comments
 (0)