Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing several bugs in the microservice implementation #273

Merged
merged 17 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,12 @@ of _[KASTEL - Institute of Information Security and Dependability](https://kaste
the [KIT](https://www.kit.edu).

## User Interfaces
To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java).

We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI) at [ArDoCo/GUI](https://github.com/ArDoCo/GUI).
To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use
the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java).

We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI)
at [ArDoCo/GUI](https://github.com/ArDoCo/GUI).

Future user interfaces like an enhanced GUI or a web interface are planned.

Expand All @@ -39,6 +42,7 @@ To test the Core, you could use case studies and benchmarks provided in ..
## Maven

```xml

<dependencies>
<dependency>
<groupId>io.github.ardoco.core</groupId>
Expand All @@ -49,7 +53,9 @@ To test the Core, you could use case studies and benchmarks provided in ..
```

For snapshot releases, make sure to add the following repository

```xml

<repositories>
<repository>
<releases>
Expand All @@ -64,9 +70,31 @@ For snapshot releases, make sure to add the following repository
</repositories>
```

## Microservice for text preprocessing

Text preprocessing works locally, but there is also the option to host a microservice for this.
The benefit is that the models do not need to be loaded each time, saving some runtime (and local memory).

The microservice can be found at [ArDoCo/StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service/).

The microservice is secured with credentials and the usage of the microservice needs to be activated and the URL of the microservice configured.
These settings can be provided to the execution via environment variables.
To do so, set the following variables:

```env
NLP_PROVIDER_SOURCE=microservice
MICROSERVICE_URL=[microservice_url]
SCNLP_SERVICE_USER=[your_username]
SCNLP_SERVICE_PASSWORD=[your_password]
```

The first variable `NLP_PROVIDER_SOURCE=microservice` activates the microservice usage.
The next three variables configure the connection, and you need to provide the configuration for your deployed microservice.

## Attribution

The initial version of this project is based on the master thesis [Linking Software Architecture Documentation and Models](https://doi.org/10.5445/IR/1000126194).
The initial version of this project is based on the master
thesis [Linking Software Architecture Documentation and Models](https://doi.org/10.5445/IR/1000126194).

## Acknowledgements

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
/* Licensed under MIT 2021-2023. */
package edu.kit.kastel.mcse.ardoco.core.api.text;

import java.io.IOException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonValue;

/**
* This class represents all valid part-of-speech (pos) tags
*
*/
public enum POSTag {
//@formatter:off
Expand Down Expand Up @@ -77,4 +81,19 @@ public boolean isVerb() {
public boolean isNoun() {
return getTag().startsWith("NN");
}

@JsonValue
public String toValue() {
return getTag();
}

@JsonCreator
public static POSTag forValue(String value) throws IOException {
try {
return get(value);
} catch (IllegalArgumentException e) {
throw new IOException("Cannot deserialize PosTag");
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ default int getLength() {
*/
ImmutableList<Word> words();

/**
* Returns the word at the given index
*
* @param index the index
* @return the word at the given index
*/
Word getWord(int index);

/**
* Returns the sentences of the text, ordered by appearance.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,14 @@

import org.eclipse.collections.api.list.ImmutableList;

import edu.kit.kastel.mcse.ardoco.core.api.text.*;
import edu.kit.kastel.mcse.ardoco.core.api.text.DependencyTag;
import edu.kit.kastel.mcse.ardoco.core.api.text.POSTag;
import edu.kit.kastel.mcse.ardoco.core.api.text.Phrase;
import edu.kit.kastel.mcse.ardoco.core.api.text.Sentence;
import edu.kit.kastel.mcse.ardoco.core.api.text.Text;
import edu.kit.kastel.mcse.ardoco.core.api.text.Word;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.IncomingDependencyDto;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.OutgoingDependencyDto;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.PosTag;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.SentenceDto;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.TextDto;
import edu.kit.kastel.mcse.ardoco.core.textproviderjson.dto.WordDto;
Expand All @@ -27,7 +31,7 @@ public class ObjectToDtoConverter {

/**
* converts an ArDoCo text into a text DTO
*
*
* @param text the ArDoCo text
* @return the text DTO
*/
Expand Down Expand Up @@ -74,7 +78,7 @@ private WordDto convertToWordDTO(Word word) throws NotConvertableException {
wordDTO.setText(word.getText());
wordDTO.setLemma(word.getLemma());
try {
wordDTO.setPosTag(PosTag.forValue(word.getPosTag().toString()));
wordDTO.setPosTag(POSTag.forValue(word.getPosTag().toString()));
} catch (IOException e) {
throw new NotConvertableException(String.format("IOException when converting word with id %d to WordDto: PosTag not found.", wordDTO.getId()));
}
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
import java.util.List;
import java.util.Objects;

import com.fasterxml.jackson.annotation.*;
import com.fasterxml.jackson.annotation.JsonProperty;

import edu.kit.kastel.mcse.ardoco.core.api.text.POSTag;

/**
* Definition of a word
Expand All @@ -17,7 +19,7 @@ public class WordDto {
private List<OutgoingDependencyDto> outgoingDependencies = new ArrayList<>();
private long sentenceNo;
private String text;
private PosTag posTag;
private POSTag posTag;

/**
* The id of the word. Should be ascending from 1 for the first word in the text.
Expand Down Expand Up @@ -72,12 +74,12 @@ public void setOutgoingDependencies(List<OutgoingDependencyDto> value) {
}

@JsonProperty("posTag")
public PosTag getPosTag() {
public POSTag getPosTag() {
return posTag;
}

@JsonProperty("posTag")
public void setPosTag(PosTag value) {
public void setPosTag(POSTag value) {
this.posTag = value;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,28 @@
public class PhraseImpl implements Phrase {
private static final String PUNCTUATION_WITH_SPACE = "\\s+([.,;:?!])";
private static final String BRACKETS_WITH_SPACE = "\\s+([()\\[\\]{}<>])";
private final PhraseType type;
private final ImmutableList<Phrase> childPhrases;
private final ImmutableList<Word> nonPhraseWords;
private ImmutableList<Word> phraseWords;

private ImmutableList<Word> containedWords;
private ImmutableList<Phrase> subPhrases;
private ImmutableSortedMap<Word, Integer> phraseVector;
private int sentenceNo = -1;
private String text;

private final PhraseType type;

private final List<Phrase> childPhrases;

public PhraseImpl(ImmutableList<Word> nonPhraseWords, PhraseType type, List<Phrase> childPhrases) {
this.nonPhraseWords = nonPhraseWords == null ? Lists.immutable.empty() : nonPhraseWords;
this.type = type;
this.childPhrases = childPhrases;
this.childPhrases = Lists.immutable.ofAll(childPhrases);
}

@Override
public int getSentenceNo() {
return getContainedWords().get(0).getSentenceNo();
public synchronized int getSentenceNo() {
if (sentenceNo < 0) {
sentenceNo = getContainedWords().get(0).getSentenceNo();
}
return sentenceNo;
}

@Override
Expand All @@ -60,70 +64,78 @@ public PhraseType getPhraseType() {
}

@Override
public ImmutableList<Word> getContainedWords() {
if (phraseWords == null) {
List<Word> collectedWords = new ArrayList<>();
for (Phrase subphrase : childPhrases) {
collectedWords.addAll(subphrase.getContainedWords().castToList());
public synchronized ImmutableList<Word> getContainedWords() {
if (containedWords == null) {
if (phraseWords == null) {
List<Word> collectedWords = new ArrayList<>();
for (Phrase subphrase : childPhrases) {
collectedWords.addAll(subphrase.getContainedWords().castToList());
}
this.phraseWords = Lists.immutable.ofAll(collectedWords);
}
this.phraseWords = Lists.immutable.ofAll(collectedWords);

MutableList<Word> words = Lists.mutable.ofAll(nonPhraseWords);
words.addAllIterable(phraseWords);
words.sortThis(Comparator.comparingInt(Word::getPosition));
containedWords = words.toImmutable();
}

MutableList<Word> words = Lists.mutable.ofAll(nonPhraseWords);
words.addAllIterable(phraseWords);
words.sortThis(Comparator.comparingInt(Word::getPosition));
return words.toImmutable();
return containedWords;
}

@Override
public ImmutableList<Phrase> getSubPhrases() {
List<Phrase> subPhrases = new ArrayList<>(childPhrases);
for (Phrase childPhrase : childPhrases) {
subPhrases.addAll(childPhrase.getSubPhrases().toList());
public synchronized ImmutableList<Phrase> getSubPhrases() {
if (subPhrases == null) {
MutableList<Phrase> tempSubPhrases = Lists.mutable.ofAll(childPhrases);
for (Phrase childPhrase : childPhrases) {
tempSubPhrases.addAll(childPhrase.getSubPhrases().toList());
}
subPhrases = tempSubPhrases.toImmutable();
}
return Lists.immutable.ofAll(subPhrases);
return subPhrases;
}

@Override
public boolean isSuperPhraseOf(Phrase other) {
List<Phrase> subphrases = this.childPhrases;
MutableList<Phrase> subphrases = Lists.mutable.ofAll(this.getSubPhrases());
while (!subphrases.isEmpty()) {
if (subphrases.contains(other)) {
return true;
}
List<Phrase> newSubphrases = new ArrayList<>();
for (Phrase subphrase : subphrases) {
newSubphrases.addAll(subphrase.getSubPhrases().castToList());
}
subphrases = newSubphrases;
subphrases = getSubPhrasesOfPhrases(subphrases);
}
return false;
}

private static MutableList<Phrase> getSubPhrasesOfPhrases(MutableList<Phrase> subphrases) {
MutableList<Phrase> subPhrasesOfPhrases = Lists.mutable.empty();
for (Phrase subphrase : subphrases) {
subPhrasesOfPhrases.addAll(subphrase.getSubPhrases().castToList());
}
return subPhrasesOfPhrases;
}

@Override
public boolean isSubPhraseOf(Phrase other) {
List<Phrase> subphrases = other.getSubPhrases().castToList();
MutableList<Phrase> subphrases = Lists.mutable.ofAll(other.getSubPhrases());
while (!subphrases.isEmpty()) {
if (subphrases.contains(this)) {
return true;
}
List<Phrase> newSubphrases = new ArrayList<>();
for (Phrase subphrase : subphrases) {
newSubphrases.addAll(subphrase.getSubPhrases().castToList());
}
subphrases = newSubphrases;
subphrases = getSubPhrasesOfPhrases(subphrases);
}
return false;
}

@Override
public ImmutableSortedMap<Word, Integer> getPhraseVector() {
MutableSortedMap<Word, Integer> phraseVector = SortedMaps.mutable.empty();

var grouped = getContainedWords().groupBy(Word::getText).toMap();
grouped.forEach((key, value) -> phraseVector.put(value.getAny(), value.size()));

return phraseVector.toImmutable();
public synchronized ImmutableSortedMap<Word, Integer> getPhraseVector() {
if (this.phraseVector == null) {
MutableSortedMap<Word, Integer> tempPhraseVector = SortedMaps.mutable.empty();
var grouped = getContainedWords().groupBy(Word::getText).toMap();
grouped.forEach((key, value) -> tempPhraseVector.put(value.getAny(), value.size()));
this.phraseVector = tempPhraseVector.toImmutable();
}
return this.phraseVector;
}

@Override
Expand Down
Loading