-
Notifications
You must be signed in to change notification settings - Fork 642
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added net6.0 target to Lucene.Net.Analysis.OpenNLP and changed to usi…
…ng MavenReference (#892) * upgrade targets to target .net core 6, in addition to .net framework * update net 4.6 version * Lucene.Net.Tests.OpenNLP: Patched IDE behavior to use net48 when net461 is selected and net7.0 when net5.0 is selected. In CI, we set IsTestProject=false and IsPublishable=false to skip these tests. * publish-test-results-for-test-projects.yml: Added support for net7.0 and net6.0 for Lucene.Net.Tests.Analysis.OpenNLP tests. * .github/workflows: Regenerated to add net7.0 as a test framework for Lucene.Net.Tests.Analysis.OpenNLP * .build/dependencies.props: Upgrade System.Memory to 4.5.5 to match IKVM 8.2.0 * .build/dependencies.props: Bumped System.Runtime.CompilerServices.Unsafe to 6.0.0 to match IKVM 8.5.0 * Lucene.Net.csproj: Added direct dependency on System.Runtime.CompilerServices.Unsafe for netstandard2.0 and net462 to ensure the version will work with any combination of Lucene.Net components. This is a transitive dependency in a few 3rd party DLLs, but there may be version conflicts if this isn't done on .NET Framework. * Lucene.Net.Facet.csproj: Added explicit dependency on System.Memory for netstandard2.0 and net462, since it is being used in Lucene.Net.Facet. * Lucene.Net.TestFramework.csproj: Added dependency on System.Text.Json to pin the version so it matches the reference of IKVM 8.5.0 (6.0.6). * upgrade targets to target .net core 6, in addition to .net framework * Lucene.Net.Analysis.OpenNLP.csproj, .build/dependencies.props: Changed to use <MavenReference> to build opennlp-tools instead of using the pre-built OpenNLP.NET NuGet package. * .build/dependencies.props: bumped IKVM to 8.7.3 and IKVM.Maven.Sdk to 1.6.7 * Lucene.Net.Analysis.OpenNLP.csproj: Removed duplicate TargetFrameworks declaration * Lucene.Net.Analysis.OpenNLP: Changed target from net462 > net472, the minimum supported by IKVM. * Directory.Build.targets: Updated FEATURE_OPENNLP to be available on .NET Core * Lucene.Net.Tests.AllProjects: Updated references so we can successfully compile with IKVM in the mix, both on .NET Framework and .NET Core * .build/dependencies.props: Added OpenNLP MavenReference version so it can be managed with the other packages * Lucene.Net.Analysis.OpenNLP: Added Maven dependency on org.osgi.core to eliminate build warnings (at least 1 type is referenced in opennlp-tools) * Lucene.Net.Analysis.OpenNLP: For now, making net472 conditional based on Windows due to lack of non-Windows build support in IKVM 8.7.3 (see: ikvmnet/ikvm-maven#49). * .build/dependencies.props: Reverted back to OpenNLP 1.9.1 because of build issues with opennlp-uima on 1.9.4. This aligns with Lucene 8.2.0. * publish-nuget-packages.yml: Remove forward slash * .build/dependencies.props: Bumped IKVM to 8.7.5 * Lucene.Net.Analysis.OpenNLP/overview.md: Added missing docs from Lucene and link to MavenReference demo. Fixes #890. * FEATURE: Lucene.Net.Analysis.Miscellaneous: Added TypeAsSynonymFilter from Lucene 8.2.0 because it is called out in the docs as part of the process of configuring Lucene.Net.Analysis.OpenNLP. Changed CannedTokenStream to set ITypeAttribute.Type because it is required by the tests for TypeAsSynonymFilter. * Lucene.Net.Analysis.Miscellaneous.TestTypeAsSynonymFilterFactory: Added comment with lucene version compatibility level (to indicate we ported it from Lucene 8.2.0) * Lucene.Net.Analysis.OpenNLP.overview.md: Corrected information about which filters are included in the package (there is no NER filter in the box) --------- Co-authored-by: Laimonas Simutis <laimis@gmail.com>
- Loading branch information
1 parent
1cb1eaf
commit 41ad676
Showing
18 changed files
with
405 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
97 changes: 97 additions & 0 deletions
97
src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/TypeAsSynonymFilter.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
// Lucene version compatibility level 8.2.0 | ||
// LUCENENET NOTE: Ported because Lucene.Net.Analysis.OpenNLP requires this to be useful. | ||
using Lucene.Net.Analysis.TokenAttributes; | ||
using Lucene.Net.Util; | ||
#nullable enable | ||
|
||
namespace Lucene.Net.Analysis.Miscellaneous | ||
{ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
/// <summary> | ||
/// Adds the <see cref="ITypeAttribute.Type"/> as a synonym, | ||
/// i.e. another token at the same position, optionally with a specified prefix prepended. | ||
/// </summary> | ||
public sealed class TypeAsSynonymFilter : TokenFilter | ||
{ | ||
private readonly ICharTermAttribute termAtt; | ||
private readonly ITypeAttribute typeAtt; | ||
private readonly IPositionIncrementAttribute posIncrAtt; | ||
private readonly string? prefix; | ||
|
||
private State? savedToken = null; | ||
|
||
/// <summary> | ||
/// Initializes a new instance of <see cref="TypeAsSynonymFilter"/> with | ||
/// the specified token stream. | ||
/// </summary> | ||
/// <param name="input">Input token stream.</param> | ||
public TypeAsSynonymFilter(TokenStream input) | ||
: this(input, null) | ||
{ | ||
} | ||
|
||
/// <summary> | ||
/// Initializes a new instance of <see cref="TypeAsSynonymFilter"/> with | ||
/// the specified token stream and prefix. | ||
/// </summary> | ||
/// <param name="input">Input token stream.</param> | ||
/// <param name="prefix">Prepend this string to every token type emitted as token text. | ||
/// If <c>null</c>, nothing will be prepended.</param> | ||
public TypeAsSynonymFilter(TokenStream input, string? prefix) | ||
: base(input) | ||
{ | ||
this.prefix = prefix; | ||
termAtt = AddAttribute<ICharTermAttribute>(); | ||
typeAtt = AddAttribute<ITypeAttribute>(); | ||
posIncrAtt = AddAttribute<IPositionIncrementAttribute>(); | ||
} | ||
|
||
|
||
public override bool IncrementToken() | ||
{ | ||
if (savedToken != null) | ||
{ | ||
// Emit last token's type at the same position | ||
RestoreState(savedToken); | ||
savedToken = null; | ||
termAtt.SetEmpty(); | ||
if (prefix != null) | ||
{ | ||
termAtt.Append(prefix); | ||
} | ||
termAtt.Append(typeAtt.Type); | ||
posIncrAtt.PositionIncrement = 0; | ||
return true; | ||
} | ||
else if (m_input.IncrementToken()) | ||
{ | ||
// Ho pending token type to emit | ||
savedToken = CaptureState(); | ||
return true; | ||
} | ||
return false; | ||
} | ||
|
||
public override void Reset() | ||
{ | ||
base.Reset(); | ||
savedToken = null; | ||
} | ||
} | ||
} |
62 changes: 62 additions & 0 deletions
62
src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/TypeAsSynonymFilterFactory.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
// Lucene version compatibility level 8.2.0 | ||
// LUCENENET NOTE: Ported because Lucene.Net.Analysis.OpenNLP requires this to be useful. | ||
using Lucene.Net.Analysis.Util; | ||
using System; | ||
using System.Collections.Generic; | ||
#nullable enable | ||
|
||
namespace Lucene.Net.Analysis.Miscellaneous | ||
{ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
/// <summary> | ||
/// Factory for <see cref="TypeAsSynonymFilter"/>. | ||
/// <code> | ||
/// <fieldType name="text_type_as_synonym" class="solr.TextField" positionIncrementGap="100"> | ||
/// <analyzer> | ||
/// <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/> | ||
/// <filter class="solr.TypeAsSynonymFilterFactory" prefix="_type_" /> | ||
/// </analyzer> | ||
/// </fieldType> | ||
/// </code> | ||
/// | ||
/// <para/> | ||
/// If the optional <c>prefix</c> parameter is used, the specified value will be prepended | ||
/// to the type, e.g.with prefix = "_type_", for a token "example.com" with type "<URL>", | ||
/// the emitted synonym will have text "_type_<URL>". | ||
/// </summary> | ||
public class TypeAsSynonymFilterFactory : TokenFilterFactory | ||
{ | ||
private readonly string prefix; | ||
|
||
public TypeAsSynonymFilterFactory(IDictionary<string, string> args) | ||
: base(args) | ||
{ | ||
prefix = Get(args, "prefix"); // default value is null | ||
if (args.Count > 0) | ||
{ | ||
throw new ArgumentException(string.Format(J2N.Text.StringFormatter.CurrentCulture, "Unknown parameters: {0}", args)); | ||
} | ||
} | ||
|
||
public override TokenStream Create(TokenStream input) | ||
{ | ||
return new TypeAsSynonymFilter(input, prefix); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.