Extract substrings matching a lexical pattern.
Load the paclet from the Paclet Repository
PacletInstall[ResourceObject["FaizonZaman/LexicalCases"]]
Needs["LexicalCases`"]
Supports v14.0+
Search strings, files or wikipedia articles for a lexical pattern.
oosp = ExampleData[{"Text", "OriginOfSpecies"}];
oospPattern = BoundToken[WordToken[2], BoundToken["specie"|"species"]];
oospResults = LexicalCases[oosp, oospPattern]
All Text Content Types can be used, however, some will take unreasonably long to expand, especially if it's meant to represent a hefty piece of text, like a topic type. The basic parts of speech types are good ones to start with:
alice = ExampleData[{"Text", "AliceInWonderland"}];
alicePattern = "Alice" ~~ TypeToken["Verb"] ~~ TypeToken["Adverb"];
aliceResults = LexicalCases[alice, alicePattern]
Use lexical patterns in StringCases
, StringPosition
and StringmatchQ
by wrapping the pattern with LexicalPattern
.
Here's an example creating an operator of StringCases:
aliceOp = StringCases[LexicalPattern["Alice" ~~ TypeToken["Verb"] ~~ TypeToken["Adverb"]]];
The paclet documentation includes additional examples, or visit LexicalCases on the Wolfram Paclet Repository.