Skip to content
This repository has been archived by the owner on Oct 4, 2022. It is now read-only.

How does morphology on yoastseo work?

agnieszkaszuba edited this page Dec 3, 2019 · 1 revision

General architecture

Morphological research

Morphological analysis of keyphrase and synonyms is implemented as a research. This means that

  • The relevant script is added in the researches folder next to the other more conventional researches
  • The results of the morphological analysis can be required through researcher.getResearch( "morphology" )

The language-specific information about how morphological forms of words should be built, is supplied separately from the researcher, in a data file in a private repository Yoast/YoastSEO.js-premium-configuration. This allows to control who has access to this file (Premium, but not Free), as well as (in prospective) makes data distribution more efficient, as a user needs to only access the data for his/her language. Currently, morphological analysis is available for Premium users and for the English language only. More guidelines on how morphology for new languages should be added will follow shortly.

What does the morphological research do

The morphological research receives a paper with keyword and eventually synonyms in it. It relies on the language of the paper (parsed from the locale) for the future analysis. The default language is English. The research

  • Splits the keyphrase or a synonym phrase by words - A boy reads a book > A, boy, reads, a, book.
  • Filters out function words (words with little or no conceptual meaning, e.g. propositions, enumerations), if a list of function words available. Otherwise keeps all words in > boy, reads, book.
  • For English, for Premium: builds all possible forms the remaining words, including hypothetical, as it was a noun, an adjective, an adverb and a verb > [boy, boys, boying, boyed], [read, reads, reading], [book, books, booking, booked, bookly]. The research makes use of regexes and lists of exceptions.
  • For English Free and for all other languages the arrays of forms would only contain one wordform.
  • Collects keyphrase and synonyms forms into one structure:
{
      keyphraseForms: [
              // forms of every word from the keyphrase
             [ form1, form2, ... ],  // 1st content word from the keyphrase
             [ form1, form2, ... ],  // 2nd content word from the keyphrase
             ...
      ],
      synonymsForms: [
             [  // forms of every word from the 1st synonym
                   [ form1, form2, ... ],  // 1st content word from the 1st synonym
                   [ form1, form2, ... ],  // 2nd content word from the 1st synonym
                   ...
             ],
             [  // forms of every word from the 2nd synonym
                   [ form1, form2, ... ],  // 1st content word from the 2nd synonym
                   [ form1, form2, ... ],  // 2nd content word from the 2nd synonym
                   ...
             ],
             ...
      ],
}

Who calls whom

  1. The plugin requires morphological data from the private repository Yoast/YoastSEO.js-premium-configuration and supplies these data to the webworker as a researchData.
  2. The webworker creates a Researcher with the provided morphological data and supplies this Researcher as an argument to the SEO asssessors (regular and cornerstone) that it calls.

Right now, content assessors do not receive this Researcher as input and create a new one (without morphological data available) on the fly every time it is needed. As soon as word-lists for readability analysis (e.g., transition words) are transferred to data, on-demand functionality), this will have to be adjusted.

  1. SEO assessor calls SEO assessments and SEO assessments call their specific researches as normal.
  2. Some SEO-specific researches require morphological analysis of keyphrase and synonyms, and some do not. Almost all researches that search for keyword or synonyms (in text, headings, tags, metadescription, etc.) require morphological analysis. You can see here if your research in question requires morphological analysis.

In order for an SEO research to use keyphrase or synonym word-forms, it should call the morphological research within itself. Something like:

export default function( paper, researcher ) {
   const topicForms = researcher.getResearch( "morphology" );
   ...
}

The function that builds morphological forms is memoized, so do not worry about inefficiency.

Depending on the exact functionality of the SEO research, it can make use of one of the helper functions, which were created to search for keyphrase forms or synonym forms in any supplied text string.

Refactor current researches and assessments: Cheat-sheet

  1. Pick an SEO assessment to work on. All specifications are available in the overview issue of this project.
  2. In the research of the assessment:
  • Pass researcher as argument to the main research function that is being exported

  • Request results of the morphological research.

  • Adjust the content of the research function to match the specification. Notice that the helper functions can return

    • number of words (word forms) matched,
    • percent of the words (word forms) matched,
    • whether the match was found with the keyword or a synonym.
  • Remember that we import from lodash-es instead of lodash and that we do export default function instead of module.exports.

  1. In the spec of your research:

For now the morphology data can be supplied as an internal json file, but it will soon be changed.

  • For every spec (or every time you create a new paper), create a Researcher, supply morphology data and use this researcher as an argument for the SEO research that you are testing.

  • Adjust the expected values of the tests.

  1. In the SEO assessment file: Adjust the criteria, boundaries and feedback strings to match the specifications.
  2. In the spec file of the SEO assessment: Adjust the expected scores and feedback strings.
  3. In the full-text specs: Adjust the expected scores and feedback strings for your assessment.
  4. In the full-text specs runner: Add researcher as a second parameter to the call of your SEO research.