Entity to Text

This suite is primarily a set of APIs and tools to improve the developer experience.

This module provides a number of utility and helper APIs for developers to transform content into plain text.

Use Entity to Text if

You need to get plain-text content of Nodes for Indexing content into a Search Engine (Solr, Elasticsearch, ...).
You want to get plain-text of Nodes Paragraphs for SEO or JSON-LD.
You need to transform "Node entity" field(s) into plain-text content.
You need to transform "Paragraphs entity" field(s) into plain-text content.
You need to transform "File entity" into plain-text through Tika.

Dependencies

The main module requires ezyang/htmlpurifier

The submodule entity_to_text_tika requires the library vaites/php-apache-tika. The submodule entity_to_text_paragraphs requires the library drupal/paragraphs.

Which version should I use?

Drupal Core	Entity to Text
8.x	-
9.x	1.0.x
10.x	1.1.x
11.x	1.1.x

Getting Started

We highly recommend you to install the module using composer.

$ composer require drupal/entity_to_text

Examples

Node fields to text

Usage

/** @var string $field_body_content */
$field_body_content = \Drupal::service('entity_to_text.extractor.node_to_text')->fromFieldtoText('body', $node);
/** @var string $field_foo_content */
$field_foo_content = \Drupal::service('entity_to_text.extractor.node_to_text')->fromFieldtoText('field_foo', $node);

Paragraphs to text

Prerequisite

Enabled entity_to_text_paragraphs module

Usage

/** @var array[] $bodies */
$bodies = \Drupal::service('entity_to_text_paragraphs.extractor.paragraphs_to_text')->fromParagraphToText($node->field_paragraphs);

File to text

Prerequisite

Having access to Tika as a RESTful API via the Tika server.
Enabled entity_to_text_tika module
Setup the settings.php configuration

/**
 * Apache Tika connection.
 */
$settings['entity_to_text_tika.connection']['host'] = 'tika';
$settings['entity_to_text_tika.connection']['port'] = '9998';

Usage

/** @var \Drupal\file\Entity\File $file */
$file = $file_item->entity;
$body = \Drupal::service('entity_to_text_tika.extractor.file_to_text')->fromFileToText($file, 'eng+fra');

or for an advanced usage avoiding multiple calls to Tika by using cached ocr file:

// Anywhere at least once in the code (Eg. module.install) in order to prepare the storage.
\Drupal::service('entity_to_text_tika.storage.local_file')->prepareStorage();

// Load the already OCR'ed file if possible to avoid unecessary calls to Tika.
$body = \Drupal::service('entity_to_text_tika.storage.local_file')->load($file, 'eng+fra');

if (!$body) {
  // When the OCR'ed file is not available, then run Tika over it and store it for the next run.
  $body = \Drupal::service('entity_to_text_tika.extractor.file_to_text')->fromFileToText($file, 'eng+fra');
  // Save the OCR'ed file for the next run.
  \Drupal::service('entity_to_text_tika.storage.local_file')->save($file, $body, 'eng+fra');
}

Generate OCR via CLI

The module provides a Drush command for generating OCR (Optical Character Recognition) for all files within Drupal. It's important to note that this command should be used judiciously due to its potential resource intensity.

Its primary objective is to generate OCR for files that have not undergone OCR processing yet. It's designed to work seamlessly with the Advanced feature set, leveraging cached OCR files efficiently. This command proves especially useful after a fresh installation, the addition of a new OCR language, or during file migrations.

# Warmup all files that does not already have an associated .ocr file.
drush e2t:t:w
# Warmup all files even if the files has already been processed before.
drush e2t:t:w --force
# Warmup the file with FID 2.
drush e2t:t:w --fid=2

Supporting organizations

This project is sponsored by Antistatique, a Swiss Web Agency. Visit us at www.antistatique.net or Contact us.

Credits

Entity to Text is currently maintained by Kevin Wenger. Thank you to all our wonderful contributors too.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
modules		modules
scripts/hooks		scripts/hooks
src		src
tests/src/Unit		tests/src/Unit
.chanrc		.chanrc
.cspell-project-words.txt		.cspell-project-words.txt
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
composer.json		composer.json
docker-compose.yml		docker-compose.yml
drupalci.yml		drupalci.yml
entity_to_text.info.yml		entity_to_text.info.yml
entity_to_text.services.yml		entity_to_text.services.yml
phpcs.xml.dist		phpcs.xml.dist
phpmd.xml		phpmd.xml
phpstan.neon		phpstan.neon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Entity to Text

Use Entity to Text if

Dependencies

Which version should I use?

Getting Started

Examples

Node fields to text

Usage

Paragraphs to text

Prerequisite

Usage

File to text

Prerequisite

Usage

Generate OCR via CLI

Supporting organizations

Credits

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

antistatique/drupal-entity-to-text

Folders and files

Latest commit

History

Repository files navigation

Entity to Text

Use Entity to Text if

Dependencies

Which version should I use?

Getting Started

Examples

Node fields to text

Usage

Paragraphs to text

Prerequisite

Usage

File to text

Prerequisite

Usage

Generate OCR via CLI

Supporting organizations

Credits

About

Topics

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages