Skip to content

A PHP component to convert HTML into a plain text format

License

Notifications You must be signed in to change notification settings

mailpoet/html2text

This branch is 4 commits ahead of, 50 commits behind soundasleep/html2text:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d907c8f · Apr 24, 2019

History

98 Commits
Apr 24, 2019
Aug 9, 2017
Dec 18, 2015
Dec 15, 2014
Feb 25, 2016
May 16, 2014
Feb 26, 2018
Mar 18, 2015
Mar 18, 2015
Dec 15, 2015
Dec 8, 2014
Dec 15, 2014

Repository files navigation

html2text Build Status Total Downloads

html2text is a very simple script that uses PHP's DOM methods to load from HTML, and then iterates over the resulting DOM to correctly output plain text. For example:

<html>
<title>Ignored Title</title>
<body>
  <h1>Hello, World!</h1>

  <p>This is some e-mail content.
  Even though it has whitespace and newlines, the e-mail converter
  will handle it correctly.

  <p>Even mismatched tags.</p>

  <div>A div</div>
  <div>Another div</div>
  <div>A div<div>within a div</div></div>

  <a href="http://foo.com">A link</a>

</body>
</html>

Will be converted into:

Hello, World!

This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly.

Even mismatched tags.
A div
Another div
A div
within a div
[A link](http://foo.com)

See the original blog post or the related StackOverflow answer.

Installing

You can use Composer to add the package to your project:

{
  "require": {
    "soundasleep/html2text": "~0.5"
  }
}

And then use it quite simply:

$text = Html2Text\Html2Text::convert($html);

You can also include the supplied html2text.php and use $text = convert_html_to_text($html); instead.

Tests

Some very basic tests are provided in the tests/ directory. Run them with composer install --dev && vendor/bin/phpunit.

Troubleshooting

Class 'DOMDocument' not found

You need to install the PHP XML extension for your PHP version. e.g. apt-get install php7.1-xml

License

html2text is dual licensed under both EPL v1.0 and LGPL v3.0, making it suitable for both Eclipse and GPL projects.

Other versions

Also see html2text_ruby, a Ruby implementation.

About

A PHP component to convert HTML into a plain text format

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 98.9%
  • PHP 1.1%