Skip to content

PixxxeL/python-html-purifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python HTML purifier

About

Cuts the tags and attributes from HTML that are not on the whitelist. Their content is leaves. Signature of whitelist:

{'enabled tag name' : ['list of enabled tag\'s attributes']}

You can use the symbol * to allow all tags and/or attributes.

Note that the script and style tags are removed with content.

This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.

Part info in my blog

Package on PyPi

Installation

$ pip install html-purifier

Basic Usage

>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
    'div': ['*'], # разрешает все атрибуты у тега div - All attributes are allowed for div
    'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span - Only "attr-2" attribute is allowed for span elements
    # все остальные теги удаляются, но их содержимое остается - All other tags and attributes are removed but their content is kept
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>

Django Usage

As usually used in models and forms. Here is purifier.models.PurifyedCharField, purifier.models.PurifyedTextField for Django ORM and purifier.forms.PurifyedCharField for Django forms

About

Purify HTML string

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published