Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 1.75 KB

README.md

File metadata and controls

52 lines (39 loc) · 1.75 KB

Python HTML purifier

About

Cuts the tags and attributes from HTML that are not on the whitelist. Their content is leaves. Signature of whitelist:

{'enabled tag name' : ['list of enabled tag\'s attributes']}

You can use the symbol * to allow all tags and/or attributes.

Note that the script and style tags are removed with content.

This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.

Part info in my blog

Package on PyPi

Installation

$ pip install html-purifier

Basic Usage

>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
    'div': ['*'], # разрешает все атрибуты у тега div - All attributes are allowed for div
    'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span - Only "attr-2" attribute is allowed for span elements
    # все остальные теги удаляются, но их содержимое остается - All other tags and attributes are removed but their content is kept
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>

Django Usage

As usually used in models and forms. Here is purifier.models.PurifyedCharField, purifier.models.PurifyedTextField for Django ORM and purifier.forms.PurifyedCharField for Django forms