All notable changes to this project will be documented in this file.
The mf2py library is excited to transition into 2.0. This version increase incorporates months of work from contributors, informed by active discussions among implementers and users.
This release officially deprecates support for versions of Python lower than 3.8.
Below are the changes we have made in this release.
- Enable
img_with_alt
by default (#184) - Add timezone offset normalisation (#206)
- Add option for exposing DOM for embedded properties (#208)
- Add srcset support (#209)
- Add language support (#210)
- Add option for filtering root class names (#211)
- Add option for metaformats support (#213)
- Remove
img_with_alt
option entirely (#200) - Resolve implied photo relative paths (#205)
- Make relative URLs in embedded properties absolute (#201)
- Fix whitespace in plaintext conversion (#207)
- Replace
dict_class
with standarddict
(#196)
- Update tests to include alt texts by default (#190)
- Add Windows and macOS tests (#198)
- Use poetry for dependency management (#189)
- Deprecate Python 2 support (#179)
- Lint code with
black
andisort
- Add linting CI actions (#193)
- Move from
nosetests
topytest
(#186) - Add 3.11, 3.12 and drop pypy from test matrix; upgrade poetry action (#204)
- Prepare tests to test options (#214)
- Bring README doctests up-to-date (#215)
- reduce instances where photo is implied (#135)
- always do relative URL resolution (#138)
- VCP now handles tz offsets without leading zeros (#142)
- implement id parsing (#143)
- fix outdated syntax causing SyntaxWarning (#157)
- add parsing for iframe.u-*[src] (#116)
- bug fix: reduced implied urls (#117)
- bug fix: don't collapse whitespace between tags
- specify explicit versions for dependencies
- revert BeautifulSoup copying added in 1.1.1 due to bugs (eg #108)
- misc performance improvements
- streamline backcompat to use JSON only.
- fix multiple mf1 root rel-tag parsing
- correct url and photo for hreview.
- add rules for nested hreview. update backcompat to use multiple matches in old properties.
- fix
rel-tag
top-category
conversion so that other classes are not lost. - use original authored html for
e-*
parsing in backcompat - make classes and rels into unordered (alphabetically ordered) deduped arrays.
- only use class names for mf2 which follow the naming rules
- fix
parse
method to use default html parser. - always use the first value for attributes for rels.
- correct AM/PM conversion in datetime value class pattern.
- add ordinal date parsing to datetimes value class pattern. ordinal date is normalised to YYYY-MM-DD
- remove hack for html tag classes since that is fixed in new BS
- better whitespace algorithm for
name
andhtml.value
parsing - experimental flag for including
alt
inu-photo
parsing - make a copy of the BeautifulSoup given by user to work on for parsing to prevent changes to original doc
- bump version to 1.1.1
- bump version to 1.1.0 since it is a "major" change
- added tests for new implied name rules
- modified earlier tests to accommodate new rules
- use space separator instead of "T"
- Don't add "00" seconds unless authored
- use TZ authored in separate
value
element - only use first found
value
of a particular typedate
,time
, ortimezone
. - move backcompat rules into JSON files
- reorganise value class pattern parsing into new files
- add datetime_helpers to organise datetime parsing rules
- reorganise tests
- remove Heroku frontend, point to mf2py-web and python.microformats.io instead in README.
- remove Flask and gunicorn requirements
- add debug info with description, version, url and the html parser used
- strip leading/trailing white space for
e-*[html]
. update the corresponding tests - blank values explicitly authored are allowed as property values
- include
alt
orsrc
from<img>
in parsing forp-*
ande-*[value]
- parse
title
from<link>
forp-*
resolves #84 - and
poster
from<video>
foru-*
resolves #76 - use
html5lib
as default parser - use the final redirect URL resolves #62
- update requirements to use BS4 v4.6.0 and html5lib v1.0.1
- drop support for Python 2.6 as html5lib dropped support
- Implied property checks now ignore alt="", treating it the same as if no alt value is defined.
- Support for using a custom dict implementation by setting mf2py.Parser.dict_class. collections.OrderedDict yields much nicer output for hosted parsers.
- Performance improvement changing simple calls to soup.find_all to a manual iteration over .contents.
- Performance improvement by limiting number of calls to soup.find_all in backcompat module. Should not be any functional changes.
- Backward compatibility parsing for rel=tag properties. These are now converted to p-category based on the last path segment of the tag URI as spec'd in http://microformats.org/wiki/h-entry#Parser_Compatibility
- Optional property html_parser to specify the html parser that BeautifulSoup should use (e.g., "lxml" or "html5lib")
u-*
properties are now parsed from<link>
elements per the updated spec http://microformats.org/wiki/microformats2-parsing-issues#link_elements_and_u-_parsing
- Version number bumped to 1.0.0 following community discussion.
- Stricter checks that Parser.init params are actually None before ignoring them.
- Now produces unicode strings for every key and value, no more byte strings anywhere.
- Do not add 'T' between date and time when normalizing dates
- Unit tests for running the microformats test suite
- New top-level "rel-urls" entry, contains rich data parsed from rel links, organized by URL.
- convenience method
mf2py.parse
that takes the same arguments as Parser and returns a dict. - nested h-* classes now parse their "value" based on the property they represent (p-, u-, dt-*), so for example "p-in-reply-to h-cite" would have a name as its value and "u-in-reply-to h-cite" will have a URL.
- Add rel=bookmark to backward compat parsing rules based (translated to u-url in mf2)
- Parser constructor now takes explicit named arguments instead of **kwargs, for saner behavior when called with unnamed arguments.
- Bugfix: Empty href="" attributes are now properly interpreted as the current document's URL.
- Minor Py3 compatibility fix
- Correct typo
test_requires
->tests_require
in setup.py
- Started keeping a changelog!
- Use a better method for extracting HTML for an e-* property
- Correct BeautifulSoup4 dependency in setup.py to fix error with installation from PyPI.
- Buffed up docstrings for public methods.