-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi
I'm in the middle of migration form MIME::Lite to Email::MIME, but I encountered performance issues. So I've profiled sample code:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
use Email::MIME;
Email::MIME->create(
'attributes' => {
'content_type' => 'text/plain',
'disposition' => 'attachment',
'charset' => 'UTF-8',
'encoding' => 'quoted-printable',
},
'header_str' => [
'From' => 'me@example.com',
'To' => 'you@example.com',
'Subject' => 'Zażółć gęślą jaźń',
],
'body_str' => 'Zażółć gęślą jaźń'
)->as_string for 1..10000;Full result is available at http://bbkr.org/email_mime_prof/ and here are the parts that can be improved:
mime encode is giant performance killer
maybe_mime_encode_header took 1/4 of total time (inclusive).
Looking deeper I found that bottleneck here is caused by two things - reallocating memory by chopping single characters from beginning of the $text (why not use simple split?) and encoding those single characters that causes calls flood to Encode
http://bbkr.org/email_mime_prof/Email-MIME-Encode-pm-27-line.html#104
Possible optimisation: You can estimate how much the Base 64 encoding will bloat output string -MIME::Base64::encoded_base64_length function can tell you that. And chop many chars at once from $text that will greatly reduce amount of calls to Encode.
header is parsed even if given
http://bbkr.org/email_mime_prof/Email-Simple-pm-6-line.html#108
This is pure waste of CPU cycles, when Email::MIME->create calls Email::Simple->new the end of header is known, no need to invoke costly regexp.
folding header
http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#322
It uses costly string reallocation on regexp replace.
Possible optimisation 1: Cheap return $line . $crlf if length $line < $limit. Most headers are NOT folded so this should save great amount of CPU cycles.
Possible optimisation 2: Use cheaper substrings (combined with index for whitespace location) instead of substitution.
Possible optimisation 3: Use formats. Folding can be expressed as
<<<<<<<<<<<
<<<<<<<<<< ~~ # repeat unless string is exhausted
$string
.
However I don't know how to respect end of words in this solution.
constants as methods
Some methods, such as CRLF http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#286 are called very often. I'm not sure if there is real need for subclassing this method if CRLF value can be also passed to the constructor. Using $self->{'mycrlf'} in Headers package will be much faster.
explosion of regexps
http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#69
I'm not sure what it does, but 18 regexps per simplest message burns CPU.
content type attribute setters dynamically dispatched
http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#527
New method name is composed for every content type attribute and then dynamic invocation is used.
Possible optimisation 1: static hash that ties attribute name with method reference (or closure to allow subclassing) would be much faster
state %dispatch = (
'name1' => sub { $_[0]->set_name1($_[1])}
);
return $dispatch{$name}->($self, $value);
content type is parsed event if given explicitly for Email::MIME->create method
http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#462
http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#822
That invokes tons of complete useless regexps.
There are few memory related issues - module requires over twice the message size for str/raw internal storage - that are not included in this report.