Skip to content

Performance issues #27

@bbkr

Description

@bbkr

Hi

I'm in the middle of migration form MIME::Lite to Email::MIME, but I encountered performance issues. So I've profiled sample code:

#!/usr/bin/env perl

use utf8;
use strict;
use warnings;

use Email::MIME;

Email::MIME->create(
    'attributes' => {
        'content_type' => 'text/plain',
        'disposition'  => 'attachment',
        'charset'      => 'UTF-8',
        'encoding'     => 'quoted-printable',
    },
    'header_str' => [
        'From' => 'me@example.com',
        'To' => 'you@example.com',
        'Subject' => 'Zażółć gęślą jaźń',
        ],
    'body_str' => 'Zażółć gęślą jaźń'
)->as_string for 1..10000;

Full result is available at http://bbkr.org/email_mime_prof/ and here are the parts that can be improved:

mime encode is giant performance killer

maybe_mime_encode_header took 1/4 of total time (inclusive).
Looking deeper I found that bottleneck here is caused by two things - reallocating memory by chopping single characters from beginning of the $text (why not use simple split?) and encoding those single characters that causes calls flood to Encode
http://bbkr.org/email_mime_prof/Email-MIME-Encode-pm-27-line.html#104
Possible optimisation: You can estimate how much the Base 64 encoding will bloat output string -MIME::Base64::encoded_base64_length function can tell you that. And chop many chars at once from $text that will greatly reduce amount of calls to Encode.

header is parsed even if given

http://bbkr.org/email_mime_prof/Email-Simple-pm-6-line.html#108
This is pure waste of CPU cycles, when Email::MIME->create calls Email::Simple->new the end of header is known, no need to invoke costly regexp.

folding header

http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#322
It uses costly string reallocation on regexp replace.
Possible optimisation 1: Cheap return $line . $crlf if length $line < $limit. Most headers are NOT folded so this should save great amount of CPU cycles.
Possible optimisation 2: Use cheaper substrings (combined with index for whitespace location) instead of substitution.
Possible optimisation 3: Use formats. Folding can be expressed as

<<<<<<<<<<<
  <<<<<<<<<< ~~   # repeat unless string is exhausted
$string
.

However I don't know how to respect end of words in this solution.

constants as methods

Some methods, such as CRLF http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#286 are called very often. I'm not sure if there is real need for subclassing this method if CRLF value can be also passed to the constructor. Using $self->{'mycrlf'} in Headers package will be much faster.

explosion of regexps

http://bbkr.org/email_mime_prof/Email-Simple-Header-pm-11-line.html#69
I'm not sure what it does, but 18 regexps per simplest message burns CPU.

content type attribute setters dynamically dispatched

http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#527
New method name is composed for every content type attribute and then dynamic invocation is used.
Possible optimisation 1: static hash that ties attribute name with method reference (or closure to allow subclassing) would be much faster

state %dispatch = (
    'name1' => sub { $_[0]->set_name1($_[1])}
);
return $dispatch{$name}->($self, $value);

content type is parsed event if given explicitly for Email::MIME->create method

http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#462
http://bbkr.org/email_mime_prof/Email-MIME-pm-5-line.html#822
That invokes tons of complete useless regexps.

There are few memory related issues - module requires over twice the message size for str/raw internal storage - that are not included in this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions