CoreCommunity ExtensionsIncubatorDistributionsTYPO3 4.5 ProjectsTYPO3 4.6 ProjectsTYPO3 4.7 ProjectsTYPO3 6.0 ProjectsTYPO3 6.1 ProjectsTYPO3 6.2 Projects (+)

Bug #36806

Performance issue with large plainlist

Added by Tomas Norre Mikkelsen about 1 year ago. Updated about 1 year ago.

Status:Closed Start date:2012-05-03
Priority:Must have Due date:
Assignee:Ivan Dharma Kartolo % Done:

100%

Category:- Spent time: -
Target version:-
TYPO3 Version: PHP Version:
Votes: 0

Description

When having a large plainlist (e.g. 15000+ recipients), the function tx_directmail_static::cleanPlainList($plainlist) in class.tx_directmail_static.php is very slow, and with lists with 50.000+ recipients the server times out (runtime is 30+ sec).

With inspiration (read: mostly copy/paste) from this comment, http://dk.php.net/manual/en/function.array-unique.php#97285 the function can be re-written and made a lot faster.

This increased the performance for cleanPlainList() from timeout to ~3 seconds for a list of 50000+ records.

I included a patch to let the rest of you test this optimization. Let me know how your test goes, and if there are any questions feel free to ask.

Test details:
Allowed Memory Size: 128M
Timeout: 30 sec
plainlist size: 50000+ recipients

performance.patch (725 Bytes) Tomas Norre Mikkelsen, 2012-05-03 11:46

compareArrayUnique.php (318.5 kB) Ivan Dharma Kartolo, 2012-05-05 00:15

Associated revisions

Revision 61804
Added by Ivan Dharma Kartolo about 1 year ago

  • Bug #36806: Speed boosting on cleaning duplicates in plain list array (thx to Tomas Norre Mikkelsen)

Revision 37b1520b
Added by Ivan Dharma Kartolo about 1 year ago

  • Bug #36806: Speed boosting on cleaning duplicates in plain list array (thx to Tomas Norre Mikkelsen)

git-svn-id: https://svn.typo3.org/TYPO3v4/Extensions/direct_mail/branches/swiftmailer@61804 735d13b6-9817-0410-8766-e36946ffe9aa

Revision 61813
Added by Ivan Dharma Kartolo about 1 year ago

  • Bug #36806: Speed boosting on cleaning duplicates in plain list array (thx to Tomas Norre Mikkelsen)

Revision 43495f6b
Added by Ivan Dharma Kartolo about 1 year ago

  • Bug #36806: Speed boosting on cleaning duplicates in plain list array (thx to Tomas Norre Mikkelsen)

git-svn-id: https://svn.typo3.org/TYPO3v4/Extensions/direct_mail/branches/swiftmailer@61813 735d13b6-9817-0410-8766-e36946ffe9aa

History

Updated by Ivan Dharma Kartolo about 1 year ago

  • Status changed from New to Under Review
  • Assignee set to Tomas Norre Mikkelsen

Hi,

thanks for the patch.. i guess the foreach block should be removed also, right?

I also did have some thought on this problem and there's one another solution I haven't test yet

array_flip(array_flip(array_reverse($input,true)));

it could be much faster, since through the flipping only one value will be preserved.
The reverse is there to take the last element and removing all the previous elements (if the key is matter).
What do you think? can you check with your 50k Data and check if its faster?

either way, I take this patch in swiftmailer branch, since it will be the next release.

Updated by Tomas Norre Mikkelsen about 1 year ago

Hi.

First no problem, i just had the problem and found a solution, had to share :)

Actually i'm not sure if the foreach can be skipped or not. But will test it of course..

Will test your suggestion too, but cannot test this before monday at work.

When will next version be released?

Updated by Ivan Dharma Kartolo about 1 year ago

compared the two implementation with using "only" an array consisting 10k records.

array flipping shows a speed up to 3 times

1 start: 1336169168.2174
1 end: 1336169168.2537
1: 0.036357879638672
1 count:9742
2 start: 1336169173.2538
2 end: 1336169173.2669
2: 0.013092041015625
2 count:9742

1 => array_map("unserialize", array_unique(array_map("serialize", $plainlist)));
2 => array_flip(array_flip(array_reverse($plainlist,true)));

attached is my testing implementation

cant tell you about the ETA... It's better to take a little more time on testing, than releasing half baked software :) since integrating swiftmailer almost restructuring the dmailer class, need to test it thoroughly :)

Updated by Ivan Dharma Kartolo about 1 year ago

Tomas Norre Mikkelsen wrote:

First no problem, i just had the problem and found a solution, had to share :)

of course, thanks for pointing the problem :)

Actually i'm not sure if the foreach can be skipped or not. But will test it of course..

yes, the foreach is not needed anymore :)

Updated by Tomas Norre Mikkelsen about 1 year ago

Hi,

I just tested with the array_flip() function, and there are challenges regarding the data structure.

The $plainlist = array() you are testing with have the "wrong" structure compared to the direct_mail data structure.

The array from direct_mail looks like this:

0 =>
array
'email' => string 'test0@domain.tld' (length=16)
'name' => string '' (length=0)
1 =>
array
'email' => string 'test1@domain.tld' (length=16)
'name' => string '' (length=0)
2 =>
array
'email' => string 'test2@domain.tld' (length=16)
'name' => string '' (length=0)
3 =>
array
'email' => string 'test3@domain.tld' (length=16)
'name' => string '' (length=0)
4 =>
array
'email' => string 'test4@domain.tld' (length=16)
'name' => string '' (length=0)

Compare that to your test array:

0 => '' 
1 => ''
2 => ''
3 => ''
4 => ''

That's the reason we have the foreach, because of the multidimensional array. So with the direct_mail data structure the array_flip is not the solution.

Updated by Ivan Dharma Kartolo about 1 year ago

Hi Tomas,

yes, you're right. my solution only works with a one dimension array. but the array_map solution filters only array value with the exact same structure (email and name). Following example:

array(
  0 => array(
    'email' => 'test0@mail.com',
    'name' => 'test0'
  ),
  1 => array(
    'email' => 'test1@mail.com',
    'name' => 'test1'
  ),
  2 => array(
    'email' => 'test0@mail.com',
    'name' => 'test2'
  ),
  3 => array(
    'email' => 'test0@mail.com',
    'name' => 'test0'
  ),
);

the array_map solution only removes the fourth element. What about the third element? does it count as a duplicate?

Updated by Tomas Norre Mikkelsen about 1 year ago

Hi,

I get you point, but what is the idea about have one email register multiple times with different names?

In my opinion this should be controlled at submission not at filtering.

Updated by Ivan Dharma Kartolo about 1 year ago

  • Status changed from Under Review to Closed
  • Assignee changed from Tomas Norre Mikkelsen to Ivan Dharma Kartolo
  • % Done changed from 0 to 100

Committed in SVN Branch r61804.
I left the foreach out, because we're taking the second level array as one value and only when the mail and name (in the second level array) is identical, we assume this is a duplicate. it means, the same mail but different name will NOT be taken as an identical.

thanks for the patch :)

Also available in: Atom PDF