Project

General

Profile

Actions

Bug #19115

closed

Pasting text from MS Word brings a lot of garbage

Added by Steffen no-lastname-given almost 16 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Should have
Category:
-
Target version:
-
Start date:
2008-07-16
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

After saving, I have this text before my pasted MS-Word-text (Firefox 3 only).

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 9" /><meta name="Originator" content="Microsoft Word 9" />

Windows W2K, XP
Firefox 3
Typo3 4.2.1
MS Word 9 ... 11
(issue imported from #M8988)


Files

rtehtmlarea_bugfix_8988_v2.patch (6.48 KB) rtehtmlarea_bugfix_8988_v2.patch Administrator Admin, 2008-08-12 21:05
Actions #1

Updated by Daniel Felix almost 16 years ago

This is a normal behaviour.

Just copy the Text into your rte and after this select "remove format". A selfexplaining dialogbox will popup. choose "remove ms word" and click on ok and you're done.

Actions #2

Updated by Steffen no-lastname-given almost 16 years ago

Hi Daniel,
thank you for your answer. I know this useful button. But the new behaviour is what I described.
I tried this with a lot of installations and with removing formats before and after saving of the content-element. Regrettably always with the same effect (only in FF3, IE works).
Today I tested also OpenOffice and was suprised to see this before my text
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" /><title></title><meta name="GENERATOR" content="OpenOffice.org 2.4 (Win32)" />

Any ideas?

Actions #3

Updated by Daniel Felix almost 16 years ago

hm, well its a bit strange. which version of ff3 do you use? Do you have any addons which could cause this problems?

i use ff3 since the alpha and beta phase and had never such an behaviour. currently i uses ff3 v. 3.0.1

i had such a problem but the reason was an bug in an extension.

do you have everything up2date?

Actions #4

Updated by Steffen no-lastname-given almost 16 years ago

I think everthing is uptodate. I use FF 3.0.1 (german) on W2K and 2 XP-machines with Typo3 4.2.1 in 4 different installations.
In FF I have no addons. I also tried to deactivate all my pageTS, but no bettering.
Then I tried a to use tinyRTE and it works, if I use the insert-button within tinyRTE. But if I use ctrl+V I have the same behavior. (tested only with FF 3.0.1 on W2K) And I did set up a new installation because I thought that it could have something to do with my database-encoding. But ... no. Same problem in the new installation.
Now I'm a littlebit helpless and I go home ;-)

Thanks for your patience

Actions #5

Updated by Daniel Felix almost 16 years ago

Well i just tested this bug with my office 2007 and it becomes the same.

And i got another funny bug. If i removed the whole ms-word styles it doesn't remove anything. after i have removed the html-formats the text between the html formats was also removed. so just the unformated text was left.
the text which was bold or italic was lost.

the "do back" button couldn't bring it back. the button just do nothing! he skipped one step back and then the step forward. so he was at the same place as before.

after i have deleted the whole content of the textarea (without saving the whole content) and copied the text into the textfield again... and used the ms-formatremover again... this output comes over my whole text:

"Normal 0 21 false false false DE X-NONE X-NONE MicrosoftInternetExplorer4 "

I hope it helps!?

By the way... the ms-formats are removed correctly but the general ms stylesheets etc. are still there. maybe a bug with the new word version?

Actions #6

Updated by Steffen no-lastname-given almost 16 years ago

What do you mean with "general ms stylesheets etc"? Is it similar to what OpenOffice brings?: <meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" /><title></title><meta name="GENERATOR" content="OpenOffice.org 2.4 (Win32)" />

Actions #7

Updated by Daniel Felix almost 16 years ago

Yes.

Here my text BEFORE i let the tool remove the styles:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 12" /><meta name="Originator" content="Microsoft Word 12" /><link rel="File-List" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_filelist.xml" /><link rel="themeData" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_themedata.thmx" /><link rel="colorSchemeMapping" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_colorschememapping.xml" />

<p class="MsoNormal"><i>Testext<o:p></o:p></i></p>

<p class="MsoNormal">W<u>it</u>h </p>

<p class="MsoNormal"><b>Some</b> Test</p>

<p class="MsoNormal">Format</p>

After Removing:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 12" /><meta name="Originator" content="Microsoft Word 12" /><link rel="File-List" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml" /><link rel="themeData" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx" /><link rel="colorSchemeMapping" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml" /> <p><i>Testext</i></p> <p>W<u>it</u>h </p> <p><b>Some</b> Test</p> <p>Format</p>

Actions #8

Updated by Stanislas Rolland almost 16 years ago

Seems similar to https://bugzilla.mozilla.org/show_bug.cgi?id=429640

Perhaps remove format should clean meta and link tags.

Have you tried to use enableWordClean with a HTMLparser configuration?

Actions #9

Updated by Daniel Felix almost 16 years ago

Some parts of my rte config:

enableWordClean = 1

HTMLparser_rte {
    # tags which are allowed/denied
allowTags < RTE.default.proc.allowTags
denyTags < RTE.default.proc.denyTags
  1. tags which should removed
    removeTags = font
  1. remove html-comments
    removeComments = 1
  1. tags which doesn't conform won't be removed (protect / 1 / 0)
    keepNonMatchedTags = 0
    }
entryHTMLparser_db = 1
entryHTMLparser_db {
  1. tags which are allowed/denied
    allowTags < RTE.default.proc.allowTags
    denyTags < RTE.default.proc.denyTags
  1. CLEAN TAGS
    noAttrib = b, i, u, strike, sub, sup, strong, em, quote, blockquote, cite, tt, br, center
rmTagIfNoAttrib = span,div,font
  1. htmlSpecialChars = 1
  1. align attribute are allowed
    tags {
    p.fixAttrib.align.unset >
    p.allowedAttribs = class,style,align

    div.fixAttrib.align.unset >

    hr.allowedAttribs = class

    1. b und i tags will be replaced (em / strong)
      b.remap = strong
      i.remap = em
    1. img tags are allowed
      img >
      }
      }
  1. Use same processing as on entry to database to clean content pasted into the editor
    RTE.default.enableWordClean.HTMLparser < RTE.default.proc.entryHTMLparser_db
Actions #10

Updated by Stanislas Rolland almost 16 years ago

When pasting from word processors, FF3 is adding unwanted meta tags, and sometimes title and style tags.
The attached patch fixes the issue:
1. the remove MS format option in Remove format dialogue will remove meta, style and title tags and their contents;
2. if enableWordClean is set, but not HTMLparser, then the default cleaning on paste will take effect and will remove meta, style and title tags and their contents;
3. the default RTE transformation processing options will remove meta, style and title tags, but not their contents (the contents of the style tags will be cleaned because they are commented and the processing options are already set to remove comments); I was unable to make FF3 insert some text inside the inserted title tags, which I find rather odd; the meta tag of course has no content;
4. if enableWordClean is set with the default HTMLparser configuration, then the default RTE transformation will be applied on paste;
5. the Typical and Demo default configurations are modified so that the RTE will remove tags meta, title and style and their contents on saving and on toggling to text mode.

Actions #11

Updated by Stanislas Rolland almost 16 years ago

Sorry. Posting a new version of the patch because I forgot about the link tag. The link tag will be handled the same way as the meta tag.

Since I do not have MS Word, and the link tag does not seem to occur when pasting from OO, it would be nice if someone could test this patch on TYPO3 4.2.

After applying the patch, make sure to clear the TYPO3 configuration cache, delete all files in typo3temp/rtehtmlarea and clear the browser cache.

Actions #12

Updated by Stanislas Rolland almost 16 years ago

Committed to SVN TYPO3core branches TYPO3_4-2 revision 3971 and trunk revision 3972.

Actions #13

Updated by Benni Mack over 5 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF