Bug #19115
closedPasting text from MS Word brings a lot of garbage
0%
Description
After saving, I have this text before my pasted MS-Word-text (Firefox 3 only).
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 9" /><meta name="Originator" content="Microsoft Word 9" />
Windows W2K, XP
Firefox 3
Typo3 4.2.1
MS Word 9 ... 11
(issue imported from #M8988)
Files
Updated by Daniel Felix over 16 years ago
This is a normal behaviour.
Just copy the Text into your rte and after this select "remove format". A selfexplaining dialogbox will popup. choose "remove ms word" and click on ok and you're done.
Updated by Steffen no-lastname-given over 16 years ago
Hi Daniel,
thank you for your answer. I know this useful button. But the new behaviour is what I described.
I tried this with a lot of installations and with removing formats before and after saving of the content-element. Regrettably always with the same effect (only in FF3, IE works).
Today I tested also OpenOffice and was suprised to see this before my text
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" /><title></title><meta name="GENERATOR" content="OpenOffice.org 2.4 (Win32)" />
Any ideas?
Updated by Daniel Felix over 16 years ago
hm, well its a bit strange. which version of ff3 do you use? Do you have any addons which could cause this problems?
i use ff3 since the alpha and beta phase and had never such an behaviour. currently i uses ff3 v. 3.0.1
i had such a problem but the reason was an bug in an extension.
do you have everything up2date?
Updated by Steffen no-lastname-given over 16 years ago
I think everthing is uptodate. I use FF 3.0.1 (german) on W2K and 2 XP-machines with Typo3 4.2.1 in 4 different installations.
In FF I have no addons. I also tried to deactivate all my pageTS, but no bettering.
Then I tried a to use tinyRTE and it works, if I use the insert-button within tinyRTE. But if I use ctrl+V I have the same behavior. (tested only with FF 3.0.1 on W2K) And I did set up a new installation because I thought that it could have something to do with my database-encoding. But ... no. Same problem in the new installation.
Now I'm a littlebit helpless and I go home ;-)
Thanks for your patience
Updated by Daniel Felix over 16 years ago
Well i just tested this bug with my office 2007 and it becomes the same.
And i got another funny bug. If i removed the whole ms-word styles it doesn't remove anything. after i have removed the html-formats the text between the html formats was also removed. so just the unformated text was left.
the text which was bold or italic was lost.
the "do back" button couldn't bring it back. the button just do nothing! he skipped one step back and then the step forward. so he was at the same place as before.
after i have deleted the whole content of the textarea (without saving the whole content) and copied the text into the textfield again... and used the ms-formatremover again... this output comes over my whole text:
"Normal 0 21 false false false DE X-NONE X-NONE MicrosoftInternetExplorer4 "
I hope it helps!?
By the way... the ms-formats are removed correctly but the general ms stylesheets etc. are still there. maybe a bug with the new word version?
Updated by Steffen no-lastname-given over 16 years ago
What do you mean with "general ms stylesheets etc"? Is it similar to what OpenOffice brings?: <meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" /><title></title><meta name="GENERATOR" content="OpenOffice.org 2.4 (Win32)" />
Updated by Daniel Felix over 16 years ago
Yes.
Here my text BEFORE i let the tool remove the styles:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 12" /><meta name="Originator" content="Microsoft Word 12" /><link rel="File-List" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_filelist.xml" /><link rel="themeData" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_themedata.thmx" /><link rel="colorSchemeMapping" href="file:///C:\Users\User\AppData\Local\Temp\msohtmlclip1\01\clip_colorschememapping.xml" />
<p class="MsoNormal"><i>Testext<o:p></o:p></i></p>
<p class="MsoNormal">W<u>it</u>h </p>
<p class="MsoNormal"><b>Some</b> Test</p>
<p class="MsoNormal">Format</p>
After Removing:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="ProgId" content="Word.Document" /><meta name="Generator" content="Microsoft Word 12" /><meta name="Originator" content="Microsoft Word 12" /><link rel="File-List" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml" /><link rel="themeData" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx" /><link rel="colorSchemeMapping" href="file:///C:%5CUsers%5CUser%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml" /> <p><i>Testext</i></p> <p>W<u>it</u>h </p> <p><b>Some</b> Test</p> <p>Format</p>
Updated by Stanislas Rolland over 16 years ago
Seems similar to https://bugzilla.mozilla.org/show_bug.cgi?id=429640
Perhaps remove format should clean meta and link tags.
Have you tried to use enableWordClean with a HTMLparser configuration?
Updated by Daniel Felix over 16 years ago
Some parts of my rte config:
enableWordClean = 1
HTMLparser_rte {
# tags which are allowed/denied
allowTags < RTE.default.proc.allowTags
denyTags < RTE.default.proc.denyTags
- tags which should removed
removeTags = font
- remove html-comments
removeComments = 1
- tags which doesn't conform won't be removed (protect / 1 / 0)
keepNonMatchedTags = 0
}
entryHTMLparser_db = 1
entryHTMLparser_db {
- tags which are allowed/denied
allowTags < RTE.default.proc.allowTags
denyTags < RTE.default.proc.denyTags
- CLEAN TAGS
noAttrib = b, i, u, strike, sub, sup, strong, em, quote, blockquote, cite, tt, br, center
rmTagIfNoAttrib = span,div,font
- htmlSpecialChars = 1
- align attribute are allowed
tags {
p.fixAttrib.align.unset >
p.allowedAttribs = class,style,aligndiv.fixAttrib.align.unset >
hr.allowedAttribs = class
- b und i tags will be replaced (em / strong)
b.remap = strong
i.remap = em
- img tags are allowed
img >
}
}
- b und i tags will be replaced (em / strong)
- Use same processing as on entry to database to clean content pasted into the editor
RTE.default.enableWordClean.HTMLparser < RTE.default.proc.entryHTMLparser_db
Updated by Stanislas Rolland over 16 years ago
When pasting from word processors, FF3 is adding unwanted meta tags, and sometimes title and style tags.
The attached patch fixes the issue:
1. the remove MS format option in Remove format dialogue will remove meta, style and title tags and their contents;
2. if enableWordClean is set, but not HTMLparser, then the default cleaning on paste will take effect and will remove meta, style and title tags and their contents;
3. the default RTE transformation processing options will remove meta, style and title tags, but not their contents (the contents of the style tags will be cleaned because they are commented and the processing options are already set to remove comments); I was unable to make FF3 insert some text inside the inserted title tags, which I find rather odd; the meta tag of course has no content;
4. if enableWordClean is set with the default HTMLparser configuration, then the default RTE transformation will be applied on paste;
5. the Typical and Demo default configurations are modified so that the RTE will remove tags meta, title and style and their contents on saving and on toggling to text mode.
Updated by Stanislas Rolland over 16 years ago
Sorry. Posting a new version of the patch because I forgot about the link tag. The link tag will be handled the same way as the meta tag.
Since I do not have MS Word, and the link tag does not seem to occur when pasting from OO, it would be nice if someone could test this patch on TYPO3 4.2.
After applying the patch, make sure to clear the TYPO3 configuration cache, delete all files in typo3temp/rtehtmlarea and clear the browser cache.
Updated by Stanislas Rolland over 16 years ago
Committed to SVN TYPO3core branches TYPO3_4-2 revision 3971 and trunk revision 3972.