Subject: Re: Cut and paste Unicode plain text
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Tue Jun 05 2001 - 09:53:21 CDT
Mike Nordell wrote:
>
> Andrew Dunbar wrote:
> >
> > > Oh, I didn't know that. Does it add a CF_LOCALE for every
> > > SetClipboardData? (a plain URL will do)
> >
> > I thought CF_LOCALE was just for plain text but RTF and HTML
> > also could behave a bit differently depending on locale.
>
> AFAIK it's only CF_TEXT that is (to be) affected, but I might be wrong.
> Imagine you're going from one locale where e.g. 0xd7 means one thing (and
> copy some text using that locale), to another locale where it means
> something completely different and then try to paste it into a non-RTF
> editor.
You might also have two apps in different locales. Especially if
you're bilingual.
> I don't know if e.g. MSWord cares about this, but even if it doesn't I don't
> think thats a reason to behave as bad. :-)
>
> I've heard e.g. Thai have a bunch of > 0x80 8-bit chars that has no
> correlation to e.g. 8859-1.
Sure does. So do Arabic, Hebrew, Greek, etc...
> > > > But since the user can change the input locale at
> > > > any time we may be able to improve it.
> > >
> > > Hence my question about adding CF_LOCALE to our "clips". Imagine
> > > copying (8-bit) text using one locale, changing locale and the pasting.
> > > We'd still assume the original chars to be pasted I think.
> >
> > Well we just treat the clipboard as raw bytes now so we will
> > interpret anything outside ASCII wrongly if we paste under a different
> > locale.
>
> Then we will have to change that, wont we? If we get a "Paste" command and
> there is only CF_TEXT available, we'll have to check the clipboard for
> existance of locale info.
Yes and we will have to pass that info to the importer.
And when we get a "Copy" command we have to put the locale info on
the cliboard.
> [using CF_UNICODETEXT also]
> > I haven't explored all cases but Windows does do automatic conversions
> > when possible.
>
> It does? Are you talking about trying to *get* Unicode as CF_TEXT (where no
> CF_TEXT really is available) and it "automagically" converts what it can to
> your locale? (I might be a Win32 guru, but these areas are *really* dark to
> me... :-) ).
Both directions. It's documented in MSDN. Of course it's lossy so
we should always prefer unicode text. When somebody tries to *get*
from the clipboard, NT will "automagically" convert Unicode -> 8 bit
or 8 bit -> Unicode if they try to get the one that's not there.
I've just been testing it and even Windows 98 seems to have some
incomplete support - the docs don't say this though...
> > There's at least some bonuses to always putting
> > CF_UNICODE on the clipboard.
>
> But there's also a drawback that I really don't want. If only pasting 8-bit
> text that we can express in the source locale charset, I'd hate to waste
> more than three times more memory than needed (says the one that thinks it's
> nothing out of the ordinary to open a 3 MB text file and copy & paste inside
> it) to put it on the clipboard.
The problem is we'd have to test the document first to see if it will
encode to the source locale. There's nothing to see all the characters
came from the keyboard - or the keyboard's current local - it is
switchable remember! If we don't it *silently* falls back to "?"
for unsupported characters giving us the old "dvo?ak" problem.
> On a sidenote: perhaps we should keep track of how much we have put on the
> clipboard and ask the user if it wants to discard it if it reaches a
> threshold? Some other programs does)
>
> > One is that it means we always take
> > control, another I can think of is that our smart quotes will be
> > preserved, so will anything the user got from the "insert symbol"
> > dialog.
>
> But we have to cooperate with other apps. Smart quotes or not.
We have to cooperate both with the apps that can handle smart quotes
(and other unicode) and the ones that don't.
> Wait a minute! Does anyone know what WinWord puts in the different clipborad
> data "streams" when its smartquotes is turned on?
It puts smart quotes in the unicode buffer and "dumbed down" quotes
in the 8-bit buffer. At least that's what results from pasting.
I'm not sure if Word puts the plain quotes in the 8-bit buffer or
if the "automagic" conversion does this...
Andrew.
-- http://linguaphile.sourceforge.net _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b25 : Tue Jun 05 2001 - 09:51:33 CDT