[R-lang] Re: writing Unicode files from R

Newman, John john.newman@ualberta.ca
Wed Oct 20 07:25:26 PDT 2010


This is not a solution within R to the problem of exporting tables with unicode characters, but it's a way of getting the characters correct within Excel or any other spreadsheet or text file.

Richard Ishida's website (http://people.w3.org/rishida/tools/conversion/) allows you to do a batch conversion from unicode numbers (various formats, including hexadecimal numbers, \x etc.) into their correct glyphs. This works for a column of data in Excel and I have successfully converted thousands of rows of data exported from R in this way into Chinese glyphs. Maybe there's a limit to the number of rows that can be converted in a batch, but I haven't found that limit yet. Often, it's just the one column of a dataframe that contains the unicode data, so this approach is usually feasible.

John
________________________________________
From: ling-r-lang-l-bounces@mailman.ucsd.edu [ling-r-lang-l-bounces@mailman.ucsd.edu] On Behalf Of Scott Jackson [scottuba@gmail.com]
Sent: October 20, 2010 7:25 AM
To: r-lang@ling.ucsd.edu
Subject: [R-lang]  writing Unicode files from R

hi R-langers,

Has anyone worked with getting Unicode *out* of R?  I've had some
success getting Unicode that began life in an Excel doc, text file,
etc. into R and getting R to display the Unicode characters (e.g.,
Arabic, Cyrillic) in the GUI and in plots.  However, whenever I've
wanted to manipulate some data that has some Unicode text and get it
out of R into a file that could be then opened in Excel and read by
humans, I've gotten stuck.  Using write.table() or write() or cat() or
even writeClipboard() have all resulted in ASCII renderings of the
Unicode (e.g., <U+0001>), which I can't manage to get back into their
human-readable Unicode characters in Excel, Notepad, or other text
editors.

Note I'm not just interested in printing out a *rendering* of the
Unicode output (like you might be able to get via a
Sweave/LaTeX-generated PDF), but actually spit out a file from R that
can be opened and worked with in Excel (or any other Unicode-reading
program), in such a way that the Unicode is displayed correctly.

My latest frustration is that I can get the Unicode to display in the
Rgui, and if I literally highlight, copy & paste from the Rgui into
Excel, it looks fine.  But that's not a workable solution for what I
really want to do.  So it feels like I'm just one simple, obvious step
away...

anyone have any ideas/suggestions?

thanks,
-scott




More information about the ling-r-lang-L mailing list