[R-lang] Re: writing Unicode files from R

Scott Jackson scottuba@gmail.com
Wed Oct 20 09:53:59 PDT 2010


Thanks to everyone for responses so far!

I have tried both Lucien's and Nathaniel's suggestions with no luck.
I'm in Windows XP, by the way, which may be making the difference, I'm
not sure.  Their strategies (as well as every other thing I've tried)
has ended up with strings like <U+0430> in the output, even if I open
the file as UTF-8 within Excel, Notepad, etc.

The site John recommended does seem to work quite nicely for my
current purposes.  It ends up not doing anything with the angle
brackets, but it's easy enough to get rid of those once the Unicode
(Cyrillic) is displayed correctly.  Not the most efficient procedure
overall, but it (eventually) gets the right output, and that's the
first time I've managed that.

Thanks again for the fast and helpful responses, and if anyone can
help sort out why Lucien's and Nathaniel's techniques may not work for
everyone (at least not for me), I'd love to hear it, since it seems
like they should work, and either would be a much simpler solution if
they did work.

-scott

On Wed, Oct 20, 2010 at 10:58 AM, Lucien Carroll <lucien@ling.ucsd.edu> wrote:
> Hi,
>
> To write a file in a particular encoding I found I needed to pass a
> file connection with the specified encoding, rather than just the
> filename.
>
>> write.csv(pass.len,(con <- file("pass_len.csv", "w", encoding="UTF-8"))); close(con)
>
> ~Lucien
>
> On Wed, Oct 20, 2010 at 6:25 AM, Scott Jackson <scottuba@gmail.com> wrote:
>> hi R-langers,
>>
>> Has anyone worked with getting Unicode *out* of R?  I've had some
>> success getting Unicode that began life in an Excel doc, text file,
>> etc. into R and getting R to display the Unicode characters (e.g.,
>> Arabic, Cyrillic) in the GUI and in plots.  However, whenever I've
>> wanted to manipulate some data that has some Unicode text and get it
>> out of R into a file that could be then opened in Excel and read by
>> humans, I've gotten stuck.  Using write.table() or write() or cat() or
>> even writeClipboard() have all resulted in ASCII renderings of the
>> Unicode (e.g., <U+0001>), which I can't manage to get back into their
>> human-readable Unicode characters in Excel, Notepad, or other text
>> editors.
>>
>> Note I'm not just interested in printing out a *rendering* of the
>> Unicode output (like you might be able to get via a
>> Sweave/LaTeX-generated PDF), but actually spit out a file from R that
>> can be opened and worked with in Excel (or any other Unicode-reading
>> program), in such a way that the Unicode is displayed correctly.
>>
>> My latest frustration is that I can get the Unicode to display in the
>> Rgui, and if I literally highlight, copy & paste from the Rgui into
>> Excel, it looks fine.  But that's not a workable solution for what I
>> really want to do.  So it feels like I'm just one simple, obvious step
>> away...
>>
>> anyone have any ideas/suggestions?
>>
>> thanks,
>> -scott
>>
>
>
>
> --
> Lucien S. Carroll
> Graduate Student
> UCSD Linguistics
> http://ling.ucsd.edu/~lucien
>



More information about the ling-r-lang-L mailing list