Oggenc doesn't properly place UTF8 comments when CHARSET is null.
While using a script I wrote to process a flac-as-cd file into seperate tagged vorbis encoded tracks, I found that accented characters in metadata were being replaced with '#' characters despite UTF8 encoding being used at every step. Thanks to the very kind help at #vorbis it was narrowed down to the fact that environment variable CHARSET was set to nothing. The problem goes away when CHARSET is set to "UTF-8".
Now, I use Slackware 10.1 which may be at fault for not setting CHARSET properly with the other locale variables, so I don't really know if it's an oggenc bug here. However, this problem does not happen when using LAME instead of oggenc in the same situation**, I haven't experienced any other problems of this sort in other programs, and MikeS asked me to file it. ;)
** With amaroK set to interpet id3v1 as UTF8.
(LANG and LC_ALL are set to "en_US.UTF-8")
> export CHARSET= > oggenc -q 6 -c "TITLE=J'y suis jamais allé" blah.wav [...] > ogginfo blah.ogg [...] User comments section follows... TITLE=J'y suis jamais all## [...] > export CHARSET=UTF-8 > oggenc -q 6 -c "TITLE=J'y suis jamais allé" blah.wav [...] > ogginfo blah.ogg [...] User comments section follows... TITLE=J'y suis jamais allé [...]
Also, oggenc was built from a vanilla vorbis-tools-1.1.1.tar.gz with './configure --prefix=/usr'