-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SBS/Lyrion incorrectly encodes title when scanning unusual Unicode characters #1225
Comments
PS. SBS release: Logitech Media Server Version: 8.5.2 - 1716215514 @ Sun 26 May 2024 05:43:14 PM CEST |
That albums is looking good here: This is imported from Spotify, but it shows that the problem must be elsewhere, not in the database. Please give LMS9 (https://downloads.lyrion.org) another try. It should be pretty stable: I plan to release it in the next few days. |
I have now installed latest Lyrion nightly build (easy to install and looking good!). Lyrion Music Server Version: 9.0.0 - 1732300171 Tracks 1 and 9 are still not showing with their correct (pictorial) titles. So I am now looking elsewhere for the source of the problem, since your tracks from Spotify are scanned OK. I wonder if the LMS scanner uses a different encoding to put the tags into the database? I have used utf-8. But I tried to decode what was in the LMS database with several possible encodings (utf-16, utf-32, latin-1) and none of them worked. Not quite sure how to proceed, as the tags in the files seem to be correct as far as I can tell. I realise you are probably pretty busy now and the priority is getting LMS9 released and dealing with questions arising to make it a successful launch. I think it's not legal to send you the actual track, but I have applied the same tags to another non-copyrighted file from the Free Music Archive and attached it here. If you get a moment please could you scan it and see if it works on your test system? You can search for it by looking for path contains "Jahzzar". This one still has problems showing the title on LMS. As another possibility, we could compare the contents of the title tag for the Coldplay track, your version compared to mine? I could write a short Python program to do that maybe. |
I'm pretty sure this is not a database issue. Because if it was, all of those characters would be broken. What browser are you using? What operating system where the browser is running? And what are the LMS details in Settings/Information)? I wonder whether the client platform was a bit dated, failing to render some unicode characters. |
Could you please provide me with a copy of your library.db and two sample music files with given tags. Feel free to send me a file with all audio removed, or just silence, but with the tags which you'd use. https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a Your system seems to be using pretty much the latest of everything. So that should definitely be fine. |
Thanks for the files. This is just to confirm that I'm seeing the same with them as you do. |
Something's odd about those files: I opened them in Meta (a tagger on Mac) - it would render correctly. Then I spit out all metadata using exiftool:
I duplicated the emoji and saved the file again in Meta, run exiftool again:
And now it's rendering correctly in LMS as well. I would think that the metadata is not saved correctly, and some applications are more forgiving than others. And LMS would render perfectly well if the file was saved correctly. Would you have an alternative tagger to try to save the metadata once again? |
I don't have another tagging tool available, I'm afraid, but I have discovered that the TIT2 tag is encoded using UTF-16. Not sure if that is what the scanner expects?
I know that according to the ID3 standard, different encodings can be supported, but I'm not sure how common UTF16 is? (The track artist tag, TPE1, is encoded using UTF8.) Does the scanner take note of the encoding in the tag or does it always assume UTF8? However, some of the other Coldplay tracks, for example '06 - ❤️.mp3' also have the TIT2 tag encoded using UTF16, but these seem to make it into the library.db ok.
The bytes encoding of the earth symbol is different in UTF8 from UTF16:
Does this help at all? |
TBH I'm not sure about utf16 support. We certainly don't have code specific to this encoding type. So this could very well be the problem here. Maybe you could give mp3tag a try (https://www.mp3tag.de/en/download.html)? Or can you configure you tagger to only use utf8? |
Michael, leave it with me for a bit ... I'm going to investigate how much of my music library is encoded using UTF-16, to see if it's an outlying case. I don't know of a way to configure what encoding is used, it's all handled under the covers by some low-level library but I will check it out. I have a suspicion that there might be a difference between the way Python and Perl handle encoding of certain characters, which would be a bug in either Python or Perl, but I'll do some research first. I am pretty busy for the next few days, so I hope to come back to you by the end of the week. Thank you for all your help so far. |
FWIW: ExifTool (https://exiftool.org) is written in Perl, too. Which would support your theory. I'd rather think it was a shortcoming of Perl, as my tagger is reading the data correctly. |
The Coldplay album Music of the Spheres has tracks titled as unusual unicode characters such as ❤️.
The track metadata, including titles, has been downloaded from MusicBrainz and seems to be encoded correctly.
However after rescanning, the tracks database on Squeezebox Server/Lyrion has incorrect encoding for the title which cannot be re-encoded to a python string.
The url/filename, which is also set to the track title from Musicbrainz, is encoded correctly.
Track 9, entitled 🌎, has this problem, while others such as ❤️ above (track 6) work fine.
Data printed from the tracks database: using select id, title, url - with an sqlite3 text_factory that returns repr(exception) if a UnicodeError is encountered in decoding the bytes to string:
track: (9, "UnicodeDecodeError('utf-8', b'\xed\xa0\xbc\xed\xbc\x8e', 0, 1, 'invalid continuation byte')", 'file:///srv/share2/music/Coldplay/Music%20of%20the%20Spheres/09%20-%20%F0%9F%8C%8E.mp3')
From this I assume the correct bytes encoding is /xf09f8c8e, and in Python this correctly decodes using UTF-8 to the 🌎 character.
Please let me know if I can provide further information.
Thanks in advance for your help with this.
The text was updated successfully, but these errors were encountered: