However, the original text is already sensible, so it is unchanged. The following text could be encoded in Windows-1252 and decoded in UTF-8, and it would decode as 'MARQUɅ'. These fixes are not applied in all cases, because ftfy has a strongly-held goal of avoiding false positives - it should never change correctly-decoded text to something else. > ftfy.fix_text('à perturber la réflexion')įtfy can also decode HTML entities that appear outside of HTML, even in cases where the entity has been incorrectly capitalized: > # by the HTML 5 standard, only 'PÉREZ' is acceptable It can fix mojibake that has had "curly quotes" applied on top of it, which cannot be consistently decoded until the quotes are uncurled: > ftfy.fix_text("l’humanité")įtfy can fix mojibake that would have included the character U+A0 (non-breaking space), but the U+A0 was turned into an ASCII space and then combined with another following space: > ftfy.fix_text('Ã\xa0 perturber la réflexion') UTF-8 is a well-designed encoding that makes it obvious when it's being misused, and a string of mojibake usually contains all the information we need to recover the original string.įtfy can fix multiple layers of mojibake simultaneously: > ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.') Here are some examples (found in the real world) of what ftfy can do:įtfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else: > import ftfyĭoes this sound impossible? It's really not. “I have no idea when I’m gonna need this, but I’m definitely bookmarking it.”.Excellent work, solving a very tricky real-world (whole-world!) problem.” “ftfy did the right thing right away, with no faffing about. “Saved me a large amount of frustrating dev work” Fixing problems and getting explanations.The documentation covers a lot more than this README, so here are The full documentation of ftfy is available at.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |