Difference between revisions of "Unicode"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with 'This page discusses the steps and issues with getting proper unicode support in VisTrails. == Goal == VisTrails should be able to handle any kind of unicode string. This means …')
 
 
(One intermediate revision by the same user not shown)
Line 19: Line 19:
== Guidelines ==
== Guidelines ==


Please familiarize yourself with unicode, [http://www.joelonsoftware.com/articles/Unicode.html encodings], and do not put casts in the code unless they are necessary. Be aware that some Python functions only accept str (like tarfile, zipfile, ...) so some compatibility might be used around these modules.
* Please familiarize yourself with unicode, [http://www.joelonsoftware.com/articles/Unicode.html encodings], and do not put casts in the code unless they are necessary. Be aware that some Python functions only accept str (like tarfile, zipfile, ...) so some compatibility might be used around these modules.
* Stop using str() when unnecessary. Document why you use str(), unicode() or bytes() when you do.
* Stop using type() at all.


== TODO ==
== Tasks ==


=== Mass-replacing str() casts ===
=== Mass-replacing str() casts ===
Line 37: Line 39:


=== Locators ===
=== Locators ===
=== Tests ===
We need to make sure everything works with unicode, so it's probably a good idea to insert non-ascii characters here and there in the tests (filenames, versions, parameters...)

Latest revision as of 21:05, 19 September 2014

This page discusses the steps and issues with getting proper unicode support in VisTrails.

Goal

VisTrails should be able to handle any kind of unicode string. This means any string should be acceptable as input from console/API/GUI settings and should be serialized/deserialized correctly. Basically, every string in the program should be a unicode object, and although in some places we can accept a bytestring (e.g. from the API or modules) they should be casted to unicode.

Mentions of str should mainly disappear in favor of unicode and bytes.

About Python 3

Proper unicode support is a prerequisite to Python 3 support. Once VisTrails is fully unicode-safe and using unicode_literals, replacing unicode with six.text_type is very easy.

Issues

The GUI has no problem with unicode, PyQt is fully unicode-safe. Since we are using API v2, PyQt already returns (and accepts) native unicode objects. It used to return QString objects which (I assume) is why we have str() casts everywhere in the code; these probably cannot be dealt with manually, which is why I propose to replace all of them with unicode() automatically (and deal with special cases afterwards).

Serialization to/deserialization from XML is not a problem since XML documents are already unicode. Properly encoding filenames in there might require some work (using proper url-encoded URLs instead is probably a good idea) -- see Locators.

Guidelines

  • Please familiarize yourself with unicode, encodings, and do not put casts in the code unless they are necessary. Be aware that some Python functions only accept str (like tarfile, zipfile, ...) so some compatibility might be used around these modules.
  • Stop using str() when unnecessary. Document why you use str(), unicode() or bytes() when you do.
  • Stop using type() at all.

Tasks

Mass-replacing str() casts

From (regex) To
(?<!def )(?<![a-z_.])str\(         unicode(

Locators

Tests

We need to make sure everything works with unicode, so it's probably a good idea to insert non-ascii characters here and there in the tests (filenames, versions, parameters...)