Python-Ref > String manipulation > Unicode > Python scripts in unicode
 
 

<-^^->
Klíčová slova
Moduly
Knihovní funkce

Python scripts in unicode

How to write Python scripts that contain unicode characters inside.
By default the Python interpreter assumes that a script it has to run is encoded in ASCII. If you want to use unicode characters inside the script, you have to tell the interpreter about it.
To do this, you must put a line conforming to the regexp "coding[:=]\s*([-\w.]+)" inside a comment on one of the first two lines of the script.
Some examples:
  • # -*- coding: utf-8 -*-
  • # encoding= iso-8859-2
  • # this file uses encoding: utf-8
The following two programs demonstrate the difference between scripts that doesn't declare the encoding and one that does.
It also shows the usage of the u character to mark unicode strings, that is strings that have to be decoded using the current encoding.
Expand/Shrink
Zdroj: (unicode2-1.py)
  1   text = u"Příliš zluťoučký koníček, αβψ"
  2   print len( text)   # wrong length
  3   #print text.encode('utf-8')   # this would print gibberish
  4   
stdout:
40
stderr:
sys:1: DeprecationWarning: Non-ASCII character '\xc5' in file unicode2-1.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Doba běhu: 20.7 ms
Expand/Shrink
Zdroj: (unicode2-2.py)
  1   # encoding: utf-8
  2   
  3   text = u"Příliš zluťoučký koníček, αβψ"
  4   print len( text)
  5   print text.encode('utf-8')
stdout:
29
Příliš zluťoučký koníček, αβψ
Doba běhu: 21.3 ms
The program below shows in more detail how to use the unicode strings.
Expand/Shrink
Zdroj: (unicode2-3.py)
  1   # encoding: utf-8
  2   
  3   # this is OK
  4   text1 = u"Příliš zluťoučký koníček, αβψ"
  5   print len( text1)
  6   print text1.encode('utf-8')
  7   
  8   # we did not say it is a unicode string, it is not decoded by default
  9   text2 = "Příliš zluťoučký koníček, αβψ"
 10   print len( text2)
 11   print len( text2.decode( 'utf-8'))   # decode it, and it works
 12   print text2.encode('utf-8')   # problem
stdout:
29
Příliš zluťoučký koníček, αβψ
40
29
stderr:
Traceback (most recent call last):
  File "unicode2-3.py", line 12, in <module>
    print text2.encode('utf-8')   # problem
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)
Doba běhu: 88.9 ms