[Skip to Content]
Copyright © 2011 Marcin Jankowski. Original version is available at http://www.marcin.jankowski.com/oracle/utf8-convert.html

Problem with Dbms_Xmlgen.Convert function

by Marcin Jankowski
1st February 2011 AD
Version 1.2

Main purpose of Oracle's Dbms_Xmlgen.Convert function is to convert HTML markers (for example &) to regular characters and vice versa.

Query 1
With T As (
  Select 'Johnny "Mad" McDonald' As Sample_Text From Dual 

Select Sample_Text, 
       Dbms_Xmlgen.Convert(Sample_Text,0) As Converted_Text 
From   T

Results for query 1
SAMPLE_TEXT CONVERTED_TEXT
Johnny "Mad" McDonald Johnny "Mad" McDonald
Query 2
With T As (
  Select 'Shop '||chr(38)||'quot;Toys '||chr(38)||'amp; Furnitures'||chr(38)||'quot;' As Sample_Text From Dual 

Select Sample_Text, 
       Dbms_Xmlgen.Convert(Sample_Text,1) As Converted_Text 
From   T

Results for query 2
SAMPLE_TEXT CONVERTED_TEXT
Shop "Toys & Furnitures" Shop "Toys & Furnitures"

There is a problem when applying this function to strings with UTF-8 characters. It happens only when:

When above conditions are met the output seems to be missing the last byte. It can be observed by dumping the result.

Query 3
Select Dump('στο') As My_Dumped_Val From Dual

Results for query 3
MY_DUMPED_VAL
Typ=96 Len=6: 207,131,207,132,206,191
Query 4
Select Dump(Dbms_Xmlgen.Convert('στο',1)) As My_Conversion From Dual

Results for query 4
MY_CONVERSION
Typ=1 Len=6: 207,131,207,132,206

The last byte (191) is missing after the conversion.

It does not happen when the last character is not UTF-8 character.

Query 5
Select Dump('στοx') As My_Dumped_Val From Dual

Results for query 5
MY_DUMPED_VAL
Typ=96 Len=7: 207,131,207,132,206,191,120

It does not happen when the conversion is applied from characters to HTML codes (second parameter of the function is 0).

Query 6
Select Dump(Dbms_Xmlgen.Convert('στο',0)) As My_Conversion From Dual

Results for query 6
MY_CONVERSION
Typ=1 Len=6: 207,131,207,132,206,191

It can cause some problems when passing the string to other applications. For example Coldfusion raises error related to transliteration of the UTF-8 characters.

So far I haven't found solution other than writing my own "convert" function.

Update

This issue was marked as Oracle bug 5259741 ('DBMS_XMLGEN.CONVERT trims output for multibyte database'). It was corrected with 10.2.0.4 Patch set.