Problem with Dbms_Xmlgen.Convert function
by Marcin Jankowski
1st February 2011 AD
Version 1.2
Main purpose of Oracle's Dbms_Xmlgen.Convert function is to convert HTML markers (for example &) to regular characters and vice versa.
Select 'Johnny "Mad" McDonald' As Sample_Text From Dual
)
Select Sample_Text,
Dbms_Xmlgen.Convert(Sample_Text,0) As Converted_Text
From T
SAMPLE_TEXT | CONVERTED_TEXT |
---|---|
Johnny "Mad" McDonald | Johnny "Mad" McDonald |
Select 'Shop '||chr(38)||'quot;Toys '||chr(38)||'amp; Furnitures'||chr(38)||'quot;' As Sample_Text From Dual
)
Select Sample_Text,
Dbms_Xmlgen.Convert(Sample_Text,1) As Converted_Text
From T
SAMPLE_TEXT | CONVERTED_TEXT |
---|---|
Shop "Toys & Furnitures" | Shop "Toys & Furnitures" |
There is a problem when applying this function to strings with UTF-8 characters. It happens only when:
- The last character is UTF-8 character.
- Conversion is applied from HTML codes to characters (second parameter of the function is 1)
When above conditions are met the output seems to be missing the last byte. It can be observed by dumping the result.
MY_DUMPED_VAL |
---|
Typ=96 Len=6: 207,131,207,132,206,191 |
MY_CONVERSION |
---|
Typ=1 Len=6: 207,131,207,132,206 |
The last byte (191) is missing after the conversion.
It does not happen when the last character is not UTF-8 character.
MY_DUMPED_VAL |
---|
Typ=96 Len=7: 207,131,207,132,206,191,120 |
It does not happen when the conversion is applied from characters to HTML codes (second parameter of the function is 0).
MY_CONVERSION |
---|
Typ=1 Len=6: 207,131,207,132,206,191 |
It can cause some problems when passing the string to other applications. For example Coldfusion raises error related to transliteration of the UTF-8 characters.
So far I haven't found solution other than writing my own "convert" function.
Update
This issue was marked as Oracle bug 5259741 ('DBMS_XMLGEN.CONVERT trims output for multibyte database'). It was corrected with 10.2.0.4 Patch set.