Parsing XML with German specific characters

Hello,
I have issue with parsing XML using XMLParser when XML contains German specific characters, for example ü

This code:
Code Block swift
import Foundation
let xml = "<Xml><Tag>Martin Hübner</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
  func parser(_ parser: XMLParser, foundCharacters string: String) {
    print("foundCharacters: \(string)")
  }
}

has following output in Playground:
Code Block
foundCharacters: Martin H
foundCharacters: übner
foundCharacters: Value

It looks to me like that XMLParser is not able to parse XML with German characters correctly, because with following XML:
Code Block swift
let xml = "<Xml><Tag>Hello World</Tag><Tag>Value</Tag></Xml>"

output is correct:
Code Block
foundCharacters: Hello World
foundCharacters: Value

Do you have any idea how to solve this issue?

Thank you
Accepted Answer

Do you have any idea how to solve this issue?

In XMLParser, parser(_:foundCharacters:) would be called separately in many cases other than containing German characters.

Code Block
import Foundation
let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
}
}


Outputs:
Code Block
foundCharacters: An
foundCharacters: '
foundCharacters: example
foundCharacters: '
foundCharacters: Value


You need to write some logic to connect strings where you want a single string.


An example:
Code Block
import Foundation
let xml = "<Xml><Tag>Martin Hübner</Tag><Tag>Value</Tag></Xml>"
//let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
var textForTag: String? = nil
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
textForTag? += string
}
func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
if elementName == "Tag" {
textForTag = ""
}
}
func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
if elementName == "Tag" {
print("textForTag=\(textForTag ?? "Something wrong")")
textForTag = nil
}
}
}


Outputs:
Code Block
foundCharacters: Martin H
foundCharacters: übner
textForTag=Martin Hübner
foundCharacters: Value
textForTag=Value


I don't see any problem. You can't just look at characters. Those are always supposed to be concatenated. You have to look at tags. When you find a start tag, then you start collecting characters. You append all the characters together until you get to that start tag's end tag. And if you find a new tag before you get to the end tag, well then you've got a new tree. You will have to connect the characters for the inner tag. Then, when you get back to the outer tag, you'll start a new text node (assuming you want to do everything correctly).
Parsing XML with German specific characters
 
 
Q