8/30/13

SAX Parser truncation problems

Have you ever met with strange problems regarding SAX parser when textual content seems to be truncated? When it seems that the parser transmits only a fragment of the content which is inside an XML element. Maybe not. Maybe yes but haven't noticed.

Parser reads blocks of stream and it may call characters method more than one times. Well, it's written in the Javadoc and it's quite logical. If I have a very long text content, it couldn't had been processed in one go. It has to be split into parts.

So, rather than assigning the content to a simple string, use concatenation instead and evaluate the content on endElement.

By the way, the magical number is 2048. The parser implementation typically uses this block size. Unfortunately it's a kind of thing which easily creeps under the radar of tests. Nobody writes tests for long data.

See also this on Stackoverflow.