Skip to main content

SAX Parser truncation problems

Have you ever met with strange problems regarding SAX parser when textual content seems to be truncated? When it seems that the parser transmits only a fragment of the content which is inside an XML element. Maybe not. Maybe yes but haven't noticed.

Parser reads blocks of stream and it may call characters method more than one times. Well, it's written in the Javadoc and it's quite logical. If I have a very long text content, it couldn't had been processed in one go. It has to be split into parts.

So, rather than assigning the content to a simple string, use concatenation instead and evaluate the content on endElement.

By the way, the magical number is 2048. The parser implementation typically uses this block size. Unfortunately it's a kind of thing which easily creeps under the radar of tests. Nobody writes tests for long data.

See also this on Stackoverflow.

Comments

Popular posts from this blog

Client's transaction aborted

I've met the above error message using a Wicket 1.2 / EJB3 intranet application under Glassfish v2 . Here is the more particular head of the stack trace: javax.ejb.TransactionRolledbackLocalException: Client's transaction aborted at com.sun.ejb.containers.BaseContainer.useClientTx(BaseContainer.java:3394) at com.sun.ejb.containers.BaseContainer.preInvokeTx(BaseContainer.java:3274) at com.sun.ejb.containers.BaseContainer.preInvoke(BaseContainer.java:1244) at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:195) at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:127) This exception raised on the integration server sometimes, randomly, for simple page fetch operations. After pressing reload on the browser, the operation was usually successful. I couldn't reproduce the failure on the local machine where I regularly restart the app server and...

jxl.log

In an intranet production environment we have running a Glassfish v2 appserver with several J2EE applications which all use JexcelApi , a.k.a JXL, which is an open source library for accessing, generating or manipulating Microsoft Excel documents. We use version 2.6.3 of JXL because it's the recent one in the Maven repository which we use, however, at the official JXL site there are newer versions. Additionally we have log4j and Java Commons Logging (JCL), ignoring Glassfish's JSR-47 Java Util Logging (JUL) facility. Application #1 uses purely log4j and gets its log4j.xml config from a custom location. Application #2 runs Java Commons Logging with no explicite configuration file given, so JCL uses the default JUL facility of the appserver. Application #1 had been running for a long time without problems but when we installed #2 we realized that a jxl.log file had been created in the glassfish/domain/domain1/config directory and it's rapidly growing. As it happens, we ...

Standup

Recently I was asked if it makes sense to do standups. Is it just a formal waste of time? Wouldn't it be more useful to spend the same amount of time by actual work? This is how our standup looks like, this is how the work would look like without it according to me and this is why I think it's worth doing standups: Standup optionally starts with a half-minute long announcement by the Scrum Master if somebody is missing and when will be this person available again. Without standup:  We could check out this information from a well-prepared shared calendar but unexpected lates or illnesses which are missing from the calendar would require a little bit more communication and irrelevant discussion during the day. It would cause some delay for sure. Then we look at the burn-down chart of the sprint and to the status of the latest nightly build. Is it stable, what about automated tests which were run last night? We make a common standpoint in one minute which is clear a...