Hacking Apache POI

In the near future, presumably, I'll have to read and process complex xlsx documents containing a lots of formulas. Jexcelapi can't even open those kinds documents, so I turned to Apache POI. Unfortunately, POI also have problems with some formulas becase support of this field is not fully laboured.

[Update 2010.03.10: After some experimentations I found that Apache POI can read and process new Excel (xlsx) documents including calculating very complex formulas: Basically, cell.getStringCellValue() gives the evaluated outcome. This way, I didn't have to hack POI. That's great!]

I choosed to start two methods:
  1. Knowing exact static structure of the Excel template document, I do problematic calculations in the reader program itself, eliminating POI's formula support.
  2. I study Apache POI's internal structure and try to make up the missing functionality. Unfortunately, POI isn't amendable in a non-invasive way with registering components at certain extension points, so I had to dig in the source code in order to make modifications. Here are the basic experiences regarding setting up the POI sources locally:

Sources can be obtained from subversion http://svn.apache.org/repos/asf/poi/trunk/ or from direct download. There are some "how to build info" on the homepage but it may be useful to record my own experiences, because there are some tricks. Downloaded source has pdf and html docs, but SVN repo contains those in xml format which can be converted by running docs Ant target after installing Apache Forrest v0.5.1 (unzip & set FORREST_HOME system variable to the forrest's root directory). I had to edit forrest.build.xml at line 630, changing failonerror value to false, otherwise the build failed. Place of generated docs is written on the console after a successful build.

For running dist target, JDepend jar have to be on the classpath and setting -Xmx512m VM argument is needed. It completes in 7 minutes for me but no need to run it regularly, just to see if everything can go smoothly. Target jar completes in 45 seconds from scratch and makes all needed jars.

For building with an interactive IDE, add jars to the project from lib and ooxml-lib directories. Setting up JUnit3 may be also needed for running test cases. When running unit tests, add test-data directory to the classpath or specify POI.testdata.path system property.

Target maven-poms creates jar and POM files for Maven and a Unix shell script to install the artifacts in the build/dist directory. I created a DOS batch file with commands like this:
call mvn install:install-file -Dfile=poi-scratchpad-@VERSION@.jar -DpomFile=poi-scratchpad-@VERSION@.pom
I ommited date timestamp info from the artifacts' name. build.xml must have been modified in the jar target to ommit timestamp postfixes. maven-poms also must have been modified to copy the new maven-deploy batch file. version.id property may be modified in build.xml to flag the forked version.

Apache POI is now ready to be hacked.


Json-lib vs Hibernate

Subtitle: java.util.Date vs java.sql.Timestamp

One of my projects has a data path where Hibernate entities are converted to Json objects by using json-lib. I have some special formatters, e.g. for Date objects for having an easy readable format instead of this:


A JsonConfig object can be suited by special formatters which later will be given to the fromObject method:
JsonConfig cfg = new JsonConfig();  
java.util.Date.class, new JsonValueProcessor() {
private static final SimpleDateFormat SDF =
new SimpleDateFormat("yyyy-MM-dd");
public Object processArrayValue(Object value, JsonConfig c) {
return format((Date)value);
public Object processObjectValue(
String key, Object v, JsonConfig c) {
return format((Date)v);
public static String format(Date date) {
return SDF.format(date);
JSONObject json = JSONObject.fromObject(objToConvert, cfg);

The problem is, that the code above sometimes doesn't work, because json-lib's default search mechanism regarding value formatters is based on exact class match, however, Hibernate sometimes replaces values for something else, say java.util.Date to java.sql.Timestamp.

It's no matter when using final classes, but we simply cannot apply substitution principle (LSP) generally when using getClass-match. In this case, fromObject won't convert descendants properly when we register a formatter for the superclass. An ugly solution is to register formatters for all descendant classes explicitly but this violates Open Closed Principle. When a new descendant appears, we must amend the code which registers the formatters.

Fortunately it's possible to replace the default search mechanisn too:
new JsonValueProcessorMatcher() {
public Object getMatch(Class target, Set set) {
if( target != null && set != null) {
for(Object obj : set) {
Class c = (Class) obj;
if(c.isAssignableFrom(target)) return c;
return null;

target is the runtime class type of the object to convert, set contains the registered formatters' keys. getMatch contract method returns the key for the appropriate formatter or null if no such one.

This solution still have drawbacks. Say, if I register java.util.Date and java.sql.Timestamp too, the matcher will randomly find them, based on the order of the keys in the set. So, some logic is missing which searches for closest match or something like that. But it's good for a demo code.



In an intranet production environment we have running a Glassfish v2 appserver with several J2EE applications which all use JexcelApi, a.k.a JXL, which is an open source library for accessing, generating or manipulating Microsoft Excel documents. We use version 2.6.3 of JXL because it's the recent one in the Maven repository which we use, however, at the official JXL site there are newer versions. Additionally we have log4j and Java Commons Logging (JCL), ignoring Glassfish's JSR-47 Java Util Logging (JUL) facility.

Application #1 uses purely log4j and gets its log4j.xml config from a custom location.
Application #2 runs Java Commons Logging with no explicite configuration file given, so JCL uses the default JUL facility of the appserver.

Application #1 had been running for a long time without problems but when we installed #2 we realized that a jxl.log file had been created in the glassfish/domain/domain1/config directory and it's rapidly growing. As it happens, we realized that when the partition had been fulled and all enterprise applications had been badly crashed.

I looked into the JexcelApi source and saw that it has a log4.xml which set everything into debug mode and set up a file appender for file jxl.log. It was clear that it's a demo config file and because of the JCL configuration mechanism, the Log4 DOM Configurator gained control and did its job. As Andy Khan wrote "The sample log4j file included along with the distribution is only indicative" and as Kevin replied at the Yahoo JexcelApi group back in 2007: "I think the problem occurs when using jexcelapi in an environment that where multiple logging frameworks are used. Some libraries use log4j, some use commons logging and some use the jdk logging framework. The ones that use log4j or commons logging sometimes look for a log4j.xml on the classpath. When jexcelapi is jarred up, the log4j.xml gets stuck in the "root" of the jar and often gets picked up by log4j or commons logging. Since the default log4j config of jexcelapi is so broad, it ends up overwriting the policies of other libraries." Yes, I can confirm it's definitively a problem.

So, the narrower task was to fix this particular problem but the broader task was to prepare for those cases when a jar in a j2ee application contains a log4j config for any reasons.
  • It's a good fix to remove log4j.xml from the jar, but say when we talk about Maven build system, it's not a trivial task to dig into an artifact and change it. What's more, who want to care about changing third party dependencies even when using Ant? Fortunately, later versions of JXL don't contain this config file so it could be a good fix to use one of them. Conclusion for component builders to not to put log4j (and any) config into the root of the artifact.
  • However, it could happen that it's not enough to change the jar to a newer one, make a new build and deploy to the appserver, because the appserver may hold the configured loggers running and some jar files in a devious working directory and couldn't release them when undeploying. Additionally, when it restarts it may pick up again the old wrong jars. Particularly, we have had the old jar files in the application/j2ee-apps and the webapps' directory of the Glassfish's domain directory and couldn't delete it even by hand until the appserver had been stopped.
  • As Application #1 showed us, it may be a rough but useful practice to get the log4j config from a custom location by giving an absolute file name in an environment variable. (-D argument of the JVM.) But this defends only the particular application. And when we don't want to use log4j in that application we also have to give a log4j config just for hide the corrupted one, which is unacceptable.
  • A good practice according to me, to follow Andreas Shaefer's tip regarding set up log4j logging in Glassfish in his blog entry: To the hell with JDK logging II.. In our case it prevented Application #1 from corrupt logging by placing a log4j.xml to the very front of the classpath, however, it enabledApplication #2 to use its special config.
Otherwise, a correct and useful logging tip collection would be very handsome for J2EE environment, but not a one which says that every logger instance method invocation should be an EJB call.


JUM VIII. - IX. - X.

In the recent months there were a few Java-related microconferences organized at Budapest.

At the eighth occassion of the Java User Meeting (JUM) in January we had a presentation about OSGi by local guys and about Flex by Cornel Creanga from the Romanian Adobe office. I've read his blog and I must say life of an evangelist can be very cool (if there's no wife or girlfriend:)). Visiting Java and Adobe user groups all around in Europe, it's great.

At the ninth occasion in March we had three presentations: introduction to the Maven build system, SOAP testing with SOAPui and various other components, and OLAP architectures. The latter was driven by the magnificant prezi presentation engine, which is a very young Hungarian startup project and which will be introduced soon at the Silicon Walley and I'm sure it will have a great success. Otherwise, I guess this time we beat down the highest participant number with about 40 attendees.

At the tenth occassion we had a presenter again from outside of the country: Corsin Decurtins from Switzerland came and talked about Rapid prototyping with object oriented databases. The discourse had also been presented at the Jazoon conference in 2007. He also talked about the conference itself which is organized in Zurich every summer. I think this time I skip it due to various domesticies. Other two discourses were about SOAP testing with Groovy and Amazon Web Services.

The next Java User Meeting will be held at the third wednesday of September. If you are an ardent evangelist or your company would appreciate if you would talk about their product, (and this product can be connected to Java somehow) contact with the JUM organizers. Budapest is still beautiful in September and we know the right pubs.


Budapest Newtech Meetup

The originally planned programme of the January Budapest Newtech Meetup contained some interesting presentations like the smart 3D visualizer solution (3dForAll) from a young Hungarian inventor or the Google App Engine which is an alternative server-side solution for those who don't have own server facilities. It can be used mainly for OpenSocial applications which is actual nowadays because of leader social network of Hungary (Wiw: Who is Who) had been announced to support opensocial applications in the near future. Further live demo would had been the Surface which is a Microsoft hardware but it had been also cancelled.

The event had been organized this time in a small art movie near the Buda-side of the Margareth bridge. About 130 people were present, some of them couldn't fit into the room.

The first demo was about the Zoomery application which is built on Microsoft Silverlight technology and used for deep-zooming a set of pictures or documents. Several kinds of documents can be converted to this special image format which is highly zoomable, filterable and orderable with the application itself.

Next presentation was about the Sensenet, a unique open source .NET based Enterprise Content Management System. They have some working project with this CMS at some larger company or organization in Hungary and abroad. It can be a deliberate solution for those who adhere Microsoft technology. In Java world, Alfresco is a similar open source Enterprise CMS.

UStream is a web-based video streaming solution. The company have some Hungarian relations and not least most development happen at Budapest. We saw a video stream on the movie screen in a web browser which was being captured by a 3G phone in the room and was passed through a server placed in the US. Not the technology was the main point but the possibility to manage the stream as a service: embedding it into own webpage, placing own trademark on the stream, managing user-access. Afterall, it was a great demonstration.

Last presentation was about TurulMeme, which is a webpage based on sharing facilities of Google Reader. People can register their sharings into the TurulMeme and they appear on the page as tagged, commentable, filterable news with information on how many other users read it and something like this. It wasn't a big technological breakthrough according to me, there are many more similar services on the web, but the main difference that this site focuses mainly on Hungarian news sources. As the developer said it had been completed in days (MS techonology) and it's rather fun than a serious competitor of other link sharing services.

Afterall, it was a great event in a great place with many people and hot vine.