Skip to main content

Hacking Apache POI

In the near future, presumably, I'll have to read and process complex xlsx documents containing a lots of formulas. Jexcelapi can't even open those kinds documents, so I turned to Apache POI. Unfortunately, POI also have problems with some formulas becase support of this field is not fully laboured.

[Update 2010.03.10: After some experimentations I found that Apache POI can read and process new Excel (xlsx) documents including calculating very complex formulas: Basically, cell.getStringCellValue() gives the evaluated outcome. This way, I didn't have to hack POI. That's great!]

I choosed to start two methods:
  1. Knowing exact static structure of the Excel template document, I do problematic calculations in the reader program itself, eliminating POI's formula support.
  2. I study Apache POI's internal structure and try to make up the missing functionality. Unfortunately, POI isn't amendable in a non-invasive way with registering components at certain extension points, so I had to dig in the source code in order to make modifications. Here are the basic experiences regarding setting up the POI sources locally:

Sources can be obtained from subversion http://svn.apache.org/repos/asf/poi/trunk/ or from direct download. There are some "how to build info" on the homepage but it may be useful to record my own experiences, because there are some tricks. Downloaded source has pdf and html docs, but SVN repo contains those in xml format which can be converted by running docs Ant target after installing Apache Forrest v0.5.1 (unzip & set FORREST_HOME system variable to the forrest's root directory). I had to edit forrest.build.xml at line 630, changing failonerror value to false, otherwise the build failed. Place of generated docs is written on the console after a successful build.

For running dist target, JDepend jar have to be on the classpath and setting -Xmx512m VM argument is needed. It completes in 7 minutes for me but no need to run it regularly, just to see if everything can go smoothly. Target jar completes in 45 seconds from scratch and makes all needed jars.

For building with an interactive IDE, add jars to the project from lib and ooxml-lib directories. Setting up JUnit3 may be also needed for running test cases. When running unit tests, add test-data directory to the classpath or specify POI.testdata.path system property.

Target maven-poms creates jar and POM files for Maven and a Unix shell script to install the artifacts in the build/dist directory. I created a DOS batch file with commands like this:
call mvn install:install-file -Dfile=poi-scratchpad-@VERSION@.jar -DpomFile=poi-scratchpad-@VERSION@.pom
I ommited date timestamp info from the artifacts' name. build.xml must have been modified in the jar target to ommit timestamp postfixes. maven-poms also must have been modified to copy the new maven-deploy batch file. version.id property may be modified in build.xml to flag the forked version.

Apache POI is now ready to be hacked.

Comments

Popular posts from this blog

Client's transaction aborted

I've met the above error message using a Wicket 1.2 / EJB3 intranet application under Glassfish v2 . Here is the more particular head of the stack trace: javax.ejb.TransactionRolledbackLocalException: Client's transaction aborted at com.sun.ejb.containers.BaseContainer.useClientTx(BaseContainer.java:3394) at com.sun.ejb.containers.BaseContainer.preInvokeTx(BaseContainer.java:3274) at com.sun.ejb.containers.BaseContainer.preInvoke(BaseContainer.java:1244) at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:195) at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:127) This exception raised on the integration server sometimes, randomly, for simple page fetch operations. After pressing reload on the browser, the operation was usually successful. I couldn't reproduce the failure on the local machine where I regularly restart the app server and

jxl.log

In an intranet production environment we have running a Glassfish v2 appserver with several J2EE applications which all use JexcelApi , a.k.a JXL, which is an open source library for accessing, generating or manipulating Microsoft Excel documents. We use version 2.6.3 of JXL because it's the recent one in the Maven repository which we use, however, at the official JXL site there are newer versions. Additionally we have log4j and Java Commons Logging (JCL), ignoring Glassfish's JSR-47 Java Util Logging (JUL) facility. Application #1 uses purely log4j and gets its log4j.xml config from a custom location. Application #2 runs Java Commons Logging with no explicite configuration file given, so JCL uses the default JUL facility of the appserver. Application #1 had been running for a long time without problems but when we installed #2 we realized that a jxl.log file had been created in the glassfish/domain/domain1/config directory and it's rapidly growing. As it happens, we

Brillien

There were three presentation as usual at the 14th Java User Meeting at Budapest yesterday, but this time I'll write only about the first one. Imre Fazekas had been invited from Debrecen to make a presentation about Brillien , their Java enterprise platform. It's an alternative way of modeling and implementing business logic in place of Java Enterprise (J2EE) Technology. It uses invasive programming model which means we have to extend classes from Brillien superclasses and we may use annotations. The whole system based on asynchronous XMPP communication and a set-context-flow concept. It's something like an application server, but uses a more logical and set-based approach. As the presenter said, it has very good speed and performance characteristics. Currently there are no IDE or other tools for editing models because this is a very young project, however a Maven plugin and a Maven archetype is accessible on the homepage. A very good introduction and tutorial can be also