2013-04-28

Tycho Test Trouble - Expectations and Realities

As part of my job, I have converted some of our Eclipse based product to use the Tycho build system instead of PDE Build and various internal build tools. For the most parts, this have been a pleasant experience, where the most trouble have been on the tests and getting these to work in the new environment. At EclipseCon Europe 2012, I talked about this in the presentation "Beware: Testing RCP Applications in Tycho can cause Serious Harm to your Brain".
One of the things, I really like about Tycho, is the fact that when a build fails, you can try to re-build the failed module by just running Tycho in the failed module and expect to get the same results as if you started the build again from the top parent POM.

But now I have a situation where this is not the case: when I run the (global) build from the top parent POM, the build fails consistently, whereas the "local" build in the failing module succeeds consistently! My expectations does not match the realities! Which is a particularly bad thing when it comes build systems as we really want to rely on them and not think too much about them in our daily work.

It has been very hard to find the reason behind this behavior, thus this blog entry in the hope that others will not have to go though the same trouble.


A little Background

The birds view of the operation of Tycho - and Maven - is rather simple: first it reads the build information of all the modules in the build configuration, then it reorders them to satisfy the dependencies between them and lastly it builds the modules in sequence (see Maven documentation for the details). Tycho automatically adds the extra dependencies from the OSGi/Eclipse related build files such as MANIFEST.MF, feature.xml, categories.xml, etc.

When you run your test plug-ins, you sometimes want to add additional plug-ins to the launch configuration - dependencies that cannot be determined by Tycho from the usual build configuration files. This can be for many different reasons: optional dependencies, RAP versus RCP differences, use of OSGi Declarative Services, use of Equinox Extension Registry, use of Update Sites, use of JSR 223 and Buddy Class Loading just to name a few. These extra dependencies can be declared relatively easy in pom.xml in the surefire section as described on the Tycho Wiki.


<plugin>
  <groupId>org.eclipse.tycho</groupId>
  <artifactId>tycho-surefire-plugin</artifactId>
  <configuration>
    <bundleStartLevel>
    <dependencies>
      <dependency>
        <!-- RAP -->
        <type>eclipse-plugin</type>
        <artifactId>org.eclipse.rap.ui</artifactId>
        <version>0.0.0</version>
      </dependency>
      <dependency>
        <!-- Groovy support -->
        <type>eclipse-plugin</type>
        <artifactId>com.agetor.core.jsr223.groovy</artifactId>
        <version>0.0.0</version>
      </dependency>
      <dependency>
        <!-- Logging via DS -->
        <type>eclipse-plugin</type>
        <artifactId>com.agetor.core.logging.impl</artifactId>
        <version>0.0.0</version>
      </dependency>
      ...
    </dependencies>
  </configuration>
</plugin>


You can have dependencies on both bundles and features - the later can be very useful in cases where you have fragments that depends on the environment!

If the build fails - usually because a test fails - Tycho stops and ignores the rest of the modules in the build sequence. At this point you usually fix the problem and then either re-run the complete build or try to re-build only the affected modules (sometimes a little dangerous, but often much faster).

Like most other developers, I always run my top partent POM build with the option -Dtycho.localArtifacts=ignore (see the Tycho Wiki for the details). This ensures that only controlled artifacts are used in the build product: the artifacts must come from the target platform, from the Maven repositories or be the result of other modules in the build reactor. Thus any artifacts from previously builds are simply ignored and cannot sneak into the product. Of cause, when you have to (re-)build a single module, you have to leave out this options.


The Problem

Which brings me to the problem we experienced yesterday.

Yesterday, two things happened: I added a new test plug-in (com.agetor.core.tests) to the application... and suddenly the build failed consistently. I have added many test plug-ins to the build before and this plug-in was very similar to all the test plug-ins, I have added before. I just wanted to test some very basic utility functions that has been left untested before - and thus the new module was added near to the top of the parent POM along with the base module that was tested.

<modules>
  <module>../com.agetor.test.utils</module>
  <module>../com.agetor.test.utils.rap</module>

  <module>../com.agetor.core</module>
  <module>../com.agetor.core.tests</module>
  <module>../com.agetor.core.utils</module>

  <module>../com.agetor.core.logging</module>
  ...
</modules>

When I tried to re-run the build on just the (new) failed test plug-in, it consistently succeeded!

The error messages from the failed tests seemed to indicate that some OSGi Declarative Services had not been properly started because some classes was missing - ClassNotFoundException - but when I used the ss command of the OSGi console to look at the started bundles, all bundles had been started as expected.

Then I tried to dissect the two OSGi configurations build by Tycho - one for the failing build and one for the succeeding build. Here I noticed a peculiar difference: config.ini for the failing build contained references to the plug-in folders of some of the used plug-ins rather than the jar files for these:

...
osgi.bundles=
    reference\:file\:/Eclipse/workspaces/agetor5/com.agetor.core/target/com.agetor.core-1.1.0-SNAPSHOT.jar,
    ...,
    reference\:file\:/Eclipse/workspaces/agetor5/com.agetor.core.logging.impl@2\:start,
    ...
...

The first line is correct, the last line is not! (The later line could have made some sort of sense, if only dev.properties had included the appropriate line, but... not for a Tycho build!) In the succeeding build the references above was directly to my Maven repository.

This was rather weird! But it did explain the problem with the missing classes: OSGi would find the MANIFEST.MF files just fine, but not the class files as these are not located in the root of the plug-in.

I spent the next hour or two checking all the various files in this test plug-in against an older working test plug-in, yet finding nothing that could account for any of this.

We use OSGi Declarative Services as well as a number of fragments in the product and thus we have added a number of surefire dependencies for these directly in pom.xml (as shown above). And on reflection, it was almost the same set of plug-ins that had been added to the POM that was not correct in config.ini. Until yesterday this worked fine. Weird indeed!

(At this point I had tried to compile with various different versions of Tycho, but 0.16.0, 0.17.0-SNAPSHOT and the newly staged 0.17.0 all had the same behavior..)

It was at this point, I noticed that the build order for the modules in the product, was a little strange!

[INFO] Reactor Build Order:
[INFO] 
[INFO] com.agetor.parent
[INFO] com.agetor.target
[INFO] com.agetor.core.parent
[INFO] com.agetor.core
[INFO] com.agetor.core.logging
[INFO] com.agetor.test.utils
[INFO] com.agetor.test.utils.rap
[INFO] com.agetor.core.tests
[INFO] com.agetor.core.utils.pde
[INFO] com.agetor.core.utils
[INFO] com.agetor.server.parent
[INFO] com.agetor.server
[INFO] com.agetor.core.utils.tests
[INFO] com.agetor.core.utils.tests.fragment
[INFO] ....

The test plug-in - com.agetor.core.utils.tests - had a pom.xml dependency on com.agetor.core.logging.impl (shown above), and thus I expected the later to appear in the build order before the test plug-in. But it did not! Until this point, I had always assumed (or expected) that these extra surefire dependencies was taken into considerations in the build sequence in the same manner that the dependencies from MANIFEST.MF, feature.xml, etc...

Now I had something to google, and searching for "tycho surefire dependency reactor" returned a relatively new bugzilla issue by Tobias Oberlies: Dependencies configured in tycho-surefire-plugin don't affect build order. Bingo!

The Solution

The immediate solution to the problem was simple, as Tobias was so kind to include two possible work arounds as well in the bug report:
Workaround: change the module order in the root POM to bundle-1, bundle-3, bundle-2, or add Require-Bundles from the test bundle to the other two.
I have re-ordered the modules in the parent POM to reflect dependencies a little closer and also added a few explicit dependencies in the MANIFEST.MF of the test plug-ins where the former didn't work.

2013-04-16

Eclipse and command line arguments

If you ever wanted to run an Eclipse based application on one machine and debug it from another machine, you have probably run into a rather annoying problem: if you specify -vmargs on the command line, you must (re-)specify all the Java VM arguments from eclipse.ini as these are replaced and not just appended to...

So the following will likely not work as intended:

# eclipse -vmargs -Xdebug -Xrunjdwp:transport=dt_socket,server=n,address=...

But, there is an easy way around this: if you also specify --launcher.appendVmargs before -vmargs, then the following arguments are appended to the Java VM arguments.

# eclipse --launcher.appendVmargs -vmargs -Xdebug -Xrunjdwp:transport=dt_socket,server=n,address=...

There are still some limitations to what you can do, as the Java VM in some cases use the first occurrence of an argument, so you cannot replace an argument from eclipse.ini this way.

See the Eclipse Wiki for all the glory details.