Runtime Dependency Analysis

2011-07-14

I was wondering: if I change class Foo, how do I determine 100% which use-cases to include in my regression tests? It would be useful to know with 100% certainty that I must consider the Acme login process, as well as the WidgetCo webservice authentication. And nothing else. Can my IDE help me with this?

Well, in some cases it’s straightforward to analyse for backward dependencies. If I change class Foo, then static analysis tells me that webservice WSFoo, and controller Bar are the only upstream entry points to your application affected by this change. So you test those flows, and that’s about it.

But what if your application behaviour can change at run-time? If you want to build products rapidly for your clients, you must be able to tweak functionality outside of the development cycle. If customer Acme wants to insert a new step into their login process, you eventually want to be able to switch that on from an admin interface, not by changing the application code.

So for example, you could implement “hooks” in your generic flows, which pull dynamic behaviour in from outside, e.g. from a database. This is nice, but it means you can no longer reason about dependencies, just by looking at the code.

So going back to the original question: how can you be sure a new code release won’t break any of your use-cases? An interesting approach would be to develop a tool which learns about code coverage dynamically. This tool would be configured with a list of use-cases, each of which maps to an “entry point” within your application.

For example, one entry point could be the method where the Acme login process first hits your application namespace, for example the doPost() method of a Servlet. The tool would then be loaded into the JVM alongside your application and would wait for a “hit” on an entry point. Once it detects a hit, it would record the following stack trace, until the entry point exits.

The information recorded by this tool would tell you exactly which downstream components - e.g. classes or external services - are dependencies of the flow being recorded. By reversing these mappings, you can instantly tell which flows will be impacted when you change a component.

You could just drop this tool into your UAT server, and let it “discover” this information, as long as customers are testing. The presence of this tool would encourage a wide variety of tests - to “train” the tool as comprehensively as possible.

How would the recorded coverage information be presented? A simple approach would be a tree of auto-generated, static HTML files - much like the output of the javadoc tool. These pages could either be created on the filesystem, or hosted over HTTP.

How would the tool interact with the application being recorded? Naturally your application should have minimal awareness of the tool. Ideally, a jar file could be dropped into the classpath, and bootstrapped with a Servlet Listener, for example.

Still, the “entry points” in your application need to be defined somehow. One idea is to annotate entry points in your application at method-level. The tool would then need to scan for these annotations at bootstrap time. This could probably be achieved by using a bytecode manipulation library such as Javassist, to avoid loading all the classes prematurely. The Scannotation library provides a ready-rolled solution for this.

So once your application is running, how would the resulting “stack trace” be captured? Possibly, JVMTI could be used to capture this information, like how a profiler does. Ideally, it would be possible to whitelist (or blacklist) by package name, so only relevant information is recorded.

One caveat is that this “profiling” could hit performance within your application. This could be minimised by flushing the recordings (e.g. to a file) asynchronously. For example, Chronon (a “time travelling debugger”) uses this approach, utilising background threads to periodically flush its buffers to a file.

Another downside of this approach is that the tool’s “knowledge” is only as comprehensive as the coverage which occurs during the training period. A blunt response to this issue would be to simply deploy the tool to production. This way, as well as building a more comprehensive picture of runtime dependencies, interesting statistics would emerge. A “one-stop shop” would result, where Service Delivery teams could determine which functionality customers are using, as well as timing information for example.

In summary, a tool like this would raise confidence in software quality, increase visibility of coverage, and also reduce pain caused by manually hunting through code for usages. Perhaps it could also offer an interesting birds-eye view of your application usage patterns, too.

Choudary Kothapalli - Aug 2, 2011

This tool MaintainJ is capable of doing exactly what you are dreaming about. MaintainJ captures runtime data (the classes used, methods called, method parameters, the actual runtime sql statements sent to the database) while you are running a use case. All this information is stored in a database, which can be used for what you are describing. MaintainJ allows you to either set a start and end point of your recording or to specify an ‘entry point’ to the application. Check the site (www.maintainj.com) and contact me if you need any more details about the product. The current public version just shows UML diagrams of the call trace. Next version by end of August will show the call data and sql in sequence diagrams. The next version by the end of September will allow you to search. Cheers, Choudary Kothapalli.

[…] cannot answer these questions better than Ben Rowland did on his blog titled Runtime Dependency Analysis. So please check that blog to appreciate the power of this new feature in […]

Ben, The just released version 3.3 of MaintainJ supports the Runtime Dependency Analysis as described in your post. 1. How does MaintainJ capture the runtime dependencies? MaintainJ simply captures all the method calls and the call context when a test case is executed. User can specify the classes that should be captured. a) One may manually record the call trace for each test case by clicking ‘Start Tracing’ and ‘Stop Tracing’ buttons and providing a file name for this information to be stored. Check the MaintainJ overview demo to get a quick sense of what I mean. b) Or the user can specify an ‘entry point’ like you described and the call trace will be logged whenever that ‘entry point’ is hit. More details of this kind of recording are here and here. 2. Technology used to record MaintainJ currently uses AspectJ to capture the runtime information. Application source code need not be changed in any way and MaintainJ supports applications on JRE 1.4 and up. Several wizards are provided to make is easy for the user to install and configure MaintainJ on various runtime configuration like J2EE applications, core Java apps, JUnit and Applets. In addition to Java classes and methods, MaintainJ currently captures JSP’s called and all the SQL statements going out to the database. This gives user the ability to search for the dependency on a database table or field as well. The SQL statements captured are the actual runtime SQL statements going out to the database with the parameters filled. 3. How is the recorded information presented? The call sequence is presented as sequence diagrams. The recorded information is of method calls, after all. The sequence diagrams are dynamic, can be explored and are easy to read. The context information is shown in tool tips and also in Eclipse ‘Properties view’. The Eclipse ‘Outline view’ allows the user to show all the calls occurring in the test case and also provides several ways to filter out unnecessary calls. 4. How is the analysis performed? a) All the runtime call trace information is internally stored in a database. b) The MaintainJ sequence diagrams must be seen in Eclipse (MaintainJ provides Eclipse plug-ins for this purpose). The dependency analysis is integrated into the Eclipse search functionality. Once the runtime call trace information is recorded, one can simply use the search functionality to find all the test cases that are affected by a change to a class or method or JSP or a database table or field. The search results are presented in the standard Eclipse search results view. For example, when the user searches for a table name, the search results view shows all the trace files and SQL calls that have SQL statement with the given table name. When user double clicks the call in search results view, it opens the breadcrumb trail of the call sequence to that particular database call in the sequence diagram, where user can see the actual SQL statement going out to the database. –Choudary Kothapalli MaintainJ Inc.