Java is the new COBOL August 03, 2011 at 08:00 PM | categories: cobol, java | View Comments
Apart from complaining about Clearcase I have actually done some real work the past few years.
Chapter 1: How we converted our entire COBOL code-base to Java (sort-of).
Already you might start to feel a sick feeling in the back of your throat. That's normal and should pass shortly after you finish reading.
COBOL? WTF?!? This is usually the first thing people say to me when I tell them about what I've been working on. I'm sad to say that COBOL is still very much in use.
The dilemma that our company faced was that our main product has been around for many years and over time we have built up a unbelievably large code-base of COBOL, consisting of millions of lines. Re-writing this is both costly and time-consuming, not to mention risky. We needed another option.
Another serious problem is that the upfront cost of CICS, the COBOL 'application server' if you will, running on dedicated Mainframe hardware, plus the cost of Micro Focus licenses, for compiling COBOL, is bloody expensive. If we could run on a 100% Java stack, using open source technologies, we could save ourselves, and our customers, cold, hard cash.
At this point I need to mention something 'special' about how we use COBOL. To support a wide-range of transaction systems and databases we developed a custom variation of the language, which included custom-built 'macros' which generate unique code depending on the environment. While not especially relevant to this article, this leads to larger-than-expected COBOL (which is large enough as it is). The size of the program is significant for a few reasons, which I'll discuss below.
Initially we started with LegacyJ, a commercial product that advertised productive COBOL to Java conversion. What was nice about using LegacyJ was that we quickly discovered that it was, in fact, possible to convert our code successfully and have a running system. However, we ran into a few serious problems that made us hesitate.
Firstly, the Java generated by LegacyJ was quite lengthy and often didn't compile due to the length of some methods and number of fields. Apparently Java has a limit, not that you would ever conceivably reach it. To work around this I had to re-parse the Java to break these methods into smaller chunks and introduce a hierarchy of classes to work-around the field limit. Yuck.
Secondly, the classes generated by LegacyJ didn't separate the idea of 'meta' information such as variable types from runtime data. For each instance of a program, variables had effectively duplicate copies of type information, resulting in an extra-large memory footprint.
The other issue, and perhaps the most compelling, was that of money; LegacyJ was not cheap. We were trading one expensive platform, CICS, with another.
At the same time the following article appeared, introducing an open-source COBOL to Java converter called NACA. I tried it almost immediately but quickly found that many of our COBOL programs didn't compile due to some commands that NACA hadn't implemented. At first I gave up and went back to our LegacyJ integration. It was only later, after taking a second look, that I realised there was much more potential on NACA's generated Java and general approach.
The most obvious was that the Java was actually readable! At least if you count this as readable. NACA actually checked-in their Java files after the conversion, so the code had to be both readable and maintainable. This also had the nice side-effect of allowing our absolutely massive generated COBOL programs to compile (in 99% of cases anyway).
In addition there was a separate and static class structure representing the program definition, meaning that each program required less memory.
I was given some time to investigate the possibility of making NACA work with our unique flavour of COBOL. Fortunately it turned out there wasn't too much missing and I managed to get a working prototype in a reasonably short period of time. After that the decision to switch to a cheaper and open-source alternative which we could control wasn't hard to make and we haven't looked back since.
To avoid making this post longer that it already is I might save the important discussion of performance for another day. In short our pure-Java application runs surprisingly quickly. The biggest bottleneck is, without a doubt, one of memory. Running an entire-COBOL runtime within the JVM is obviously costly in terms of memory, not helped by our generated COBOL and vast code-base.
Do I recommend this approach to others? Absolutely, without a doubt. There seems to be people advising against a direct port, or at least re-thinking the problem first. For us the issue is one of scale. There simply isn't enough time/money to re-write everything, at least not in this decade. We needed to do something now; something we could guarantee would continue to work.
The benefits of running a pure-Java stack are, alone, compelling. One example that springs to mind is that of tracing. Once upon a time we would need to ask customers with a bug to recompile specific applications in trace mode in the vain hope that we actually knew where the problem was. Now we can leverage powerful Java logging (no, not that useless java.util.logging) and have full tracing across the entire stack; something that is invaluable for customer support.
So, while I hate the idea of granting further life to our hideous COBOL demon, from a business point-of-view it has been crucial in the continued success and evolution of our product; giving us breathing room to slowly migrate COBOL logic to 'normal' Java applications while guaranteeing our business logic continues to serve our customers. Or at least that's what our marketing brochures say; for me it was fun.
Java logging and per-user tracing May 14, 2011 at 05:20 AM | categories: logging, java, logback | View Comments
Let me just say this straight-up - Java Logging you are completely fucking useless. It boggles the mind how badly Sun screwed up the implementation of something so simple but yet so fundamental. It's logging for Christ's sake; how hard can it be?!? This captures the essence of my feelings, so I'll leave it at that.
One thing I did want to mention is Logback and the awesome SiftingAppender. Logback is the successor to the much loved Log4J, written by the same dude and addressing some of the problems with the original.
My manager wanted a way for our users to enable individual logging on the server and not require any meddling on the server by an administrator. A quick Google revealed this and from there it wasn't hard to implement a start/stop trace button on each page which harnesses MDC for per-user logging. On completion the trace can either be downloaded or emailed directly from the server to the system administrator or bug tracker.
Honestly, if/when I ever work on another online application I will almost certainly re-implement something very similar again. Being able to capture a trace of everything from the server for a single user session has proven to be an invaluable tool for diagnosing bugs. Give it a whirl!
ClearCase strikes back July 12, 2009 at 01:57 AM | categories: svn, git, java, python, dvcs, linux, clearcase | View Comments
Recently I made the switch to Ubuntu at work. This left me without ClearCase, as it requires some derivative of enterprise Redhat or SUSE, which didn't interest me in the slightest. I can only assume this is partially due to the complexities of managing a separate binary kernel module required by MVFS, their virtual filesystem. As an alternative Rational seem to be pushing their Java implementation ClearCase Remote Client (CCRC). CCRC is essentially an Eclipse plugin communicating to a server - via a mixture of Web Services and RPC, which in turn has the 'real' client installed. As you would expect this seems to have some performance trade-offs, and it also feels to me quite unstable at the moment, often throwing strange errors unrelated to ClearCase. It's certainly better than nothing though.
This, of course, raised immediate problems for me as I refuse to go back to using ClearCase directly now that I've tasted the good (and fast) Git life. Because I obviously can't seem to help myself I ported the gitcc to Java to use their new libraries, although it's definitely the last time.
One good thing to come out of this new version is the cross-platform ability to login as any user, bypassing the domain-specific authentication that is normally used. At least in my situation ClearCase was authenticated via our Windows domain and I could see no quick and safe way to switch between users. This was/is essential for us to preserve the identity of the author of each commit in ClearCase. I strongly suspect there is a way around this, but for the life of me I couldn't think of any. The one alternative that occured to me would have been to run gitcc on something other than Windows configured with NFS and 'sudo su' to each user as required. For reasons I am unable to articulate I wasn't all that happy with the idea and never bothered.
I've been running a gitcc daemon at work for the past week with great success. It's funny how the speed difference of pushing/pulling affects your work habits. Before I used to watch the checkin and rebase console output, like being glued to TV. Especially because we're using UCM this could take up to a few minutes to do it's rebase/deliver business, which would distract me and I would often 'hold off' on pushing or pulling because I wanted to do it all in one hit to save time. I no longer have that problem, being able to 'fetch' whenever I like, knowing it only takes (literally) a second to complete. Of course in the background the daemon is chugging away, syncing with ClearCase at its own slow-and-steady pace.
Git seems to be finally taking off at work, after a slow and bumpy start. Personally I'm thrilled to be using Git at work regardless of what anyone else is doing. However some part of me isn't satisfied with just stopping there. It pains me to see others at work using a sub-par tool when I know there is a much better one available. The worst part is that most of them don't even know what they're missing, having never used anything else. Having originally come from SVN I couldn't bear to have to checkout files every time I wanted to make a change and I knew it didn't have to be that way. A co-worker even commented that they 'like' knowing when they're modifying files just in case they accidentally typed something without realising it. That may be true, but I strongly suspect after using Git for a few days they would be unlikely to revert. That's not even taking into consideration the many other benefits of using a DVCS - like local branching.
What remains is the biggest hurdle: taking it to the 'next level'. A handful of developers are currently using it individually, but now that we have daemon available we can start to look at essentially ditching ClearCase, at least for our team. Part of the problem with working for a 'largish' company is their slow and cautious approach when introducing new technologies. From their perspective they have to consider what happens if I get run over by a bus tomorrow (those damn buses - always threatening to run us over). No one else really has the same intimate knowledge of gitcc and its inner workings. How much time and money would be wasted having to fix it, or revert to ClearCase if I was no longer there? The other issue is one of training and expertise. Now you have the problem of supporting two completely different source control systems, including training for, and maintenance of, both.
The final problem that I face is one of Git's maturity. I love Git. Without a doubt it is the most advanced version control system available today. Sure it's got more than its fair share of warts, but I wouldn't use anything else (yes I know about Mercurial, and in any other universe without Git I would be using it quite happily). Unfortunately what it lacks at the moment is all that boring Windows GUI and administration tools that managers and non-technical people need (and I don't just mean TortoiseGit). I'm sure they'll be here soon enough, but in the meantime I would still hesitate if asked 'is Git ready'. The problem is that the viable alternative, at this very second in time, would most likely be SVN, which isn't a bad system, but when compared to any modern DVCS it looks increasingly archaic. If we switched to SVN now I strongly suspect there would be pressure to switch (again) to something more powerful in the not-too-distant future.
I'd love to use Git at work as our sole version control. Unfortunately as a developer I don't really have the time or patience to play politics and ensure that all aspects of a migration are planned for. However, it seems silly to me that if a majority of your developers start to use an alternative system for any aspect of work then there isn't some reflection on your current one and how you can make your working environment that much more productive. I'm not suggesting for a second that we replace ClearCase tomorrow. Instead I am simply proposing a gradual adoption, or at least trial, of Git over ClearCase. Hell, we're pretty much doing that now anyway, just not officially.
Finally, the other revelation to come out of this is how truly entrenched in Java I have become. I found with the Python implementation I would often worry about making changes because I would have to spend part of the time investigating how to implement something 'in Python' rather than just being able to focus on the problem at hand. This has nothing to do with Python, it still remains a fantastic language, but perhaps says something about me getting older and less adaptable when learning new things. I hope I never reach that point where I'm too afraid to try something new, although I'm certainly becoming far less likely to (at least seriously) learn a new language in my spare time. I'm just hoping one of these days I get to use Haskell at work. :)