Tuesday, November 12, 2013

FDS and the Challenge of "Big Data"

Just the other day Kris Overholt stumbled across a blog post written by a post-doctoral fellow at the University of Washington, Jake VanderPlas:


The article speaks to the difficulty of maintaining sophisticated computational tools in an academic setting, and helps explain some of the challenges we face in developing and maintaining FDS and Smokeview. In particular, the article points out how the traditional form of academic scholarship has become counter-productive:
This brings us to Academia's core problem: despite the centrality of well-documented, well-written software to the current paradigm of scientific research, academia has been singularly successful at discouraging these very practices that would contribute to its success. In the "publish-or-perish" model which dominates most research universities, any time spent building and documenting software tools is time spent not writing research papers, which are the primary currency of the academic reward structure. As a result, except in certain exceptional circumstances, those who focus on reproducible and open software are less likely to build the resume required for promotion within the academic system.
While NIST and VTT are not universities, they have much in common and share the same reward structure. Our management has, to some extent, adopted a broader set of criteria for evaluating our impact, but the fact remains that we and our collaborators are under pressure to publish papers rather than do the necessary things to maintain FDS and Smokeview. You might say that the two need not be mutually exclusive, but they often are. Take, for example, the recent request for an OpenMP (i.e., shared-memory parallel) version of FDS 6. OpenMP is a set of statements within the FDS Fortran source code that enables FDS to run on a single computer that has multiple processors or cores. This is different than the MPI version of FDS which enables multiple meshes to be processed by multiple computers. There was an OpenMP version of FDS 5 that was contributed by a volunteer, but he can no longer work on this because he has to earn a living (one does wonder where all this wonderful free stuff on the Internet comes from). OpenMP is not new, and implementing it in FDS is not going to win anyone fame, fortune, or even a published journal article. It is a thankless task (academically). OpenMP is just one example. Consider the time and effort to assemble all those verification and validation case studies, and automate the procedure for running and processing them, and prepare the manuals, and distribute the software, and so on. Much of this work is not the stuff of academic journals, but it is absolutely necessary.

So what is the solution to this problem? The first step is to develop a new definition of the word "collaboration." To most, collaboration means that we publish our work in journals, or meet at conferences and exchange ideas. In many areas of science and engineering, this has worked reasonably well for centuries. So what's wrong now? What has changed? As Jake VanderPlas points out, what has changed is the advent of complex numerical codes that cannot be maintained by a single professor and a few students; nor can they be successfully commercialized. There has been some limited success in developing useful add-ons to FDS, like the graphical user interface PyroSim by Thunderhead Engineering, but the core development work is underwritten by NIST and VTT. In our opinion, this is a good model because it allows for continued research in algorithms that probably could not be supported by a completely commercialized program. On the other hand, it means that these various "maintenance" activities fall through the cracks. They appeal to neither the core developers nor the commercial partners.

So who are to do these thankless tasks? At the moment, we the developers. This partly explains why it took us three years to release FDS 6. As you've read in the blog posts over the past few years, we have redoubled our efforts to better document our work, to develop more accurate and robust routines, and to do the necessary verification and validation work that is absolutely necessary if FDS-SMV is to be used in a regulatory setting. But there is so much more that could be done to improve this software, and what might surprise most of you is that this work IS being done, sort of, by all those students out there who are working with FDS. But the problem is, and this takes us back to the definition of the word "collaboration", that all this work is not finding its way into the FDS code, manuals, or repository. Rather, it is making its way, sort of, into the various fire journals and conference proceedings. I say "sort of" because much of it just ends with the masters or PhD thesis. Even the stuff that gets into the literature is not easily converted into genuine improvements to the code or documentation. As much as we try to open up lines of communication early in the process, we cannot break the traditional publish or perish model.

Usually, we are sent a paper describing a student's masters or PhD work with FDS only at the end of the student's tenure and asked if the work can be incorporated into the code or added to the validation suite.  This is frustrating on several levels.  First, often we have already solved the problem being addressed, but as discussed above we do not have time to write a paper every time we change a few lines of code.  Second, often the work is very good and we would like to incorporate it, but this amounts to us redoing the student's two years (plus) of work within a few days---that's about how much time we can usually spare.  Why could this work not have been done within the FDS-SMV framework to begin with?  Why won't the professors and students contact us on the front end?  The answer we usually get for this question is that they are worried about someone stealing the idea and beating them to publication.  Lest you feel we are picking on academia, in fact we find the same response from industry with their concerns about intellectual property (IP).  Here is a list of reasons why this concern is invalid:
  1. Students have almost nothing to lose and a tremendous amount to gain by collaborating with us early in the process. FDS was designed as a platform for the entire research community to make small incremental improvements in fire modeling without having each and every new student re-invent the wheel.
  2. The developers can usually code something up much faster than the students, but we have little time to thoroughly shake it down. Programming is not the problem – it is the work done over months, even years, to do a thorough job of verifying and validating the algorithm. This is perfect for students – they learn from watching experts, and then they develop the necessary skills to do it themselves, all while making a tangible contribution to FDS. For this to work, the algorithm must be coded into FDS. If it is not ready for prime time, that is OK, as we just hide the new algorithm with simple IF-THEN statements.
  3. We have never had a case where a student’s work was hijacked by another. In fact, what students ought to worry about is being ignored, not copied. Publishing FDS-related papers in journals is NOT the best way to make a contribution in fire research. An improved algorithm in FDS is, and you cannot do that by hiding your work for 3-4 years.
  4. If we adopted the same attitude toward IP, there would be no FDS. As developers we have little time to publish. And we are punished for this in various ways. Is it fair that students hide the work they are doing and publish results atop the pains-taking work of the developers to improve the core functionality of the code?  We are not asking for co-authorship, just a little give and take.  The "payment" for using our software should be to help improve it.
  5. Finally, if you really think that your ideas are going to be “scooped”, understand this: all commits to the FDS repository are recorded and maintained forever. We know exactly when anyone has “touched” any routine. No one who has committed source code or experimental data to the FDS repository has ever been scooped. If it were to happen, we would be the first to write to a journal editor with undeniable proof that a researcher has violated the spirit of our open source development process.
The irony of the situation is that the majority of the students working with FDS leave with a masters degree and do not pursue academic jobs. What we have to offer these students in return for helping us develop and maintain FDS-SMV is an education in current software development practices, in addition to a far better understanding of how these models really work. Simply running a piece of software is hardly an enviable skill; but being able to get under the hood and work with its various components is. Furthermore, the skills you can acquire go way beyond fire. Many of you will move beyond fire into other areas of engineering or beyond in the next few decades. Our blogger Jake is a perfect example of a student who has acquired a tremendous amount of IT skills that go far beyond his chosen course of study. And that's a good thing because there are only so many jobs available in cosmology.