We have developed a technique to make sense of change information from a typical software project's history. The core of our approach is to treat the program text as a tree, to find differences in the tree structure, to group similar differences together, and then finally to extract a pattern that represents each group.
Jason Dagit presented our work at the DChanges workshop in September, part of the DocEng conference. The paper is available from the workshop site, along with the full workshop proceedings. The slides from the talk are also available.
The problem we looked at is simply stated:
What does the change history of a piece of software as represented in a source control system tell us about what people did to it over time?
Anyone who has worked on a project for any substantial amount of time knows that working on code isn't dominated by adding new features -- it is mostly an exercise of cleaning up and repairing flaws, reorganizing code to make it easier to add to in the future, and adding things that make the code more robust. During the process of making these changes, we have often found that it feels like we do similar things over and over -- add a null pointer check here, rearrange loop counters there, add parameters to functions elsewhere. Odds are, if you asked a programmer "what did you have to do to address issue X in your code," they would describe a pattern instead of an explicit set of specific changes, such as "We had to add a status parameter and tweak loop termination tests."
We started with some work Matt Sottile had developed as part of a Department of Energy project called "COMPOSE-HPC" where we built infrastructure to manipulate programs in their abstract syntax form via a generic text representation of their syntax trees. The representation we chose was the Annotated Term form used by the Stratego/XT and Spoofax Workbench projects. A benefit of the ATerm form for programs is that it allows us to separate the language parser from the analyzer -- parsing takes place in whatever compiler front end is available, and all we require is a traversal of the resulting parse tree or resulting abstract syntax tree that can emit terms that conform to the ATerm format.
To show the idea at work, we used the existing Haskell language-java parser to parse code and wrote a small amount of code to emit an ATerm representation that could be analyzed. We applied it to two real open source repositories -- the one for the ANTLR parser project and the Clojure compiler. It was satisfying to apply it to real repositories instead of contrived toy repositories -- we felt that the fact the idea didn’t fall over when faced with the complexity and size of real projects indicated that we had something of real interest to share with the world here.
What can you learn from your software history?
Galois is delighted to announce that our proposal "Practical Roots of Trust for Mobile Devices" has been selected for award by the Department of Homeland Security. In this Phase I Small Business Innovative Research (SBIR) award, Galois will be investigating methods to provide secure yet practical methods for mobile devices to authenticate to critical systems. This work builds on Galois' expanding experience in mobile device security, including our previous work with SRI and United States Marine Corps.
Contact Leah Daniels at 503-808-7152 for more information.
Galois Awarded Phase II SBIR with Office of Naval Research: Programmer Intention Capture Tool (PICT)
Galois has been selected by the Office of Naval Research (ONR) for a Phase II Small Business Innovative Research (SBIR) award, for its PICT tool that interactively captures and manages programmers' intentions.
The design of a software product often isn't fully captured by the semantics and syntax of the language - many aspects of the design reside in documents, in the heads of programmers, and other places not easily analyzed. PICT is a mechanism to capture these programmer "intentions" in such a way that static analysis tools can be guided to check how the code and these intentions match up. By providing a mechanism for essentially scripting static analysis methods, we reduce the difficulty of taking advantage of them to analyze code for a wide variety of intentions.
Contact Leah Daniels at 503-808-7152 for more information.
Galois is pleased to host the following tech talk. These talks are open to the interested public--please join us! (There is no need to pre-register for the talk.)
This talk is on Friday at 2pm.
|title:||Using Drones in Agriculture|
|time:||Friday, 20 September 2013, 2pm|
421 SW 6th Ave. Suite 300,
Portland, OR, USA
(3rd floor of the Commonwealth building)
abstract: Small unmanned aircraft---more often called drones---are set to make a big impact on agriculture. You already know about military drones operating overseas, and perhaps you've even seen recreational drones starring in youtube videos, but as the FAA begins permitting commercial use of unmanned aircraft in 2015, you'll see drones replacing all sorts of roles that used to require a manned aircraft, and taking on new roles made possible by their low cost, versatility, and safety.
Chris Anderson will speak about the upcoming role of drones in agriculture, a field Chris says "is a big data problem without the big data." Chris will describe how farmers can will drones to curb plant disease, conserve water, and reduce pesticide and fertilizer use. He'll discuss the challenges ahead to integrate air vehicle systems with sensors, specialized cameras, and data processing.
bio: Chris Anderson is the CEO of 3D Robotics, a manufacturer of drone autopilots and complete drone systems based on open source hardware and software. He is also the founder of DIYDrones.com, the bestselling author of "Makers", and the former Editor in Chief of Wired Magazine.