Mathematical Formulas in Real-World Java Projects

We use the term formula code to refer to fragments of source code that implement a mathematical formula. In our empirical studies we  analyze the diversity and frequency of formula code in open-source-software projects. In an exploratory study, we investigated what kinds of formulas are implemented in real-world Java projects and derived syntactical patterns and constraints. We refined these patterns for sum and product formulas to automatically detect formula code in software archives and to reconstruct the implemented formula in mathematical notation. In a quantitative study of a large sample of engineered Java projects on GitHub we analyzed the frequency of formula code and estimated that one of 700 lines of code in this sample implements a sum or product formula. For a sample of scientific-computing projects, we found that one of 100 lines of code implements a sum or product formula. To assess the need for tool support, we investigated the helpfulness of comments for program understanding in a sample of formula-code fragments and performed an online survey. Our findings provide first insights into the characteristics of formula code, that can motivate further studies on the role of formula code in software projects and the design of formula-related tools.

Dataset for Replication:

Moseler, Oliver, Lemmer, Felix, Baltes, Sebastian, & Diehl, Stephan. (2020). On the Diversity and Frequency of Code Related to Mathematical Formulas in Real-World Java Projects [Data set]. Journal of Systems and Software. Zenodo. http://doi.org/10.5281/zenodo.4065367

 

Related Publications:

  • On the Diversity and Frequency of Code Related to Mathematical Formulas in Real-World Java Projects
    Oliver Moseler, Felix Lemmer, Sebastian Baltes, and Stephan Diehl
    in Journal of Systems and Software, Elsevier, volume 172, February 2021. [preprint on arXiv, see publisher site:
    https://doi.org/10.1016/j.jss.2020.110863]
  • Visual Breakpoint Debugging for Sum and Product Formulae
    Oliver Moseler, Michael Wolz, and Stephan Diehl
    in Proceedings of IEEE Working Conference on Software Visualization (VISSOFT 2020, NIER Track), Adelaide, Australia, 2020. [see publisher site: [see publisher site: https://doi.org/10.1109/VISSOFT51673.2020.00019]