GSoC Gupta Lab


GCC is one of the famous and widely used open source complier used to compile and debug C programs. Apparently, it has been noted and also highlighted (by GNU) the known problems that affect users of GCC. “Some of the problems may be due to bugs in other software, or some are missing features that are too much work to add, and some are places where people's opinions differ as to what is best”. In this project, we focus on the errors and warnings produce by GNU compiler, and analyze those in order to understand and develop a predictive model that helps us to pinpoint the root cause (actual cause, instead of a number of lines) of the respective error or warning. It is no secret that C happens to be one the most widely used languages for coding around the world. Despite the popularity, there does not exist a single predictive model which pinpoints the sources of the errors that we most commonly come across. This is the motivation behind the project. Once successfully completed, the model will be of great help in reducing the debugging efforts and costs and will help in a better understanding of the language and its usage.

Software Homepage : CVisualyzer


GNU Compiler, C Programming

Programming Skills:

Java, Python, c++, c .

Road Map:

Till now, we have analyzed and collected the errors and warning logs of more than 400 programs, produced by GNU compiler. We have also identified several patterns in that and now implementing them in Java language. Meanwhile, we are keen to collect data for at least 1000 C programs.

Roadmap The Project:

We started off by collecting various types of errors and warning produce by the GNU compiler, and then logging those errors and warnings in the database. In this process, we are trying to identify patterns between various errors and warnings in order to successfully give out remedies when we get to the predictive model. Parallelly, we will apply Natural Language Processing (NLP) techniques in order to parse the errors and warnings produce by GCC; so that we can match the code-errors with our database and can predict flaw in the code accurately. Our aim is to mainly focus on the errors and warnings produce by the GNU compiler (more than 400 buggy programs has been complied in order to understand the pattern in the error-warning logs), and next generating and implementing a predictive model which accurately map the error-warning and highlight the accurate cause of the bug. Once the database is complete, we will use student developed programs, developed as a part of the course, to train our predictive model and thereafter other programs for testing purpose, so that we ensure an efficient model to ensure accuracy. We have made sure to test as many different kinds of codes that we come across in C, so that we have a wide array of errors which are commonly encountered and we are more successful in finding out any patterns between different errors in C. We have also examined and analyzed various kinds of mutation testing tools so that we get varied errors from the same erroneous code.

Project Details:

Our project will try to give out the user the root source of the error or the very probable cause of it. We aim to achieve this by studying existing C errors that are achieved using random mutation and random student codes containing compilation errors of many kinds. The database generated in the process gives us the underlying patterns of C compilation errors which are generated due to one single mistake. Below are examples given illustrating such patterns. This predictive model will be trained by a random sample of 70-80% dataset and remaining will be to use as test data. This predictive model will be than deployed to run on working data and live codes where we will use it to give out summary of our original compilation errors. We will be using NLP (preferably Stanford NLP) to parse the error strings generated as compilation error at this stage. Finally, we aim to develop an automatic tool to visualize the errors and warning in C programs which will thereafter helpful to debug the programs and may improve the learning by providing a comprehensive compiling environment (e.g., students, practitioners, etc.) for the users. We also implement the “Options for Debugging Your Program or GCC” in our tool provided by GNU.

Success Criteria:

  1. The tool recognizes the errors and shows an intended predictive output for the respective errors and warnings.
  2. The training and testing of the earlier developed predictive model strengthen the database and result in incorporating more number of errors and warnings.
  3. Tool help to debug your program as per the guidelines provided by GNU.

Copyright © 2015- Dixita Limbachiya