Digging Deeper

During the third and fourth week, I was exposed to more yt codebase. I worked on data_objects module by writing more unit tests and fixing issues found during the process.


Merged PRs

  • Removed coverage report from Travis logs: With Codecov dashboard working properly, it was time for me to remove coverage report output in Travis logs. Initially, I had added this in Travis log to ensure the integration between Travis and Codecov. After removing this report at Travis, strangely the build started failing flake8 checks. This resulted in the decoupling of flake8 tests with the unit tests and division of Travis build in Lint and Tests stages.

  • Added more tests for Particle Filter: In this PR, I added few more test cases to increase the coverage for the class ParticleFilter.

  • Removing particle_io: I removed this deprecated and undocumented particle-IO handler from the codebase as part of the cleanup, which I came across as a result of writing more test cases for increasing coverage.

  • RegionExpression: I added more test cases for RegionExpression to capture various error messages and execution flows.


Open PRs

  • DataContainers: This PR is one of biggest in terms of number of lines of additions/deletions and number of files changed. Few highlights of this PR are as follows:

    • YTDataContainer’s write_out method is fixed. This function takes a data object and produces a tab-delimited text file containing the fields presently existing and the fields given in the fields list.

    • For class YTSmoothedCoveringGrid(YTCoveringGrid) the method returned None when level_state was None as shown below:
        def _setup_data_source(self, level_state = None):
            if level_state is None: return
      

      Extensive testing helped me identify this bug and now it is fixed by calling the corresponding parent method.

    • Added setting center to a point of minimum (‘gas’, ‘density’) by simply calling ds.sphere("min", (0.25, 'unitary')).

    • Opened an issue #1836 describing inability to load object after saving using YTDataContainer.save_object().

    • For one of the tests pandas package was required. This required installing pandas on Travis and AppVeyor. With the latest version of pandas, the build started failing in python3.4 as pandas stopped supporting it. I pinned the pandas for this environment and taking this opportunity, I updated our lint stage too to run on python3.6 from python3.4.

    • Many other tests were added to the uncovered code.

    • This PR will increase the code coverage of the entire yt project by +0.34%.


  • Input validation for disk geometry: There was an issue #1768 where the error messages generated by data_objects module were not informative enough. For this issue, I added error checking of input parameters for disk geometry. Using the approach of this PR, we can extend error checking for other geometries very easily.
    First, I enforced this validation using decorators. This approach was not feasible and scalable in this scenario since inside the decorator I was doing validation based on geometry type. This would have led to a very bloated decorator in future, due to different geometries. After discussion, I made it atomic by implementing checks for individual argument types, eg. we will pass the check for a float value if is a single numeric value, a tuple of format (float, string), a YTQuantity, or a list/array of length one comprising of earlier listed types.

Lessons learned

  • Good to use Git’s advanced features: I was always content with Git’s basic commands like branch, checkout, pull, and push. git stash made me nervous about what will happen to my unsaved work. When I started working on multiple branches, it was difficult for me to manage with the limited git commands. Thus, it was time for me to take the leap and read more about these commands patiently. Further, my mentors explained the benefits of using git in interactive mode (-p) and other commands like squash, rebase, reset. After learning these, my :heart: for git has increased further.

  • Avoid bulky PR: It was tough for both the reviewers and me to work on big PRs like #1831. Thus, it is best to have small PRs which will not only get a closer look but would also require lesser turnaround time.

  • Brainstorm before code: In order to work efficiently, it is always better to brainstorm on the problem/solution with others. This helps in coming up with the best possible approach and saves development time.


The road ahead…

  1. ParticleTrajectories: This class is a collection of particle trajectories over a series of datasets. At present, we use a real dataset to test this class. Since it is sufficient to test the functionality of this class by using fake datasets, I created a PR (removed real dataset from the tests) where I used static data for testing. After more discussion on this PR, I decided to add fake_timeseries function that would produce time series datasets. Thus, in coming weeks, I will update this PR by using fake_timeseries.

  2. Answer Testing: The main culprit of higher runtime at Travis is the testing done in visualization and volume_rendering modules. Following screenshot of Travis build shows the time taken by these modules is one of the highest in the entire test suite.

    Many of the tests in these modules are of answer testing type and execute only on Jenkins and thus keeping a lot of scope to write unit tests for them. Test coverage at Travis for these modules currently is 45.34%. Hence, now already a bit familiar with yt codebase, it is time for me to jump in and get my hands dirty.

    The strategy is to identify tests that are time-consuming and come up with an approach to optimize them if possible. Another aspect is to reduce the reliance on the answer tests by having unit tests for those scenarios. These unit tests, as opposed to image comparison tests, will check the data returned by plotting functions.

  3. Coverage Integration: At present, only Travis test reports are shown on our Codecov dashboard. I will be integrating the reports from Jenkins and Appveyor as well in coming days.

This marks the end of the first phase of GSoC and I am as excited as day one to dive in phase two!