Skip to main content

astropy@GSoC Blog Post #6.5 - Week 10, Final Evaluations

Hey there,

How you holding up?

So, this is the end of astropy@GSoC time period. There was a time growing up when I used to wonder about Astropy people. About those science wizards who worked on writing computer programs for an esoteric software thing that made calculations of the universe, with a bizarre portmanteau of a name, astro-pie. Haha, and now, as the project finishes up, I am one of those fabled wizards. Well, it has been quite a ride, hasn't it?

Let's first take a step back to recap what the context of my Google Summer of Code project was and then have a detailed look at what has been done. Below is part of the short project synopsis I had submitted during the application process back in April.

One important feature of Astropy is reading and writing tabular data in a wide variety of useful formats. One such astronomical data storage format is the CDS-ASCII format employed by Centre de Données astronomiques de Strasbourg (CDS) for maintaining its VizieR catalogue. Currently Astropy only supports reading data from a CDS-ASCII table and not writing to it. The present project is meant to address this issue by adding a CDS-ASCII writer to Astropy. CDS already has an available package to generate standardized ASCII tables for submission of data to its catalogue. The proposed Astropy CDS-ASCII writer can be based on the methods used by that package.

I am glad to say that, more or less, together with the zealous support from my considerate mentors Aarya and Moritz, I have been able to fulfil the aim this project started with.

The work started with directly importing pycdsreadme code within the Astropy writer and utilizing it's features for writing out the required table. We soon realized that not all pycdsreadme code can be used as it is, because, one, that it was meant for cases when the user had multiple tables to be written and two, perhaps the more significant, that it employed doing raw string manipulations instead of using various high-level Astropy capabilities. So, we took what pycdsreadme code was absolutely necessary as a reference and then started to write the code in a bottom-up approach. Proceeding thus, by the time the First Evaluations ended, I had completed teaching an Astropy writer to write in the aforementioned CDS format albeit without support for optional keywords yet. The requisite tests and documentation that apparently tag along with Astropy PRs all the time, were also done. There had also been a minor hickup along the way about the writer's codestyle, which was duly dealt with (more about that here).

At this point in time, although it had already been hinted previously by him slightly, Tom Aldcroft provided some cool insights and inquisitive suggestions. Now, there is something to keep in mind while writing tables in the CDS format, much more so when you are actually teaching a writer to write it. And this seemingly all essential point is that there are not one but two, highly alike but ultimately distinct formats: the CDS format, with its immaculate details and n number of optional keyword fields and the more down-to-earth and popular Machine Readable Table aka the MRT format. We had been working on writing the former and then decided to switch towards completing the latter first.

Developed in 1993, the CDS-ASCII file format (more simply called CDS after the eponymous organisation) solved the problem of saving already published lengthy tables, for which FITS wasn't an ideal choice, by separating the given input tabular data into two individual files. These files were the description or the ReadMe file containing a description of the table, and the Plain ASCII file(s) containing the actual table data. The ReadMe included numerous metadata keywords akin to the ones used in FITS, however the comparatively more prominent section was the Byte-By-Byte description of the table. Over time, the American Astronomical Society (AAS) developed its own format based on CDS for submissions to its journals, by dropping all the optional metadata keywords while retaining the Byte-By-Byte, and calling it the MRT.

The MRT format differs from CDS in two key aspects:

  • It has a much smaller template with only 4 metadata keywords in addition to the Byte-By-Byte, namely the Title, Authors, table Caption and Notes if present.
  • MRT consists of a single file with the ReadMe and the Data parts combined into one, whereas CDS can also have multiple Data files.

After much deliberation, we opted to also separate out the two formats into different writer classes of their own within Astropy, ascii.cds and ascii.mrt, in preference to having an argument like template been given during the function call. Further needing a separate attention were some specially-abled table columns, the so called mix-in columns in Astropy parlance.

Since Astropy is hosted on GitHub, this project work has been done in the form of 4 Pull Requests (PR) to Astropy's main repository. These PRs are listed as follows.

  • GSoC21aAdd MRT format writer to cds.py #11897. Contains the primary of the work, including the Byte-By-Byte writer and automatic conversion of coordinate columns. Fully complete with proper tests and documentation. Can be merged immediately.
  • GSoC21bSupport CDS/MRT format writing for Time columns #12027. Builds up on the previous PR. Fully complete with proper tests and documentation. Can be easily merged after rebasing on the previous PR merger.
  • GSoC21cMRT metadata input and subsequent writing #12039. Builds up on the previous PRs to provide writing of common CDS/MRT metadata. Proper tests have been added. Documentation only partially complete at the time of writing. Can be merged after rebasing on the previous PR merger.
  • GSoC21d: Add CDS template to CDS/MRT writer #12096. Builds up on the previous PRs to provide the CDS template for writing with CDS metadata. Also separates out MRT and CDS writers into two different classes. Proper tests have been added. Documentation only partially complete at the time of writing. Can be merged after rebasing on the previous PR merger.

Note that these PRs haven't been merged to the Astropy main repository at the time of writing, despite the fact that the first one of these GSoC21a was finished more than 2 weeks prior. This was because some Astropy tests were failing earlier due to Numpy depreciation warnings. 

With the release of version 1.20, Numpy has deprecated numpy specific aliases for different Python types, for example np.int. Most previous tests in Astropy appear to use these now depreciated numpy types, which raises warnings during testing our code. I was first able to provide remedy for majority of these by additionally using np.issubdtype(col.dtype, np.integer), while checking if the columns has integer values, but for tests using older versions of the dependencies, this wasn't much of a help. In order to solve this for all test cases we thought it pertinent to change the condition to what was been suggested by the FutureWarning, i.e. to np.issubdtype(col.dtype, np.dtype(type).type. And viola! All tests pass now. Since this is now done, the code from this work can be merged and the Astropy library will then have a full functioning CDS and MRT format writer, with only the support for some obscure optional features left behind.

On a related note, Astropy is in process of dropping support for Python 3.7 and Numpy 1.17 in #11934 and #11935. I locally checked the output of the in-build Astropy tests after rebasing on these PRs and find that all the tests pass but for the ones with the Numpy 1.18 version. Tests with later versions of numpy and with Python 3.8 and 3.9 pass alright. So, it is evident that the problem solely lies with those menacing depreciation warnings, which would soon be removed in updated versions anyway. According to the Numpy drop schedule, support for Numpy 1.18 will be dropped on Dec 22, 2021. Astropy may also perhaps plan to drop the same before that then. 

Astropy and me after GSoC

I intend to continue towards a long-term long-distance healthy relationship with Astropy as the GSoC period ends. Astropy seems to agree to this. Long-term commitments are vastly favoured afterall.

So, in addition to the PRs mentioned above, I am also working on a few additional touch-ups to the writer and an enhancement to the CDS/MRT reader. These changes did not fall under the scope of this GSoC project, but I reckon they will nonetheless be significant improvements to the CDS and MRT format writer/reader in Astropy. Thus I be working on them gradually in the time to come. The same will be included in some new PRs, for instance,

  • On writing out the respective CDS or MRT templates when the passed table is empty, i.e. doesn't contain any columns.
  • On saving the unit name as table column attributes when the column values have Magnitude as the unit.
  • On supporting automatic division of coordinate columns into coordinate components for tables with multiple coordinate columns.
  • On enhancements to the CDS/MRT reader so that table roundtrip is possible using table meta dictionary.
Uhm... well, that all. It has been an enriching experience working on the astropy@GSoC project.

I am pretty sure I would be continuing to contribute to Astropy. Hope to see you around.
Bon voyage!

Comments

Post a Comment

Popular posts from this blog

Why I write?

Initially I wanted to title this post,  On writing Diaries , however this thought-provoking question is far better. Although it may take me a long time to be an author per se, more or less, I have been a continuous writer since my seventh grade. What started with jotting down non-sense farce on daily routines during the middle school years, has now grown into some fierce rhetoric on vast ranging topics important to me in general. So, what is it about writing that I like so much? Unlike most of my friends, what has made me go on for so long now? Why do I write? And why do I plan to write more vigorously, more persuasively and more indiscriminately than ever in the coming future? Well, there isn't any plain simple answer to these. I write because I like to write. I write because it gives reality to all of these thoughts inside my head which couldn't find an route through the tongue. I write because it often makes me feel good. I write because I find this the best way to convey my...

Review: Breakfast at Tiffany's

Breakfast at Tiffany's by Truman Capote My rating: 3 of 5 stars Hmm... hard to say whether or not I like it. The story is fine. The protagonist is spectacular. And yet, I cannot seem to whole-heartedly be fond of the novella. Parts of the story are fascinating, for instance, Holly Golightly's apparent unreserve with the budding writer narrator. I also like the opening scene which introduces the mystery of where our heroine can be at the moment. The scene brings the reader into the story and somehow or the other, we are entrapped the fast-paced New York of 1940. I would recommend the book as a tranquil read. However, honestly, with the impeccable performance by Audrey, the film version brings forth Holly's persona a tad bit more effectily. The book is outstanding, no doubt, but couple it with the movie session post reading and there you have it - " a top banana in the shock department". View all my reviews

astropy@GSoC Blog Post #3, Week 3

So, it's the start of the 3rd week now. I will be virtually meeting Aarya and Moritz again Tom. For the past few weeks now, I have been pushing commits to a Draft PR  https://github.com/astropy/astropy/pull/11835  on GitHub. I wanted to have something working quite early in the project, in order to be able to pinpoint accurately when something doesn't work. This is why I started with directly adding the cdspyreadme code within Astropy. Afterwards, I am also writing the code from scratch. As more of the required features from cdspyreadme get integrated into cds.py , those files and codes added earlier will be removed. About the reading/writing to Machine Readable Table format, in fact I wrote about it briefly in my GSoC Proposal that I could attempt it as an extension. I don't have an opinion on whether or not it should have it's own format classes etc. However, since the title of my GSoC project is to Add a CDS format writer to Astropy , I would prefer to work on the ...