Skip to main content

astropy@GSoC Blog Post #6, Week 8&9

Heads-up about the Progress of #11897

In summary the situation of the concerned PR a few days back was 4 types of CI test errors, one bug and possibly a need for modification of part of the code copied from pycdsreadme. All these have been taken care of as detailed below, but for the numpy depreciation warnings that keep coming up. I don't think we can do anything about the latter's persistence as of now. I shall comment more about it on GitHub as well.
  1. File not found error: Moritz's HW, i.e. using get_pkg_data_filename import, directly took care of this.
  2. Error in coord col decimal places: The precision of the coordinate component columns was getting set arbitrarily, which created difference in the output for 32-bit and 62-bit machines, and possibly between different operating systems. This has been corrected by having a fixed number of 12 digits after decimal for RAs, DEs and the latitude/longitude columns of Galactic and Ecliptic coords. This error also relates with the Formats bug.
  3. SphericalRepresentation col error: Now, this was a bit major issue compared to the two above, although the solution was only 2 line changes. When the coords cols were checked for and divided into components, the original SkyCoord col was deleted right within the loop. This made the iteration index of the loop to point to i+2 column after deletion, where i is the index of the original SkyCoord col. That is, effectively skipping the immediate next column after the SkyCoord col, as it would have receded by one place in the list. Got this fixed by popping the original SkyCoord col after all the columns in the table have been iterated over. This way all object type columns are converted to Column objects with str values.
  4. ~table.tests and test_write failures: All these errors were warnings due to depreciation of numpy specific aliases for different Python types. Most previous tests in Astropy appear to use these now depreciated numpy types, which raises warnings during testing our code. I have been able to provide remedy for majority of these by additionally using np.issubdtype(col.dtype, np.integer) while checking if the columns has integer values, however, tests with oldest supported version of all dependencies still fails. See my GitHub comment for more info.

The
formats bug

This was another major problem we had stumbled upon. It took me a while to skim through various docs and codes to find the optimum fix for this.

Our initial insight was that the difference between the Byte-By-Byte description and the data part of the written table, when the formats argument is passed to the write function, related in some manner to the string formatting part of the code. By first look itself, it was evident that there isn't any provision in the writer for cases when the columns already contain a format attribute, which is what is assigned when formats is passed, as I had written here back then. Creating allowance for this was easy enough, right away correcting the test outputs. Now, both the Byte-By-Byte and the table data had the number of decimal digits, or whatever other format for that matter, we wanted them to have. Apart from the internally created coordinate component columns, for which the number of digits after decimal was fixed.

It is when we want to go a step further than this and wanna truncate or eradicate the string formatting part to obtain the column format, that we stumble upon a road block. There are two concerns,
  • If no formats argument is passed, col.format will be set to None.
  • Even if we already know the column format, say .5f, we still need to evaluate the maximum size of the value strings of the column in most cases, and do some formatting to have the format in CDS/MRT recommendation, Fx.5.
The column formats passed in the formats argument are set by using the in-build Python function format (https://docs.python.org/3/library/functions.html#format). For cases when no formats argument is passed, the default behavior when writing the table data, for instance in the FixedWidth writer is to set the column format to '' which is equivalent to saying val = str(val). (https://docs.astropy.org/en/stable/table/construct_table.html#table-format-string) FixedWidth uses the maximum length of these strings to get the column widths. So, there the string formatting part of the code is essential if we want to know the correct format for columns without string values.

However, there may be another solution to this that can be tried in the long-term. I was curious to know what other writers in Astropy did in such situations when the column format needs to be given explicitly in the header of the written table. There aren't extravagantly many such use cases, but the FITS standard tables do have format keywords in the header as serve the purpose well. So, looking over the Astropy FITS writer, I found the way in which it deals with the problem of assigning column formats is by separately defining all the formats that can be and then using a custom Column class which has some default format attributes. (See:
https://github.com/astropy/astropy/blob/main/astropy/io/fits/column.py). ASCII writers also have a custom Column class, but the attributes that it currently has are exceedingly lacking to be of any use to us now. (https://github.com/astropy/astropy/blob/79323de928e87827526ed8fce04986a5dd459794/astropy/io/ascii/core.py#L270) In the long-run, we could take motivation from the FITS writer and make changes herein.

Other updates

I have began to work on the other two branches for Time cols and MRT metadata resp and would have them done in some time.
On an unrelated note, I found that the test_cds_header_from_readme.py test file in astropy.io.ascii.tests contains some CDS reading tests. It was recently modified by the 11593 PR (https://github.com/astropy/astropy/pull/11593/files). I imagine that these tests can be incorporated within test_cds.py and then we won't perhaps have to move CDS/MRT tests to any other test file?

Comments

Popular posts from this blog

Why I write?

Initially I wanted to title this post,  On writing Diaries , however this thought-provoking question is far better. Although it may take me a long time to be an author per se, more or less, I have been a continuous writer since my seventh grade. What started with jotting down non-sense farce on daily routines during the middle school years, has now grown into some fierce rhetoric on vast ranging topics important to me in general. So, what is it about writing that I like so much? Unlike most of my friends, what has made me go on for so long now? Why do I write? And why do I plan to write more vigorously, more persuasively and more indiscriminately than ever in the coming future? Well, there isn't any plain simple answer to these. I write because I like to write. I write because it gives reality to all of these thoughts inside my head which couldn't find an route through the tongue. I write because it often makes me feel good. I write because I find this the best way to convey my...

Review: Breakfast at Tiffany's

Breakfast at Tiffany's by Truman Capote My rating: 3 of 5 stars Hmm... hard to say whether or not I like it. The story is fine. The protagonist is spectacular. And yet, I cannot seem to whole-heartedly be fond of the novella. Parts of the story are fascinating, for instance, Holly Golightly's apparent unreserve with the budding writer narrator. I also like the opening scene which introduces the mystery of where our heroine can be at the moment. The scene brings the reader into the story and somehow or the other, we are entrapped the fast-paced New York of 1940. I would recommend the book as a tranquil read. However, honestly, with the impeccable performance by Audrey, the film version brings forth Holly's persona a tad bit more effectily. The book is outstanding, no doubt, but couple it with the movie session post reading and there you have it - " a top banana in the shock department". View all my reviews

astropy@GSoC Blog Post #3, Week 3

So, it's the start of the 3rd week now. I will be virtually meeting Aarya and Moritz again Tom. For the past few weeks now, I have been pushing commits to a Draft PR  https://github.com/astropy/astropy/pull/11835  on GitHub. I wanted to have something working quite early in the project, in order to be able to pinpoint accurately when something doesn't work. This is why I started with directly adding the cdspyreadme code within Astropy. Afterwards, I am also writing the code from scratch. As more of the required features from cdspyreadme get integrated into cds.py , those files and codes added earlier will be removed. About the reading/writing to Machine Readable Table format, in fact I wrote about it briefly in my GSoC Proposal that I could attempt it as an extension. I don't have an opinion on whether or not it should have it's own format classes etc. However, since the title of my GSoC project is to Add a CDS format writer to Astropy , I would prefer to work on the ...