Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43077: Convert all tasks to use CalibrateImageTask outputs #979

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

parejkoj
Copy link
Contributor

No description provided.

class WriteSourceTableConnections(pipeBase.PipelineTaskConnections,
defaultTemplates={"catalogType": ""},
dimensions=("instrument", "visit", "detector")):

catalog = connectionTypes.Input(
doc="Input full-depth catalog of sources produced by CalibrateTask",
name="{catalogType}src",
name="{catalogType}initial_stars_footprints_detector",
storageClass="SourceCatalog",
dimensions=("instrument", "visit", "detector")
)
outputCatalog = connectionTypes.Output(
doc="Catalog of sources, `src` in DataFrame/Parquet format. The 'id' column is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy usage of src in this doc string.

@@ -188,30 +188,33 @@ def run(self, catalogs, tract, patch):
return catalog


# TODO: should deprecate this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on performing these deprecations on this ticket?

@@ -281,6 +284,7 @@ class WriteRecalibratedSourceTableConfig(WriteSourceTableConfig,
)


# TODO: deprecate, since reprocessVisitImage does this more thoroughly?
Copy link
Contributor

@leeskelvin leeskelvin Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're asking for an opinion, I can see the merit of adding deprecation flags here. If reprocessVisitImage can do a better job, I don't see the need to keep this around. With that said, is this beyond the scope of this ticket?

If deprecations are set up, please also set up the future removal ticket, blocked by the appropriate version release.

@@ -1122,6 +1126,7 @@ def _combineExposureMetadata(self, visit, dataRefs):
class ConsolidateSourceTableConnections(pipeBase.PipelineTaskConnections,
defaultTemplates={"catalogType": ""},
dimensions=("instrument", "visit")):
# TODO: Deprecate the dataframe connection?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And replace it with another storage class? Or just a deprecation and removal?

@@ -95,8 +95,9 @@ calexpFlags:
- base_PixelFlags_flag_saturatedCenter
- base_PixelFlags_flag_crCenter
- base_PixelFlags_flag_suspectCenter
- base_PixelFlags_flag_streak
- base_PixelFlags_flag_streakCenter
# Streak flags not yet propagated from difference imaging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are being left in but commented, please add a TODO and a DM ticket number for their eventual reinstatement.

@@ -769,8 +769,9 @@ flags:
- base_PixelFlags_flag_sensor_edgeCenter
- base_PixelFlags_flag_suspect
- base_PixelFlags_flag_suspectCenter
- base_PixelFlags_flag_streak
- base_PixelFlags_flag_streakCenter
# Streak flags not yet propagated from difference imaging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO and ticket link here too please.

@@ -380,8 +381,9 @@ flags:
- base_PixelFlags_flag_saturatedCenter
- base_PixelFlags_flag_suspect
- base_PixelFlags_flag_suspectCenter
- base_PixelFlags_flag_streak
- base_PixelFlags_flag_streakCenter
# Streak flags not yet propagated from difference imaging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO and ticket link here please.

parejkoj and others added 11 commits October 10, 2024 11:52
Now that reprocessVisitImage is directly writing an ArrowAstropy, it
no longer has an index column, so we can just treat the id like a normal
column.
They will get added back in once we start propagating them from the
diffim.
Remove STREAK from MeasureMergedCoaddSources.
calib_detected is now meaningless, as the icSource table is no more.
This is needed for compatibility with outputs produced by
CalibrateImageTask.
Some doc parameter lists had gotten out of sync with the code, and it's
always good to clarify whether you've got a DeferredDatasetHandle or a
real object.
Background modeling is done on the image without the calibration
applied, and that's how it should be subtracted, too.

This was a clear bug in the old code, but probably a very minor one,
because the photometric calibration is close to constant, and we were
previously only incorrectly multiplying the skyCorr background by the
ratio of the photometric calibration to its average.
This is needed for compatibility with outputs produced by
CalibrateImageTask.
@@ -4,7 +4,8 @@
# See the DPDD for more information about the output: https://lse-163.lsst.io
funcs:
sourceId:
functor: Index
functor: Column
args: id
Copy link
Member

@TallJimbo TallJimbo Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parejkoj, do you recall what led you to make this change? I'm finding that I don't need it, which is very convenient because it means that (at least for now) I can use the one Source.yaml for both calibrateImage outputs and writeRecalibratedSourceTable despite one writing ArrowAstropy and one writing DataFrame (I do need to make other changes to the file and various plugin configurations, but that's orthogonal). But I want to make sure I'm not missing anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment on this commit: 1df4911

The Index line didn't work with ArrowAstropy when I tried it, and this treats it just like a normal column.

I'm surprised this file works with calibrateImage outputs, since that task is not configured to produce an output catalog with many of the fields that this Source.yaml expects. That's why there's a new dedicated .yaml file for that task's outputs on another commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Index line didn't work with ArrowAstropy when I tried it, and this treats it just like a normal column.

Do you remember what didn't work? I also expected it not to work, but unless I didn't test what I thought I did, restoring the original didn't cause a failure in transform[Pre]SourceTable or consolidate[Pre]SourceTable for me. I'm running a ci_hsc in Jenkins now with this one bit reverted (on DM-46991) so that'll hopefully check what I tried to do locally.

I'm surprised this file works with calibrateImage outputs, since that task is not configured to produce an output catalog with many of the fields that this Source.yaml expects.

In my incremental approach I've reconfigured the task to produce all of those fields in DRP (except calib_detected). We've got analysis tools that need some of them, but the pipeline steps we have now are really not amenable to adding the multiple invocations we need of those tools (for initial stars, recalibrated stars, and final sources), so I want to fix that on DM-47320 before trying to review which analysis tools we run when, and I want to do that before trimming down the calibrateImage output fields back to [close to] the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember what didn't work?

No, I don't remember any of the details at this point. I'm pretty sure one of those listed tasks just failed when I tried to use it with ArrowAstropy inputs. It was probably ci_imsim or ci_hsc where the failure occurred.

I've reconfigured the task to produce all of those fields in DRP

Oof, that's what I was afraid of. Good luck. I hope we really can get it back to the defaults, and/or any non-default values are ones that we can look over with AP, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I figured it out - without this change, transformSourceTable does misbehave, but only by replacing the real source ID column with range(0, N), and it turns out nothing actually raises an exception (even downstream anywhere in the pipeline). A lot of tasks just produce no outputs, because finalizeCharacterization doesn't produce any PSFs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that sounds very familiar. I think it took me a while to figure it out; the "there were no outputs for tasks" was the secret.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants