Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] data ingestion commands - flags/user control for creating new mini files (orig segdtoph5 command line not following user input) #393

Open
hrotman-pic opened this issue Apr 8, 2020 · 7 comments
Labels
enhancement GeoHDF Things to consider for GeoHDF

Comments

@hrotman-pic
Copy link
Collaborator

hrotman-pic commented Apr 8, 2020

Enhancement
This flag requires to create new mini files start from a given index.
To meet that request, Das_t should be able to links to multiple external mini files. However, one pytable node can link to only one external mini files. In the future Das_t should be an array or a list of node and each of this will link to a minifiles.

Describe the bug
When adding new .fcnt files to an existing node archive, segdtoph5 does not create mini files starting at the specified mini and may not create the specified number of additional mini files. Commonly, the data gets appended to existing mini files in what appears to be an unpredictable order. This is undesirable for adding service runs.
segdtoph5 also does not accept immediately subsequent mini indices: segdtoph5: error: option -S: invalid integer value: '00038' where the existing PH5 had minis 00001-00037.
All experimentation took place on ph51/Working/Holly_append_test

Environment (please complete the following information):

  • OS: Ubuntu 18.04, ph51
  • Program Version: segd2ph5 2019.252

To Reproduce
For a case where some new files have the same DAS S/N as in the existing PH5, use
segdtoph5 -f ../WSTF.list -n master.ph5 -U 13N -M 7 -S 00040 on a copy of FaultScan. A fresh copy can be obtained from ph51/completed. This approach mostly appends data to existing minis, including for DAS S/Ns that do not exist in the PH5 being appended to, and creates 1 new mini from 5 fcnt files.

For a case where the DAS S/Ns are different:
segdtoph5 -f ../Valles_3C.list -n master.ph5 -U 13N -M 9 -S 00040
on a copy of FaultScan. This attempt is partially successful, since it creates a few new minis, but starts at mini 00038 then reverts back to adding to existing minis, shown below with leading zeros removed from mini index:

.fcnt file Mini file
1 38
2 39
3 40
4 12
5 2
6 3
7 8
8 1
9 10

OR
segdtoph5 -f ../ShaleHills.list -n master.ph5 -U 18N -M 5 -S 00040 on a copy of FaultScan. This attempt does not create any new minis:

.fcnt file Mini file
401 12
403 2
404 3
405 8
406 1
407 10
408 9
409 6

OR
segdtoph5 -f ../Playa2.list -n master.ph5 -U 14N -M 24 -S 00101 on a copy of FaultScan. This attempt creates new minis, but starts at mini 00038 instead of 00101, and creates 47 new minis instead of 24.

An example of the most successful attempt:
segdtoph5 -f ../Valles_3C.list -n master.ph5 -U 13N -M 9 -S 00101
However, this was not completely successful because it started creating new minis at 00038.

Expected behavior
segdtoph5 to start at the mini specified.
segdtoph5 to create the specified number of mini files.
segdtoph5 to accept a valid mini index (e.g., 00038) as a valid integer.

@hrotman-pic hrotman-pic added the bug label Apr 8, 2020
@dsentinel
Copy link
Contributor

Let look at making a more general merge tool for :
1 adding a service run
2 merging 2 different ph5's i.e. 2 distince metadata and data sets

@hrotman-pic
Copy link
Collaborator Author

I wonder if there's something from ph5_merge_helper that would assist with those items? Never used it, uncertain of how functional it was when used.

@dsentinel
Copy link
Contributor

That's a good thought... I vaguely remember that process was also pretty manual, but I've never used it.

@damhuonglan
Copy link
Contributor

I have done a research on segdtoph5 and this is the way it currently works:

  • Option -S (FIRST_MINI) is to identify the very first miniPH5_XXXXX.ph5 on the first run
  • If the serial number already in a mini file, data will be write to that file
  • if serial number isn't in any minifile, get the HIGHEST_MINI file and check the following:
    • If option M (NUM_MINI) is used:
      • (HIGHEST_MINI - (FIRST_MINI-1) < NUM_MINI: file HIGHEST_MINI + 1 will be created
      • (HIGHEST_MINI - (FIRST_MINI-1) >= NUM_MINI: data will be write to SMALLES_MINI file
    • If NO option M:
      • Check total size of data including the new one if greater than 100GB, file HIGHEST_MINI + 1 will be created

When I look at the comments in the file, I see that it works as it is designed . There are no bugs in it.
@dsentinel @ascire-pic Please review the rules above and let me know if you want to change it to work differently.

@ascire-pic
Copy link
Collaborator

I need to logic my way through all of it but overall it looks like it makes sense.

We might want to be able to override adding to an existing file if the serial number already exists. If you have a PH5 for 10 stations and then add more data for those stations after another service run, it will currently add to the existing mini files because the serial numbers already exist. However, if that original PH5 archive is already at the DMC, then you have to replace the entire archive because you just added more data to the already existing mini files.

@damhuonglan @dsentinel How complicated would it be to add a flag to override the serial number check so you could create new mini files even if a serial number already exists? I don't think it should be the default behavior, but make it an optional flag?

@hrotman-pic
Copy link
Collaborator Author

May I ask, then, that the help for segdtoph5 be updated to reflect the logic being used and to make it clear that some PH5 logic will override user input? The current help makes it look like more behavior is under the user's control than is actually the case, and is misleading in its current form.

I'd like to note that I don't want the 'not a valid integer' error to be lost in the shuffle, especially since segdtoph5 will output that, for example, mini 00038 is not valid, and then when a larger integer (e.g., 00040) is specified as the starting mini, segdtoph5 may create mini 00038. If a mini index number is not valid, it should be not valid across the board. (I would have to look up the paths to the test archives and segdtoph5 input that resulted in this error.)

@ascire-pic
Copy link
Collaborator

ascire-pic commented Sep 9, 2020

I believe we should make similar changes to the other data ingestion commands:

  • mstoph5
  • 130toph5
  • seg2toph5
  • 125atoph5

All data ingestion commands should have similar options to the new functionality in segdtoph5 for appending to an existing PH5. 125atoph5 is probably lowest priority as the Texans are being phased out.

@ascire-pic ascire-pic changed the title [BUG] segdtoph5 command line not following user input [BUG] data ingestion commands - flags/user control for creating new mini files (orig seg2toph5 command line not following user input) Sep 9, 2020
@hrotman-pic hrotman-pic changed the title [BUG] data ingestion commands - flags/user control for creating new mini files (orig seg2toph5 command line not following user input) [BUG] data ingestion commands - flags/user control for creating new mini files (orig segdtoph5 command line not following user input) Sep 28, 2020
@damhuonglan damhuonglan mentioned this issue Oct 16, 2020
5 tasks
@ascire-pic ascire-pic added the GeoHDF Things to consider for GeoHDF label Nov 3, 2020
@ascire-pic ascire-pic removed the bug label May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement GeoHDF Things to consider for GeoHDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants