-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: fill data gaps and bug fix trim start end times #84
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…n value process function 'merge_and_trim_start_end_times' now takes fill value in order to fill in gappy data added test to test gap filling
…ntage length of allowable data gaps. defaults to 1 (100%). added test to cover this
…and trim function trim function now takes actual expected start and end times rather than inferring from the data bugfix gap percentage not accurately determinig total trace length
… which would create incomplete data
…ng an additional x% of data from data center because downloading the exact time bounds may lead to shorter waveforms after resampling or preprocessing. the extra downloaded data is trimmed away before being saved
…n to allow for direct preprocessing of streams without event ifnormation
This was referenced Feb 24, 2023
bch0w
added a commit
that referenced
this pull request
Jul 19, 2023
* Fix various bugs and adds features to code (#60) * renamed specfem station writing function, and added an ordering component * bumping version in setup and changing url and author * small mt typo change * added an example config that gathers a small amount of data for testing/dev purposes * added a new kwarg 'order_stations_list_by' which sets the order of the output stations list, related to #36 * bugfix typo parameter call * bugfix rotating was not actually rotating streams in place for arbitrary components ->ENZ * bugfix fixed rotation testing, which was not actually evaluating as expected. some confusion about editing streams and inventories in place which was not actually happening. Fixed and now rotate test works as expected * added 12Z test data for rotating 12Z -> ENZ which works now, added test to cover this case * moved 2012 central alaska IU gather script (which gathers 12Z component) into examples * bugfix pysep was not setting log level properly from the config file. now in the load() function, config log level is allowed to override logger level * changed config examples dir name to test, to be used as a test bed for checking specific data gathering features such as data gap removal * bugfix #57, reading event origin time from sac header was shifting times unexpectedly * added utility function to convert obspy catalog to event input list * Bugfix missing rotation metadata throwing TypeError (#63) * bugfix clipped amplitude check was not properly checking the data array * added error catch and release during rotate code to avoid processing breakdown. also rotate now goes by channel rather than the entire stream at once * rearranged rotation algorithm to be a bit cleaner, previously too many different streams floating around and being fed to one another. * pysep rotation now has additional checks for station metadata and error catching to get around any rotation bugs that used to cause the rotation to fail added new tests to cover rotation and fixed a rotation test that was failing due to change added a config for testing the rotation bug, providing station info * updating bug config file * feature mass download (#64) * fixed exclude string making and added a bool check for deleting tmpdir in mass download function * removed accidental debug statement * Remove llnl_db_client hard dependency (#65) * moved llnl_db_client as a hard dep of pysep and moved it into an optional dependency hid llnl_db_client import statement behind a try-except block and threw a check in the init of pysep only requiring this dep if the client choice is LLNL * Added setup.py legacy file * removed llnl dep from conda env yml file * tested install with and without llnl dep and tested check statement for missing llnl import * Feature: event declustering and source receiver weighting (#66) * started adding declustering rewrite with test and test data * finished cartesian gridding decluster script, added radial gridding decluster script, added tests to cover both and plotting * decluster catalog added feature to threshold events by min magnitude and data availability separate from declustering added plot feture to connect source receiver pairs based on data availability added feature to allow sorting by data availability * adjusting default figure names for declust plotting scripts * cleaned up plotting functions into functions to avoid repeat plotting calls, added removed events to plotting routines * small typo updte declust * moved some declustering functions into util * declust started srcrcv weighting function * individual source or receiver list weighting working with the smart scan feature next up is to implement the entire weighting scheme with normalization and based on data availability * declust slow progression * finished srcrcv weighting scheme * finished declustering and weighting algos with additional plotting and text file writing for weights * added basic test for srcrcv weight calc * clean up docstrings of pysep/recsec, bugfix pysep restrictions (#68) * reformatted PySEP init docstring, added missing parameters and categorized parameters to make things easier to find * removed hard requirement that user provides event depth and magnitude if event selection is default * added boolean flag to toggle insufficient length checker bugfix: added remove_clipped boolean toggle on the actual processing step, which was not there before * bugfix phase list passed from pysep into util function for phase getting * last minute touch ups on docstring * recsec removed unneeded myround function from top of script * categorized recsec init docstring for easier readability * bugfix: rotate parameter check function was setting rotate parameter as an empty list, but it was expected as NoneType within the main processing function. removed this type conversion * bugfix: log message failing when magnitude or depth set to NoneType * bugfix allow NoneType event magnitude and depth, ass ign dummy values to cap header because these are required by record section * Update docs (#69) * moved docstring of pysep into class description for autoapi * instantiated sphinx for doc creation and autoapi building, reorganized pysep and recsec docstrings to be well formatted for autoapi * migrated wiki docs into docs directory * fixed docstrings and corrected links for API references in docs * fixed up typos and cleaned up docs * Delete jekyll-gh-pages.yml * Bugfix: unable to set event_depth_km or event_magnitude as NoneType (#72) * allow taup arrival time get to be skipped if event depth not provided * sac header functions that required 'evdp' are now skipped over if 'evdp' is not present in the sac header * Update pysep.py bugfix: removing debug statement left in pysep * Feature recsec tmarks (#73) * replace plotw_rs with record section run command in pysep plotting, sets default sorting to 'distance' and not 'distance_r' for pysep-generated record section * working on ordering of multiple pages * added feature tmark to add static lines at given time values, does not address time shifts or move out * Bugfix: plotw rs sort (#76) * reformatting plotw_rs page separation logic * sort order of multi-page record sections is now reversed to be more natural * Update CHANGELOG.md * version bump v0.3.0 * Bugfix trace merge (#81) * added a try-except block over trace merging to avoid errors caused by incompatible sampling rates * re-ordered processing steps, resample now occurs before merge and start/endtime trims to avoid sampling rate inconsistency during merge process * added comment for merge command * Feature: fill data gaps and bug fix trim start end times (#84) * pysep new parameter which allows users to fill gappy data with a given value process function 'merge_and_trim_start_end_times' now takes fill value in order to fill in gappy data added test to test gap filling * added gap_fraction parameter which allows selecting the overall percentage length of allowable data gaps. defaults to 1 (100%). added test to cover this * added gap_fraction to pysep docstring and parameters * cleaned up docstrings for new parameters * split up processing function 'merge_and_trim...' into merge function and trim function trim function now takes actual expected start and end times rather than inferring from the data bugfix gap percentage not accurately determinig total trace length * bugfix merge gapped data not returning any data * remove debugger statement * added a check in trim function for late starttimes and early endtimes which would create incomplete data * changed default 'fill_data_gaps' value as False to be more intuitive * added new hidden variable 'extra_download_pct' which allows downloading an additional x% of data from data center because downloading the exact time bounds may lead to shorter waveforms after resampling or preprocessing. the extra downloaded data is trimmed away before being saved * fixing tests, trim start and end times ignored if no origin time given to allow for direct preprocessing of streams without event ifnormation * bugfix: location code hardcoded for bulk waveform request bugfix: debugger left in test suite * Bugfix: stations removed insufficient time (#88) * bugfix: seconds before and after ref were allowed to be integers which caused floating point rounding errors when trimming start and end times curtail remove for insufficient lenght now checks for streams with time less than the mode rather than no equal, to not kick out streams with sufficient data * check statement forces time buggers to be floats obspy trim function set nearest sample to False to ensure we are taking inner samples and getting same sampling rate * remove last data gap check from merge gapped data * allow user to fill boundary data gaps with parameter 'fill_data_gaps' which sidesteps rejection for late starttimes and early endtimes * update changelog * added another postprocessing check to remove masked data from array which has a flag toggle in PySEP to turn on or off, on by default * fixed merging tests * moved remove masked data check into end of preprocessing because it affects the rotation operations if not done prior to rotation. fixed tests * bugfix insufficient length check remove '=' which was causing all traces to be thrown out and not just ones with shorter lengths * bugfix-xticks * Bugfix: plotw_rs sort order and multi-page order (#90) * reverse sort on absolute sorting to get smallest values on top * absolute sorting default y-axis label now plots on the y-axis rather than inside the figure, co-existing with distance labels * fixing docstrings with new parameters * added option y_axis_abs_right for absolute scale plotting which places waveform text labels outside the right border of the figure, with the normal y axis showing distance or azimuth values. this is the default option for absolute sorting * bugfix azimuth bins now recognize multi-page azimuth sorting and do not show azimuth bins if no data are available at those azimiuths on a given page' * bugfix azimuth bin divider was not searching properly causing no bins to be plotted * typo fix: missing space in log message * bugfix bool check on fill value during trim was skipping over when fill was 0, even though that was a requested value * source reciver map plot plots event before stations to center on epicenter * fixed broken preprocessing test, minor break, still working as expected * improve source receiver map plotting (#91) * moved figure generation attributes into catalog plotting which comes first * srcrv map was not ever plotting global projection even though ortho was not useful for global scales now has an internal check for projection and also resizes for local scales also contains a subsetting feature to pick certain stations based on event id * Update CHANGELOG.md * improve TauP theoretical arrival usage (#94) * Update README.md Update readme with better description of PySEP * remove hardcode phase list for getting TauP arrival times * change phase_list default value NoneType -> ('ttall',) to match ObsPy default which gets all phase arrivals * REMOVED function 'get_taup_arrival_w_sac_headers' which was merged into 'get_taup_arrivals' to avoid redundant code. 'get_taup_arrivals' now either takes a Stream or Inventory + Event object" * changed the keys of the phase dict outputted by TauP arrival getting to be more uniform, and better warning messages when phase arrivals can not be found for a station. arrival time fetching now finds earliest arriving P and S wave, rather than just the P or S arrival * removed 'model' from SAC header TauP arrival time labels because SAC cuts off the values for long phase names (ie pdiff_ak135 -> pdiff_ak), which seems more confusing than its worht * allowed SAC header value 'a' to be earliest arriving phase regardless of P or S derived. P and S are still hardcoded later * removed s and p phase names from incident angle and takeoff angle names in SAC header because long phase names were cutting them off (8 char max), hardcoded to string that sorta looks like the words' * BUGFIX: SAC header incident angle was being incorrectly set to takeoff angle value, causing no takeoff angle name and incorrect incidient angle name to be attributed * added 'earliest arrival' to SAC DICT for use in record section * RecSec now accepts phase arrival times as an acceptable time shift value to align on specific phases * add log message about how time shift is applied in RecSec * flip tshift value for phase arrivals to align rather than space out * fixed failing test for sac header * related to #93, remove hard no-NoneType restriction on 'output_unit' and 'water_level' during pysep checks because 1. ObsPy allows NoneType water level and 2. there is already a check for valid output units earlier in the check function * Update CHANGELOG.md * add readthedocs yaml file to shift docs to RTD, and bumped docs version number and trimmed down doc environment to bare essentials (holdovers from previous docs builds not required) * General updates and bugfixes (#97) * added kwarg parameters and to let user determine if they want to show figures during pysep processing. defaults to false bugfix: pysep load was not retaining keyword arguments for command line usage of pysep * added version number to __init__ so that its accessible. bumped version number of 0.3.2 so that it is uniquely identifiable from the master version 0.3.1 * related to #95 display pysep version in debug log * playing around with aspect ratio on maps * changing default value of 'remove_clipped' parameter to True to be more restrictive by default, related to #80 * map plotting at local scales now auto scales based on the figure size, using obspy internal logic, which fixes the problem of having very narrow map plots or not showing the event, or not showing all the stations * Bugfix: RecSec abs sorting and y-tick scaling (#100) * fixed the polarity of waveforms for the yabsolute y-axis sorting by inverting waveforms when inverting the y-axis, #99 * added automatic y-axis tick mark scaling for absolute sorting based on total distance span shown on record section * bugfix ytick implementation was conflicting with absolute plotting, merged the two operations together * bugfix allow abs_backazimuth as an option for recsec sorting * Update documentation (#102) * changing recsec doc page to .rst to get relative linking working util.plot.set_plot_aesthetic, changed kwargs to args and added all options to docstring * bugfix addresses #101 allow RecSec 'save' to be None * added plotw_rs to init * Added source receiver map plotting to cookbook and added a link to it in the recsec documentation * convert pysep doc from md-> rst for better internal file linking * added general error catching to pysep plotting because these are non-critical tasks #71 (comment) * addresses #71 comment about maps not plotting when no depth or magnitude information provided. This was used in the title of the map, so added some if statements to avoid these quantities if they are nto rpesent * General bugfixing and updates (#105) * API change, parameters 'mindistance' -> 'mindistance_km', 'maxdistance' -> 'maxdistance_km' * related to #98 allow list of station ids as input updated docs to reflect station id parameter starting script to update example config files * formatting * moved default phase list value into init and out of hard set parameter since its a list (avoiding mutable input values) * update all example config files with updated API * bugfix recsec spines not showing by default and bugfix allow command line users ability to input spine flags which are input as strings * typo incorrect default inventory name in cookbook docs * overhaul file writing system plus general updates (#106) * #75 turn OFF azimuth bins when sorting by absolute (back)azimuth * typo fixing, adding 'event_depth_km' and 'event_magnitude' to pysep docstring * replaced 'stations_list' -> 'station_list' throughout package for consistentcy' * one more stations -> station * slightly modified how the plot and write functions check for validity of file selection, same outward behavior but internally a little cleaner * removed unncessary 'cha' aprameter in RTZ rotation updating rotation implementation * updated write system to be able to write out individual channel SAC files completed intermediate writing so that files are written as the program progresses * overwrite_event_tag new behavior, allow users to dump files directly into output directory with no event tags' * allow SAC sub directory bypass * added file logger to pysep * pysep added ability to write out specific components for SAC files added docs page for explaining how to ignore event tag and sac sub directory * fixing up docs page for output control * cleanup docs * bump version number for easier version tracking * bump version in toml and docs files * adjusting log behavior, allow turning off log with 'save_log' kwarg, and move created log to output directory after run has completed * #107 incorrect directory name for pysep doc page * #107 add overwrite catch for '-W' option * #109 allow log file creation to create the output directory if specified by the user but not already existing. Bump devel version number for debug checking * added a -v/--version option to pysep command line tool to check pysep version number * removing debugger statement * bugfix: multi-page record section was using the incorrect number of traces to traverse through the list causing index errors * bugfix: avoid silently skipping nonexistent pysep path or syn path in recsec initialization * bugfix: read_sem_cartesian was incorrectly defining azimuth based for two points in cartesian space * bugfix: implemented separate amplitude scaling arrays tfor data (st) and synthetics (st_syn) because originally all scaling was done w.r.t data array which would cause incorrect normalizations for per-trace normalizing * fixing broken tests due to indexing errors * bump dev version number * bugfix typo * bugfix: read sem cartesian azimuth calculation takes sign of coordinate into account * bugfix: checking bool of max amplitudes for recsec syn check * feature: kwarg 'title' allows user to overwrite default recsec title * recsec docstring typo fix type -> param * bugfix: subsetting routine in source receiver map plot had the ability to ignore repeat stations even if they came from different networks, due to exclusion based on station name only. Now it excludes based on a NN.SSS (network + station), which is assumed to be unique for all seismic stations * typo fix recsec docstring parameter: x_min -> x_max * add warning log message to alert User if Config file parameter is not explicitely used by PySEP to help debug argument mispelling or future API changes not registering. Related to #110
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses Issues #83 and #67 dealing with gappy data and waveform start and end times.
In current version of PySEP, gapped data is not handled, any data with gaps is thrown own during preprocessing.
Additionally, waveforms are gathered from data centers directly on the requested bounds (+/- s around origin time), which may cause saved data to be shorter than expected due to time series being slightly truncated during rotation, resampling or other preprocessing steps.
This PR addresses these issues by:
fill_data_gaps
andgap_fraction
to address data gapsfill_data_gaps
is False, which handles gapped data like the current code, by removing stations with gapsfill_data_gaps
can also be a series of values which tell PySEP to keep gapped data, and how to deal with data gaps (see: https://docs.obspy.org/packages/autogen/obspy.core.stream.Stream.merge.html and PySEP class docstring)gap_fraction
tells PySEP what allowable fraction of data can be data gaps. Defaults to 1. (100% of data can be gaps). This lets Users allow in short or long data gaps depending on their use.extra_download_pct
handles waveform start and end time by adding a small time buffer around downloaded data (default 1%). This is then trimmed off during preprocessing, to help ensure that all Traces in the Stream have the same start and end time.Under the hood:
merge_and_trim_start_end_times
has been split intomerge_gapped_data
(to handle merging) andtrim_start_end_times
(to handle data trimming)