You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before new files are generated to the DH data directory for use in Benchmark tests, existing files are checked to ensure duplicate data is not produced. This is a big time-saver for larger scale (like 100,000,000), assuming there are a significant number of existing generator files in the data directory
Change the file name convention so that the gen.def and gen.parquet files contain a hash of the generator definition
Change the file list glob to use the hash
This story is not an immediate concern because of the expected high reuse of data generator files.
The text was updated successfully, but these errors were encountered:
Updated Ids.uniqueName to allow a prefix to be provide. Used that to name files with a hash of the contents and search on the hash with a glob in the python "data file reuse detection" code
Before new files are generated to the DH data directory for use in Benchmark tests, existing files are checked to ensure duplicate data is not produced. This is a big time-saver for larger scale (like 100,000,000), assuming there are a significant number of existing generator files in the data directory
This story is not an immediate concern because of the expected high reuse of data generator files.
The text was updated successfully, but these errors were encountered: