Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download only specific files from a measurement #19

Open
KochTobi opened this issue Jul 9, 2024 · 3 comments
Open

Download only specific files from a measurement #19

KochTobi opened this issue Jul 9, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@KochTobi
Copy link
Member

KochTobi commented Jul 9, 2024

Is your feature request related to a problem? Please describe.

Projects with many samples can produce a large amount of data in one measurement.
To analyse effects only observed in a subset of the generated data, a download of the whole measurement leads to a download containing all files. Many of the files are not of interest for the problem analysis. Problems could be data corruption or data quality issues.
Issues could occur during muliplexed runs where only a subset is of interest.

Describe the solution you'd like
As files of interest are known beforehand, a filter for a part of the filename would help to only download files of interest.

Describe alternatives you've considered
@qbicStefanC any ideas?

Additional context

---
title: Example
---
flowchart LR
   s1("sample 1") --> sequencer 
   s2("sample 2") --> sequencer 
   s3("sample 3") --> sequencer 
   sequencer --> multi_out("multiplexed output (BCL)")
   multi_out --> demultiplex("de-multiplexing")
   demultiplex --> o11("2020-01-01_sample_1_L001.fastq.gz")
   demultiplex --> o12("2020-01-01_sample_1_L002.fastq.gz")
   demultiplex -- "error" --> o13("2020-01-01_sample_1_L003.fastq.gz")
   demultiplex --> o21("2020-01-01_sample_2_L001.fastq.gz")
   demultiplex --> o22("2020-01-01_sample_2_L002.fastq.gz")
   demultiplex --error--> o23("2020-01-01_sample_2_L003.fastq.gz")
   demultiplex --> o31("2020-01-01_sample_3_L001.fastq.gz")
   demultiplex --> o32("2020-01-01_sample_3_L002.fastq.gz")
   demultiplex --error--> o33("2020-01-01_sample_3_L003.fastq.gz")
Loading
@KochTobi KochTobi added the enhancement New feature or request label Jul 9, 2024
@qbicStefanC
Copy link

Most of it looks good. i am not sure 100% about the 'muliplexed runs' thing though, what is meant by this. From what i know is that usually from a let's say Miseq BCL file (multiplexed), demultiplexed fastq files (1-many files per sample barcode) can be produced. Thus demultiplexing is the step from BCL to fastq.

@qbicStefanC
Copy link

Example would more be of this:
Screenshot 2024-07-09 at 11 00 55

although this is also simplified. But the key is: it is lane003 in this example across samples. Let assume all files of lane003 might be corrupted and should be investigated in this case. then a file name search with a regex for "L003" would help.

@sven1103 sven1103 moved this to In Progress in Issue triage Aug 19, 2024
@sven1103
Copy link
Contributor

To me it looks like the download API can make use of a query parameter e.g. filterType and filter.

I suggest:

filterType:

  • fileName
  • size
  • date

filter:

  • regex
  • timestamp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

3 participants