1) What type of data are supported ?

The analysis pipeline has been designed for Illumina paired end libraries, e.g. color space is not supported. Two files (Read1 and Read2 sequences) are required for each sample.

2) What is the input format ?

Supported input format is sequence files in fastq format, compressed with gzip algorithm. It is the default format as generated by the current Illumina sequencers (e.g. generated by CASAVA 1.8+). Please note that the pipeline assumes the quality score encoding is CASAVA 1.8+/Sanger (please see http://en.wikipedia.org/wiki/FASTQ_format for clarification and a list of format converters). Please note that the system will refuse to upload files not having extension ".gz" (indicating gzip compression).

3) My disease of interest is not present in your list. How can I use the tool ?

Our tool is specific for Mendelian Disorders and our resources are not unlimited, so please use it only with sequences from patients affected with a rare genetic disorder or their relatives. Please note that association to one of the available diseases is mandatory and it is not possible to arbitrarily write a disease name not in the list. We adopted the MEDIC hierarchy developed for the CTD database (http://ctdbase.org/help/diseaseDetailHelp.jsp) but limited our hierarchy to the child terms of {MeSH ID D009358: "Congenital, Hereditary, and Neonatal Diseases and Abnormalities" } with the aim of selecting only diseases with a proven genetic aetiology. Please contact us at exome-support@tigem.it if you want to suggest to add diseases not present in the list, e.g. when the genetic cause of a disease is recently published or is suspected yet still unproven. We will evaluate case by case.

4) The diagnosis of my patient is ambiguous and I cannot choose a specific OMIM ID, what should I do ?

Please start typing a relevant keyword to search the list using the auto-completion feature and choose the term that best describes the patient phenotype. If the diagnosis is unclear or ambiguous you can initially choose a more general term and get advantage of the analysis results to later change the disease association to a more specific term. If a causative mutation is found in the results, the association can be confirmed by checking the Confirm Disease Association after it has been experimentally validated.

5) I have realized my initial diagnosis was uncorrect after looking at the results, can I cange it ?

You can change the disease associated to each analysis by editing it in the Analysis archive. The corresponding samples will thus be moved in the correct disease group and the allele frequencies will reflect the change after the next refresh.

6) What is the meaning of the "Confirm Disease Association" option ?

Please check this option only when the diagnosis is proven after experimental validation. When the confirmed disease associations will be enough, we will only use them to calculate allele frequencies.

7) I have a group of isolated cases all affected by the same disease, how can I analyze them ?

Isolated cases should be submitted each in a separate analysis. They will contribute to the disease group they are assigned to in the allele frequency calculation.

8) How can I obtain updated allele frequencies in my results including samples analyzed after my analysis ?

We periodically regenerate all the variation reports, so that they will all be updated to include the samples currently present in the database.

9) How is the database updated when the pipeline is updated? Are all the allele frequencies present in each analysis calculated only using the samples progressively imported ?

When a major change in the pipeline requires re-running all the analyses, we will first run all the analyses to populate the variation database and afterwards we generate all variation reports. This way all reports will always benefit the whole database information.

10) What is the procedure to delete my sequences ?

Please send an email to exome-support@tigem.it with the name of the analysis you want to delete.

11) How long does it take to run the analysis ?

The analysis running time depends on the number of samples analyzed and on the number of reads produced for each sample, on average it can go from 2-3 hours for a single, small sample to >12 hours for groups of larger samples.

12) How many variants are produced and what is their functional distribution ?

The number of variants depends primarily on the target size and then also on the coverage depth; in case of whole exome (~50 Mbases) the raw calls can be ~10,000. On average, ~<5% are highly disruptive mutations (e.g. frameshift insertions/deletions, stopgain/stoploss SNV or variations affecting splicing donor/acceptor sites), ~45% are amino acid substitutions and ~50% are synonymous SNV.

13) What is the file size of the analysis output ?

The alignment .bam files usually range from <1 Gb to >5 Gb (they are the larger files); the raw variation calls .vcf files are ~20-30 Mb and the annotated variation table .xlsx file is ~ 10Mb or less.

For any other question please contact exome-support@tigem.it