FASTQ format - Principum

FASTQ is an extension of [[FASTA format]], however it does make some significant changes in order to contain more data. This format is designed to handle the per-base quality scores generated by the sequencing machines. FASTQ is made up of 4 lines per record (shown below), unlike the [[FASTA format]]s 2 lines. ![[fastq.png]] _Figure 1: An example of the FASTQ format. Taken from CompGenomR_ The lines starting with `@` are now the header/identifier and you may find lines beginning with a semi-colon `;` to indicate comment lines. Line 1, the indentifier, is the unique name of the sequence, as well as optional descriptors. This is typically data given by the sequencer (machine ID, cell IDs, lane number). Line 2 is the sequence, much like FASTA. Line 3 begins with a `+` symbol. This marks the end of the sequence data and optionally is a repeat of Line 1. Line 4 encodes the quality values for each base of the sequence. Although any quality system may be in place here, by convention it should be the _Phred quality score_. These represent the likelihood of a base being incorrect. The higher the score the less likely it is to be incorrect. These symbols represent scores, in which any of the software you use should tell you the score of said symbol.