Frequently Asked Questions
1) as an internal quality control: after cycle 25, the instrument's Real Time Analysis software (RTA) aligns all reads to the PhiX genome, and calculates alignment rate and error rate in real time. This tells us about the performance of the instrument during the run and helps with troubleshooting if anything goes wrong. At the USC Genome Core, we routinely spike-in around 1-2% of PhiX in all lanes.
2) as a way to balance the base composition: Illumina's technology requires the base composition of the library that is being sequenced to be homogeneous; this means that, at every cycle, the proportion of each base (A, C, G, T) should be ~25%. This is even more critical during the first 5 cycles of sequencing, which is what RTA uses to locate the clusters in the lane. The RTA can handle certain deviations from that ideal proportions, but heavily biased libraries will have less clusters passing the filter and reduced overall quality. In such cases, in order to increase the base composition of the lane, and avoid issues, we will need to spike-in some percentage of PhiX. The amount of PhiX depends on the type of library, and it can be up to 15-20%. Examples of unbalanced libraries are: ddRAD, amplicon mix, libraries with a common sequence after the Illumina adapters, libraries with custom barcodes in either 5' or 3'. If your library is unbalanced, please let the Core personnel know when you submit it, so that we can spike-in the right amount of PhiX to balance the lane.
You can find more information here.
The base composition, if biased, can be compensated by adding PhiX (see "What is PhiX and how much do you spike-in?" above for details). The fragment size should ideally be between 100 and 600 bp, depending on the run type, though shorter and longer fragments can also be sequenced with a bit of tweaking; however, libraries with an average fragment size of 1 Kb or longer might not cluster efficiently in the flowcell, and we typically recommend repeating the library preparation in such cases whenever possible. The run format is determined by your needs, so there's not much to do here; that said, short SR runs can usually tolerate higher cluster densities (meaning more reads per lane) without impacting the quality. The loading concentration is how much of the library we put into the flowcell; it is measured in molar concentration, and the optimal range to target depends on all the other factors we just mentioned (i.e. base composition, size, run type). To calculate the loading concentration, we need to estimate an average fragment size, so the broader the size distribution of the library, the more difficult it is to estimate an average value; in addition, small fragments tend to cluster more efficiently in the flowcell, so the relationship between size and library efficiency is not linear, which makes the size estimation even more difficult for broadly distributed libraries.
At the USC Genome Core, we perform a very thorough 3-step QC process for all libraries prior to sequencing. Following the results of this QC process, we calculate the optimal loading concentration for your library.
For the record, our best Rapid Run so far yielded a total of 215 million reads passing the filter, and our best High-Output yielded 315 million reads after the filter. Now, that is impressive!
However, in rare occasions we have failed runs, or runs where the data quality or the number of reads is less than ideal. In such cases, we start an internal investigation, typically in collaboration with the Illumina Technical Support team. If we find that the machine is at fault (e.g. due to a technical problem), we repeat the run at no extra cost. Similarly, if we determine that it was due to a mistake on our side (e.g. we miscalculated the loading concentration), we will also repeat at no cost. Now, in the case that our investigation concludes that the issue was with the library, you will be responsible for the cost of the run. In such cases, we will work with you to try to figure out what could be the cause, and how we can fix it. In our experience, more often than not, this happens because the user forgets to mention some important detail about the library (e.g. there is a custom unique sequence in 5', there is a custom index that is longer than the default 6 bp Illumina indexes, etc.). To avoid such issues, please include in the submission form as much information as you can about the library.