---
Name: FastQC
URL: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Description: >
  FastQC is a quality control tool for high throughput sequence data,
  written by Simon Andrews at the Babraham Institute in Cambridge.
---

The FastQC module parses results generated by
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/),
a quality control tool for high throughput sequence data written
by Simon Andrews at the Babraham Institute.

FastQC generates a HTML report which is what most people use when
they run the program. However, it also helpfully generates a file
called `fastqc_data.txt` which is relatively easy to parse.

A typical run will produce the following files:

```txt
mysample_fastqc.html
mysample_fastqc/
  Icons/
  Images/
  fastqc.fo
  fastqc_data.txt
  fastqc_report.html
  summary.txt
```

Sometimes the directory is zipped, with just `mysample_fastqc.zip`.

The FastQC MultiQC module looks for files called `fastqc_data.txt`
or ending in `_fastqc.zip`. If the zip files are found, they are
read in memory and `fastqc_data.txt` parsed.

> **Note:** The directory and zip file are often both present. To speed
> up MultiQC execution, zip files will be skipped if the file name suggests
> that they will share a sample name with data that has already been parsed.

You can customise the patterns used for finding these files in your
MultiQC config (see [Module search patterns](#module-search-patterns)).
The below code shows the default file patterns:

```yaml
sp:
  fastqc/data:
    fn: "fastqc_data.txt"
  fastqc/zip:
    fn: "*_fastqc.zip"
```

> **Note:** Sample names are discovered by parsing the line beginning
> `Filename` in `fastqc_data.txt`, _not_ based on the FastQC report names.

### Theoretical GC Content

It is possible to plot a dashed line showing the theoretical GC content for a
reference genome. MultiQC comes with genome and transcriptome guides for Human
and Mouse. You can use these in your reports by adding the following MultiQC
config keys (see [Configuring MultiQC](http://multiqc.info/docs/#configuring-multiqc)):

```yaml
fastqc_config:
  fastqc_theoretical_gc: "hg38_genome"
```

Only one theoretical distribution can be plotted.
The following guides are available: _(txome = transcriptome)_

- `hg38_genome`
- `hg38_txome`
- `mm10_genome`
- `mm10_txome`

Alternatively, a custom theoretical guide can be used in reports. To do this,
create a file with `fastqc_theoretical_gc` in the filename and place it with your
analysis files. It should be tab delimited with the following format (column 1 = %GC,
column 2 = % of genome):

```bash
# FastQC theoretical GC content curve: YOUR REFERENCE NAME
0	0.005311768
1	0.004108502
2	0.004060371
3	0.005066476
[...]
```

You can generate these files using an R package called
[fastqcTheoreticalGC](https://github.com/mikelove/fastqcTheoreticalGC)
written by [Mike Love](https://github.com/mikelove).
Please see the [package readme](https://github.com/mikelove/fastqcTheoreticalGC)
for more details.

Result files from this package are searched for with the following search pattern
(can be customised as described above):

```yaml
sp:
  fastqc/theoretical_gc:
    fn: "*fastqc_theoretical_gc*"
```

If you want to always use a specific custom file for MultiQC reports without having to
add it to the analysis directory, add the full file path to the same MultiQC config
variable described above:

```yaml
fastqc_config:
  fastqc_theoretical_gc: "/path/to/your/custom_fastqc_theoretical_gc.txt"
```

### Changing the order of sections

Remember that it is possible to customise the order in which the different module sections appear
in the report if you wish.
See [the docs](https://multiqc.info/docs/#order-of-module-and-module-subsection-output) for more information.

For example, to show the _Status Checks_ section at the top, use the following config:

```yaml
report_section_order:
  fastqc_status_checks:
    order: -1000
```
