Example Usage
The following example demonstrates how to run ArtiCull.
Example Files
example/example.maf: Mutation Annotation Format (MAF) file containing candidate variants
example/example.bam: BAM file with sequencing data
example/example.bam.bai: Index files for BAM
Instructions
Follow setup and activate conda environment
Follow the directions in Installation to create a conda environment and download required genomic tracks.
conda env create -f requirements.yml -n articull-env
conda activate articull-env
bash scripts/setup_mappability_track.bash [output_directory]
If you have already done this previously, ensure that the articull-env` environment is activated.
conda activate articull-env
Run classification
This will first extract model features and save to example/example_features.tsv. Then it will run the pretrained model in model/preprint_model/ directory to classify and save to example/example_result.tsv.
maf=example/example.maf bam=example/example.bam output_prefix=example/example model_dir=models/preprint_model/ python -m articull classify $maf $output_prefix $model_dir $bam --cores 1
Output
The output file, result.tsv, is saved in the specified output directory. This is a tab-separated values (TSV) file with the following columns:
Column |
Description |
|---|---|
chrm |
Chromosome where the variant is located. |
pos |
Position of the variant on the chromosome. |
ref_allele |
Reference allele at the given position. |
alt_allele |
Alternate allele identified at the given position. |
result |
Classification of the variant (ARTIFACT, PASS, or SKIP). |
prob_artifact |
Probability that the variant is an artifact (only provided for classified variants). |
Example output:
chrm |
pos |
ref_allele |
alt_allele |
result |
prob_artifact |
|---|---|---|---|---|---|
1 |
201206823 |
G |
A |
ARTIFACT |
0.9729175263160109 |
1 |
203994039 |
C |
A |
PASS |
0.01078797299659806 |
1 |
201226655 |
T |
C |
SKIP |
Classification criteria:
Variants are classified as ARTIFACT if prob_artifact > 0.5.
Variants are classified as PASS otherwise.
Variants are classified as SKIP if no supporting variant reads are found in the BAM file (e.g., due to realignment during variant calling).