Example Usage

The following example demonstrates how to run ArtiCull.

Example Files

  • example/example.maf: Mutation Annotation Format (MAF) file containing candidate variants

  • example/example.bam: BAM file with sequencing data

  • example/example.bam.bai: Index files for BAM

Instructions

Follow setup and activate conda environment

Follow the directions in Installation to create a conda environment and download required genomic tracks.

conda env create -f requirements.yml -n articull-env
conda activate articull-env
bash scripts/setup_mappability_track.bash [output_directory]

If you have already done this previously, ensure that the articull-env` environment is activated.

conda activate articull-env

Run classification

This will first extract model features and save to example/example_features.tsv. Then it will run the pretrained model in model/preprint_model/ directory to classify and save to example/example_result.tsv.

maf=example/example.maf
bam=example/example.bam
output_prefix=example/example
model_dir=models/preprint_model/

python -m articull classify $maf $output_prefix $model_dir $bam --cores 1

Output

The output file, result.tsv, is saved in the specified output directory. This is a tab-separated values (TSV) file with the following columns:

Column

Description

chrm

Chromosome where the variant is located.

pos

Position of the variant on the chromosome.

ref_allele

Reference allele at the given position.

alt_allele

Alternate allele identified at the given position.

result

Classification of the variant (ARTIFACT, PASS, or SKIP).

prob_artifact

Probability that the variant is an artifact (only provided for classified variants).

Example output:

chrm

pos

ref_allele

alt_allele

result

prob_artifact

1

201206823

G

A

ARTIFACT

0.9729175263160109

1

203994039

C

A

PASS

0.01078797299659806

1

201226655

T

C

SKIP

Classification criteria:

  • Variants are classified as ARTIFACT if prob_artifact > 0.5.

  • Variants are classified as PASS otherwise.

  • Variants are classified as SKIP if no supporting variant reads are found in the BAM file (e.g., due to realignment during variant calling).