Command-line Usage

python -m articull classify \
        <input_file> <output_prefix> <model_dir> <bams> \
        [--resources_dir <path>] [--chunksize <n>] [--features_file <path>] [--cores <ncores>]

Arguments

  • <input_file> (required): MAF or VCF file containing candidate variants

  • <output_prefix> (required): Output prefix (directory and sample name, e.g., /path/to/sample1. Output files will be saved as /path/to/sample1_features.tsv and /path/to/sample1_result.tsv)

  • <model_dir> (required): Directory containing model.pkl and scaler.pkl

  • <bams> (required): List of BAM files containing sequencing data

  • --resources_dir (optional): Path to directory containing folder of mappability tracks (default: resources/hg19_mappability.bedGraph)

  • --chunksize (optional): Number of rows per worker for parallel processing (default: 5000)

  • --features_file (optional): File containing features (e.g., generated by a previous run of ArtiCull). If not provided, features will be extracted from input BAM files

  • --cores (optional): Number of CPU cores for parallel processing (default: all available cores)

  • --no_vcf_filter (optional): Do not filter variants in VCF input file (default: filter out variants with FILTER field not equal to “PASS”)

  • --extract_features_only (optional): Extract features but do not classify variants.