18 comments on “SAMtool bitwise flag meaning explained: how to understand samflags without pains

  1. Flag for properly paired reads.

    samtools view -f 2
    will give you properly mapped pair.

    Usage: samtools view [options] | [region1 […]]

    Options: -b output BAM
    -h print header for the SAM output
    -H print header only (no alignments)
    -S input is SAM
    -u uncompressed BAM output (force -b)
    -x output FLAG in HEX (samtools-C specific)
    -X output FLAG in string (samtools-C specific)
    -t FILE list of reference names and lengths (force -S) [null]
    -T FILE reference sequence file (force -S) [null]
    -o FILE output file name [stdout]
    -f INT required flag, 0 for unset [0]
    -F INT filtering flag, 0 for unset [0]
    -q INT minimum mapping quality [0]
    -l STR only output reads in library STR [null]
    -r STR only output reads in read group STR [null]
    -? longer help

  2. Hello, need help for samtools or other methods.
    You know that the Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. If a patient’s BAM file is given, how can you extract the sequence from the storing read alignments(not from the reference) within a region? For example, chr22: 1000000-1234000.
    Somebody in an online forum totally misunderstood my question and I haven’t get a solution. So I ask for help whereever it is possible. The guy used “samtools faidx ref.fasta”, but this result from the reference instead of the patient’s read.
    Thanks for any hint or a piece of code.

  3. Thanks for this post. It was definitely helpful. I was wondering if you could elaborate a little more on the:

    “the alignment is not primary (a read having split hits may have multiple primary alignment records)”

    Does this mean that if a read has multiple alignments, then one of the alignments is considered the primary alignment? How does this primary alignment get assigned? Is it based on the best of all the alignments?

    Thanks,

  4. ardmore: If a patient’s BAM file is given, how can you extract the sequence from the storing read alignments(not from the reference) within a region? For example, chr22: 1000000-1234000.

    Sorry I did not check this for a while. It is indeed tricky: you need reverse complement reverse reads: 1 st mates.
    Here is bam2 fastq for a given region.

    perl Bam2fastqR.pl my.bam chr_name start end
    perl Bam2fastqR.pl my.bam chr01 1000 3000
    makes
    my.2.fastq

    ###Bam2fastqR.pl (For reverse reads )=======================================
    use strict;
    use Cwd;
    my ($in,$chr,$sta,$end)=@ARGV;
    open F1,”samtools view -f 64 $in ‘$chr\:$sta\-$end’ |” or die;
    my (@x,@z,$pr1,$pr2,$rb,$rs);
    my (@y)=split ‘\.’,$in;
    open R1, “>$y[0].1.fastq”;
    while (my $s=){
    (@x)=split ‘\s+’,$s;
    $rs=revdnacomp($x[9]);
    $rb=reverse($x[10]);# you need to reverse complement
    print R1 “\@$x[0]/1\n$rs\n+\n$rb\n”;
    }
    sub revdnacomp {
    my $dna = shift;
    my $revcomp = reverse($dna);
    $revcomp =~ tr/ACGTacgt/TGCAtgca/;
    return $revcomp;
    }
    #### END =======================================

    ###Bam2fastqF.pl (For forward reads )
    use strict;
    use Cwd;
    my ($in,$chr,$sta,$end)=@ARGV;
    open F1,”samtools view -f 128 $in ‘$chr\:$sta\-$end’ |” or die;
    my (@x,@z,$pr1,$pr2);
    my (@y)=split ‘\.’,$in;
    open R1, “>$y[0].2.fastq”;
    while (my $s=){
    (@x)=split ‘\s+’,$s;
    print R1 “\@$x[0]/2\n$x[9]\n+\n$x[10]\n”;
    }
    ================================================

    You need to run for both scripts. If they do not work, please let me know.

  5. “the alignment is not primary (a read having split hits may have multiple primary alignment records)”

    Does this mean that if a read has multiple alignments, then one of the alignments is considered the primary alignment? How does this primary alignment get assigned? Is it based on the best of all the alignments?
    ============
    I just copied it from the samtools manual. And the updated one tells us that
    “Bit 0x100 marks the alignment not to be used in certain analyses when the tools in use are
    aware of this bit.” So basically we should not use this one, unless an aligner specifically say we should use this flag. I talked with Heng before and he said this flag was to eliminate bad reads. (or something like that.)

  6. I borrowed the code from http://edwards.sdsu.edu/labsite/index.php/robert/289-how-to-convert-fastq-to-fasta
    But I got something in my fasta file like this, is it right?
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    f[“c`_]d]aWi]c]dbgfi[aZ_d_n_nr[`jlvnmdimUbmb]mfUkkjg^faZ[ih
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{~~~q~wq[\SVT\]]X`_a_c^XZZYT
    `zdsobkpRNRqx}l~`yoyvwo\c~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    zxtuu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~zsoml\R

  7. I think you are getting something like this. The first part is base and the second part is its quality.

    @Ld01_v01s1
    CTAACCCTAACCCTAACCCTAACCCTAACCAGYACACCAGTACACCGTCACGCCCCCGTC
    CTGTTGGAGAGGGTGTCGCKGTGCAAGGAATCAGTCGAGAGAAAAACCCTAACCCGTACC
    GGTACCGAGTTATCGTTTTTATTAGGTTACTAACCTCTGGCACGCTTGTCGCTGCTCTTC
    AGAACGAAACGCACAATGCTCTTCGATAAACGTGCTGAAATAGAAAAAAAAAACGAAAGA
    TCTCGGCTACGTTTGCTGCCGTTGGCCTCACCTCGTCATGGCGCGTCGGTCCACCAAAGA
    +
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~d~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~O~~~~~~~~~~~~~~~:~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  8. A:
    samtools view -f 16
    0 + 16 => 16

    Usage: samtools view [options] | [region1 […]]

    Options: -b output BAM
    -h print header for the SAM output
    -H print header only (no alignments)
    -S input is SAM
    -u uncompressed BAM output (force -b)
    -1 fast compression (force -b)
    -x output FLAG in HEX (samtools-C specific)
    -X output FLAG in string (samtools-C specific)
    -c print only the count of matching records
    -L FILE output alignments overlapping the input BED FILE [null]
    -t FILE list of reference names and lengths (force -S) [null]
    -T FILE reference sequence file (force -S) [null]
    -o FILE output file name [stdout]
    -R FILE list of read groups to be outputted [null]
    -f INT required flag, 0 for unset [0]
    -F INT filtering flag, 0 for unset [0]
    -q INT minimum mapping quality [0]
    -l STR only output reads in library STR [null]
    -r STR only output reads in read group STR [null]
    -s FLOAT fraction of templates to subsample; integer part as seed [-1]
    -? longer help

  9. how would I select alignments with flag 0 OR 16 using samtools?
    samtools view -f 16
    samtools view -f 0
    samtools view -f 17
    You needs pick 3 cases, I guess …

  10. Hi! Great post! 🙂

    Do you have any idea how to collect the uniquely mapped reads whose 5′ ends falls within a target exon? I’ve filtered for 99, 147, 86 and 163 flags but saddly I don’t have any idea how to get it…

    Can someone propose an approach? Thank you in advance!

  11. >Do you have any idea how to collect the uniquely mapped reads whose 5′ ends falls within a target exon? I’ve filtered for 99, 147, 86 and 163 flags but saddly I don’t have any idea how to get it…

    It seems manipulating flags is not enough to get the info … Let me see.

  12. Maybe filtering from a sam file? If you discard the reads according to the fourth column… should work!

Leave a comment