How to read vcf file in R
17,081
Solution 1
Maybe this could be good for you:
# read two times the vcf file, first for the columns names, second for the data
tmp_vcf<-readLines("test.vcf")
tmp_vcf_data<-read.table("test.vcf", stringsAsFactors = FALSE)
# filter for the columns names
tmp_vcf<-tmp_vcf[-(grep("#CHROM",tmp_vcf)+1):-(length(tmp_vcf))]
vcf_names<-unlist(strsplit(tmp_vcf[length(tmp_vcf)],"\t"))
names(tmp_vcf_data)<-vcf_names
p.s.: If you have several vcf files then you should use lapply function.
Best, Robert
Solution 2
data.table::fread reads it as intended, see example:
library(data.table)
#try this example vcf from GitHub
vcf <- fread("https://raw.githubusercontent.com/vcflib/vcflib/master/samples/sample.vcf")
#or if the file is local:
vcf <- fread("path/to/my/vcf/sample.vcf")
We can also use vcfR package, see the manuals in the link.
Author by
MAPK
Updated on August 05, 2022Comments
-
MAPK almost 2 years
I have this VCF format file, I want to read this file in R. However, this file contains some redundant lines which I want to skip. I want to get something like in the result where the row starts with the line matching
#CHROM
.This is what I have tried:
chromo1<-try(scan(myfile.vcf,what=character(),n=5000,sep="\n",skip=0,fill=TRUE,na.strings="",quote="\"")) ## find the start of the vcf file skip.lines<-grep("^#CHROM",chromo1) column.labels<-read.delim(myfile.vcf,header=F,nrows=1,skip=(skip.lines-1),sep="\t",fill=TRUE,stringsAsFactors=FALSE,na.strings="",quote="\"") num.vars<-dim(column.labels)[2]
myfile.vcf
#not wanted line #unnecessary line #junk line #CHROM POS ID REF ALT 11 33443 3 A T 12 33445 5 A G
result
#CHROM POS ID REF ALT 11 33443 3 A T 12 33445 5 A G
-
Rich Scriven over 8 yearsHow about using a sequencing package? There are a few if you google "read vcf R"
-
hrbrmstr over 8 yearsBioconductor has a few VCF readers.
-
MAPK over 8 years@RichardScriven that vcfreader is not appropriate in my case. I just want to skip the lines and get the tab separated table.
-
Calimo over 5 yearsPossible duplicate of Extract sample data from VCF files
-
-
Ricardo Guerreiro over 5 yearsGreat answer, but do you always use points in your variable names? I find it confusing (especially if you also know python), prefer much more underscores. I guess it's a matter of taste though, cheers.
-
Calimo over 5 years@RicardoGuerreiro dots are idiomatic in variable names in R. Widely used and perfectly acceptable.