Import all txt files in folder, concatenate into data frame, use file names as variable in R?
Solution 1
add to the loop dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
DF <- rbind(DF, dat)
}
Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)
Solution 2
You can use basename
to get the last path element( filename) , for example:
(files = file.path("~","Documents","ForR",c("file1.txt", "file2.txt")))
"~/Documents/ForR/file1.txt" "~/Documents/ForR/file2.txt"
(basename(files))
[1] "file1.txt" "file2.txt"
Then sub
to remove the extension ".txt":
sub('.txt','',basename(files),fixed=TRUE)
[1] "file1" "file2"
Ros1920
Updated on June 12, 2022Comments
-
Ros1920 almost 2 years
I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies). I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.
I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name
Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.
I am able to create a list of the 142 dataframes using this code:
myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt") data <- lapply(myFiles, read.table, sep="\t", header=FALSE) names(data) <- myFiles for(i in myFiles) data[[i]]$Source = i do.call(rbind, data)
I am able to create the dataframe I want with 19 variables, but the filename is not present:
files <- list.files(path="~/Documents/ForR/.", pattern=".txt") DF <- NULL for (f in files) { dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character") DF <- rbind(DF, dat) }
How do I add the file name (without .txt if possible) as a variable to the loop?