Import all txt files in folder, concatenate into data frame, use file names as variable in R?

10,462

Solution 1

add to the loop dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]

files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
    DF <- NULL
        for (f in files) {
        dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
        dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
        DF <- rbind(DF, dat)
    }

Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names

data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)

Solution 2

You can use basename to get the last path element( filename) , for example:

 (files = file.path("~","Documents","ForR",c("file1.txt", "file2.txt")))
 "~/Documents/ForR/file1.txt" "~/Documents/ForR/file2.txt"

(basename(files))
[1] "file1.txt" "file2.txt"

Then sub to remove the extension ".txt":

 sub('.txt','',basename(files),fixed=TRUE)
[1] "file1" "file2"
Share:
10,462
Ros1920
Author by

Ros1920

Updated on June 12, 2022

Comments

  • Ros1920
    Ros1920 almost 2 years

    I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies). I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.

    1. I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name

    2. Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.

    I am able to create a list of the 142 dataframes using this code:

    myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
    data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
    names(data) <- myFiles
        for(i in myFiles) 
        data[[i]]$Source = i
        do.call(rbind, data)
    

    I am able to create the dataframe I want with 19 variables, but the filename is not present:

    files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
        DF <- NULL
            for (f in files) {
            dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
            DF <- rbind(DF, dat)
        }
    

    How do I add the file name (without .txt if possible) as a variable to the loop?