How to Find duplicates files (by name and/or size) in a specific folder of my google drive?
For better readability I add the next part of this answer in a second answer.
Here is the code to handle the duplicates : it works in 2 steps :
- detect the duplicates and mark them in the sheet to check if everything is fine (markDuplicates())
- remove the red background on the files you want to preserve (they will become yellow after step 3 is done)
- delete the selected (marked) files, actually move them to trash, you'll have to empty the trash manually from the drive Ui. (trashDuplicates()) The deleted files will be colored in deep red to confirm deletion.
LAST EDIT :
Placed the full code here and made some functions private to avoid errors due to bad script calls.
function startProcess(){
PropertiesService.getScriptProperties().deleteAllProperties();
try{
ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);
}catch(e){}
var sh = SpreadsheetApp.getActiveSheet();
sh.getDataRange().clear();
sh.getRange(1,1,1,4).setValues([['fileName (logged @'+Utilities.formatDate(new Date(),Session.getScriptTimeZone(),'MMM-dd-yyyy HH:mm')+')','fileSize','parent folders tree','fileID']]);
var trig = ScriptApp.newTrigger('getDriveFilesList_').timeBased().everyMinutes(5).create();
Logger.log(trig.getUniqueId()+' '+trig.getHandlerFunction());
getDriveFilesList_();
}
function getDriveFilesList_(){
var content = [];
var startTime = new Date().getTime();
var sh = SpreadsheetApp.getActiveSheet();
if( ! PropertiesService.getScriptProperties().getProperty('numberOfFiles')){
PropertiesService.getScriptProperties().setProperty('numberOfFiles',0);
}
var numberOfFiles = Number(PropertiesService.getScriptProperties().getProperty('numberOfFiles'));
Logger.log(numberOfFiles);
var max = numberOfFiles+10000;
if( ! PropertiesService.getScriptProperties().getProperty('continuationToken')){
var files = DriveApp.getFiles();
// var files = DriveApp.getFolderById('0B3qSFd_____MTFZMDQ').getFiles();// use this line and comment the above if you want to process a single folder
// use your chozen folder ID of course (available from the browser url , the part after "https://drive.google.com/?authuser=0#folders/")
}else{
var files = DriveApp.continueFileIterator(PropertiesService.getScriptProperties().getProperty('continuationToken'))
}
while(files.hasNext() && numberOfFiles<(max)){
var file = files.next()
if(file.getSize()>0){
numberOfFiles++;
var folder = '(shared)';
if(file.getParents().hasNext()){folder = getTree_(file)}
content.push([file.getName(),file.getSize(),folder,file.getId()])
}
if(new Date().getTime()-startTime > 250000){break};
}
sh.getRange(sh.getLastRow()+1,1,content.length,content[0].length).setValues(content);
if(!files.hasNext()){ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);Logger.log('done !'); sh.getRange(sh.getLastRow()+1,1).setValue('All files processed ('+numberOfFiles+' found)')};
var continuationToken = files.getContinuationToken()
PropertiesService.getScriptProperties().setProperty('numberOfFiles',numberOfFiles);
PropertiesService.getScriptProperties().setProperty('continuationToken',continuationToken);
}
function markDuplicates(){
handleDuplicates_(false)
}
function trashDuplicates(){
handleDuplicates_(true)
}
function handleDuplicates_(trash){
var sh = SpreadsheetApp.getActiveSheet();
sh.setFrozenRows(1);
sh.sort(1);
var data = sh.getDataRange().getValues()
var headers = data.shift()
var lastComment = data.pop();
var toDelete = [];
var item = data[0];
for(var n=1 ; n<data.length; n++){
if(data[n][0]==item[0] && data[n][1]==item[1]){
toDelete.push('delete '+n);
}
item=data[n];
}
var marker = sh.getRange(2,1,data.length,1).getBackgrounds();
for(var n in data){
if(!trash){marker.push(['#FFF'])};
if(toDelete.indexOf('delete '+n)>-1 && !trash){
marker[n][0] = '#F99';
}
if(toDelete.indexOf('delete '+n)>-1 && trash){
if(marker[n][0]=='#ff9999'){
try{
DriveApp.getFileById(data[n][3]).setTrashed(trash);
marker[n][0] = '#F33';
}catch(err){Logger.log(err)}
}else{
marker[n][0] = '#FF9';
}
}
}
sh.getRange(2,1,marker.length,1).setBackgrounds(marker);
}
function getTree_(file){
var tree = [];
var folderP = file.getParents()
while (folderP.hasNext()){
var folder = folderP.next();
folderP = folder.getParents();
tree.push(folder.getName());
}
return tree.reverse().join('/');
}
Related videos on Youtube
miodf
Updated on June 04, 2022Comments
-
miodf almost 2 years
Is it possible to create a Google apps script which would select automatically duplicate files inside a specific (current) folder (then I just had to delete them with one click) ?
https://developers.google.com/apps-script/reference/drive/
I am asking this because Google drive pc client create random duplicates files when updating them ! They want to solve this bug but I have to manually select them in each folders and then delete them. It is very annoying and time consuming. https://productforums.google.com/forum/#!category-topic/drive/report-a-problem/_ApOopI-Xkw
Edit: Alternatively, as windows cannot create the same filename inside the same folder, find and put in the trash all duplicate filenames inside a specific main folder (and its each of its related subfolders).
-
miodf about 10 yearsThanks. For : "I left aside the duplicate detection(...)" in my case I am uploading some folders of photos and videos (per year), so there are no google docs inside those folder. ;) Re: you "note", I am afraid I am just a too small beginner in GAS for doing that already. Anyway I will try your script right away.Thanks. ;)
-
miodf about 10 yearsThanks. No problem. ;) Here my test results: I create a new google spreadsheet. Go to Tools/Script Editor. Copy/paste your code. Save it. Run script. Authorize permission. Then wait a few seconds and go to View/logs to see the results: it displays: 100 filenames with their size. ;) I was not able to detect for now if files are found in a particular order or folder.
-
miodf about 10 yearsThanks for your second version: I run it and at the end of the Execution transcript I have : Execution failed: Cannot find method deleteTrigger((class)). (line 39, file "Code") [250.835 seconds total runtime]. It displays about 2500 rows in the spreadsheet (new version). Thanks in advance ;)
-
miodf about 10 yearsThanks for the edit : I have tried to add a folder only :
var folder = 'PHOTOS';
instead ofvar folder = 'root';
but it didn't work. -
Serge insas about 10 yearsif you want to get only one folder I'd suggest to sort the files in the spreadsheet, that would be the easiest way... else you'd have to change var files = DriveApp.getFiles(); (line 23 in code) and replace driveApp by a folder object. (the var=(shared) is for display only)
-
Serge insas about 10 yearsThe error you get is because you didnt call the right function : you have to use startProcess and in case of failure you have to clear the trigger and scriptProperties manually.
-
Serge insas about 10 yearssee next episode in answer #2 ;)
-
miodf about 10 yearsThanks. Yesterday, I had the error "Service using too much computer time for one day" for "Summary of failures for Google Apps Script: find_duplicates". I have changed the notifications settings in the current project trigger to immediate.
-
miodf about 10 yearsYour code is working. For handleduplicate,I have about 15,000 rows in the spreadsheet. It is highlighting in red duplicate filenames that have the same size. ;) Would it be possible to add not only "parent folder" but also "parent folders" of "parent folder" ? Example: not just"2010" but "PHOTOS/2010". Thanks in advance ;)
-
miodf about 10 yearsAnother example: with html files. hpics.li/66439ae Row 1 = row 3 (what I find surprising is that even the file id is the same!). Same thing for row 2 = row 4.
-
Serge insas about 10 yearsif you see the same id then it is not a duplicate but the very same file in 2 or more different folders... . In that case, you should not delete it. I guess the script should be tuned to allow selective delete depending on color in the sheet... what do you think ? As for the folder, it is of course doable to show the whole folder tree , there has been a few SO post on that subject, feel free to append some necessary code.
-
miodf about 10 yearsAlas it is really a duplicate ! If I try a search with the filename, you can see that there are 2 files in the same folder here: hpics.li/6f5103f . Then, if I go to the folder, there is only one file shown (see top right hpics.li/dbe8a05 ) ! I did not run your function trashDuplicates just handleDuplicates so it is not in the trash either : hpics.li/8dd4546 note: I have only one folder with the same name : hpics.li/2ff3db9
-
miodf about 10 yearsI think I may know why I have only 1 of the 2 files shown in the folder : I have removed this folder from my pc yesterday ! And Google drive client for pc has not already removed them online.
-
miodf about 10 yearsHere is another better test with only one file I have created with "Fastone Capture" in my pc inside the folder "screenshots_me" located inside the Google Drive local PC folder. Google drive pc client created alone a duplicate online and locally on my pc! In my PC: it has appended it "(1).jpg" to the filename. And online it has kept the same filename ! Your duplicate spreadsheet shows 4 same filenames and size located in the same folder but with only 2 that have the same fileID hpics.li/0d0de03 If I do a search in my drive I have : hpics.li/7e64a6c (...)
-
miodf about 10 years(...) If I look inside the online folder (see top right) : hpics.li/cb58df1 My local pc folder view hpics.li/7fb2e52 (I manually created only one file 2014-03-29_095332.jpg)
-
miodf about 10 yearsImho, I should choose at the beginning a specific folder like "PHOTOS". As it has a far smaller number of files (and contains also the biggest files sizes) than my other folders in drive, I won't have again, like yesterday and today, the error "Service using too much computer time for one day" (i.e.: one hour of computer time in my case). Per your suggestion, I have changed
var files = DriveApp.getFiles();
withvar files = getFoldersByName('PHOTOS').getFiles();
alas it didn't work as I expected ! Thanks in advance ;) -
Serge insas about 10 yearsShould be var files = DriveApp.getFoldersByName('PHOTOS').getFiles();
-
miodf about 10 yearsThanks. ;) But it didn't work either on my part ! I also tried unsucessfully
var folderToSearch = "PHOTOS";
withvar files = DriveApp.getFoldersByName(folderToSearch).getFiles();
Both list files inside other folders ! My PHOTO folder is located closer to the root "My Drive". Thanks in advance ;) -
Serge insas about 10 yearslast version : I added a folder tree display and the ability to select which file to delete or not.
-
miodf about 10 yearsMany thanks. When I run the function getTree I have this error : TypeError: Cannot call method "getParents" of undefined. (line 90, file "Code"). Thanks in advance ;)
-
Serge insas about 10 yearsYou didn't read my instructions : only 2 functions to run in this part : markDuplicate and trashDuplicates. The other are called from within the script, don't use them directly since they need parameters.In the first part use only startProcess. That said, I begin to think that you could accept this answer... don't you ?