How to retrieve the most recent file in cloud storage bucket?
12,654
Solution 1
Hello this still doesn't seems to exists, but there is a solution in this post: enter link description here
The command used is this one:
gsutil ls -l gs://[bucket-name]/ | sort -k 2
As it allow you to filter by date you can get the most recent result in the bucket and recuperating the last line using another pipe if you need.
Solution 2
gsutil ls -l gs://<bucket-name> | sort -k 2 | tail -n 2 | head -1 | cut -d ' ' -f 7
It will not work well if there is less then two objects in the bucket though
Author by
Chris Stryczynski
Software dev(op). Independent consultant available for hire! Checkout my GitChapter project on github!
Updated on June 19, 2022Comments
-
Chris Stryczynski about 2 years
Is this something that can be done with gsutil?
https://cloud.google.com/storage/docs/gsutil/commands/ls does not seem to mention any sorting functionality - only filtering by a date - which wouldn't work for my use case.
-
John Hanley almost 3 yearsRead this link regarding sequentially naming objects: cloud.google.com/storage/docs/best-practices#naming Avoid using sequential object names such as timestamp-based object names if you are uploading many objects in parallel. Objects with sequential names are stored consecutively, so they are likely to hit the same backend server. When this happens, throughput is constrained. In order to achieve optimal throughput, add the hash of the sequence number as part of the object name to make it non-sequential.
-
Codemonkey almost 3 yearsI've been doing it this way for years with no issues... I have root folders 0001 0002 0003 0004 etc; each of those is limited to 75GB in size; when it fills, I move on to the next one. The filenames WITHIN the folders, are md5 hashes of the file contents, so maybe that's suitable given the wording above?
-
John Hanley almost 3 yearsCloud Storage does not have folders. What you think is a folder is just a prefix that is part of the object name. Buckets are a flat namespace. Unless you need optimum performance, this probably does not matter for you. For customers that require high performance for millions/billions of objects: Objects with sequential names are stored consecutively, so they are likely to hit the same backend server. I commented on your answer so that others do not copy your naming scheme without understanding the impact on performance.
-
Codemonkey almost 3 yearsI know that, but I'm using this as a backup of my server. I should have clarified that I meant that's my file structure on the server.
-
John Hanley almost 3 yearsI am not trying to inform you. I commenting for future readers of your answer.