How to remove old SVN revisions

12,958

You can remove or better "shrink" the history of your SVN repository. Say you have 1000 revisions and you want to shrink to only have the revisions from r950-r1000. You can do the following:

svnadmin dump /path/to/current/repo -r950:1000 > small_svn.dump
svnadmin create /path/to/new/repo
svnadmin load /path/to/new/repo < small_svn.dump

However, there are two caveat to see:

1st: all your tags and branches will end up as standalone copies and so will take much more space than before (this could end in an even bigger repository, you have to try) - you can use svndumpfilter to remove tags and branches, however, than you need the old repository to get information of these tags/branches.

2nd: If your branches stay in your new repository, all mergeinfo will show wrong revisions as your new repository starts with revision 0 again and also all branches are gone in version history (due to pt. 1)

A much better solution:

  • Find the revision(s) which are responsible for the growth of your repository(search for large files in your repository datastorage usually located under: /path/to/repo/db/revs/[0...X]).
  • Check the log history of these revisions and locate the files which are responsible.
  • If you do not need these files, remove them via svndumpfilter.
  • Teach your user how to avoid committing unnecessary, large files.

Otherwise you will have to shrink your repository in several weeks again!

Share:
12,958
Wes
Author by

Wes

I'm just me.

Updated on July 17, 2022

Comments

  • Wes
    Wes almost 2 years

    Our SVN repository is approaching 0.5 GB. We have nowhere near that amount of code in our production system.

    Is it possible to remove old revisions. I tried svn dump with a beginning revision number, but to no avail. I couldn't import that into a clean SVN repository.

    We don't need the history over a year old.

    Any ideas?

    • Mark Byers
      Mark Byers over 13 years
      Have you considered buying a larger harddisk? 1 terabyte disks are affordable for most companies these days.
    • Ish
      Ish over 13 years
    • Nick T
      Nick T over 13 years
      Not sure about this, but doesn't SVN slow down as it has to run through all the revs to figure out what the HEAD files actually are? @Mark: Usually the cost of everything is at least double or triple, as redundancy is desired, then the cost of backup space.
    • sbi
      sbi over 13 years
      @Nick: No, SVN does not slow down if a file has more revisions. (The SVN project has been self-hosting for a very long time, so it must be the oldest SVN repo around. If many older revision would be annoying, the SVN developers themselves would be the first to notice.) If you waste 5hrs on this, you have the cost of a 1TB disk, including copying data and swapping the physical disks, and enough room to wiggle in a 0.5TB external backup disk.
    • Ates Goral
      Ates Goral over 13 years
      @Wes - could you specify how large a clean checkout of each 'trunk' (depending on if you use /trunk/projects or /projects/trunk repo layout) is? It may be that there is not much savings to be had, and the problem is more that large files are being committed that should not be.
    • Wes
      Wes over 13 years
      @joshua your right I'll check this out when at work.
    • Ates Goral
      Ates Goral over 13 years
      @Wes also keep in mind the space of a checkout is 2 copies of the files (one in .svn folders, one actual), and that svn repository itself is highly compressed - still, it will give you a ballpark number, and you can compare it to the size of a full checkout a year ago and look for large files...
    • Sander Rijken
      Sander Rijken over 13 years
      Also, SVN doesn't have to run through all revs to find HEAD. It saves a full copy once in a while (not sure how often). Kind of comparable to key frames in video compression
    • Wes
      Wes over 6 years
      @Peter Mortensen Thanks for the edit but I'm sure that this question is really not relevant to people now. A disk space is a lot cheaper then it was in those days. 2 not a lot of people use SVN now (Thats a fact I just made up). 3 there is decient cloud hosting specifically for source control.
  • Wes
    Wes over 13 years
    Well I wanted to back it up each night
  • Mark Byers
    Mark Byers over 13 years
    @Wes: 366 * 0.5 Gig < 1 Terabyte.
  • Wes
    Wes over 13 years
    First the cost of a disk is much more than the cost of a disk. Remote hosting. Secondly I struggle to get them to pay £30 on a book. Bandwidth isn't free either, and the time to transfer 1/2 a gig every time isn't trivial either. Oh and the rate of growth is crazy. We had most of the work done in the first year and the size of the repo was 80megs 1 year later its arround 500
  • sbi
    sbi over 13 years
    Supposedly, remote hosting is done because it's cheaper than hosting yourself? Then how could "it's more expensive because we're remote-hosting" ever be a valid argument?
  • sbi
    sbi over 13 years
    And as for £30 for a book: I used to have a boss who, when asked to buy a certain book would ask back "Does this have a chance to save you X hours?" with the book's price/X being my monthly rate. When answered with "Yes", he'd buy the book. (I never had to answer with "No", but I suppose that, had I done so, he'd probably told me to close his office's door and sit down, so he could have a talk with me to find out why I come bothering him about a book that I don't think is worth its money. :) To bad I had to leave there.)
  • Wes
    Wes over 13 years
    Nope its not cheaper than hosting it ourselves not by a looooong shot. Cost isn't the reason for the hosting.
  • Goran Jovic
    Goran Jovic over 13 years
    Just out of curiosity, what exactly are you storing on source control that takes up that much space?
  • ldav1s
    ldav1s over 13 years
    A differential backup scheme might be better performing a full backup every night.
  • Mohammad Nikravan
    Mohammad Nikravan over 11 years
    We use SVN to keep some big images binary file on cloud SVN. How can we buy a disk for cloud service? It have cost per GB/user. by the way we don't need more than 10 revision, why we should pay for 10GB old revision file per month/user!!! we should find a way to delete the old revision.
  • alfonx
    alfonx about 10 years
    Wes has a clear question about the possibilities to drop old svn revisions. How does it matter whether it's 1/2GB or 1/2TB...
  • sbi
    sbi about 10 years
    @alfonx: I generally prefer to point out what I consider an erroneous approach rather than answer the question it led to. Sometimes this triggers heated rejections like yours, and sometimes enthusiastically thankful replies by the OP. Shrug.