How can I mirror a yum repository but download only the newest versions of each package?

25,079

Solution 1

reposync is the only reliable way to do this. You will need to create a small bash script and use reposync parameters (-a) to download each architecture in a separate folder and then run createrepo to generate the metadata.

Here is a small script that I have (it is running on Ubuntu but doesn't matter, you get the idea):

cat sync-repos

#!/bin/bash

reposync -n -c /etc/yum/yum.conf -p /repos/centos6 -d -r base -r updates -r extras -r centosplus -r contrib
createrepo -g /repos/centos6/base/repodata/comps.xml /repos/centos6/base
createrepo /repos/centos6/updates
createrepo /repos/centos6/extras
createrepo /repos/centos6/centosplus

reposync -n -c /etc/yum/yum.conf -p /repos -d -r vmware -r home_xtreemfs
createrepo /repos/vmware
createrepo /repos/home_xtreemfs

reposync -n -c /etc/yum/yum.conf -p /repos/vz -d -r openvz-utils -r openvz-kernel-rhel6
createrepo /repos/vz/openvz-utils
createrepo /repos/vz/openvz-kernel-rhel6

reposync -n -c /etc/yum/yum.conf -p /repos/nginx -d -r nginx-stable -r nginx-mainline
createrepo /repos/nginx/nginx-stable
createrepo /repos/nginx/nginx-mainline

Solution 2

You can do this with pulp and the yum rpm distributor plugin.

When congifguring a new repo, to get only one verison of each rpm, set the retain_old_count retain_old_count parameter

retain_old_count
Count indicating how many old rpm versions to retain; by default it will 
download all versions available.

So something along the line of:

$ pulp-admin rpm repo create \
          --repo-id=rhel6-puppet-products \
          --relative-url=rhel6-puppet-products \
          --feed=http://yum.puppetlabs.com/el/6/products/ \
          --retain-old-count 1
$ pulp-admin rpm repo sync run  \
          --repo-id=rhel6-puppet-products \

Should achieve what you want. There is a quick start guide which should give you an idea of how the thing works, in case you have not tried it before.

Share:
25,079

Related videos on Youtube

Stefan Lasiewski
Author by

Stefan Lasiewski

Stefan Lasiewski Daddy, Linux Guy, Bicyclist, Tinkerer, Fixer & Breaker of things. I work as a Senior SYstem Engineer at the National Energy Research Scientific Computing Center (NERSC) Division at Lawrence Berkeley National Laboratory (LBNL) in Berkeley, CA. Father of 3 cute children. Yes I'm a sysadmin and a parent. Heavy user of CentOS, RHEL & FreeBSD for production services at work. I also run Ubuntu at home, for the simplicity. I'm a fan of Apache HTTP Server, Nagios & Cacti. Original proposer of unix.stackexchange.com (Yes, this proposal predated askubuntu.com, and I wish they would have merged with the Unix proposal.).

Updated on September 18, 2022

Comments

  • Stefan Lasiewski
    Stefan Lasiewski over 1 year

    I would like to mirror the following Yum/RPM repositories at http://yum.puppetlabs.com/ :

    The Puppet repository contains every Puppet product ever released and is quite large at about 8GB. I only need to mirror the newest versions of the files.

    I have tried to mirror the repository using reposync --newest-only:

    reposync --config=puppetlabs.repo.el6 --repoid=puppetlabs-products --repoid=puppetlabs-deps --newest-only --download_path=el/6 --quiet --downloadcomps
    

    and this downloads the newest packages like I need. However, reposync doesn't automatically create the regular directory structure (x86_64, noarch, SRPMS, etc.) and doesn't mirror repodata.xml. As a result, my yum clients get errors like this:

    [root@web1 ~]# yum --quiet install puppet
    http://mirrors.example.org/pub/puppet/el/6/puppetlabs-deps/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: puppetlabs-deps. Please verify its path and try again
    [root@web1 ~]# 
    

    Is there a way to programmatically mirror only the new files from a Yum repo and follow the standard repository directory structure?

    • Michael Hampton
      Michael Hampton about 10 years
      I don't know offhand, but it's an interesting question. Personally I don't worry about it. 8GB is a tiny fraction of my 276GB of local mirrors...
    • Stefan Lasiewski
      Stefan Lasiewski about 10 years
      Sure I know, why fret about 8GB. I'm just trying to be efficient :) In addition, sometimes I need to quickly set up another yum mirror of the EPEL or CentOS repos, and those are quite large. I really only need the latest N versions of the packages.
  • Stefan Lasiewski
    Stefan Lasiewski about 10 years
    spacewalk looks nice, but it does way more then I need. We're already attempting provisioning using Puppet, Foreman and other systems.
  • Stefan Lasiewski
    Stefan Lasiewski about 10 years
    Thanks for the tips. This sort of works, as reposync insists on appending the repoid to the directory structure. This means A command like reposync --config=puppetlabs.repo.el6 --repoid=puppetlabs-products --newest-only --arch=x86_64' creates a bizarrely named directory like 'x86_64/puppetlabs-products when I simply need 'x86_64/'.
  • Florin Asăvoaie
    Florin Asăvoaie about 10 years
    Yeah, I find that annoying as well but maybe if you make it a symlink or something like that it will work. On the other hand, if you have time to invest, reposync and createrepo are python scripts that you can modify to fit your needs :).
  • mikejonesey
    mikejonesey almost 7 years
    probably wasn't available at the time of the post, but for the benefit of future readers: --norepopath Don't add the reponame to the download path. Can only be used when syncing a single repository (default is to add the reponame).
  • Tim
    Tim almost 7 years
    Your answer really needs more context to be useful. Are you able to edit it? For example, how does this provide a repository? Is this answer in addition to another answer?