How can I mirror a yum repository but download only the newest versions of each package?
Solution 1
reposync is the only reliable way to do this. You will need to create a small bash script and use reposync parameters (-a) to download each architecture in a separate folder and then run createrepo to generate the metadata.
Here is a small script that I have (it is running on Ubuntu but doesn't matter, you get the idea):
cat sync-repos
#!/bin/bash
reposync -n -c /etc/yum/yum.conf -p /repos/centos6 -d -r base -r updates -r extras -r centosplus -r contrib
createrepo -g /repos/centos6/base/repodata/comps.xml /repos/centos6/base
createrepo /repos/centos6/updates
createrepo /repos/centos6/extras
createrepo /repos/centos6/centosplus
reposync -n -c /etc/yum/yum.conf -p /repos -d -r vmware -r home_xtreemfs
createrepo /repos/vmware
createrepo /repos/home_xtreemfs
reposync -n -c /etc/yum/yum.conf -p /repos/vz -d -r openvz-utils -r openvz-kernel-rhel6
createrepo /repos/vz/openvz-utils
createrepo /repos/vz/openvz-kernel-rhel6
reposync -n -c /etc/yum/yum.conf -p /repos/nginx -d -r nginx-stable -r nginx-mainline
createrepo /repos/nginx/nginx-stable
createrepo /repos/nginx/nginx-mainline
Solution 2
You can do this with pulp and the yum rpm distributor plugin.
When congifguring a new repo, to get only one verison of each rpm, set the retain_old_count retain_old_count parameter
retain_old_count
Count indicating how many old rpm versions to retain; by default it will
download all versions available.
So something along the line of:
$ pulp-admin rpm repo create \
--repo-id=rhel6-puppet-products \
--relative-url=rhel6-puppet-products \
--feed=http://yum.puppetlabs.com/el/6/products/ \
--retain-old-count 1
$ pulp-admin rpm repo sync run \
--repo-id=rhel6-puppet-products \
Should achieve what you want. There is a quick start guide which should give you an idea of how the thing works, in case you have not tried it before.
Related videos on Youtube
Stefan Lasiewski
Stefan Lasiewski Daddy, Linux Guy, Bicyclist, Tinkerer, Fixer & Breaker of things. I work as a Senior SYstem Engineer at the National Energy Research Scientific Computing Center (NERSC) Division at Lawrence Berkeley National Laboratory (LBNL) in Berkeley, CA. Father of 3 cute children. Yes I'm a sysadmin and a parent. Heavy user of CentOS, RHEL & FreeBSD for production services at work. I also run Ubuntu at home, for the simplicity. I'm a fan of Apache HTTP Server, Nagios & Cacti. Original proposer of unix.stackexchange.com (Yes, this proposal predated askubuntu.com, and I wish they would have merged with the Unix proposal.).
Updated on September 18, 2022Comments
-
Stefan Lasiewski over 1 year
I would like to mirror the following Yum/RPM repositories at http://yum.puppetlabs.com/ :
- http://yum.puppetlabs.com/el/6/products/
- http://yum.puppetlabs.com/el/6/dependencies/
- http://yum.puppetlabs.com/el/5/products
- http://yum.puppetlabs.com/el/5/dependencies/
The Puppet repository contains every Puppet product ever released and is quite large at about 8GB. I only need to mirror the newest versions of the files.
I have tried to mirror the repository using
reposync --newest-only
:reposync --config=puppetlabs.repo.el6 --repoid=puppetlabs-products --repoid=puppetlabs-deps --newest-only --download_path=el/6 --quiet --downloadcomps
and this downloads the newest packages like I need. However, reposync doesn't automatically create the regular directory structure (
x86_64
,noarch
,SRPMS
, etc.) and doesn't mirrorrepodata.xml
. As a result, my yum clients get errors like this:[root@web1 ~]# yum --quiet install puppet http://mirrors.example.org/pub/puppet/el/6/puppetlabs-deps/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found" Trying other mirror. Error: Cannot retrieve repository metadata (repomd.xml) for repository: puppetlabs-deps. Please verify its path and try again [root@web1 ~]#
Is there a way to programmatically mirror only the new files from a Yum repo and follow the standard repository directory structure?
-
Michael Hampton about 10 yearsI don't know offhand, but it's an interesting question. Personally I don't worry about it. 8GB is a tiny fraction of my 276GB of local mirrors...
-
Stefan Lasiewski about 10 yearsSure I know, why fret about 8GB. I'm just trying to be efficient :) In addition, sometimes I need to quickly set up another yum mirror of the EPEL or CentOS repos, and those are quite large. I really only need the latest N versions of the packages.
-
Stefan Lasiewski about 10 yearsspacewalk looks nice, but it does way more then I need. We're already attempting provisioning using Puppet, Foreman and other systems.
-
Stefan Lasiewski about 10 yearsThanks for the tips. This sort of works, as
reposync
insists on appending therepoid
to the directory structure. This means A command likereposync --config=puppetlabs.repo.el6 --repoid=puppetlabs-products --newest-only --arch=x86_64' creates a bizarrely named directory like 'x86_64/puppetlabs-products
when I simply need 'x86_64/'. -
Florin Asăvoaie about 10 yearsYeah, I find that annoying as well but maybe if you make it a symlink or something like that it will work. On the other hand, if you have time to invest, reposync and createrepo are python scripts that you can modify to fit your needs :).
-
mikejonesey almost 7 yearsprobably wasn't available at the time of the post, but for the benefit of future readers: --norepopath Don't add the reponame to the download path. Can only be used when syncing a single repository (default is to add the reponame).
-
Tim almost 7 yearsYour answer really needs more context to be useful. Are you able to edit it? For example, how does this provide a repository? Is this answer in addition to another answer?