How does apt-get really work?

17,330

You need to take a look at https://wiki.debian.org/Packaging — the packaging tutorial there will help you a lot, as well as parts of the new maintainer's guide.

As to your questions, in order:

  1. The repository contains "list" files. E.g.., http://http.us.debian.org/debian/dists/stretch/main/binary-amd64/Packages.xz. apt-get update downloads these list files, and stores them in /var/lib/apt/lists. The list files list all the packages including a bunch of metadata and a relative URL to find the .deb at. (They're human-readable plain-text files, so you can just look at it).

  2. OS doesn't matter. You could host it on Windows, if you wanted. (Well, you'd maybe have trouble with file names Windows doesn't like.) (See also #4 and #5).

  3. Yes, it's inside the deb file. A deb file is actually an archive (using ar). Inside are some tar files; one of them is (essentially) extracted to /.

  4. It's just HTTP (or HTTPS, or FTP, or... apt-get supports a lot of protocols). Nothing special, though. Note that there are Release files, signed with gpg, which guarantee integrity even w/o HTTPS. Debian mirrors mostly use HTTP, not HTTPs. (A few support HTTPS as well for confidentiality).

  5. It's just a structured filesystem.

A quick, high-level overview of how apt-get interacts with a package source:

  1. You configure which sources to look at in your sources.list file. Consider a line like:

    deb http://http.us.debian.org/debian/ stretch main
    

    deb says this is a source for gettings .deb (binary) files; then there is the URL-prefix, suite/release ("stretch"), and component ("main").

  2. apt-get has a list of architectures, it gets that from dpkg. Let's say dpkg --print-architecture is amd64. apt-get can now build the URLs its actually going to download from, by combining the URL-prefix, the word "dists", the suite, the component, and the architecture. Then it tacks on a few fixed filenames, like "Packages.xz". That gives the URL above (in #1). There are a few more files with defined names/paths, like the Release file http://http.us.debian.org/debian/dists/stretch/Release and its signature (same, with .gpg appended). These are all (possibly-compressed) plain-text files. The release file contains checksums for other files apt-get is going to download, like Packages.xz.

  3. The Packages.xz file lists all the packages in that suite/codename/architecture. It also gives the path where that file is located; for example pool/main/0/0ad/0ad_0.0.21-2_amd64.deb.

  4. When you ask apt-get to download a package, it uses that location + the base URL to download the package, so that package is at http://http.us.debian.org/debian/pool/main/0/0ad/0ad_0.0.21-2_amd64.deb

  5. The other interesting directory is source instead of binary-amd64. That's used for your deb-src entries; it contains info about source packages (and is otherwise fairly similar).

  6. There are some other things (all of them optional, I believe) that can be part of the repository (i.e., available via HTTP): diffs between different versions of the Packages.xz file; translations of package descriptions, a complete list of every installable file and which package it belongs to (Contents-amd64.gz, used by e.g., apt-file, not by apt-get) etc. These likely aren't relevant to you, but you can see them all by browsing around http://http.us.debian.org/debian/dists/stretch/; most of them are plain-text files.

All these files are plain text. They can, in theory, be created by hand. In practice, everyone uses one of these repository generation tools. Here—and I caution this was a choice made a long time ago, so may be outdated—we use mini-dinstall. The output of those tools are ordinary files or, at worst, symlinks. You can rsync them over to whatever web server you want.

Share:
17,330

Related videos on Youtube

user1032531
Author by

user1032531

Updated on September 18, 2022

Comments

  • user1032531
    user1032531 over 1 year

    Okay, I understand how I may use apt-get {install|remove} mypackages and apt-get upgrade to install, upgrade, or remove binaries as well as their configuration data files and dependencies (actually, remove will only remove the binaries unless additional flags are provided).

    I am not looking for how it is used as the man describes this, but high level what it is doing. My end goal is to create a means for me to install and manage some custom software (created by a make file) on multiple remote machines, and I need to learn more about the process. If answers to this question are based on which distribution is used, please tailor to Debian.

    In addition to generally how it works, I have the following specific questions:

    1. How does the client that is accessing the apt repository keep track of the files?
    2. Must the repository be hosted on the same operating system (i.e. can apt repository be hosted on redhat)?
    3. How are the locations to install files specified? Is this specified by the .deb file?
    4. How is a remote machine accessing the repository? Is it just ftp(s) or http(s)?
    5. Is the machine that is hosting the repository running special software (like gitlab for a git repository), or is it just some structured file system?
  • Stephen Kitt
    Stephen Kitt almost 7 years
    Beat me to it ;-). unix.stackexchange.com/q/285635/86440 covers the integrity aspect of things (point 4). FTP support on the mirror side was de-activated recently IIRC.
  • user1032531
    user1032531 almost 7 years
    Regarding #2, debian.org/doc/manuals/distribute-deb/… states differently. Thanks
  • user1032531
    user1032531 almost 7 years
    Regarding #5, what is the point of wiki.debian.org/DebianRepository/…
  • derobert
    derobert almost 7 years
    @user1032531 #2. Creating packages is best done on Debian. But your web server can be anything. (Typically, you create the package on your build host, possibly even the entire repository structure, then upload it to the webserver.) #5. Those tools help you build the structured filesystem, including all the lists files, signed releases files, etc. (They are also probably easiest to run on Debian).
  • user1032531
    user1032531 almost 7 years
    Thanks derobert. Makes sense now. Packages are specific. Distribution is agnostic.
  • user1032531
    user1032531 almost 7 years
    I reviewed wiki.debian.org/Packaging and the debian.org/doc/manuals/packaging-tutorial/… tutorial per your recommendation. While very comprehensive, it does not provide a high level narrative describing how packaging repositories work. Your responses to my specific questions and your comments helps, but I believe a concise high level answer would be a welcomed addition to this forum should you be willing to write one. Regardless, thanks again for your help.
  • derobert
    derobert almost 7 years
    @user1032531 I'd like to try to write one—what still remains unclear? I can try to fix that.
  • derobert
    derobert almost 7 years
    @user1032531 I've added something (more than doubling the size of the answer...) — does that help?
  • user1032531
    user1032531 almost 7 years
    Yes, derobert, it does. While details are important, having high level context allows me to organize the details. Thank you
  • Alen Milakovic
    Alen Milakovic almost 7 years
    It might be worth emphasizing that Packages and Contents normally live server-side. I.e. they are not downloaded, by apt-get, at least.
  • derobert
    derobert almost 7 years
    @FaheemMitha Packages (of some compression, modern apt prefers .xz) is definitely downloaded, winds up in /var/lib/apt/lists/. Contents is downloaded by apt-file and auto-apt.
  • Marco Dufal
    Marco Dufal almost 4 years
    What do you guys know about amazon linux repositories?