On GitHub API - what is the best way to get the last commit message associated with each file?

18,373

Solution 1

Ok, after figuring out that what you need is the latest commit message for each file, here's what you can do.

First, get the list of files in your repository. To do this, you need to:

1) fetch the reference object of the branch that you want to list files for:

GET https://api.github.com/repos/:owner/:repo/git/refs/heads/:branch

You probably want the master branch, so this is an example of the request you will make:

https://api.github.com/repos/izuzak/pmrpc/git/refs/heads/master

The response you will get will look like this:

{
  "ref": "refs/heads/master",
  "url": "https://api.github.com/repos/izuzak/pmrpc/git/refs/heads/master",
  "object": {
    "sha": "fd6973f430a3367ad718ff049f1b075843913d6f",
    "type": "commit",
    "url": "https://api.github.com/repos/izuzak/pmrpc/git/commits/fd6973f430a3367ad718ff049f1b075843913d6f"
  }
}

2) fetch the commit object that the reference points to, using the object.url property of the response you received in the previous step:

GET https://api.github.com/repos/izuzak/pmrpc/git/commits/fd6973f430a3367ad718ff049f1b075843913d6f

The response you will get will look like this:

{
  "sha": "fd6973f430a3367ad718ff049f1b075843913d6f",
  "url": "https://api.github.com/repos/izuzak/pmrpc/git/commits/fd6973f430a3367ad718ff049f1b075843913d6f",
  "html_url": "https://github.com/izuzak/pmrpc/commits/fd6973f430a3367ad718ff049f1b075843913d6f",
  "author": {
    "name": "Ivan Zuzak",
    "email": "[email protected]",
    "date": "2013-04-09T08:55:45Z"
  },
  "committer": {
    "name": "Ivan Zuzak",
    "email": "[email protected]",
    "date": "2013-04-09T08:55:45Z"
  },
  "tree": {
    "sha": "f5f5de80f67dd794ffbd4abb855fb7d1a573660e",
    "url": "https://api.github.com/repos/izuzak/pmrpc/git/trees/f5f5de80f67dd794ffbd4abb855fb7d1a573660e"
  },
  "message": "fix typos",
  "parents": [
    {
      "sha": "d3617ae56dda793131e743b2ff394984bbab6ca3",
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/commits/d3617ae56dda793131e743b2ff394984bbab6ca3",
      "html_url": "https://github.com/izuzak/pmrpc/commits/d3617ae56dda793131e743b2ff394984bbab6ca3"
    }
  ]
}

3) fetch the tree object of the commit object fetched in the previous step. You will do this by following the tree.url link provided in the response of the previous step:

GET https://api.github.com/repos/izuzak/pmrpc/git/trees/f5f5de80f67dd794ffbd4abb855fb7d1a573660e

The response will look like this:

{
  "sha": "f5f5de80f67dd794ffbd4abb855fb7d1a573660e",
  "url": "https://api.github.com/repos/izuzak/pmrpc/git/trees/f5f5de80f67dd794ffbd4abb855fb7d1a573660e",
  "tree": [
    {
      "mode": "100644",
      "type": "blob",
      "sha": "726f21a4adec8c24c2fab6cf5b455d094a8b21bf",
      "path": "LICENSE.markdown",
      "size": 568,
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/blobs/726f21a4adec8c24c2fab6cf5b455d094a8b21bf"
    },
    {
      "mode": "100644",
      "type": "blob",
      "sha": "eb94760b81441b34a73d1b085d9f153ae48b0e63",
      "path": "README.markdown",
      "size": 5772,
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/blobs/eb94760b81441b34a73d1b085d9f153ae48b0e63"
    },
    {
      "mode": "040000",
      "type": "tree",
      "sha": "2e72b217b8644ce6874cda03387a7ab2d8eee55e",
      "path": "examples",
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/trees/2e72b217b8644ce6874cda03387a7ab2d8eee55e"
    },
    {
      "mode": "100644",
      "type": "blob",
      "sha": "64b0dbe4981759c0f9640c8e882c97c7324fc798",
      "path": "pmrpc.js",
      "size": 24546,
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/blobs/64b0dbe4981759c0f9640c8e882c97c7324fc798"
    }
  ]
}

These are all the files and folders in the repository. Notice however that for folders you will need to recursively fetch the folder tree object to get the list of files in the folder. In the response above, the examples is a folder which you can see by the tree value of the type property. So, you would to another GET request on the url provided with the folder:

  GET https://api.github.com/repos/izuzak/pmrpc/git/trees/2e72b217b8644ce6874cda03387a7ab2d8eee55e

An alternative approach is to get the list of all files (in all folders) with just one request, using the recursive=1 parameter, as described here. I suggest you use this approach since it requires just a single HTTP request.

Next, now that you have the list of files and folders in the repo, you will get the last commit that changed each of the files/folders. To do that, make this request

GET https://api.github.com/repos/:user/:repo/commits?path=FILE_OR_FOLDER_PATH

So, for example, this is a request to fetch the commits for the examples folder mentioned above:

GET https://api.github.com/repos/izuzak/pmrpc/commits?path=examples

The response you will get is a list of commit object, and you should just look at the first object in that list (since you are interested in the last commit for the file) and retrieve the commit.message property to get the message you need:

[
  {
    "sha": "3437f015257683a86e3b973b3279754df9ac2b24",
    "commit": {
      "author": { ... },
      "committer": { ... },
      "message": "change mode",
      "tree": { ... },
      "url": "https://api.github.com/repos/izuzak/pmrpc/git/commits/3437f015257683a86e3b973b3279754df9ac2b24",
      "comment_count": 0
    },
    ...
  },
  {
    ...
  }
]

In this case, the message for the latest commit that changed the folder examples is "change mode."

So, basically, you need to make 3 HTTP requests to fetch the list of files, and then 1 HTTP request for each file and folder. The bad news is that if you have lots of files -- you will be making lots of HTTP requests. The good news is that you can cache responses so that you don't need to make requests if nothing changed (see here for more info). Also, you will not be fetching all the commit messages at once, you will fetch them as the user navigates through the folders (just as on GitHub as you click on folders). Thus you should be able to stay within limits of 5000 requests easily.

Hope this helps! And let me know if you find an easier way to do this :). I don't know if theres a way to achieve this with just 1-2 requests, which is probably what you expected.

Solution 2

I'm listing commits on the repository and than grabbing the first one and reading it's SHA and it works great:

https://developer.github.com/v3/repos/commits/#list-commits-on-a-repository

In Go it looks something like this:

func GetLatestCommit(owner, repo string, sgc *github.Client) (string, error) {
    commits, res, err := sgc.Repositories.ListCommits(owner, repo, &github.CommitsListOptions{})

    if err != nil {
        log.Printf("err: %s res: %s", err, res)
        return "", err
    }

    log.Printf("last commit: %s", *commits[0].SHA)

    return *commits[0].SHA, nil
}
Share:
18,373

Related videos on Youtube

Marcos Scriven
Author by

Marcos Scriven

Updated on June 04, 2022

Comments

  • Marcos Scriven
    Marcos Scriven almost 2 years

    So far as I understand it, messages are associated with commits. But when you look at a repo on GitHub it helpfully lists the message by each file, for when it was last changed.

    I'd like to replicate that in a web view of a repo I have. Looking at the GitHub api it looks to me the only way to get that info is to download all the commits (which can be paged), and work from the most recent ones assigning commit messages to the files in your local cache, going further and further back until you've got the message for every file, potentially to the very first commit, if any of the files have not been changed since the initial commit

    Question is, is that the right way to do it? Is that not going to kill even the 5000/hr quota?

  • Marcos Scriven
    Marcos Scriven about 11 years
    Ticked as the answer as it's very comprehensive, thanks. However, it's the call to get the last commit for every file that I was hoping to avoid. I think what I'll do is have a background cron job that gets a few every so often, limiting the rate it does so. It's not crucial information, but it is useful I think
  • Ivan Zuzak
    Ivan Zuzak about 11 years
    @MarcosScriven Thanks! And yeah, I understand why that's an issue for you. The cron job idea sounds cool, though. Nice!
  • novemberkilo
    novemberkilo over 10 years
    Nice! If you just want the latest commit that affected the file, you can use pagination to work for you like so GET https://api.github.com/repos/izuzak/pmrpc/commits?path=examp‌​les&page=1&per_page=‌​1
  • mutant_america
    mutant_america about 7 years
    What if repo is private?