How to auto deploying git repositories with submodules on AWS?

11,145

Solution 1

Edit: Codebuild now has a "submodules" flag https://docs.aws.amazon.com/codebuild/latest/APIReference/API_GitSubmodulesConfig.html

Here's what worked for me

We're going to reinitialize the git repository and then trigger a submodule clone during the build phase of our deploy, essentially patching in support for submodules in codepipeline / codebuild

  • Generate a new SSH key for your github account, if using an organization you may want to create a deploy user
  • Store this ssh key in your aws parameter store using aws ssm put-parameter --name build_ssh_key --type String --value "$(cat id_rsa)" ideally use SecureString instead of String but the guide I was following simply used string so I'm not sure if the commandline will require any extra params
  • Go into IAM and grant your CodePipeline user read access to your paramstore, I just granted read access to SSM

Then make your buildspec.yml look like the following:

version: 0.2

env:
  parameter-store:
    build_ssh_key: "build_ssh_key"

phases:
  install:
    commands:
      - mkdir -p ~/.ssh
      - echo "$build_ssh_key" > ~/.ssh/id_rsa
      - chmod 600 ~/.ssh/id_rsa
      - ssh-keygen -F github.com || ssh-keyscan github.com >>~/.ssh/known_hosts
      - git config --global url."[email protected]:".insteadOf "https://github.com/"
      - git init
      - git remote add origin <Your Repo url here using the git protocol>
      - git fetch
      - git checkout -t origin/master
      - git submodule init
      - git submodule update --recursive
  build:
    commands:
      - echo '...replace with real build commands...'

artifacts:
  files:
    - '**/*'

Solution 2

After banging my head against this all day, I've found a simple solution (for Code Pipeline) that doesn't require any SSH key juggling in the buildspec. I am using Bitbucket but I would think this would work for other providers. I'm also cloning my submodule via https, I'm not sure if that's a requirement or not.

  1. Configure your source to do a full clone of the repository. This will pass along the git metadata that you need. Source configuration

  2. Configure your build role to add a customer-managed UseConnection permission to give your build action access to the credentials you configured for your source. Documentation from AWS here: https://docs.aws.amazon.com/codepipeline/latest/userguide/troubleshooting.html#codebuild-role-connections

  3. Set up your env to include git-credential-helper: yes and clone the submodule in your buildspec.yml:

enter image description here

And that's it! Submodule will be available for build, and without having to do a bunch of key configuration for every submodule you want to use.

Maybe a good addition to the documentation if this ends up being useful for people.

Solution 3

I ran into this issue myself and, thanks to the awesome suggestions by @matt-bucci I was able to come up with what seems like a robust solution.

My specific use-case is slightly different - I am using Lambda Layers to reduce lambda redundancy, but still need to include the layers as submodules in the Lambda function repositories so that CodeBuild can build and test PRs. I am also using CodePipeline to assist with continuous delivery - so I need a system that works with both CodePipeline and CodeBuild by itself

  1. I created a new SSH key for use by a "machine user" following these instructions. I am using a machine user in this case so that a new ssh key doesn't need to be generated for every project, as well as for potential support of multiple private submodules

  2. I stored the private key in the AWS Parameter Store as a SecureString. This doesn't actually change anything within CodeBuild, since it's smart enough to just know how to decrypt the key

  3. I gave the "codebuild" role AWS managed property: AmazonSSMReadOnlyAccess - allowing CodeBuild to access the private key

  4. I made my buildspec.yml file, using a bunch of the commands suggested by @matt-bucci, as well as some new ones

# This example buildspec will enable submodules for CodeBuild projects that are both 
# triggered directly and via CodePipeline
#
# This buildspec is designed with help from Stack Overflow: 
# https://stackoverflow.com/questions/42712542/how-to-auto-deploying-git-repositories-with-submodules-on-aws
version: 0.2  # Always use version 2
env:
  variables:
    # The remote origin that will be used if building through CodePipeline
    remote_origin: "[email protected]:your/gitUri"
  parameter-store:
    # The SSH RSA Key used by our machine user
    ssh_key: "ssh_key_name_goes_here"
phases:
  install:
    commands:
      # Add the "machine user's" ssh key and activate it - this allows us to get private (sub) repositories
      - mkdir -p ~/.ssh                   # Ensure the .ssh directory exists
      - echo "$ssh_key" > ~/.ssh/ssh_key  # Save the machine user's private key
      - chmod 600 ~/.ssh/ssh_key          # Adjust the private key permissions (avoids a critical error)
      - eval "$(ssh-agent -s)"            # Initialize the ssh agent
      - ssh-add ~/.ssh/ssh_key            # Add the machine user's key to the ssh "keychain"
      # SSH Credentials have been set up. Check for a .git directory to determine if we need to set up our git package
      - |
        if [ ! -d ".git" ]; then
          git init                                              # Initialize Git
          git remote add origin "$remote_origin"                # Add the remote origin so we can fetch
          git fetch                                             # Get all the things
          git checkout -f "$CODEBUILD_RESOLVED_SOURCE_VERSION"  # Checkout the specific commit we are building
        fi
      # Now that setup is complete, get submodules
      - git submodule init
      - git submodule update --recursive
      # Additional install steps... (npm install, etc)
  build:
    commands:
      # Build commands...
artifacts:
  files:
    # Artifact Definitions...

This install script performs three discrete steps

  1. It installs and enables the ssh private key used to access private repositories

  2. It determines if there is a .git folder - if there isn't then the script will initialize git and checkout the exact commit that is being built. Note: According to the AWS docs, the $CODEBUILD_RESOLVED_SOURCE_VERSION envar is not guranteed to be present in CodePipeline builds. However, I have not seen this fail

  3. Finally, it actually gets the submodules

Obviously, this is not a great solution to this problem. However, it's the best I can come up with given the (unnecessary) limitations of CodePipeline. A side effect of this process is that the "Source" CodePipeline stage is completely worthless, since we just overwrite the archived source files - it's only used to listen for changes to the repository

Better functionality has been requested for over 2 years now: https://forums.aws.amazon.com/thread.jspa?threadID=248267

Edited January, 23, 2019

I realized (the hard way) that my previous response didn't support CodePipeline builds, only builds run through CodeBuild directly. When CodeBuild responds to a GitHub Webhook, it will clone the entire GitHub repository, including the .git folder

However, when using CodePipeline, the "Source" action will clone the repository, check out the appropriate branch, then artifact the raw files without the .git folder. This means that we do have to initialize the github repository to get access to submodules

Solution 4

While @MattBucci answer works, it has the caveat that you can only pull a specific branch, and not the specific commit that the submodule is using.

In order to handle that case, which is likely when using submodules, there are multiple things that needs to be done:

1) Create a git pre-commit hook with the following content:

#!/bin/bash

#   This file is used in post-commit hook
#   if .commit exists you know a commit has just taken place but a post-commit hasn't run yet
#
touch .commit

If you already have one, you can add that line at the beginning.

2) Create a git post-commit hook with the following contnet:

#!/bin/bash


DIR=$(git rev-parse --show-toplevel);

if [[ -e $DIR/.commit ]]; then
    echo "Generating submodule integrity file"
    rm .commit

    SUBMODULE_TRACKING_FILE=$DIR/.submodule-hash
    MODULE_DIR=module
    #   Get submodule hash, this will be used by AWS Code Build to pull the correct version.
    #   AWS Code Build does not support git submodules at the moment
    #   https://forums.aws.amazon.com/thread.jspa?messageID=764680#764680
    git ls-tree $(git symbolic-ref --short HEAD) $MODULE_DIR/ | awk '{ print $3 }' > $SUBMODULE_TRACKING_FILE

    git add $SUBMODULE_TRACKING_FILE
    git commit --amend -C HEAD --no-verify
fi

exit 0

This hook will put the current commit hash into .submodule-hash file, this file needs to be committed to version control.

3) Go to your AWS Code build project

Developer Tools > CodeBuild > Build projects > YOUR_PROJECT > Edit Environment

Add an environment variable called: GIT_KEY, and the value will be the ssh key base 64 encoded. (Without line breaks, otherwise it won't work).

You can convert it online, or use any tool or programming language.

enter image description here

4) On your buildspec.yml add a pre_build script.

version: 0.2

phases:
  pre_build:
    commands:
      - bash build/aws-pre-build.sh
...

5) Create build/aws-pre-build.sh with the following content:

#!/bin/bash

set -e

#   Get root path
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && cd .. && pwd )"

MODULE_HASH=$(cat $DIR/.submodule-hash);
GIT_HOST=bitbucket.org
MODULE_DIR=module
REPO=user/repo.git


if [[ ! -d ~/.ssh ]]; then
    mkdir ~/.ssh
fi

if [[ ! -f ~/.ssh/known_hosts ]]; then
    touch ~/.ssh/known_hosts
fi

#   Base64 decode private key, and save it to ~/.ssh/git
echo "- Adding git private key"

echo $GIT_KEY | base64 -d > ~/.ssh/git

#   Add correct permissions to key
chmod 600 ~/.ssh/git

#   Add $GIT_HOST to ssh config
echo "- Adding ssh config file"

cat > ~/.ssh/config <<_EOF_
Host $GIT_HOST
    User git
    IdentityFile ~/.ssh/git
    IdentitiesOnly yes
_EOF_

#   Check if host is present in known_hosts
echo "- Checking $GIT_HOST in known_hosts"

if ! ssh-keygen -F $GIT_HOST > /dev/null; then
    echo "- Adding $GIT_HOST to known hosts"
    ssh-keyscan -t rsa $GIT_HOST >> ~/.ssh/known_hosts
fi

#   AWS Code build does not send submodules, remove the empty folder
rm -rf $MODULE_DIR

# Clone submodule in the right folder
git clone git@$GIT_HOST:$REPO $MODULE_DIR

# cd to submodule
cd $DIR/$MODULE_DIR

# Checkout the right commit
echo "- Checking out $MODULE_HASH"

git checkout $MODULE_HASH


Extras

If you have an extra step before going to AWS Code Build, something like bitbucket pipelines or similar, you can check that the actual git submodule hash, matches the hash from the generated file: .submodule-hash.

If it does not match, it means who ever pushed, didn't have the git hook.

#!/bin/bash

$MODULE_DIR=module

echo "- Checking submodules integrity"

SUBMODULE_TRACKING_FILE=.submodule-hash


#   Check submodule hash, this will be used by AWS Code Build to pull the correct version.
#   AWS Code Build does not support git submodules at the moment
#   https://forums.aws.amazon.com/thread.jspa?messageID=764680#764680

#   Git submodule actual hash
SUBMODULE_HASH=$(git ls-tree $(git symbolic-ref --short HEAD) $MODULE_DIR/ | awk '{ print $3 }')

if [[ ! -e $SUBMODULE_TRACKING_FILE ]]; then

    echo "ERROR: $SUBMODULE_TRACKING_FILE file not found."
    submoduleError

    exit 1
fi

#   Custom submodule hash - The is used by AWS Code Build
SUBMODULE_TRACKING_FILE_HASH=$(cat $SUBMODULE_TRACKING_FILE)

if [[ "$SUBMODULE_TRACKING_FILE_HASH" != "$SUBMODULE_HASH"  ]]; then

    echo "ERROR: $SUBMODULE_TRACKING_FILE file content does not match submodule hash: $SUBMODULE_HASH"

    echo -e "\tYou should have pre-commit && post-commit hook enabled or update $SUBMODULE_TRACKING_FILE manually:"
    echo -e "\tcmd: git ls-tree $(git symbolic-ref --short HEAD) $MODULE_DIR/ | awk '{ print \$3 }' > $SUBMODULE_TRACKING_FILE"

    exit 1
fi

NOTE: You can also create that file on the pipeline before AWS Code Build, create a commit, tag it, and push it so the AWS Code Build pipeline begins.

git ls-tree $(git symbolic-ref --short HEAD) module/ | awk '{ print \$3 }' > .submodule-hash

Solution 5

SSH is not needed if you're using CodeCommit as a repository. Use the AWS CLI Credential Helper and clone over https.

git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true
git clone https://git-codecommit.[region].amazonaws.com/v1/repos/[repo]
Share:
11,145
Varun Nayyar
Author by

Varun Nayyar

Updated on June 03, 2022

Comments

  • Varun Nayyar
    Varun Nayyar almost 2 years

    I have a submodule in my git repository and my directory structure is like,

    app
      -- folder1
      -- folder2
      -- submodule @5855
    

    I have deployed my code on AWS by using autodeploy service. Now, on server I have code in the parent-directory but submodule directories are empty.

    Q1) How can I get data in submodules. My repository on server is not git repository. Do I need to convert it firstly into git repo and then run submodule commands to get it ?

    Q2) How can I automate the submodule deployment as well?

    Thanks

  • Martin Kosicky
    Martin Kosicky over 5 years
    This aws CodeCommit is really trashed when we cant use submodules. I guess I will just make a batch script which is a lot more easier than this non-working CI/CD "tools"
  • Sergey Nikitin
    Sergey Nikitin about 5 years
    What if ssh-add requires passphrase to be entered? How can it be automated/bypassed?
  • imekinox
    imekinox over 3 years
    This actually works, for those using CodePipeline, you need to do this: go to the CodeBuild Project, Edit, Sources, Add a new source, Select Github, Connect via Oauth or Personal Access token, set the repo, and save ( you can delete the source later ) what you need is to create a connection in CODEBUILD, not CodePipeline. This will enable the credential helper in the CodeBuild Container. You can debug by checking the build logs, right bellow "Git credential helper enabled" you should not see: GITHUB Git credential unavailable. Or BITBUCKET if thats the case. Hope this helps someone .
  • Jeremy
    Jeremy over 3 years
    This doesn't work for me with BitBucket: Git credential helper enabled GITHUB Git credential unavailable. GITHUB_ENTERPRISE Git credential unavailable.. Not sure why it thinks it's GitHub. I don't use that for any projects. Also confirmed I have the correct codestar statement for the role. This is strange.. man this is insane they still lack support for submodules. CDK is such a pain
  • Jason Yu
    Jason Yu about 3 years
    Thanks @imekinox ! This works with our CodePipeline + CodeBuild + GitHub submodules setup. Without the CodeBuild source update, it will prompt remote: Invalid username or password. and then fatal: Authentication failed for 'https://github.com/xxx.git/' and fatal: clone of 'https://github.com/xxx.git' into submodule path '/codebuild/output/xxx' failed
  • megapixel23
    megapixel23 almost 3 years
    This + @imekinox guide works perfectly. I can only add that for bitbucket you must specify the HTTPS link to the repo, not the GIT based.
  • Joe Keene
    Joe Keene almost 3 years
    Thanks for this it worked perfectly - I also had to add - ssh-keygen -F github.com || ssh-keyscan github.com >>~/.ssh/known_hosts to get it to connect to the github repo. Cheers
  • Kushan Gunasekera
    Kushan Gunasekera almost 3 years
    You're welcome @JoeKeene, glad to hear that.
  • Hexy
    Hexy over 2 years
    Did you write this blog? adrianhesketh.com/2018/05/02/…
  • imekinox
    imekinox over 2 years
    For Github, you should use HTTPS too, not GIT-based URL.
  • kewur
    kewur over 2 years
    this is the only solution that works with codepipeline + codebuild. hopefully they will add submodule support in the codepipeline source actions s we dont hve to do this hack anymore.
  • shaikhspear
    shaikhspear almost 2 years
    CodeBuild supports GitSubmodulesConfig. How can I integrate it with my pipeline of type "AWS::CodePipeline::Pipeline"?