How to validate code repos movement for tags, commit history etc

Our source repos currently is in Bitbucket. ~10 years of code checkin’s it has. We are moving our code to Gitlab now.

Pipeline fixes are NOT part of the current movement plan still how do we ensure/ validate that code movement is a success? Structure, Tags, Commit History, branches are correctly migrated and each branches structure, tags, commit history etc…? Any git commands that we can run, after migration, both on source and target repos to help us validate?

We have >100 repos to be migrated…

To ensure and validate that your code migration from Bitbucket to GitLab is successful, you need to verify key aspects of your repositories, such as branch structure, commit history, tags, and overall integrity. Below is a step-by-step process, including Git commands and automation ideas, to validate the migration:


1. Migration Process

Ensure you migrate using a reliable approach that preserves the Git history. Common approaches include:

  • Using git clone --mirror to clone the Bitbucket repository.
  • Pushing the cloned repository to GitLab using git push --mirror.

2. Key Elements to Validate

The main elements to validate after migration are:

  • Branches: Ensure all branches in the source repository are present in the target.
  • Commit History: Verify that the commit logs are identical.
  • Tags: Check that all annotated tags have been migrated.
  • File Structure: Validate that the file and folder structure matches.
  • Authors: Verify that commit authors are preserved.

3. Git Commands for Validation

a. List All Branches

Run the following command on both Bitbucket and GitLab repositories:

git branch -a

Compare the branch lists to ensure that all branches have been migrated.

b. List All Tags

Use the following command to compare tags:

git tag

c. Compare Commit Histories

For each branch, compare the commit history:

git log <branch_name> --oneline

Alternatively, use:

git log <branch_name> --pretty=format:"%h %an %ad %s"

Automate this comparison using a script to iterate through all branches.

d. Compare File Structures

For each branch:

  1. Check out the branch in both repositories.
  2. Compare file structures:
git ls-tree -r <branch_name> --name-only

Automate this check using a script.

e. Verify Annotated Tags

Run this command to verify that all tags exist and their commit IDs match:

git show-ref --tags

f. Validate Commit Authors

Use the following to list all commit authors:

git log --pretty=format:"%an <%ae>" | sort | uniq

Ensure all authors are preserved after migration.


4. Automate Validation Using Scripts

Given the number of repositories, automating these checks with a script is essential. Below is an example Bash script to compare two repositories:

#!/bin/bash

SOURCE_REPO="$1"  # Bitbucket repo path
TARGET_REPO="$2"  # GitLab repo path

validate_branches() {
  echo "Validating branches..."
  src_branches=$(git -C "$SOURCE_REPO" branch -a)
  tgt_branches=$(git -C "$TARGET_REPO" branch -a)

  echo "Source branches:"
  echo "$src_branches"
  echo "Target branches:"
  echo "$tgt_branches"

  diff <(echo "$src_branches") <(echo "$tgt_branches")
}

validate_tags() {
  echo "Validating tags..."
  src_tags=$(git -C "$SOURCE_REPO" tag)
  tgt_tags=$(git -C "$TARGET_REPO" tag)

  echo "Source tags:"
  echo "$src_tags"
  echo "Target tags:"
  echo "$tgt_tags"

  diff <(echo "$src_tags") <(echo "$tgt_tags")
}

validate_commit_history() {
  echo "Validating commit history for each branch..."
  src_branches=$(git -C "$SOURCE_REPO" branch -r | sed 's/origin\///')
  for branch in $src_branches; do
    src_commits=$(git -C "$SOURCE_REPO" log "$branch" --pretty=format:"%h %an %ad %s")
    tgt_commits=$(git -C "$TARGET_REPO" log "$branch" --pretty=format:"%h %an %ad %s")
    echo "Checking branch: $branch"
    diff <(echo "$src_commits") <(echo "$tgt_commits")
  done
}

validate_structure() {
  echo "Validating file structure for each branch..."
  src_branches=$(git -C "$SOURCE_REPO" branch -r | sed 's/origin\///')
  for branch in $src_branches; do
    git -C "$SOURCE_REPO" checkout "$branch"
    src_files=$(git -C "$SOURCE_REPO" ls-tree -r HEAD --name-only)
    git -C "$TARGET_REPO" checkout "$branch"
    tgt_files=$(git -C "$TARGET_REPO" ls-tree -r HEAD --name-only)
    echo "Checking branch: $branch"
    diff <(echo "$src_files") <(echo "$tgt_files")
  done
}

# Run validation functions
validate_branches
validate_tags
validate_commit_history
validate_structure
  • Save this script as validate_repos.sh.
  • Execute it with the source and target repository paths:
./validate_repos.sh /path/to/bitbucket/repo /path/to/gitlab/repo

5. Handle Large Number of Repositories

For >100 repositories, use a repository list file and batch process them:

  1. Create a file repo_list.txt with source and target repository pairs:
bitbucket_repo1 gitlab_repo1
bitbucket_repo2 gitlab_repo2
...
  1. Modify the script to loop through this list:
while read src tgt; do
  ./validate_repos.sh "$src" "$tgt"
done < repo_list.txt

6. Post-Migration Checklist

  • Verify migration reports generated by the scripts.
  • Document any discrepancies for resolution.
  • Set up GitLab CI/CD pipelines for future commits if necessary.

7. Optional: Visual Comparison Tools

For more advanced validation, you can use visual tools or services:

  • GitKraken: To visually compare commit histories and branches.
  • GitLab Import Feature: GitLab provides some built-in import validation, but this is limited.

By following these steps, you can systematically validate your migration, ensuring that every aspect of your repositories—structure, tags, branches, and commit history—has been migrated correctly. Let me know if you need help with specific parts of the process!