Skip to content

How to use Congregate

Quick Start

  1. Download and install docker, podman, or any other application that can run docker containers
  2. Generate a personal access token from your gitlab.com account that has the read_registry permission by clicking User Icon (top right) > Settings > Access Tokens > Generate Access Token.
  3. Then run docker login registry.gitlab.com and provide your username and paste the access token when prompted for a password.
  4. Pull the docker image from the container registry:
    • ✅ For official versioned releases
      • docker pull registry.gitlab.com/gitlab-org/professional-services-automation/tools/migration/congregate:<version> (or :latest-<debian|centos>)
    • ⚠ For rolling releases (latest on master):
      • docker pull registry.gitlab.com/gitlab-org/professional-services-automation/tools/migration/congregate:rolling-debian (or :rolling-centos)
  5. Run docker images -a to list and copy the image-id of the congregate image.
  6. Create and run the Congregate docker container:

    bash docker run \ --name <name> \ -v /var/run/docker.sock:/var/run/docker.sock \ # expose docker socket as volume -v /etc/hosts:/etc/hosts \ # expose DNS mapping -p 8000:8000 \ # expose UI port -it <image-id> \ /bin/bash # or zsh

  7. Set /var/run/docker.sock permissions for ps-user by running sudo chmod 666 /var/run/docker.sock.

  8. Exit (Ctrl+d) the container (stops it).
  9. To resume the container and keep it up, along with copying the congregate.conf template file:

    bash docker start <container-name> docker cp ./congregate.conf.template <container-name>:/opt/congregate/data/congregate.conf docker exec -it <container-name> /bin/bash # OR -itd (zsh)

  10. Modify the configuration file in /opt/congregate/data/congregate.conf using the congregate.conf.template as a guide.

  11. Check out the fundamental congregate commands below.

Full Congregate setup with test environment

Follow the steps in this issue template for a full guide on how to setup a source, destination and congregate system to test congregate functionality.

Congregate commands

During migrations, we generally do the same type of commands every time in roughly the same order.

Congregate Process Flow

NOTE: Migrations from GitHub and BitBucket Server (source SCMs) rely on incoming (Ingress) REST API connections from GitLab (destination).

List

./congregate.sh list is the command that gathers meta data from the source system(s) to prepare for later steps of migration. Listing pulls all of the metadata from each of these systems and typically requires an admin token or read-only all credentials to be successful. To set these run ./congregate.sh configure or modify data/congregate.conf with the SCM and/or CI source hostname(s) and credentials provided by the customer. Be sure to reference the congregate config template to format it correctly if you're editing the .conf file directly.

Listing can take quite a long time and is directly dependant on the amount of data in the source system(s). To tune the concurrency of a list, you can add --processes=16 (GitHub only) to a list command. Without setting this, it will use processes=nproc based on the hardware number of processes. Be careful with going over 16 processes as it will increase the chances of causing undue harm to the source systems (which if their API rate limit isn't set, could cause stability problems)

If you need to re-list and don't want to overwrite any data that you have listed previously, run it with --partial. Additional --skip-* arguments allow you to skip users, groups, projects and ci.

If you are migrating data from CI sources with an SCM source, listing will also perform a mapping function to map CI jobs to SCM repositories. This functionality will position migrations of build config XMLs into repositories for future transformation into gitlab-ci.yml.

Stage

Staging data for migration follows the same usage pattern as staging data does with a git-based workflow. Now that we've run a ./congregate.sh list to get ALL of the data to the local filesystem (combination of local MongoDB and .json files), we need to select which users, projects, and groups to migrate. The output of any flavor of stage command will be the staged_groups.json, staged_projects.json, and staged_users.json files in the data directory.

When running stage in a shell (without UI), we will need to identify the project and group IDs that are in scope for the current migration. We can stage a few different ways.

stage-projects

To get the project IDs from the names of the projects (that the customer should have provided) you can run a cat data/projects.json | grep <ProjectName> -A10 -B10 where ProjectName is the name of the repository (github & bitbucket) or project (gitlab) that is in scope to migrate. From there you will see the ID that can be noted for the following command ./congregate.sh stage-projects 13 78 951 446. Note you can list multiple project ids space delimited after the stage-projects verb. To force this command to write to the staged_projects.json and other staged json files use the --commit flag.

stage-groups

This process is very similar to stage-projects, but you need to search for group ids in the groups.json file. Run cat data/projects.json | grep <GroupName> -A10 -B10. Then run ./congregate.sh stage-groups 78 651 997 --commit to produce the staged_*.json files.

stage-wave

For larger migrations, customers often want to define waves of migration in a spreadsheet that congregate can read in to stage many different groups and projects at once without having to do the initial investigation for project and group ids. To set this up we need to add a few lines to the data/congregate.conf that look like the ones from the template. This configuration template refers to the stage-wave-template.csv file. A more verbose description on how this configuration works is in the configuration section below.

Once we have this in place, you can run ./congregate.sh stage-wave <WaveName> --commit to stage all projects and groups defined by the spreadsheet.

Stage using the UI

If you are running congregate from a full desktop experience (not SSH'ed or BASH'ed into a container running on a cluster), you can use the UI to stage data after it is listed by using ./congregate.sh ui &. This will give you the ability to select the specific groups, projects and users that you want to stage for a migration. Once you have checked all the boxes, you can click stage to generate the staged_*.json files required for the final step of migration.

Migrate

./congregate.sh migrate is the action of initiating the data imports on gitlab based on the information on staged_projects.json, staged_groups.json, and staged_users.json. You can run to see the log output to see what projects and groups will be migrated to what locations. When you are satisfied with these results, you can add the --commit flag to run the migration for real. Note, if you want to adjust the concurrency, you can add the --processes=n flag to this command as well.

In General, we like to migrate all users (excluding internal and bot) into the destination system before doing any group or project migrations. This is because if the user is not found during a project or group migration, there will be attribution errors on import that manifest as MRs or other objects in gitlab being owned by or attributed to the import user which is usually root or some Admin.

Migrate Users

Best practice is to first migrate ONLY users by running:

  • ./congregate.sh ui & - Open the UI in your browser (by default localhost:8000), select and stage all users.
  • ./congregate.sh migrate --skip-group-export --skip-group-import --skip-project-export --skip-project-import - Inspect the dry-run output in:
  • data/results/dry_run_user_migration.json
  • data/logs/congregate.log
  • Inspect data/staged_users.json if any of the NOT found users are inactive as, by default, they will not be migrated.
  • To explicitly remove inactive users from staged users, groups and projects run ./congregate.sh remove-inactive-users --commit.
  • ./congregate.sh migrate --skip-group-export --skip-group-import --skip-project-export --skip-project-import --commit

Migrate Groups and Sub-Groups

Once all the users are migrated:

  • Go back to the UI, select and stage all groups and sub-groups.
  • Only the top level groups will be staged as they comprise the entire tree structure.
  • ./congregate.sh search-for-staged-users - Check output for found and NOT found users on destination.
  • Adding argument --table will output the result, in the form of a table, to data/user_stats.csv.
  • All users should be found.
  • Inspect data/staged_users.json if any of the NOT found users are inactive as, by default, they will not be migrated.
  • To explicitly remove inactive users from staged users, groups and projects run ./congregate.sh remove-inactive-users --commit.
  • ./congregate.sh migrate --skip-users --skip-project-export --skip-project-import - Inspect the dry-run output in:
  • data/results/dry_run_group_migration.json
  • data/logs/congregate.log
  • ./congregate.sh migrate --skip-users --skip-project-export --skip-project-import --commit

Migrate Projects

Once all the users and groups and sub-groups are migrated:

  • Go back to the UI, select and stage projects (either all, or in waves).
  • ./congregate.sh search-for-staged-users - Check output for found and NOT found users on destination.
  • All users should be found.
  • Inspect data/staged_users.json if any of the NOT found users are inactive as, by default, they will not be migrated.
  • To explicitly remove inactive users from staged users, groups and projects run ./congregate.sh remove-inactive-users --commit.
  • ./congregate.sh migrate --skip-users --skip-group-export --skip-group-import - Inspect the dry-run output in:
  • data/results/dry_run_project_migration.json
  • data/logs/congregate.log
  • ./congregate.sh migrate --skip-users --skip-group-export --skip-group-import --commit

Rollback

To remove all of the staged users, groups (w/ sub-groups) and projects on destination run:

⚠ This will delete everything that was previously staged and migrated. If a significant period of time has passed since migration, you will risk losing data added by users in the time since migration completed. There is a default timeout on rollback 24 hours from the time of migration that acts as a guard against accidental rollbacks.

  • ./congregate.sh rollback --hard-delete - Inspect the output in:
  • data/logs/congregate.log
  • ./congregate.sh rollback --commit
  • For more granular rollback see Usage.

Checking the results of a migration

Lots of this section is covered in our runbooks, but an overview is provided below.

Spot checking features

  • Group creation
  • Project creation
  • Membership
  • Branches
  • Commits
  • Tags
  • Merge requests
  • User attribution
  • Branch protection settings
  • Merge request approver settings

Automated diff report

TODO

Migration results.json

TODO

Congregate Configuration Items

TODO

Migration Reporting

Migration reporting is typically an exercise in data gathering and normalization to determine the success of a migration. At scale this can be difficult since the sign-off of a migration is distributed to many people. Congregate can be configured to automatically create issues to gather migration sign-off agreement from application owners.

To configure this functionality, we need to add some lines to the congregate.conf from the template. Note:

  • The issue1.md expects this file to be in data/issue_templates
  • pmi_project_id is the id of the project that will contain all of the sign-off issues
  • subs is a key value pair dictionary that you can use to replace specific strings from the template with customer specific info.

Once we have the issues automatically being created on ./congregate.sh migrate --commit we can configure a stand-alone utility to poll these issues and build a csv that can be ingested by the customer's data-analyzer tool of choice.

TODO: Add info on this once we have it more complete.

Wave definition spreadsheet ingestion

The data in the stage-wave-template.csv represents the Repo/Project URLs that are in scope for a wave of migration. The required fields are:

Mandatory columns

  • Wave name - Name of a migration wave
  • Source URL - Full URL to the source repository
  • Parent Path - Destination GitLab group full_path e.g. parent/group/path
  • Override - Binary value
  • Empty (``) - Migrate only the source repository to the destination Parent Path
  • Non-empty (any string value, e.g. "yes", "x", etc.) - Migrate the entire source repository structure (e.g. parent/sub_parent/repository) to the destination Parent Path

Optional columns (suggestions)

  • Wave date - To keep multiple waves in a single .csv file and keep track of dates when they should be migrated
  • Approval - To keep track whether teams approve and are ready to migrate
  • etc.

Reporting

If the migration reporting feature is configured, there are two additional fields that are optional that will facilitate creating "sign-off" issues and assign them to application owners. The two fields are Application ID and Application Owner Email. These column names are mapped in the config file to the variable names that congregate expects.

To exercise this configuration, follow steps in the stage wave section.