How to use Congregate
Quick Start
- Download and install docker, podman, or any other application that can run docker containers
- Generate a personal access token from your gitlab.com account that has the
read_registry
permission by clicking User Icon (top right) > Settings > Access Tokens > Generate Access Token. - Then run
docker login registry.gitlab.com
and provide your username and paste the access token when prompted for a password. - Pull the docker image from the container registry:
- For official versioned releases
docker pull registry.gitlab.com/gitlab-org/professional-services-automation/tools/migration/congregate:<version>
(or:latest-<debian|centos>
)
- For rolling releases (latest on
master
):docker pull registry.gitlab.com/gitlab-org/professional-services-automation/tools/migration/congregate:rolling-debian
(or:rolling-centos
)
- For official versioned releases
- Run
docker images -a
to list and copy theimage-id
of the congregate image. -
Create and run the Congregate docker container:
bash docker run \ --name <name> \ -v /var/run/docker.sock:/var/run/docker.sock \ # expose docker socket as volume -v /etc/hosts:/etc/hosts \ # expose DNS mapping -p 8000:8000 \ # expose UI port -it <image-id> \ /bin/bash # or zsh
-
Set
/var/run/docker.sock
permissions forps-user
by runningsudo chmod 666 /var/run/docker.sock
. - Exit (Ctrl+d) the container (stops it).
-
To resume the container and keep it up, along with copying the congregate.conf template file:
bash docker start <container-name> docker cp ./congregate.conf.template <container-name>:/opt/congregate/data/congregate.conf docker exec -it <container-name> /bin/bash # OR -itd (zsh)
-
Modify the configuration file in
/opt/congregate/data/congregate.conf
using thecongregate.conf.template
as a guide. - Check out the fundamental congregate commands below.
Full Congregate setup with test environment
Follow the steps in this issue template for a full guide on how to setup a source, destination and congregate system to test congregate functionality.
Congregate commands
During migrations, we generally do the same type of commands every time in roughly the same order.
NOTE: Migrations from GitHub and BitBucket Server (source SCMs) rely on incoming (Ingress) REST API connections from GitLab (destination).
List
./congregate.sh list
is the command that gathers meta data from the source system(s) to prepare for later steps of migration. Listing pulls all of the metadata from each of these systems and typically requires an admin token or read-only all credentials to be successful. To set these run ./congregate.sh configure
or modify data/congregate.conf
with the SCM and/or CI source hostname(s) and credentials provided by the customer. Be sure to reference the congregate config template to format it correctly if you're editing the .conf
file directly.
Listing can take quite a long time and is directly dependant on the amount of data in the source system(s). To tune the concurrency of a list, you can add --processes=16
(GitHub only) to a list
command. Without setting this, it will use processes=nproc
based on the hardware number of processes. Be careful with going over 16 processes as it will increase the chances of causing undue harm to the source systems (which if their API rate limit isn't set, could cause stability problems)
If you need to re-list and don't want to overwrite any data that you have listed previously, run it with --partial
. Additional --skip-*
arguments allow you to skip users, groups, projects and ci.
If you are migrating data from CI sources with an SCM source, listing will also perform a mapping function to map CI jobs to SCM repositories. This functionality will position migrations of build config XMLs into repositories for future transformation into gitlab-ci.yml
.
Stage
Staging data for migration follows the same usage pattern as staging data does with a git-based workflow. Now that we've run a ./congregate.sh list
to get ALL of the data to the local filesystem (combination of local MongoDB and .json
files), we need to select which users, projects, and groups to migrate. The output of any flavor of stage
command will be the staged_groups.json
, staged_projects.json
, and staged_users.json
files in the data
directory.
When running stage in a shell (without UI), we will need to identify the project and group IDs that are in scope for the current migration. We can stage a few different ways.
stage-projects
To get the project IDs from the names of the projects (that the customer should have provided) you can run a cat data/projects.json | grep <ProjectName> -A10 -B10
where ProjectName is the name of the repository (github & bitbucket) or project (gitlab) that is in scope to migrate. From there you will see the ID that can be noted for the following command ./congregate.sh stage-projects 13 78 951 446
. Note you can list multiple project ids space delimited after the stage-projects verb. To force this command to write to the staged_projects.json
and other staged json files use the --commit
flag.
stage-groups
This process is very similar to stage-projects, but you need to search for group ids in the groups.json
file. Run cat data/projects.json | grep <GroupName> -A10 -B10
. Then run ./congregate.sh stage-groups 78 651 997 --commit
to produce the staged_*.json
files.
stage-wave
For larger migrations, customers often want to define waves of migration in a spreadsheet that congregate can read in to stage many different groups and projects at once without having to do the initial investigation for project and group ids. To set this up we need to add a few lines to the data/congregate.conf
that look like the ones from the template. This configuration template refers to the stage-wave-template.csv file. A more verbose description on how this configuration works is in the configuration section below.
Once we have this in place, you can run ./congregate.sh stage-wave <WaveName> --commit
to stage all projects and groups defined by the spreadsheet.
Stage using the UI
If you are running congregate from a full desktop experience (not SSH'ed or BASH'ed into a container running on a cluster), you can use the UI to stage data after it is listed by using ./congregate.sh ui &
. This will give you the ability to select the specific groups, projects and users that you want to stage for a migration. Once you have checked all the boxes, you can click stage to generate the staged_*.json
files required for the final step of migration.
Migrate
./congregate.sh migrate
is the action of initiating the data imports on gitlab based on the information on staged_projects.json
, staged_groups.json
, and staged_users.json
. You can run to see the log output to see what projects and groups will be migrated to what locations. When you are satisfied with these results, you can add the --commit
flag to run the migration for real. Note, if you want to adjust the concurrency, you can add the --processes=n
flag to this command as well.
In General, we like to migrate all users (excluding internal and bot) into the destination system before doing any group or project migrations. This is because if the user is not found during a project or group migration, there will be attribution errors on import that manifest as MRs or other objects in gitlab being owned by or attributed to the import user which is usually root or some Admin.
Migrate Users
Best practice is to first migrate ONLY users by running:
./congregate.sh ui &
- Open the UI in your browser (by defaultlocalhost:8000
), select and stage all users../congregate.sh migrate --skip-group-export --skip-group-import --skip-project-export --skip-project-import
- Inspect the dry-run output in:data/results/dry_run_user_migration.json
data/logs/congregate.log
- Inspect
data/staged_users.json
if any of the NOT found users are inactive as, by default, they will not be migrated. - To explicitly remove inactive users from staged users, groups and projects run
./congregate.sh remove-inactive-users --commit
. ./congregate.sh migrate --skip-group-export --skip-group-import --skip-project-export --skip-project-import --commit
Migrate Groups and Sub-Groups
Once all the users are migrated:
- Go back to the UI, select and stage all groups and sub-groups.
- Only the top level groups will be staged as they comprise the entire tree structure.
./congregate.sh search-for-staged-users
- Check output for found and NOT found users on destination.- Adding argument
--table
will output the result, in the form of a table, to data/user_stats.csv. - All users should be found.
- Inspect
data/staged_users.json
if any of the NOT found users are inactive as, by default, they will not be migrated. - To explicitly remove inactive users from staged users, groups and projects run
./congregate.sh remove-inactive-users --commit
. ./congregate.sh migrate --skip-users --skip-project-export --skip-project-import
- Inspect the dry-run output in:data/results/dry_run_group_migration.json
data/logs/congregate.log
./congregate.sh migrate --skip-users --skip-project-export --skip-project-import --commit
Migrate Projects
Once all the users and groups and sub-groups are migrated:
- Go back to the UI, select and stage projects (either all, or in waves).
./congregate.sh search-for-staged-users
- Check output for found and NOT found users on destination.- All users should be found.
- Inspect
data/staged_users.json
if any of the NOT found users are inactive as, by default, they will not be migrated. - To explicitly remove inactive users from staged users, groups and projects run
./congregate.sh remove-inactive-users --commit
. ./congregate.sh migrate --skip-users --skip-group-export --skip-group-import
- Inspect the dry-run output in:data/results/dry_run_project_migration.json
data/logs/congregate.log
./congregate.sh migrate --skip-users --skip-group-export --skip-group-import --commit
Rollback
To remove all of the staged users, groups (w/ sub-groups) and projects on destination run:
This will delete everything that was previously staged and migrated. If a significant period of time has passed since migration, you will risk losing data added by users in the time since migration completed. There is a default timeout on rollback 24 hours from the time of migration that acts as a guard against accidental rollbacks.
./congregate.sh rollback --hard-delete
- Inspect the output in:data/logs/congregate.log
./congregate.sh rollback --commit
- For more granular rollback see Usage.
Checking the results of a migration
Lots of this section is covered in our runbooks, but an overview is provided below.
Spot checking features
- Group creation
- Project creation
- Membership
- Branches
- Commits
- Tags
- Merge requests
- User attribution
- Branch protection settings
- Merge request approver settings
Automated diff report
TODO
Migration results.json
TODO
Congregate Configuration Items
TODO
Migration Reporting
Migration reporting is typically an exercise in data gathering and normalization to determine the success of a migration. At scale this can be difficult since the sign-off of a migration is distributed to many people. Congregate can be configured to automatically create issues to gather migration sign-off agreement from application owners.
To configure this functionality, we need to add some lines to the congregate.conf from the template. Note:
- The
issue1.md
expects this file to be indata/issue_templates
pmi_project_id
is the id of the project that will contain all of the sign-off issuessubs
is a key value pair dictionary that you can use to replace specific strings from the template with customer specific info.
Once we have the issues automatically being created on ./congregate.sh migrate --commit
we can configure a stand-alone utility to poll these issues and build a csv that can be ingested by the customer's data-analyzer tool of choice.
TODO: Add info on this once we have it more complete.
Wave definition spreadsheet ingestion
The data in the stage-wave-template.csv represents the Repo/Project URLs that are in scope for a wave of migration. The required fields are:
Mandatory columns
Wave name
- Name of a migration waveSource URL
- Full URL to the source repositoryParent Path
- Destination GitLab groupfull_path
e.g. parent/group/pathOverride
- Binary value- Empty (``) - Migrate only the source repository to the destination Parent Path
- Non-empty (any string value, e.g. "yes", "x", etc.) - Migrate the entire source repository structure (e.g. parent/sub_parent/repository) to the destination Parent Path
Optional columns (suggestions)
Wave date
- To keep multiple waves in a single.csv
file and keep track of dates when they should be migratedApproval
- To keep track whether teams approve and are ready to migrate- etc.
Reporting
If the migration reporting feature is configured, there are two additional fields that are optional that will facilitate creating "sign-off" issues and assign them to application owners. The two fields are Application ID
and Application Owner Email
. These column names are mapped in the config file to the variable names that congregate expects.
To exercise this configuration, follow steps in the stage wave section.