Long Luong · 8 min read · Resources

Git Workflow for Collaborative Economics Research Projects

A practical Git workflow for starting a cookiecutter research repository, deciding what to track, and syncing with coauthors when local work is still unfinished.

A practical Git workflow for starting a cookiecutter research repository, deciding what to track, and syncing with coauthors when local work is still unfinished.

Git is especially useful in economics research because a project usually changes in several places at once: data cleaning scripts, analysis code, tables, figures, and the paper itself.

This post is about a practical Git workflow for collaborative research projects. I will focus on three questions:

  • how to start a new research repository when you use a cookiecutter project template
  • how and when to add files to tracking and commit them
  • what to do each morning if you still have unfinished local work but also want to pull your collaborator’s latest changes

I assume a cookiecutter-style project with folders such as data/, src/, paper/, output/, README.md, and .gitignore. The exact folder names may differ, but the Git logic is the same.

A simple mental model

Before the commands, separate these three things:

  • your working directory: the files on your computer
  • your local repository: the Git history stored in .git/
  • your remote repository: the copy on GitHub

Most Git confusion comes from mixing these three together.

1. Start a new research project

If you use a cookiecutter template, local-first is usually the cleanest workflow. The template already creates the research folder structure, so it is better to generate the project locally and then connect it to GitHub.

  1. Create an empty repository on GitHub.
  2. Generate the project locally from your cookiecutter template.
  3. Enter the new project folder.
  4. Check whether the template already initialized Git.
  5. If not, initialize Git, add the remote, and push the first commit.
Terminal window
cookiecutter path/to/your-research-template
cd project-name
git status

If the template did not initialize Git, do this:

Terminal window
git init
git branch -M main
git remote add origin [email protected]:your-name/project-name.git
git add .
git commit -m "Initialize research project from cookiecutter template"
git push -u origin main

This is one of the few times git add . is usually fine. At the beginning, the template files belong to one logical unit: the initial project structure.

Important detail about folders

Git does not track empty folders by themselves. This matters in research projects because cookiecutter templates often create directories such as:

  • data/raw/
  • data/derived/
  • output/tables/
  • output/figures/

If you want these folders to remain in the repository even when empty, keep a placeholder file inside, such as .gitkeep or a short README.md.

2. What should be tracked in an economics research repository?

As a general rule, track files that another researcher would need in order to understand, reproduce, review, or continue the project.

Usually track these:

  • data cleaning and analysis scripts such as .do, .R, .py, or carefully maintained notebooks
  • paper source files such as .tex, .bib, markdown notes, slides, and presentations
  • README.md, codebooks, variable definitions, and workflow notes
  • project configuration such as Makefile, environment.yml, renv.lock, or package lists
  • small hand-built input files that are part of the project itself

Usually do not track these:

  • licensed or confidential raw data from sources such as WRDS, CRSP, Compustat, Orbis, or administrative records
  • large derived datasets that can be rebuilt from scripts
  • temporary logs, cache files, autosave files, and editor junk
  • secrets such as credentials, API keys, or .env files
  • compiled paper outputs such as .aux, .log, and often .pdf, unless your team explicitly wants them in Git

In economics, the important thing is usually to track the code that rebuilds the data and results, not every large file produced along the way.

3. How and when to use git add

git add does not mean “save my work.” It means “stage this change for the next commit.”

That distinction matters. A good research workflow is:

  1. edit files
  2. run git status
  3. stage only the files that belong to one research task
  4. review the staged changes
  5. commit that one logical unit

Stage by research task

In a research project, one good commit often corresponds to one clear research step.

Good examples:

  • add one sample restriction and document it in the paper
  • revise one regression specification and update the matching table note
  • add one new data cleaning step and update the codebook

Bad examples:

  • mix literature review edits with sample construction changes
  • commit several unrelated robustness checks together
  • stage every modified file just because git add . was quick

Example:

Terminal window
git status
git add src/02_build_sample.do
git add paper/data-section.tex
git add notes/sample-notes.md
git diff --staged
git commit -m "Add sample screen and document it"

If the same file contains several unrelated edits, use:

Terminal window
git add -p

That command is very useful when one do file or one paper section contains both a real change and some unrelated cleanup.

4. When should you commit?

A commit should represent one research decision or one coherent task that you can describe in one sentence.

Good commit messages:

  • Add winsorization step for firm characteristics
  • Revise baseline leverage specification
  • Document WRDS download process
  • Update Table 3 note after sample change

Weak commit messages:

  • update
  • changes
  • fix stuff

Practical rule

Commit when:

  • the change has one clear purpose
  • the files staged belong to the same research step
  • a coauthor could understand the change from the commit message

Do not wait until the end of the day if one clean unit of work is already done. Small commits are easier to review and much easier to revisit later when you are trying to remember why a result changed.

Are WIP commits okay?

Yes, especially on your own branch.

In research, a WIP commit is often safer than leaving important changes only in your working directory.

Terminal window
git add -A
git commit -m "WIP: continue event-study appendix"

You can later clean up the branch history before merging if your team prefers that.

5. A branch workflow that fits research collaboration

If you have collaborators, do not do daily research work directly on main.

A safer pattern is:

  • keep main as the stable, reproducible version of the project
  • create one branch for one task
  • merge back only when the task is in good shape

Example branch names:

  • feature/sample-construction
  • feature/table-4-revision
  • feature/new-robustness-check
  • feature/lit-review-update

This structure is useful because research tasks are often naturally separated. One branch may be about data cleaning, another about a new identification test, and another about paper writing.

6. What to do each morning if you still have unfinished local work

This is the situation that causes most Git stress:

  • you still have local unfinished work
  • your collaborator may have pushed new changes
  • you want the latest remote updates without losing your own work

My practical workflow is below.

Step 1: inspect your current state

Terminal window
git status -sb
git branch

You need to know:

  • which branch you are on
  • whether you have uncommitted changes

Step 2: protect your unfinished work

If your changes are already somewhat coherent, make a WIP commit on your branch:

Terminal window
git add src/03_analysis.do
git add paper/results.tex
git commit -m "WIP: continue baseline specification revision"

If the changes are too messy for even a temporary commit, stash them:

Terminal window
git stash push -u -m "wip before morning sync"

I would use stash as short-term parking, not as long-term storage.

Step 3: fetch remote changes without touching your files yet

Terminal window
git fetch origin

This downloads the latest remote history but does not merge anything into your working tree.

If you want to see whether main changed:

Terminal window
git log --oneline main..origin/main

Step 4: update your local main

Terminal window
git switch main
git pull --rebase origin main

Now your local main is aligned with the latest shared version.

Step 5: bring the new main into your research branch

Terminal window
git switch feature/table-4-revision
git rebase main

If your team prefers merge commits, use:

Terminal window
git merge main

For a personal research branch, I usually prefer rebase because it keeps the branch history easier to read.

Step 6: restore your local work if you used stash

Terminal window
git stash pop

Then resolve any conflicts.

Step 7: rerun the relevant research pipeline

This last step matters a lot in research. Pulling your collaborator’s changes is not enough. You also need to check whether the updated code changes your results or paper text.

After syncing, rerun the relevant pieces of the project:

  • rebuild the affected dataset
  • rerun the relevant regression or estimation script
  • regenerate the affected tables or figures
  • check whether the manuscript still matches the outputs

In research, reproducibility is part of the Git workflow.

7. The short version of the daily collaborator workflow

If I have unfinished local work but want the latest collaborator changes, my default sequence is:

Terminal window
git status -sb
git add src/03_analysis.do paper/results.tex
git commit -m "WIP: checkpoint before sync"
git fetch origin
git switch main
git pull --rebase origin main
git switch feature/table-4-revision
git rebase main

If I do not want a WIP commit, I replace the commit step with:

Terminal window
git stash push -u -m "wip before sync"

and later:

Terminal window
git stash pop

Then I rerun the relevant scripts and check the outputs.

8. Habits that prevent Git pain in research projects

  • Keep main reproducible.
  • Pull or fetch early in the day before doing several more hours of work.
  • Track scripts, documentation, and paper source files more carefully than generated outputs.
  • If a code change affects a table or figure, update the manuscript notes close to the same commit when possible.
  • Avoid leaving large uncommitted changes in your working directory for too long.
  • Write commit messages that explain the research step, not just the file change.

Final thought

For collaborative economics research, the goal of Git is not just version control. It is to keep your project understandable and reproducible while several people are editing scripts, data notes, and paper files at the same time.

If you build the habit of checking git status, committing small research tasks, and syncing main carefully before continuing your own branch, Git becomes much easier to manage.

Back to Blog

Related Posts

View All Posts »