This dataset features over 7 hours of footage over 360 videos containing natural gestures used by people when collaborating to complete a shared task. We recorded 20 pairs of people who were put in separate rooms and connected via video chat. One person (the actor) was given blocks, and the other (the signaler) was given a picture showing an arrangement of blocks. The signaler's goal was to get the actor to replicate the arrangement of blocks. On some trials, the participants could not hear each other and were forced to use gestures to communicate. Other trials included sound, so gestures were used to suppliment spoken language.