The day after Pearl Harbor, on a Monday morning, December 8, 1941, the US Army embarked on a novel experiment in the social and behavioral sciences. In a large, dimly lit theatre at Fort Bragg, North Carolina, the private who had been put in charge explained to the 40 to 50 men seated at three- to four-seat intervals why they had been plucked from the morning roll call and ordered to report in a shroud of mystery. “You are going to be given some questions, and the same questions will be put to hundreds of other soldiers here in this [the Ninth] division,” he began. “The questions are not personal; they simply ask you what you think about a lot of things that the War Department is interested in.” As he went on to explain to the soldiers dispersed across the theatre, “[T]his type of thing has never been done in this Army, or in any other Army in the world. It is being done for the first time right here in the 9th Division, and because of this we are asking you to help, and to answer these questions just as carefully as you possibly can.” Over the course of the next three days, with the divisional commander’s approval, the Special Services Research Branch administered the very same mimeographed “planning survey” to some 1,900 men, a cross section of the Ninth Division—one soldier out of every six, selected by lot.

(Image: an excerpted page from the American Soldier survey. The soldier’s additional remarks read: “I HAVE JUST ANSWERED YOUR QUESTIONS IN THIS BOOK TO THE BEST OF MY KNOWLEDGE, SO I AM NOW LOOKING FOR A DISCHARGE, AND I DON’T SEE WHY I CAN’T HAVE OF VERY LITTLE USE TO THE U.S. ARMY. TO HELL WITH HITLER”.)

This survey led to another, then to another, and another. Over the course of the conflict, the Research Branch would administer more than 200 surveys to approximately half a million servicemembers at points across the globe, from Burma to Panama to Iceland. Questions covered everything from the reading and radio-listening habits and preferences of GIs to the quality of rations, training, and winter clothing materials to the impact on morale of integrating Black soldiers into all-white units late in the war. (As it turned out, ad hoc integration of Black replacements had the opposite effect of what was feared: it improved race relations and did so without negatively impacting performance or morale.) The War Department during World War II accomplished a feat that had, indeed, never been attempted before not only in the history of the United States but by any army anywhere. With instruments of the social and behavioral sciences—many well tested, others adapted, some invented to fit the situation and mission—the Branch created a composite portrait of a national army of “citizen-soldiers,“ some 16 million strong. Debunking Jim Crow, though significant, was just one of a number of contributions.

A team based at Virginia Tech is now working on a digital humanities and social sciences project dedicated to restoring the Research Branch’s composite portrait. At the end of the war, cabinets full of IBM punch cards that had been used for processing the survey responses went with the Branch’s director of research, Samuel Stouffer, up to Harvard, where he and other former Branch social and behavioral scientists reprocessed their data and wrote an authoritative four-volume report of their methods and findings, known collectively as The American Soldier (1949–50)—a title that itself conveys its purported comprehensiveness. These efforts were supported and staffed by the then only decades-old Social Science Research Council, through a grant from the Carnegie Corporation. (For references to this project from the SSRC’s own archives, see here.) The IBM cards themselves, representing 83 studies, remained in Stouffer’s possession until his death in 1960, when they were nearly thrown away but were fortunately instead donated to the Roper Center for Public Opinion Research, then located at Williams College.

Soldiers did more than fill out multiple-choice questionnaires. Research Branch personnel went out into the field and conducted one-on-one interviews and in a number of surveys included open-ended prompts, much as they did in that first post–Pearl Harbor survey. Question 118 was more of an invitation than a query: “If you have any remarks to add to this Survey, please write them here just as fully as you like.” The Branch collected tens of thousands of handwritten commentaries that, in 1947, were also preserved, on one set of 44 microfilm rolls—as orphaned documents, detached from their parent surveys. We need to reunite these long-orphaned commentaries with their corresponding surveys if we hope to reconstitute the Branch’s composite portrait. And to do that, we need the help of both computational tools and techniques and an army of citizen-archivists.

What makes this collection so unique and worth the effort is not only its breadth and depth but also the conditions under which the information was obtained from these hundreds of thousands of servicemembers. Namely, each of them was promised absolute anonymity. “Your officers know your names, but we don’t. We have your answer sheets, but your officers can’t see them. The soldiers were ordered not to reveal themselves by writing their names or serial numbers to ensure their anonymity.So you can be sure that no one will ever know just who filled out these 2000 questionnaires,” the private in charge promised. “They will be deliberately mixed up, so that the way the men in any particular company or battalion or regiment made their answers can not be figured out.” The soldiers were ordered not to reveal themselves by writing their names or serial numbers to ensure their anonymity. They were otherwise free to write whatever they wanted and could do so without fear of censorship or censure. And write they did. That Monday morning of December 8, only one or two men had turned in a questionnaire at the end of an hour, when it could have been completed superficially in about twenty minutes. Ten men were still writing a half hour later when the theatre had to be cleared.

To conserve space, one soldier, a 22-year-old inductee from Indiana, used fine script and filled the pages to the very edge. He was invited back to the Branch’s administrative office to complete the questionnaire, and there he remained until 2:25 p.m., long after mess call. The Indianan wrote a very well-considered 855-word response. “Unlike many of my buddies, I am very concerned with the defeat of the Axis, as I don’t think democracy has a chance of working itself out if the Axis wins in the rest of the world,” he worried. “Feeling as I do about the war it is very disheartening to see obvious faults ruining the efficiency of the Army and nullifying the efforts of so many. Many things that go on would be funny if they weren’t serious.” This Dartmouth alumnus was not one of hundreds or a few thousand servicemembers who took the opportunity afforded by these open-ended questions to share their thoughts. He was one of tens of thousands. “The negro soldier would easily be one of the best and loyalist men in the army if given a half way chance. But the way this army is working you have no chance,” wrote one disgruntled soldier. One week he was a wing assembler on the B-17; the next, he was a hospital porter—a “job that a 70 year old man or woman could do.” He concluded, “The co. commander says I will never see a gun. Do you think I feel as if I was doing anything in this war. Hell No.”

(Image: an excerpted page from the American Soldier survey. The soldier’s additional remarks read: “To Whom it May Concern; The 28th Division as a whole is run not for the soldier but for the officers. I have no grudge nor am I jealous of them, I have even some good ones but they are regular Army Men. This division is run on a, who you know, basis, if the officer doesn’t know me I haven’t a chance. The men here are all alright but have lost all respect for the Army and don’t care what they do due to the officers. All in all it adds up to anything—the men are O.K. but the officers stink. (sorry to be so vulgar).” In the margins, he has added, “This Is My True And Honest Belief And I Wish I Could Put My Name Down.”)

The very quality that makes these records unique—their anonymity—has presented our team with an equally unique challenge. Only a small portion of the 65,000 pages of microfilmed commentary contain traces of custody, in the form of a serial number or a code assigned by the Branch. These clues might help us reunite them with their parent surveys. A number of microfilmed sheets show both open-ended responses as well as one or more multiple-choice answers. There is a chance that we can deduce the pool of respondents who might have written one open-ended response or another. Still, the bulk of these handwritten open-ended responses are truly orphaned. We have only a page and a soldier’s commentary. A National Endowment for the Humanities Preservation and Access start-up grant, and advances in computational methods, has allowed our team to plan an innovative hybrid approach to reconnecting Research Branch surveys and these handwritten documents leveraging both human and artificial intelligence.

First, we need to turn those microfilmed handwritten commentaries into text-searchable documents. This past year we have worked with the National Archives and Records Administration to get all 65,000-plus images digitized. We also created a project site on the crowdsource platform Zooniverse. Zooniverse describes itself as “the world’s largest and most popular platform for people-powered research,” and one of its strongest selling points is a community with over one million registered volunteers. Zooniverse was originally built for citizen-science projects in domains like astronomy and zoology, but having built up a critical mass of users and a dedicated staff, the platform has expanded in recent years to support projects in other research areas, including the social sciences and history. On May 8, the anniversary of VE Day, we will launch our project’s transcription drive on Zooniverse with a coordinated national transcribe-a-thon.Among the latter are two significant World War I–related projects, Measuring the Anzacs and Operation War Diary, as well as the recently inaugurated African American Civil War Soldiers. The platform leverages redundant user contributions and aggregation algorithms to ensure that the quality of crowd contributions meets or exceeds what trained specialists would produce. On May 8, the anniversary of VE Day, we will launch our project’s transcription drive on Zooniverse with a coordinated national transcribe-a-thon. We want to get as many of these 65,000-plus pages of commentary transcribed as possible and estimate that this will take between one and two years to complete.

This, though, is only one of several steps. At the moment, the Research Branch’s extant codebooks only provide aggregate survey response. To reunite open-ended and multiple-choice responses, we need to unlock individual survey responses. To extract this quantitative survey data, we brought in Dr. Sallie Keller, the director of Virginia Tech’s Social and Decision Analytics Lab (SDAL). Founded in 2013, SDAL has developed world-class statistical and data science capabilities and, quite serendipitously, has established a collaborative partnership with the US Army Research Institute for the Behavioral & Social Sciences. SDAL data research scientists can now automate much of the process of extracting the survey results with customized computer code, dramatically cutting down the time that it will take to harvest the Branch’s quantitative social data. The 83 surviving studies promise to yield many thousands of responses at the individual level.

But then comes the hardest part of reuniting the quantitative and qualitative data. For some transcribed documents, we will have serial numbers, codes, and other clues that might help us reconnect an individual soldier’s open-ended response to the rest of their survey answers. But in the vast majority of cases, we are unlikely to be so lucky, which is where artificial intelligence comes into play. We will use natural language processing techniques and a topic modeling algorithm, such as the widely used Latent Dirichlet allocation, to perform a first pass on the transcribed responses. The goal is to extract salient topics from a sea of text. These AI-selected topics will then be improved and enriched using human intelligence and domain knowledge, the latter coming from social and behavioral scientists who will have access to the extracted survey data.

Our aim is not to reconnect every open-ended commentary to the rest of the multiple-choice responses. That would seem a fool’s errand. Yet we find tantalizing the prospect of simply approximating the Research Branch’s composite portrait of the largest, but quickly fading, citizen-soldier Army in the country’s history. Our end-goal is to create an open-access website that employs rich data visualizations of the survey responses and is accessible to the wider public—to educators, students, and family members of veterans—as well as to scholars, who will have access to this vast trove of wartime data—unvarnished, and uncensored.