After a year of testing, an effort to use volunteers to transcribe historical documents and data for online use is opening wide.
Volunteers are powering an effort by the Smithsonian Institution to create online, searchable versions of its vast collections of diaries, journals, biological specimens and other historical gems. After more than a year of testing, the project came out of beta Aug. 12, with officials inviting the public to join in the massive transcription and labeling effort.
The Transcription Center attracted about 1,000 active volunteers during its testing phase, and that group has grown by more than 800 since the public launch, according to project coordinator Meghan Ferriter. Volunteers dive into a variety of projects, including transcribing texts that are often handwritten and occasionally in languages other than English. Volunteers also review submitted work before it is published.
So far, more than 13,000 transcribed pages have been produced, and several projects have been completed, including the archives of the Monuments, Fine Arts and Archives Section, popularized in the book and film "The Monuments Men," and the Charles Henry Hart autograph collection, which includes letters from notable artists and sculptors.
Some projects focus on gathering the full text of diaries, notebooks and other primary source material. Other projects, such as cataloging specimen records for the U.S. National Herbarium, focus on collecting structured data.
There's no shortage of work. Sylvia Orli, an information manager in the Department of Botany at the National Museum of Natural History, estimates that it would take about 110 years, at current rates, to digitize the 3.5 million uncataloged items in the National Herbarium's collection of 5 million specimens.
In a panel discussion about the Transcription Center at the annual meeting of the Society of American Archivists on Aug. 15, Orli said she hoped the volunteer effort would greatly improve the pace of digitizing the collection.
Ferriter leads the push to promote the Transcription Center on social media and keep in touch with volunteers, who, she said, are morphing into a self-sustaining community -- answering one another's questions and providing help via Twitter and other platforms.
Inside individual projects, volunteers can share notes on specific challenges, such as rendering marginal notes or interpreting scientific symbols. Ferriter also sees the possibility of communities of interest springing up around individual projects. For instance, a project to transcribe the diary of Earl Shaffer, the first man to walk the Appalachian Trail in one continuous hike, was completed in just two weeks thanks to a Reddit group that generated volunteers. Although that process happened organically, social media could be used to seed interest in projects.
The Transcription Center was created in Drupal, but the source code hasn't been released. Although the front-end display and the back-end collection are open source, customization was required to connect it with the Smithsonian's enterprise system. But the Smithsonian is prepared to help libraries, institutions and museums that want to launch a similar service by providing code and support in the future, Ferriter said.
NEXT STORY: What Roat's move to DOT means for FedRAMP