Digital Government

Smithsonian transcription project moves out of beta

Volunteers are powering an effort by the Smithsonian Institution to create online, searchable versions of its vast collections of diaries, journals, biological specimens and other historical gems. After more than a year of testing, the project came out of beta Aug. 12, with officials inviting the public to join in the massive transcription and labeling effort.

The Transcription Center attracted about 1,000 active volunteers during its testing phase, and that group has grown by more than 800 since the public launch, according to project coordinator Meghan Ferriter. Volunteers dive into a variety of projects, including transcribing texts that are often handwritten and occasionally in languages other than English. Volunteers also review submitted work before it is published.

So far, more than 13,000 transcribed pages have been produced, and several projects have been completed, including the archives of the Monuments, Fine Arts and Archives Section, popularized in the book and film "The Monuments Men," and the Charles Henry Hart autograph collection, which includes letters from notable artists and sculptors.

Some projects focus on gathering the full text of diaries, notebooks and other primary source material. Other projects, such as cataloging specimen records for the U.S. National Herbarium, focus on collecting structured data.

There's no shortage of work. Sylvia Orli, an information manager in the Department of Botany at the National Museum of Natural History, estimates that it would take about 110 years, at current rates, to digitize the 3.5 million uncataloged items in the National Herbarium's collection of 5 million specimens.

In a panel discussion about the Transcription Center at the annual meeting of the Society of American Archivists on Aug. 15, Orli said she hoped the volunteer effort would greatly improve the pace of digitizing the collection.

Ferriter leads the push to promote the Transcription Center on social media and keep in touch with volunteers, who, she said, are morphing into a self-sustaining community -- answering one another's questions and providing help via Twitter and other platforms.

Inside individual projects, volunteers can share notes on specific challenges, such as rendering marginal notes or interpreting scientific symbols. Ferriter also sees the possibility of communities of interest springing up around individual projects. For instance, a project to transcribe the diary of Earl Shaffer, the first man to walk the Appalachian Trail in one continuous hike, was completed in just two weeks thanks to a Reddit group that generated volunteers. Although that process happened organically, social media could be used to seed interest in projects.

The Transcription Center was created in Drupal, but the source code hasn't been released. Although the front-end display and the back-end collection are open source, customization was required to connect it with the Smithsonian's enterprise system. But the Smithsonian is prepared to help libraries, institutions and museums that want to launch a similar service by providing code and support in the future, Ferriter said.

About the Author

Adam Mazmanian is executive editor of FCW.

Before joining the editing team, Mazmanian was an FCW staff writer covering Congress, government-wide technology policy, health IT and the Department of Veterans Affairs. Prior to joining FCW, Mr. Mazmanian was technology correspondent for National Journal and served in a variety of editorial roles at B2B news service SmartBrief. Mazmanian started his career as an arts reporter and critic, and has contributed reviews and articles to the Washington Post, the Washington City Paper, Newsday, Architect magazine, and other publications. He was an editorial assistant and staff writer at the now-defunct New York Press and arts editor at the online network in the 1990s, and was a weekly contributor of music and film reviews to the Washington Times from 2007 to 2014.

Click here for previous articles by Mazmanian. Connect with him on Twitter at @thisismaz.

The Fed 100

Read the profiles of all this year's winners.


  • Then-presidential candidate Donald Trump at a 2016 campaign event. Image: Shutterstock

    'Buy American' order puts procurement in the spotlight

    Some IT contractors are worried that the "buy American" executive order from President Trump could squeeze key innovators out of the market.

  • OMB chief Mick Mulvaney, shown here in as a member of Congress in 2013. (Photo credit Gage Skidmore/Flickr)

    White House taps old policies for new government makeover

    New guidance from OMB advises agencies to use shared services, GWACs and federal schedules for acquisition, and to leverage IT wherever possible in restructuring plans.

  • Shutterstock image (by Everett Historical): aerial of the Pentagon.

    What DOD's next CIO will have to deal with

    It could be months before the Defense Department has a new CIO, and he or she will face a host of organizational and operational challenges from Day One

  • USAF Gen. John Hyten

    General: Cyber Command needs new platform before NSA split

    U.S. Cyber Command should be elevated to a full combatant command as soon as possible, the head of Strategic Command told Congress, but it cannot be separated from the NSA until it has its own cyber platform.

  • Image from Shutterstock.

    DLA goes virtual

    The Defense Logistics Agency is in the midst of an ambitious campaign to eliminate its IT infrastructure and transition to using exclusively shared, hosted and virtual services.

  • Fed 100 logo

    The 2017 Federal 100

    The women and men who make up this year's Fed 100 are proof positive of what one person can make possibile in federal IT. Read on to learn more about each and every winner's accomplishments.

Reader comments

Mon, Oct 26, 2015

I really excited for your Volunteers project.If you want to something for Audio Transcription Agencies for audio and video transcription services we provide free trail and also provide 24/7. Using the newest systems your computer data and communications are protected from unauthorized entry. All communications with our server FTP and Browser based are SSL encrypted guaranteeing security's very best level.

Tue, Sep 8, 2015 Burt NewYork

It statement from the audio moment, because our customers more often than not understand just how extended movie record or their sound is. You’ll obtain an estimation by if that estimation must be transformed whenever you add your saving, your Bill Government may contact one to make sure it’s okay to create that change. Below, please look for a total clarification of our transcription prices; you may also utilize our rates calculator to calculate your transcription task if you’d like.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group