Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Books United States

Librarians Are Finding Thousands of Books No Longer Protected By Copyright Law (vice.com) 11

An anonymous reader quotes a report from Motherboard: On January 1, 2023, a swath of books, films, and songs entered the public domain. The public domain is not a place -- it refers to all the creative works not protected by an intellectual property law like copyright. Creative works may not have intellectual property protections for a number of reasons. In most cases, the rights have expired or have been forfeited. Basically, no one holds the exclusive rights to these works, meaning that living artists today can sample and build off those works legally without asking anyone's permission to do so. That's why the New York Public Library (NYPL) has been reviewing the U.S. Copyright Office's official registration and renewals records for creative works whose copyrights haven't been renewed, and have thus been overlooked as part of the public domain.

The books in question were published between 1923 and 1964, before changes to U.S. copyright law removed the requirement for rights holders to renew their copyrights. According to Greg Cram, associate general counsel and director of information policy at NYPL, an initial overview of books published in that period shows that around 65 to 75 percent of rights holders opted not to renew their copyrights. "That's sort of a staggering figure," Cram told Motherboard. "That's 25 to 35 percent of books that were renewed, while the rest were not. That's interesting for me as we think about copyright policy going forward." [...]

The U.S. Copyright Office and the Internet Archive collaborate to digitize these records, and while that digitization effort has been foundational for NYPL to even be able to conduct their investigation, the digital experience isn't much different from the physical one: To navigate the records, you have to click on a picture of an antique card catalog and then sift through volumes of digitized cards without the help of Optical Character Recognition (OCR) software, which converts books into machine-readable text. Cram says that use of these tools today still requires some sort of specialized knowledge, like which drawer to open and which category to look for. Those searches can take a lot of time and produce a lot of false positives for researchers. Plus, what Cram is looking for within the records is exactly what's missing: A copyright renewal registration, or a renewal, or a registration to begin with. [trying to find absence of information]
"We started the pilot with, I think it was just around 10,000 records, and then we started to realize, okay, we can start making some rules here," said Marianne Calilhanna, vice president of marketing with DCL. "So we're able to start making these conversion rules that then we can kind of put into our automation processes to start to structure this."

"Ultimately, the output we're creating is XML," she added. "XML is a series of tags that tell the computer, this is a title of a book, this is the title of a journal article. This is the author of that. And then we would also apply extra metadata on top of that record." NYPL plans to make their XML open source for other libraries across the nation and the world to use.

"For us to advance the progress and knowledge, which is the goal of copyright, I think we need access to this data so that we can understand how to answer that question of how can I use this?" Cram noted. "Having the data helps get us closer to an answer for that question, which ultimately is the goal, to use works lawfully, in a way that advances knowledge."
This discussion has been archived. No new comments can be posted.

Librarians Are Finding Thousands of Books No Longer Protected By Copyright Law

Comments Filter:
  • by Another Random Kiwi ( 6224294 ) on Friday February 10, 2023 @09:57PM (#63283737)
    One hopes that NYPL is not generating yet another ad-hoc XML description of their data, but the article seems to be written by someone who has limited comprehension of bibliographic information standards, or the librarian they interviewed did a poor job of explaining what they were doing. BIBFRAME [loc.gov] would be the current model for bibliographic information, and like its predecessor, MARC [loc.gov] has ways of representing copyright status in bibliographic records that can be effectively shared between institutions.
  • Copyright does not protect a work. It shackles the work.

C'est magnifique, mais ce n'est pas l'Informatique. -- Bosquet [on seeing the IBM 4341]

Working...