Librarians Are Finding Thousands of Books No Longer Protected By Copyright Law (vice.com) 11
An anonymous reader quotes a report from Motherboard: On January 1, 2023, a swath of books, films, and songs entered the public domain. The public domain is not a place -- it refers to all the creative works not protected by an intellectual property law like copyright. Creative works may not have intellectual property protections for a number of reasons. In most cases, the rights have expired or have been forfeited. Basically, no one holds the exclusive rights to these works, meaning that living artists today can sample and build off those works legally without asking anyone's permission to do so. That's why the New York Public Library (NYPL) has been reviewing the U.S. Copyright Office's official registration and renewals records for creative works whose copyrights haven't been renewed, and have thus been overlooked as part of the public domain.
The books in question were published between 1923 and 1964, before changes to U.S. copyright law removed the requirement for rights holders to renew their copyrights. According to Greg Cram, associate general counsel and director of information policy at NYPL, an initial overview of books published in that period shows that around 65 to 75 percent of rights holders opted not to renew their copyrights. "That's sort of a staggering figure," Cram told Motherboard. "That's 25 to 35 percent of books that were renewed, while the rest were not. That's interesting for me as we think about copyright policy going forward." [...]
The U.S. Copyright Office and the Internet Archive collaborate to digitize these records, and while that digitization effort has been foundational for NYPL to even be able to conduct their investigation, the digital experience isn't much different from the physical one: To navigate the records, you have to click on a picture of an antique card catalog and then sift through volumes of digitized cards without the help of Optical Character Recognition (OCR) software, which converts books into machine-readable text. Cram says that use of these tools today still requires some sort of specialized knowledge, like which drawer to open and which category to look for. Those searches can take a lot of time and produce a lot of false positives for researchers. Plus, what Cram is looking for within the records is exactly what's missing: A copyright renewal registration, or a renewal, or a registration to begin with. [trying to find absence of information] "We started the pilot with, I think it was just around 10,000 records, and then we started to realize, okay, we can start making some rules here," said Marianne Calilhanna, vice president of marketing with DCL. "So we're able to start making these conversion rules that then we can kind of put into our automation processes to start to structure this."
"Ultimately, the output we're creating is XML," she added. "XML is a series of tags that tell the computer, this is a title of a book, this is the title of a journal article. This is the author of that. And then we would also apply extra metadata on top of that record." NYPL plans to make their XML open source for other libraries across the nation and the world to use.
"For us to advance the progress and knowledge, which is the goal of copyright, I think we need access to this data so that we can understand how to answer that question of how can I use this?" Cram noted. "Having the data helps get us closer to an answer for that question, which ultimately is the goal, to use works lawfully, in a way that advances knowledge."
The books in question were published between 1923 and 1964, before changes to U.S. copyright law removed the requirement for rights holders to renew their copyrights. According to Greg Cram, associate general counsel and director of information policy at NYPL, an initial overview of books published in that period shows that around 65 to 75 percent of rights holders opted not to renew their copyrights. "That's sort of a staggering figure," Cram told Motherboard. "That's 25 to 35 percent of books that were renewed, while the rest were not. That's interesting for me as we think about copyright policy going forward." [...]
The U.S. Copyright Office and the Internet Archive collaborate to digitize these records, and while that digitization effort has been foundational for NYPL to even be able to conduct their investigation, the digital experience isn't much different from the physical one: To navigate the records, you have to click on a picture of an antique card catalog and then sift through volumes of digitized cards without the help of Optical Character Recognition (OCR) software, which converts books into machine-readable text. Cram says that use of these tools today still requires some sort of specialized knowledge, like which drawer to open and which category to look for. Those searches can take a lot of time and produce a lot of false positives for researchers. Plus, what Cram is looking for within the records is exactly what's missing: A copyright renewal registration, or a renewal, or a registration to begin with. [trying to find absence of information] "We started the pilot with, I think it was just around 10,000 records, and then we started to realize, okay, we can start making some rules here," said Marianne Calilhanna, vice president of marketing with DCL. "So we're able to start making these conversion rules that then we can kind of put into our automation processes to start to structure this."
"Ultimately, the output we're creating is XML," she added. "XML is a series of tags that tell the computer, this is a title of a book, this is the title of a journal article. This is the author of that. And then we would also apply extra metadata on top of that record." NYPL plans to make their XML open source for other libraries across the nation and the world to use.
"For us to advance the progress and knowledge, which is the goal of copyright, I think we need access to this data so that we can understand how to answer that question of how can I use this?" Cram noted. "Having the data helps get us closer to an answer for that question, which ultimately is the goal, to use works lawfully, in a way that advances knowledge."
Re: (Score:1, Insightful)
"Basically, no one holds the exclusive rights to these works, meaning that living artists today can sample and build off those works legally without asking anyone's permission to do so. "
Erm, meaning anyone can copy them without permission. As in "Copy" right.
It goes on to describe how a list of those works, in a magical format called XML, was produced.
BUT NOT HOW THAT HELPS PEOPLE ACCESS THE ACTU
Re: (Score:2, Funny)
It goes on to describe how a list of those works, in a magical format called XML, was produced.
XML? Ewwww.....
Re: (Score:3)
Re: (Score:3)
Most of these books are probably already found in WorldCat, run by OCLC - they also are the backbone of interlibrary loan world-wide (ILL is my field). The XML will probably be integrated into OCLC's databases at some point to make it easier to search, but this sounds like a very lengthy project.
I expect at some point this will be easily searchable, and with it being tied into OCLC/ILL, you'll be able to borrow physical copies that way. And they'll probably be digitized progressively
Re: I disagree (Score:2)
I can testify to that too. Went there for vacation last year. It was alright, but I would say Bali is better.
Re: (Score:2)
That was my assumption: Publishing their taxonomy allows others to form queries for the NYPL to process, or to apply to their own data.
Journalist unclear on the concept(s)... (Score:3)
Copyright does not protect (Score:2)
Copyright does not protect a work. It shackles the work.