Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI 30

Posted by BeauHD on Tuesday August 12, 2025 @07:20PM from the two-way-street dept.

An anonymous reader quotes a report from NPR: Boston Public Library, one of the oldest and largest public library systems in the country, is launching a project this summer with OpenAI and Harvard Law School to make its trove of historically significant government documents more accessible to the public. The documents date back to the early 1800s and include oral histories, congressional reports and surveys of different industries and communities. "It really is an incredible repository of primary source materials covering the whole history of the United States as it has been expressed through government publications," said Jessica Chapel, the Boston Public Library's chief of digital and online services. Currently, members of the public who want to access these documents must show up in person. The project will enhance the metadata of each document and will enable users to search and cross-reference entire texts from anywhere in the world. Chapel said Boston Public Library plans to digitize 5,000 documents by the end of the year, and if all goes well, grow the project from there. Because of this historic collection's massive size and fragility, getting to this goal is a daunting process. Every item has to be run through a scanner by hand. It takes about an hour to do 300-400 pages.

Harvard University said it could help. Researchers at the Harvard Law School Library's Institutional Data Initiative are working with libraries, museums and archives on a number of fronts, including training new AI models to help libraries enhance the searchability of their collections. AI companies help fund these efforts, and in return get to train their large language models on high-quality materials that are out of copyright and therefore less likely to lead to lawsuits. "Having information institutions like libraries involved in building a sustainable data ecosystem for AI is critical, because it not just improves the amount of data we have available, it improves the quality of the data and our understanding of what's in it," said Burton Davis, vice president of Microsoft's intellectual property group. [...] OpenAI is helping Boston Public Library cover such costs as scanning and project management. The tech company does not have exclusive rights to the digitized data.

Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 30 Comments Log In/Create an Account

Comments Filter:

I heard this story on NPR today.. (Score:4, Insightful)

by ndsurvivor ( 891239 ) writes: on Tuesday August 12, 2025 @07:31PM (#65586152) Journal

It seems like a decent use of AI, and of the profits that the companies are generating from AI. Seems like a win... win..

- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Yes, finally a sensible use is being reported here. LLMs are pretty good at summation and categorization. I have a pretty big media library I'm thinking of using local models to categorize. Since it doesn't much matter how long it takes, I can underclock everything and let it just plod along.
  - Re: (Score:2)
    
    by ambrandt12 ( 6486220 ) writes:
    
    Yes, if it can be a thing (like Windows did for the Media Center thing).
    The whole ball of wax for Media Center is a whole thing.
Better hurry (Score:4, Informative)

by quonset ( 4839537 ) writes: on Tuesday August 12, 2025 @07:39PM (#65586174)

Before the fascist goon makes them rewrite history to fit his agenda [cbsnews.com].

- Re: (Score:2)
  
  by ndsurvivor ( 891239 ) writes:
  
  eh, so Trump re-writes history, and fakes the GDP, inflation, and jobs numbers for awhile. I am still optimistic that in a year and a half, the other party will get control of the house, and I think Trump will be "toast" then. Then we are looking at 2 years of paralysis... then maybe someone better will come along that can move the US forward, instead of backwards.
  - Re: (Score:2)
    
    by ambrandt12 ( 6486220 ) writes:
    
    (Offtopic)
    Of course... that's if the 'other party' has a worthy candidate.
    (On topic)
    Why does this need AI? I could see maybe converting a scanned page or book to a text file that a computer can search through... but Adobe has been able to do that for quite a while (OCR).
    "AI" is not the answer to everything. Humans (last I checked, anyway) are capable of reading books and researching stuff.
    - Re: (Score:2)
      
      by ndsurvivor ( 891239 ) writes:
      
      (On Topic) I find that AI has a particular ability to summarize information, and an ability to answer queries about information. Of course they "hallucinate", and are wrong sometimes. I think it is important to drill down, ask questions about the sources, and double check them, but it is much faster than reading all the books yourself, and the human mind misses things, or "hallucinates" too.
      - Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Of course it does... it removes the extraneous crap and gives you a few highlights... what about the in-between information?
        My mind has never hallucinated anything... the closest approximation would be dreams. Not that any medical establishment has any clue about the whole mind/brain thing... the 'soul' is still a mystery (even though people lose a couple dozen grams when they die... they don't know why).
        If I want to know something (say 'Chernobyl'), I'll read the first article that seems like it has info,
        
        Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        "Can "AI" make me a 3D model to take to the resin printer for a specific 'thing'? Can it write a book for me? Can it edit a whole video for me? If 'no... what is it's purpose?"...
        I would say yes. People are spitting out AI books and flooding the market with them. They may or may not be able to make a 3D model yet, I don't know. I know they can do the basics of SW programming well, and their syntax is almost perfect. It can edit or even create a video based on specific verbal instructions for anybo
        
        Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Yeah... and, are those "AI" books worth taking 5 minutes to glance at? I would much rather read a good book written by Michael Crichton (or whoever), than something written "in the style of Crichton".
        What about the potential authors (the humans) who are trying to be authors? Should they lose their livelihood because of AI? What about the people who write movies or even act in movies? Should they be unemployed because of AI?
        "AI" can't make the 3D model for me because it can't understand that I want to ma
        
        Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        I hear you, and I respect your point of view. I think a lot of our thoughts align.
        
        Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Thank you... that's rare on here.
        The "AI" rush is ignoring the fact of what it actually is... it's the same thing that your phone does when you text someone... predictive text.
        If you stuff it (the hardware and crap) in a closet without 'net connectivity... can it solve the problem of how to get out?
        If it can't (consistently... run the test a dozen times), is it intelligent?
        Sure... if I give it access to my Arduino code stores, it can reference the code pile and make something following the standards.
        If I as
        
        Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        I hove no delusions about "AI". I tried to get it to write code for me on the RP2040 using their PIO's so I could get 8 channels of PWM waveforms with 8ns precision. The RP2040 has PWM generators, and I looked into them for this purpose, but it seemed to not do the job. I got the AI to write the code to interface between the serial port and my computer, tried to get it to write the other stuff. Ultimately, it wrote me a skeleton and I did the rest. It did accelerate my time, I think, but ultimately
        
        Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Y'know... you could just write the code.
        Funny how that works... the shit (can I use that here?) that is supposed to help us ends up holding us back.
        
        Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        well, I started out my career learning machine code, and microcomputer architecture, then I learned LabVIEW, and did that about 30 years, then was tasked to do things with precision with a low cost. The RP2040 is an extraordinary processor. Looking at the spec sheet I knew it could do it. I did not know the python syntax that is popular with the processor, but damn if it does not look a lot like BASIC!. So I had the AI(s) write the skeleton, and do the syntax, and I tweaked it to make it work.
        
        Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Yeah, I could do that sitting here listening to Tatu
        What advantage does the "AI" give you?
        I'm not gonna do a full standoff because there's no real-world thing.
        We'd have to sync watches and crap... my watch sets itself to the atomic clock... a Casio PAW-1500
        If anyone musses with the door (without saying anything)... I'm up (without the glasses), and have the katana in hand
    - Re: (Score:2)
      
      by nightflameauto ( 6607976 ) writes:
      
      (Offtopic) Of course... that's if the 'other party' has a worthy candidate.
      (On topic) Why does this need AI? I could see maybe converting a scanned page or book to a text file that a computer can search through... but Adobe has been able to do that for quite a while (OCR). "AI" is not the answer to everything. Humans (last I checked, anyway) are capable of reading books and researching stuff.
      Simple search *can* work, if you know precisely what set of words you're looking for. AI enabled search works well when it comes to documents. I have a PrivateGPT instance at home running on a not particularly high-spec machine for finding info in my fictional universe, currently somewhere around 400k words. Instead of having to remember precise word patterns, I can just ask, "What rank was $character on $date" or "what color eyes does $character" have or "what was the name of the captain on $shipname in th
      - Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        So... why do you need a LLM indexing your Google stories folder?
        I know what's in my stories folder, and which has what... I don't need to launch a search across all 8 drives on the tower to find one thing.
        It's not "AI"... it's predictive text (it's the same thing your phone does when you text someone... just on a bigger scale).
        It can be useful for certain things... but it's not going to replace "remembering where that file is" anytime soon.
        
        Re: (Score:3)
        
        by nightflameauto ( 6607976 ) writes:
        
        So... why do you need a LLM indexing your Google stories folder?
        -1 don't use google anything for my written work. It's a private, local-only LLM.
        I know what's in my stories folder, and which has what... I don't need to launch a search across all 8 drives on the tower to find one thing.
        So do I, in theory. But I can't tell you the number of times I've spent half an hour or more trying to track down precisely where in that massive tome of documents I mentioned this fact or that fact, when a machine-enabled search can find it in seconds.
        It's not "AI"... it's predictive text (it's the same thing your phone does when you text someone... just on a bigger scale).
        I think we're well past the point where we can argue about the definition of AI. Words no longer mean things. It's marketing labels that mean things.
        It can be useful for certain things... but it's not going to replace "remembering where that file is" anytime soon.
        Actually, that's PRECISELY wh
  - Re: (Score:2)
    
    by nightflameauto ( 6607976 ) writes:
    
    eh, so Trump re-writes history, and fakes the GDP, inflation, and jobs numbers for awhile. I am still optimistic that in a year and a half, the other party will get control of the house, and I think Trump will be "toast" then. Then we are looking at 2 years of paralysis... then maybe someone better will come along that can move the US forward, instead of backwards.
    Forty years of stalemate/backwards/stalemate/backwards and you have hope? I admire your optimism, but I think the best we can hope for is a return of stalemate.
Who will own the results? (Score:2)

by david.emery ( 127135 ) writes:

Of course, all those books Google digitized are now behind Google's login.
- Re: (Score:2)
  
  by ambrandt12 ( 6486220 ) writes:
  
  Well... it's Google or a competitor... can't think of a competitor other then M$
  It depends on what that scanned book/article is behind... depending on copyright, it should (maybe has to) be available to public.
  Whether it's Google or ArsTechnica or whoever... the actual thing needs to be no excuse public, especially for rare stuff.
  Paywalled articles about new drugs or science papers shouldn't exist... maybe do a "publish the article, but contact info is paid-for" or something.
  For the people "complaining" abo
I have approximate knowledge of many things (Score:1)

by Won Ton Hammer ( 556262 ) writes:

Oh wonderful. In exchange for company access to the entire library catalogue, the public gets a shitty semantic search engine that can only work by probabilistically stringing words together. This will do nothing for people who need factual information.
- Re: (Score:2)
  
  by ndsurvivor ( 891239 ) writes:
  
  I think a researcher could ask the AI interesting questions, and ask for a direct quote.. and then get it. The AI should be able to give the page and paragraph number of the book. It can then be verified. This is the outcome I imagine.
  - Re: (Score:2)
    
    by ambrandt12 ( 6486220 ) writes:
    
    Sure... you can ask the "AI" to directly quote an ArsTechnica article... and it'll give you what you want. You entirely could go to the site itself and read the article.... what did the "AI" do for you?
    It's a very basic search engine, that replies using (more or less) "plain text", and I'm sure there's a lot of censorship and filtering built into it.
    I can type something into a search engine and review the results, and click on the closest relative to what I wanted to find.
    Of course, I can find my article a
    - Re: (Score:3)
      
      by ndsurvivor ( 891239 ) writes:
      
      I think that you are boiling down what the current state of AI is: a search engine. That is what the point of the article is. In the future, I am hoping that it can do an emulation of understanding physics and material science, and can actually design rockets, computer chips, and robots.
      - Re: (Score:2)
        
        by ambrandt12 ( 6486220 ) writes:
        
        Let me boil it down for ya... the entirety of any of them is relying on search engines (which kind of defeats the whole purpose)
        Why do you need it to do emulations of physics and material science crap... you can't be bothered to do that stuff on your own?
      - Re: (Score:1)
        
        by Won Ton Hammer ( 556262 ) writes:
        
        If it's an LLM powered search engine, then it's strictly worse than what we used to have because it doesn't return information, it returns a statistically likely arrangement of words. Hallucinations are how LLMs work: mashing together words/tokens in its training set to give you a series of words that are likely to occur. "Training" is just a cover, the industry cannot solve a problem that is the foundation of how the system works.
more important (Score:2)

by groobly ( 6155920 ) writes:

The most important legal documents, however, are the annotated decisions, explaining precedent and interpretation, that are very expensive to buy, only available in law libraries, which are closed to the public.
- Re: (Score:2)
  
  by ambrandt12 ( 6486220 ) writes:
  
  That would be a good use, because I don't wanna have to read a 500-page law document just to find that "someone VS someone" relied on some obscure law referenced in some other case 50 years ago... a version that gives the highlights of the first case, including mentions of the 'law referenced' (so you could try and look that up if you were so inclined) would be a good thing, instead of paying a bunch to buy the file and finding the information you want is actually in a different one.
  I doubt that'll happen,

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI 30

Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI More Login

Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI

I heard this story on NPR today.. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Better hurry (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Who will own the results? (Score:2)

Re: (Score:2)

I have approximate knowledge of many things (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

more important (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot