Well, no one can ensure that same text they found will be found again on a link. It can simply change since the last time you saw it.
Now, building an index means that after you are done you can drop the data you had and use that space for getting newer data, so you might not even have it. It's probably smart to keep the data to help the crawler be less aggressive at downloading the whole internet though, but this data is globally replicated and ready to serve, might be just in 1.5+ hard disk drives that are not publicly reachable because they are dedicated to run the indexing service, not serve end users.
Could Google do it? Yeah, sure, but they are busy doing LLM things.
True, although only if the link is actually cached, which I find to be the case far more rarely than before. But it does work and it's still helpful...
Cool to see the Internet Archive getting some official promotion from a megacorp like Google. I hope there's also proportional financial (and legal?) support.
Doesn't work for me – probably one of those things that Google rolls out over the course of months without any way of telling whether you're already in the feature flag or not.
In the meantime, I'll keep using this handy bookmarklet: https://gist.github.com/n-st/0dd03b2323e7f9acd98e (which obviously only works for pages that are still available; for others, it requires copy-paste-ing the URL).
Also, Google/IA and me seem to have very different definitions of "easy":
> [...] conduct a search on Google as usual. Next to each search result, you’ll find three dots—clicking on these will bring up the “About this Result” panel. Within this panel, select “More About This Page” to reveal a link to the Wayback Machine page for that website.
The only thing that's missing is the "Beware of the Leopard" sign.
That said, it's one full page scroll down, and that after two clicks at not-too-obvious menus. Not exactly in your face in terms of discoverability.
Hopefully this is just a first step – keeping archived sites indexed and/or automatically forwarding people to IA links once the original goes down would be amazing!
Within reason, of course; there's possibly a point where it would be a bit "too discoverable" and make people exclude their site from archiving as a general precaution. (I could see people being generally fine with being archived, but not with being google-able forever.)
Internet Archive is amazing, and vulnerable: It's a single entity in a single jurisdiction unfriendly to this kind of effort. Are there efforts to duplicate it? For the book side - and some magazines, there is libgen and such. Is there something for the web side? music, photos, software? Any current effort by Internet Archive themselves?
A quick look now at archive.org didn't find much.
A hint that "partner institutions" can maintain local copies. Some which might be "one-off" copies, not maintained up to date.
You can ask for the oldest version of any page that is on the Wayback Machine. That may or may not be the first posting, depending on the date. The further back in time you go, the more spotty the archive becomes. Late 1990s and early 2000s is quite patchy. Just enter the URL into Wayback Machine and you'll see two dates (first and last) and a timeline (all the archives versions).
I don't know about easier than ever. Easier than since google killed their own version of this that worked well until they killed it, sure.
My guess is that Google doesn't want to share the version of each page they get served to index.
Now, building an index means that after you are done you can drop the data you had and use that space for getting newer data, so you might not even have it. It's probably smart to keep the data to help the crawler be less aggressive at downloading the whole internet though, but this data is globally replicated and ready to serve, might be just in 1.5+ hard disk drives that are not publicly reachable because they are dedicated to run the indexing service, not serve end users.
Could Google do it? Yeah, sure, but they are busy doing LLM things.
In the meantime, I'll keep using this handy bookmarklet: https://gist.github.com/n-st/0dd03b2323e7f9acd98e (which obviously only works for pages that are still available; for others, it requires copy-paste-ing the URL).
Also, Google/IA and me seem to have very different definitions of "easy":
> [...] conduct a search on Google as usual. Next to each search result, you’ll find three dots—clicking on these will bring up the “About this Result” panel. Within this panel, select “More About This Page” to reveal a link to the Wayback Machine page for that website.
The only thing that's missing is the "Beware of the Leopard" sign.
That said, it's one full page scroll down, and that after two clicks at not-too-obvious menus. Not exactly in your face in terms of discoverability.
Hopefully this is just a first step – keeping archived sites indexed and/or automatically forwarding people to IA links once the original goes down would be amazing!
Within reason, of course; there's possibly a point where it would be a bit "too discoverable" and make people exclude their site from archiving as a general precaution. (I could see people being generally fine with being archived, but not with being google-able forever.)
A quick look now at archive.org didn't find much.
A hint that "partner institutions" can maintain local copies. Some which might be "one-off" copies, not maintained up to date.
There was an IA.BAK project.
There was a useful discussion 4 years ago, here:
https://old.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...
This is a step in the right direction, even though navigating the Wayback Machine often results in the tab crashing.
https://imgur.com/a/z4D8aDo