Saving and Verifying the Internet

I believe user is pointing out OP's grammar

Attached: 1465503543634.png (521x243, 136.87K)

Sounds like a personal problem to me.

But having those hashes is proof that the site possessed CP. They're not getting only the hashes from the government like major sites do. Although it doesn't prove intent to get CP and the site would delete the content immediately after hashing (two things that would make courts lenient), it would probably be enough for a CIAnigger to shut down the site.

...

Implementing the hash check client-side would keep it from ever touching the site's servers, although a court could expand the client-side code as being part of the domain of the site owner.

That's true. Although OP said they wanted some centralized way to verify the hashes. I guess it's either have a central server prone to takedown or have decentralized verification prone to poisoning. It's poz all the way down.

Another way to think about this is just like you have reproducible builds, where the resulting executable should be identical, you have reproducible page archives, one copy just has a hash in a central server for 3rd party verification. Multiple clients who archive the same page at the same time should get the same archive.

Even with a bunch of back and forth between clients(or server) to strip out differences, there's no guarantee it's going to catch all of them even with the dom walking, two clients might agree but a 3rd 4th and 5th might not. Clients could also poison eachother:
Client 1: I don't have this or it's different
Client 2(or server): Okay let's delete this then, does it match now?
Client 1: It matches now.
(deleted content was the important thing all along now intentionally deleted).
This could happen easily even with a text-only mode.

The only way to do this is going to be the simpler alternative:

The server generate's the "official" version, offers a 1 time download, and then maintains the hash for verification later. I don't think this is especially take down resistant. Offering copyrighted content for download once is the same as offering it for download a million times. If a site wanted to get butt-hurt, all it would take is 1 site with 1 kike lawyer and it's game over unless a bunch of money was dumped into legal fees trying to fight it.

Google and archive seem to be running with the excuse that they are a "direct agent of a human user" which allows them to ignore robots.txt and I guess copyright?

The question is then, is the "simpler alternative" offering 1 time download's and hash verification, worth building?

Attached: archive_excuse.png (596x274 11.96 KB, 26.25K)

a client side hash check while keeping any form of verification would require the client be checked that it wasn't altered, which sounds like DRM. The client could act as a simple proxy for the server when it grabs the page, but that doesn't sound appealing at all.

Just drop it already, it is simply not working with non-static (shit) sites. Any workaround you can come up with will be too complex.

Maybe I'll just start saving pages to .TXT files with "lynx -dump" (possibly also with -nolist, if links aren't needed).