Universal Decentralized Database

Due to the unreliable nature of information on the Internet I've been thinking about something that I haven't seen mentioned anywhere: a decentralized universal database of everything. Is there a project attempting to do such a thing?

The idea here is to fill it with data from all around. It will not be a Wikipedia though it will surely link to it quite a lot. If a datapoint (say, a URL or an article) has conflicting versions then all of those can be collected but the validation system will ensure noone can really store and have propagated to the nodes a fake quote about someone.

Basically the tenets would be along these lines:

* Mostly decentralized with some servers acting as coordinators to reduce the amount of spam/crap. If those main servers go down the network can still function.

* Many UIs both native and web easy enough for normies not to fuck up anything other than the info they pump into the DB.

* A system of reviews/validation in which all the community can rate the trustfulness of the data.

* Sharding all the DB in managable chunks. Nodes can choose how many GB of data they want to store and how much bandwidth they're willing to provide.

* Maybe use blockchain to validate new blocks of data ingested into the system.

* Hashes of all media stored on it or referenced to it.

* Autochecks for major websites. A stored tweet will automatically get compared to the one stored on Twitter's servers. Same for Facebook, Youtube, Wikipedia, etc.

Anyway, I hope you get the gist of it. Of course I'm already running into many problems such as my lack of knowledge in databases and the time it will require to create all this (not to mention that I'm almost a pajeet in programming proficiency). Scalability is also a huge issue that I have no clue how to overcome.

I know there are some technologies that will really help create this universal database such as IPFS and torrents to propagate confirmed chunks. Maybe even stuff like Storj or Ethereum to refer to in case of sabotage (i.e.: check for hashes periodically stored in Ethereum in case of conflicting versions).

The rationale behind this is I'm fucking sick of not being able to have a basic semblance of confidence every time I read something on the Internet.

I know there's stuff like Everypedia but that's not what I'm aiming for here. This is not an encyclopedia. This is a reference for all (politically) relevant datapoints that is guaranteed with a decent level of confidence not to have been tampered with since it was stored and if there's crap stored on it it was rated as such.

Attached: world database.png (512x512, 284.32K)

Other urls found in this thread:

archiveteam.org/index.php?title=INTERNETARCHIVE.BAK
twitter.com/AnonBabble

Everytime some retard "Ideaguy" like you talks about shit like this they are utterly incompetent. The people that can actually code and do practical shit could (and should be able to) think your shitty thread up in 2 seconds, it's the actually doing it that's HARD. Go grab a book and start implementing that shit yourself. Dickhead.

Which one?

Depends on what you currently know and what you want knowledge you want to end up with. Just pick one up that looks interesting, if it sucks so be it, get another one, if it's good that's good for you. All books are available for free, I still can't understand how people can consciously decide not to read when they clearly have an interest. Use the internet for fucks sake, it's probably your greatest asset at the stage you're at right now.

I'm constantly reading. And I'm aware implementing is the hard part. I'm also aware this will be a years long project. I'm also willing to pay someone for the parts I don't care about learning how to implement. But I don't really know where to start.

What kind of DB would you use? Would you go for SQL or NoSQL? The only thing that comes closely to what I'm picturing is something Apache but even there there's plenty of DB systems.

Should I learn Cassandra as a starting point? I'm already familiar with SQL but I don't think it will be that useful here. I don't want to start pumping stuff into a random MySQL DB only to realize down the road it was never the proper tool for this task.

OP everything you want can be done with IPFS, except the automatic caching of webcontent. Someone needs to make a webcrawler that crawls and stores webpages as IPFS/IPNS adresses.

We used to talk about it in the early '90s in tech before the web caught on. The idea would be that you could send out 'agents' (small pieces of code) that could look through another computer's files for desired information and report back. You'd be charged for the resources used by the agent on the machine it runs on and you'd pay the machine's owner. So you could do a global search of any complexity with enough money. Unsurprisingly, the professor I worked with on this for a bit was Jew and everything revolved around how to implement decentralized payment. You could probably dig up those old CS papers and have a look since with buttcoins we now have the missing piece.

This project will probably have IPFS as a key component but it will also contain metadata about files or resources too big to cache completely. The ratings systems -which will help decide if something is true or fake- is also not implemented in IPFS as far as I know.

But yeah, maybe what I'm thinking of can just be extended from IPFS instead of being a standalone project.


Any idea how can I look for those? Any keywords or names will be appreciated.

sounds like IPFS

You just described exactly what Freenet is.
/thread

why does it have to be universal and database? fuck you

Do you understand what a database is or am I failing in my explanations?

Please show me how can I query all of Trump quotes said between 2003 and 2008 with references to all of them in IPFS or Freenet.


It doesn't have to be a database. Open to suggestions. Anything to share?
It has to be universal because it will be open to anybody excepting malicious actors.

Easy. Have the person who initially stores the tweets create an index mapping each tweet to metadata (like timestamp). You can then use the index to query the info. It's not possible to directly query information because data is identified by only its hash. Hash based databases are key value databases.
What you really want in your database is an index of tag metadata for each file. At that point it all basically becomes a JSON datastore.

Right, but what I mean is that I can't do that right now querying IPFS because it lacks such a function. And that's the kind of functionality I'm looking to create with this universal database. As I understand it IPFS doesn't contain a queriable database of atomic datapoints related to non-automatable metadata.

I don't want to reinvent the wheel though so thanks for your input.

I just told you how to do it. All you need to do is create a metadata index of files you add.

Nigger you have no clue what you're talking about.

Bots galore.

So let's say the database you envision is complete. You have access to it and this information is what you want, all Trump quotes said between 2003 and 2008 with references to all of them. Is that the exact query you enter into your database? If not, what is? And what do you expect the database to return to you?

Wouldn't it be much easier to start from the torrent format and just extend it to work better in cases where you want to work with a local torrent without having to copy the whole thing and implement a way to authenticate versions of files that superseed each other ala packages?

I just figure out a way for OP to do that. Use IPFS to store the content. But then use >>>/hydra/ to sort the content. For example the trump quotes. You would use a giant webcrawler to obtain the data of the internet, which you would then feed into hydra which automatically tags and sorts it, such as by trump quotes, and then store it using IPFS.

Now this is a gigantic undertaking as you would have to crawl alot of pages, almost NSA levels of storage, to store the initial IPFS hash. You would have to make your own IPFS CDN, or several, based on content. Such as a IPFS CDN for trump quotes, or one for cook books, and etc. With hydra automatically sorting it all you have to do is download the hydra metadata files and look for what you want. Then download it using IPFS.

Whoops I fail. It is called >>>/hydrus/ and not hydra. The developer of hydrus is doing it as a pet project though, and it is meant to sort anything file wise. From their website
The hydrus network client is a desktop application written for Anonymous and other internet-enthusiasts who have large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a *booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but reasonably functional builds for Linux and OS X are available.Currently importable filetypes are: images - jpg, gif (including animated), png (including animated!) and bmp audio - mp3, flac, ogg and wma video - webm, mp4, mpeg, flv and wmv misc - swf, pdf, zip, rar, 7z
I am sure it supports text too since it supports pdf.

So you could webcrawl all of trump's twitter history and sort it into a tag called "trump_qoutes" for example. Then upload all that history to IPFS and upload the hydrus metadata file with it. All the user has to do is install ipfs or a browser addon for ipfs. Then install hydrus and download the hydrus database file for trump quotes.

But it gets better because you aren't limited to one tag. Say you wanted to sort trump quotes by type. You could tag it "trump_quotes" and also make a tag for "funny" and "random" and "emotional" or whatever else you want. This also sorts images too so you could sort those into the mix along with all the files hydrus supports.

You have a problem though, you need the storage space for downloading all those files with trump_quotes in them. You could sidestep this by making a program that hashes the trump_quote file, such as text or imagery, and then when a user requests the hash from IPFS it downloads it from the clearnet on the fly. That would be exchanging storage space for internet bandwidth and speed. That is ok as long as you don't have more users then your network connection can handle wanting different things at the same time.

Your big problem is getting enough people interested in "trump_quotes" to download the hydrus metadata file from ipfs and then download the quote itself. Since ipfs distributes the data you no longer have to host the quote itself, you can just network link to it as a fallback CDN. But you need people to use it on something they care about, like porn, for it to be shared in the first place.

I could see this working for very popular niches, oxymoron I know. Since they have to be technically apt or you have to bundle IPFS+hydrus+GUI in a .exe for normalfags to even use it.

You want a futuristic database, with infinite depth and instant querying?
Sounds. nice, but don't you think an app that lets you send nudes and funny pictures in in realtime is more interesting?

Thanks for all the info. I'm really happy someone has been working on it for some time now.

Here's the thing, I'm not interested in hosting video or images in this database. I'm mostly interested in text and text is easily compressible. I bet all of Trump's tweets can fit in a 7zipped floppy disk.

Those exist and are willing to host content for free.

Gonna study about hydrus now.

That larp...

Attached: 1410274399710.gif (380x285, 1.97M)

lol

"Cocksucking For Retards"

Cut him some slack, he seems willing to learn which is better than most.

I guess they first need to implement the Internet Archive to IPFS thingie: archiveteam.org/index.php?title=INTERNETARCHIVE.BAK

For the time being I think I'll focus on hydrus. It has a lot of what I want even if currently buggy or feature incomplete. Its dev also updates it a more than reasonable pace, which is great.

There's a project called "tauchain" that approximates what you are aiming at. It's not just data, but logic. It uses blockchain for consensus and to incentivize its population. I haven't checked the status of the project in years, but a quick google search shows that its been rolled into one of those generic crypto coin ICO-style websites.

Even a few years ago I saw multiple projects based around donating your node to a collective for implicit distribution. IPFS nodes are easy to control and have multiple methods of connecting peers directly or via pubsub groups, publicly via the dht and privately via libp2p sockets.

You could easily build a Freenet style distribution system around IPFS and its pubsub system.
Publish a stream of hashes with low peercounts, have nodes subscribe to this and do some evaluation to determine if they should store it themselves. Most likely based around local metrics.
Or really you could do it however you wanted.

I have a feeling that people are going to fork the filecoin project for this purpose. It would give you everything you need to coordinate storing, and verifying that data is actually being stored and distributed, but you could remove the tokens/cost and have your own metrics around choosing nodes (instead of cheapest bidder).