Symantic Filesystem

I'm working on a FUSE filesystem that represents file tags as dynamically
nested directories. It's heavily inspired by Tagsistant. I would have been
happy with Tagsistant if it didn't segfault and fill my console with
errors every time I access the mounted directory.

The idea is that you can tag a file by copying it to a path with tags in it.
[code]
$ cp goblin_slayer_s1e1.mkv mnt/mkv/anime/rape/@
[\code]

You can get all files with the given tags by listing in a path with tags in it.
[code]
$ ls mnt/anime/tiddies/@
[\code]

My filesystem can only read files and copy from outside the filesystem. I plan
to get the following features in someday:

* copy from one tag to another
* mody files
* delete files or tags
* ontological reasoning (anime is show)
* triple (machine) tags (show:episode:1)
* negate tags (ls mnt/anime/-gayshit/@)
* or-ing tags (ls mnt/anime/ass/+/anime/tiddies/@)
* multithreading

Tags are stored in an SQLite database with file hashes. Files are stored in
the Tagfs home directory in subdirectories named after their hash. Files may
have multiple names, which are stored as hardlinks to the original data file.
The reason I store the names as hardlinks is so they can be easily copied out
of the database if it ever becomes corrupted.

[code]
$ tree ./local/share/tagfs
.local/share/tagfs
├── 25
│   └── c
│   └── fe17c3006dd3f5d5f46a8a54d699c
│   ├── data
│   └── names
│   └── spooked_cat.gif
├── 99
│   └── 4
│   └── d9b168731bf38e93de20ead7600cb
│   ├── data
│   └── names
│   └── wiggle_cat.gif
└── tagfs.sqlite3

8 directories, 5 files
[\code]

Is this something you'd use? Have any suggestions or code review?

Here's a link to my GitLab repo.
gitlab.com/elitist_neckbeard/tagfs

Here's a link to Tagsistant which inspired my project.
tagsistant.net/

Attached: cfcf34be6705583f6f675200049760.png (776x795, 933.92K)

Other urls found in this thread:

gnu.org/licenses/license-list.html
github.com/oniony/TMSU
blake2.net/
en.wikipedia.org/wiki/Collision_resistance
en.wikipedia.org/wiki/Birthday_attack
twitter.com/SFWRedditImages

I fucked up my code markup.

Attached: 2253df3dc81bed1e7270c9bdb65f55.gif (500x281, 1016.03K)

You had me until I saw that it was GPL3 cancer. Public domain that shit faggot.

Write and license your own code, faggot.

I don't know my licenses that well. I know that GPL requires that the code always be open source and anything that uses the code must also be open source. That seems to give more freedom to the end user.

The MIT and BSD licenses allowed Intel to hide MINIX in all their processors, and Mac to make OSX from BSD.

I'm not opposed to changing, but I'd like to hear what others have to say about it.

I agree. And following the same train none of us should complain about Linux or FreeBSD getting a CoC.

Gno.

Public domain isn't a license that applies correctly for software. It's even worse than MIT/BSD.
You're probably a political/religious npc who's following the sqlite license because of it's coc.


GPLv3 means basically that:
-You have to distribute source code if you give a binary, you can distribute the source code in any form you want, paper, cd, etcetera....
Example: if you distribute your binary to a company you have to share it to them in any form you want but you aren't forced to make it public, if you put a binary on the internet in a publicly available space then you have to share the source code publicly, when requested, in any form.
-You can't enforce tivoization, meaning that you can't stop people from replacing software that was integrated into hardware (aka if there's a signature people must be able to replace it or remove it).
-People can read, edit, compile and execute the source code like they want.
The GPL gives the same freedoms to users and developers (which are users too btw) thus none of them can be technically stuck because of juridical inanities.
Besides special advertisement BSD, the MIT and BSD are alike, people can do what ever they want even restrict other users if they want, it give to developers and companies the right to restrict users.
I hope this helped you, if you have questions just ask I'll gladly respond to them.
All the major licenses are compatible with the GPLv3 (and exceptions can be made if developers explicitly states it) only minor licenses like the openwatcom are not compatible.
See: gnu.org/licenses/license-list.html


Implying that we are forbidden to talk about it.

Why did I think I could make an OP on Zig Forums without getting flooded with bikeshedding? Is there anywhere on the internet I can discuss Zig Forums without getting the tranny CoCk or this gay shit?

Attached: 620fe5872aecfcf0d350ad315d06124e03b92d7c38f124abe7234d9d259389cd-pol.jpg (640x432, 35.22K)

GPL users are cucks, BSD developers are cucks. It is that simple. Write proprietary, use public domain, to be the least cucked.

So op going over your list there you have implemented exactly nothing. Just another Zig Forums project that will never amount to anything. Even cuckchan has get tox going slowly, nothing started here ever gets done just durr durr look at me I wrote the readme.

Reads like shit. I rather interface directly with pgsql and table the tags. Plus, daemonizing might be the optimal choice since you could interface it with any other program on your system, including your FS.

Fuck off SQL shill.

Currently, files can be written to tags and queried by tag. That's enough to host an FTP server with tags for directories. Deletion and copying between tags shouldn't take me long, and then it will have all its basic functionally, and I can focus on stuff like ontology and triple tags.

bloat, can be done way easier OP tbh.
But thanks for the idea, I'll try to improve it.

I mean after all, that's the power of open source.

Post a link here when you're done.

nice
epic, can i delete a hard link to delete the file or do i have to interact with the database and do sql magic (fine by me but not fine for retards)

You're posting cartoons on a gook message board run by some weirdo freemason who stole the board from a dying cripple. WHAT DID YOU EXPECT?

OP, have you checked out TMSU? It's basically what you just described here, except the file system is read-only and it's all symbolic links.

Your brain must be on acid.
I just setup a single read only user on pgsql and be done. If I need to, I just setup nginx and modulate from there. Making a simple *booru and done.

You'd need to delete the file hash from the database, then delete the whole directory that all the hardlinks are in for that file.

I'm not sure how I want to implement deletion yet, which is why I haven't finished it. I have some ideas you can add to.

* When a file is removed from all tags, it's deleted from the database.
* Files can't be removed from the filesystem normally. An admin has to use a tool that cleans files without tags, and can back them up
* Files can be moved to a special "trash" tag that either deletes them immediately, or marks them for deletion by the admin tool.
* All files are in a special "all" tag, deleting the from that tag deletes them for real.

Any suggestions?

If you're going to try deleting things now, remember to use the foreign key pragma in SQLite, since that dbms doesn't cascade delete by default. Just remove the hash from the file table, and all its tag entries will be deleted too.


I haven't heard of that. Is that the other thread here if the user that made a Booru accidentally? I think the symlinks in directories method isn't scalable. It requires a file write for each tag a file has, and a tag with many files will be slow to search through. See the directory tree I posted in OP? The first 3 characters in the file hash are directories, so there's never more than 255 entries in the root directory.

elaborate or stop posting

Ah blooblooobloo, muh ancap rice.


Don't listen to the suckless faggot, GPLv3 is what you want.
If anyone wants to see the end result of too permissive license look at the bitching of Antirez; "OH GOD I SAID THEY COULD DO ANYTHING AND THEY DID, THEY'RE NOT EVEN CONTRIBUTING BACK!".
GPL solves this problem.

Sure will do in some days since I got school shit to do but I think you could tag your files by just organizing them in folders.

see my response here

There's scalability issues to simply putting symlinks in directories. If you look at the overhead, an SQL database is less bloated than symlinks in directories, albeit more complicated. There's less reads and writes, and listing a directory with too many files in it is slower than querying a database with that many entries. I also can't imagine how to implement ontology or triple tags when using symlinks in directories.

TMSU's backend is sqlite3, and the symlinks are generated on the fly when you browse the virtual file system. An OK workflow is:

1) Download file into some directory structure
2) Use a `tmsu tag myfile tag1 tag2 tag3` command to tag the file
3) When you want to find your file, you navigate to `/path/to/tmsu-vfs-mount/queries/"(tag1 or tag2) tag3 not tag4"` and you see a bunch of symlinks to relevant files.

You can even tell TMSU that tag1 implies tag2. The only real drawback imo is that you have to manually tag files using commands (there's no writing to the virtual file system).

Basically just consider hacking on github.com/oniony/TMSU instead of rewriting everything from scratch. It's written in golang, and it works on Alpine as long as you remove GNU specific extentions from the makefile.

Start over from scratch, in C, or fork an existing simple FS (like OpenBSD's FFS) and toy with the file table code instead, and use the BSD two-clause.
On second thought, you don't seem ready. Good luck nooblet!

sqlite is the least pozzed and most highly tested DB ever. wtf do you have against

It's so reliable the US military uses it, it's an actually succesful FOSS project that puts Linux to shame, it has the best license I have ever seen, and the entire dev team is both politically sane and gifted with a good sense of humor (see their CoC).

blake2.net/


he is probably an atheist

Blake2 is interesting but SHA-1 is more tested, in any case if this wants to be anything more than a toy project it's clear hashes alone won't cut it in the long term.

Python was made to integrate native components, and that's how I'm using it. All the heavy computing is done by sqlite3 and I'm sure the hashing function is also written in C.


I'll take a look at it. I want to avoid collisions though.


What else do you think I'll need?

tech thread, sage

Actual retard spotted. How is it relevant that SHA-1 is more tested? You are aware that OP uses MD5 as a non-cryptographic checksum, right?
Also SHA-1 is broken, slow, and bloated. There is absolutely no reason to ever use SHA-1.


? Blake2 is a cryptographic hash function. So collision resistance is a necessary property of it.
I posted Blake2 because it is faster than MD5. It is in python's std since 3.6.

How is it NOT relevant when you are using hashes and want to avoid collisions?
More tested without being broken= collisions are harder to create, intentionally or not.
Md5 is broken so worse than both, of course, and I would personally reccomend SHA-256 for the reasons that made Hydrus choose it.
It's a hash you dumb fuck, there's no bloat.


A way to distinguish between files with the same hash.
Even if collisions are incredibly unlikely now, eventually hashes get broken and I/O failing because of them is unacceptable.
I would suggest something such as hash plus incrementing ID for files with the same hash, that way there is no noticeable perf loss in most cases (when hash lookup finds only one file or zero files) and a collision won't fuck up the system in weird ways.

Wew lad you are retarded.
Kill yourself you dumb nigger.
wew fucking lad. Now we truly have entered NaN levels of IQ.

not op but this is actually how every fucking hash table works. they use intentionally weak very very fast hashes and then compare the value when they have to.

Sqlite is a datastore, not a database, lol!

NIST, nuff said.

Only 1 cipher, and depending on compromised hardware.

I wonder what the probability of collision with >= 128 bit hashes is...


Blake2 b and s are both faster than MD5 on all hardware.

you glow too much

Not them, but using more than one cipher to test files has always been the standard. I still laugh at people that publish MD5&SHA1 on their release .checksums.
Variety is key, signing is proper.

What if I specialized the filesystem for images, audio, and video, and used perceptual hashing like suggests in another thread?

(I just realized I got Satan trips.)
I've decided that files can be deleted by writing them to a DELETE tag. This seems to be least confusing or clunky way to handle deletion. That's what I'll be working on next.


TMSU looks ideal for running locally, but adding files remotely would require an SSH session.

...

All of it: you are trying to yield everything replicable. Every step must be duplicable. If I use ZFS, UFS, RFS, etc., how do you get another peer to replicate.

I've been ignoring that thread exactly because Autobooru + Illustration2Vec has existed for 3 years now, and Spic/tech/ love to reinvent wheels for the nth time. Phproject when Haskell already does httpds with functional monads? What year is this, 1998?

-Inf IQ


>>>/reddit/


LMAO. The absolute state of Zig Forums.
en.wikipedia.org/wiki/Collision_resistance
en.wikipedia.org/wiki/Birthday_attack
>

This larper faggot still thinks SHA256 is the only hash and that everything has the same properties it does.

stop calling things "semantic". a heirarchical filesystem is no less semantic than a tagged one. you can tag a file as gay or you can make a folder called /gayshit. i'm not saying tagged systems aren't better, they're just not "semantic"

gentoo uses blake2 now, you're argument is invalid

can't wait to use NortonFS

I never mentioned SHA256. I mentioned Blake2.
Also btw collision resistance is by definition a property of every cryptographic hash function. If a hash function isn't collision resistant, like MD5 and SHA1, then it isn't cryptographic.
Kill yourself, LARPer.

Was interested until this post. I really don't want terms and conditions on a filesystem.


Is it faster than xxhash yet?
Does it really need to be cryptographic for a tag based, read only filesystem where speed is probably paramount?

Faggot you think you can backpedal now? We are talking about hash functions as they are useful for the data structure that is a file system. Op specifically talked about using faster hash functions that have higher collision (which is very very common in fast data structures) and then checking for it when needed.

collision chance is too high


You fucking nigger. How can you be so fucking retarded? I'm not backtracking, you are just too stupid to read properly.
Absolute rubbish. See
Have you even tried reading the source code OP posted? Probably not because you are LARPing too hard.
Also how do you think collision resolution works in "fast data structures"? You first compare the hash, then the values. Now what happens when the values are big and on a hard drive instead of in memory?
You absolutely do not want to have hash collisions in a hash addressed filesystem.

I've added deleting tags and hardlinking files to new tags since I posted last. Since an user showed me TMSU, I see it does nearly everything I need. All I wanted to do was tag my files and browse them in a filesystem with my other software, and share them on a LAN with FTP. So, I may not continue work on this project, but I might finish it to amuse myself. TMSU's virtual filesystem is read only, so remote tagging through FTP is something mine does that TMSU can't. Maybe I could start over by simply wrapping my filesystem around TMSU, so writing a file to a tag uses TMSU to tag it.

I think adding ontology will make it semantic. I've seen the term "semantic data" used for metadata added to content so that a program can easily sort through it. I admit that I don't know the full implications of calling something semantic.

Attached: serveimage (1).jpeg (1178x1600, 889.55K)