Symantic Filesystem

Ethan Foster

I'm working on a FUSE filesystem that represents file tags as dynamically
nested directories. It's heavily inspired by Tagsistant. I would have been
happy with Tagsistant if it didn't segfault and fill my console with
errors every time I access the mounted directory.

The idea is that you can tag a file by copying it to a path with tags in it.
[code]
$ cp goblin_slayer_s1e1.mkv mnt/mkv/anime/rape/@
[\code]

You can get all files with the given tags by listing in a path with tags in it.
[code]
$ ls mnt/anime/tiddies/@
[\code]

My filesystem can only read files and copy from outside the filesystem. I plan
to get the following features in someday:

* copy from one tag to another
* mody files
* delete files or tags
* ontological reasoning (anime is show)
* triple (machine) tags (show:episode:1)
* negate tags (ls mnt/anime/-gayshit/@)
* or-ing tags (ls mnt/anime/ass/+/anime/tiddies/@)
* multithreading

Tags are stored in an SQLite database with file hashes. Files are stored in
the Tagfs home directory in subdirectories named after their hash. Files may
have multiple names, which are stored as hardlinks to the original data file.
The reason I store the names as hardlinks is so they can be easily copied out
of the database if it ever becomes corrupted.

[code]
$ tree ./local/share/tagfs
.local/share/tagfs
├── 25
│   └── c
│   └── fe17c3006dd3f5d5f46a8a54d699c
│   ├── data
│   └── names
│   └── spooked_cat.gif
├── 99
│   └── 4
│   └── d9b168731bf38e93de20ead7600cb
│   ├── data
│   └── names
│   └── wiggle_cat.gif
└── tagfs.sqlite3

8 directories, 5 files
[\code]

Is this something you'd use? Have any suggestions or code review?

Here's a link to my GitLab repo.
gitlab.com/elitist_neckbeard/tagfs

Here's a link to Tagsistant which inspired my project.
tagsistant.net/

Attached: cfcf34be6705583f6f675200049760.png (776x795, 933.92K)

October 22, 2018 - 02:33

Other urls found in this thread:

gnu.org/licenses/license-list.html
github.com/oniony/TMSU
blake2.net/
en.wikipedia.org/wiki/Collision_resistance
en.wikipedia.org/wiki/Birthday_attack
twitter.com/SFWRedditImages

Dylan Butler

I fucked up my code markup.

Attached: 2253df3dc81bed1e7270c9bdb65f55.gif (500x281, 1016.03K)

October 22, 2018 - 02:50

Josiah Perez

You had me until I saw that it was GPL3 cancer. Public domain that shit faggot.

October 22, 2018 - 03:08

Nicholas Green

Write and license your own code, faggot.

October 22, 2018 - 03:17

Christian Morales

I don't know my licenses that well. I know that GPL requires that the code always be open source and anything that uses the code must also be open source. That seems to give more freedom to the end user.

The MIT and BSD licenses allowed Intel to hide MINIX in all their processors, and Mac to make OSX from BSD.

I'm not opposed to changing, but I'd like to hear what others have to say about it.

October 22, 2018 - 03:19

Brandon Powell

I agree. And following the same train none of us should complain about Linux or FreeBSD getting a CoC.

October 22, 2018 - 03:54

Ryan Campbell

Gno.

October 22, 2018 - 04:01

Oliver Morris

Public domain isn't a license that applies correctly for software. It's even worse than MIT/BSD.
You're probably a political/religious npc who's following the sqlite license because of it's coc.

GPLv3 means basically that:
-You have to distribute source code if you give a binary, you can distribute the source code in any form you want, paper, cd, etcetera....
Example: if you distribute your binary to a company you have to share it to them in any form you want but you aren't forced to make it public, if you put a binary on the internet in a publicly available space then you have to share the source code publicly, when requested, in any form.
-You can't enforce tivoization, meaning that you can't stop people from replacing software that was integrated into hardware (aka if there's a signature people must be able to replace it or remove it).
-People can read, edit, compile and execute the source code like they want.
The GPL gives the same freedoms to users and developers (which are users too btw) thus none of them can be technically stuck because of juridical inanities.
Besides special advertisement BSD, the MIT and BSD are alike, people can do what ever they want even restrict other users if they want, it give to developers and companies the right to restrict users.
I hope this helped you, if you have questions just ask I'll gladly respond to them.
All the major licenses are compatible with the GPLv3 (and exceptions can be made if developers explicitly states it) only minor licenses like the openwatcom are not compatible.
See: gnu.org/licenses/license-list.html

Implying that we are forbidden to talk about it.

October 22, 2018 - 04:31

Chase Lee

Why did I think I could make an OP on Zig Forums without getting flooded with bikeshedding? Is there anywhere on the internet I can discuss Zig Forums without getting the tranny CoCk or this gay shit?

Attached: 620fe5872aecfcf0d350ad315d06124e03b92d7c38f124abe7234d9d259389cd-pol.jpg (640x432, 35.22K)

October 22, 2018 - 04:36

Jeremiah Barnes

GPL users are cucks, BSD developers are cucks. It is that simple. Write proprietary, use public domain, to be the least cucked.

October 22, 2018 - 04:57

Jaxson Wilson

So op going over your list there you have implemented exactly nothing. Just another Zig Forums project that will never amount to anything. Even cuckchan has get tox going slowly, nothing started here ever gets done just durr durr look at me I wrote the readme.

October 22, 2018 - 05:01

Carter Robinson

Reads like shit. I rather interface directly with pgsql and table the tags. Plus, daemonizing might be the optimal choice since you could interface it with any other program on your system, including your FS.

October 22, 2018 - 05:11

Evan Flores

Fuck off SQL shill.

October 22, 2018 - 05:17

Jeremiah Brooks

Currently, files can be written to tags and queried by tag. That's enough to host an FTP server with tags for directories. Deletion and copying between tags shouldn't take me long, and then it will have all its basic functionally, and I can focus on stuff like ontology and triple tags.

October 22, 2018 - 05:18

Kayden Torres

bloat, can be done way easier OP tbh.
But thanks for the idea, I'll try to improve it.

I mean after all, that's the power of open source.

October 22, 2018 - 05:20

Jaxson Young

Post a link here when you're done.

October 22, 2018 - 05:34

William Davis

nice
epic, can i delete a hard link to delete the file or do i have to interact with the database and do sql magic (fine by me but not fine for retards)

October 22, 2018 - 06:20

Evan Hall

You're posting cartoons on a gook message board run by some weirdo freemason who stole the board from a dying cripple. WHAT DID YOU EXPECT?

October 22, 2018 - 06:25

Charles Hall

OP, have you checked out TMSU? It's basically what you just described here, except the file system is read-only and it's all symbolic links.

October 22, 2018 - 06:27

Camden Sanders

Your brain must be on acid.
I just setup a single read only user on pgsql and be done. If I need to, I just setup nginx and modulate from there. Making a simple *booru and done.

October 22, 2018 - 10:17

Ian Gonzalez

You'd need to delete the file hash from the database, then delete the whole directory that all the hardlinks are in for that file.

I'm not sure how I want to implement deletion yet, which is why I haven't finished it. I have some ideas you can add to.

* When a file is removed from all tags, it's deleted from the database.
* Files can't be removed from the filesystem normally. An admin has to use a tool that cleans files without tags, and can back them up
* Files can be moved to a special "trash" tag that either deletes them immediately, or marks them for deletion by the admin tool.
* All files are in a special "all" tag, deleting the from that tag deletes them for real.

Any suggestions?

October 22, 2018 - 11:45

Julian Torres

If you're going to try deleting things now, remember to use the foreign key pragma in SQLite, since that dbms doesn't cascade delete by default. Just remove the hash from the file table, and all its tag entries will be deleted too.

I haven't heard of that. Is that the other thread here if the user that made a Booru accidentally? I think the symlinks in directories method isn't scalable. It requires a file write for each tag a file has, and a tag with many files will be slow to search through. See the directory tree I posted in OP? The first 3 characters in the file hash are directories, so there's never more than 255 entries in the root directory.

October 22, 2018 - 11:57

Jaxon Allen

elaborate or stop posting

October 22, 2018 - 12:50

Aiden Collins

Ah blooblooobloo, muh ancap rice.

Don't listen to the suckless faggot, GPLv3 is what you want.
If anyone wants to see the end result of too permissive license look at the bitching of Antirez; "OH GOD I SAID THEY COULD DO ANYTHING AND THEY DID, THEY'RE NOT EVEN CONTRIBUTING BACK!".
GPL solves this problem.

October 22, 2018 - 13:12

Isaiah Morales

Sure will do in some days since I got school shit to do but I think you could tag your files by just organizing them in folders.

October 22, 2018 - 13:29

Wyatt Moore

see my response here

There's scalability issues to simply putting symlinks in directories. If you look at the overhead, an SQL database is less bloated than symlinks in directories, albeit more complicated. There's less reads and writes, and listing a directory with too many files in it is slower than querying a database with that many entries. I also can't imagine how to implement ontology or triple tags when using symlinks in directories.

October 22, 2018 - 15:16

Austin Russell

TMSU's backend is sqlite3, and the symlinks are generated on the fly when you browse the virtual file system. An OK workflow is:

1) Download file into some directory structure
2) Use a `tmsu tag myfile tag1 tag2 tag3` command to tag the file
3) When you want to find your file, you navigate to `/path/to/tmsu-vfs-mount/queries/"(tag1 or tag2) tag3 not tag4"` and you see a bunch of symlinks to relevant files.

You can even tell TMSU that tag1 implies tag2. The only real drawback imo is that you have to manually tag files using commands (there's no writing to the virtual file system).

Basically just consider hacking on github.com/oniony/TMSU instead of rewriting everything from scratch. It's written in golang, and it works on Alpine as long as you remove GNU specific extentions from the makefile.

October 24, 2018 - 04:22

Jose White

Start over from scratch, in C, or fork an existing simple FS (like OpenBSD's FFS) and toy with the file table code instead, and use the BSD two-clause.
On second thought, you don't seem ready. Good luck nooblet!

October 24, 2018 - 04:56

Nathan Anderson

sqlite is the least pozzed and most highly tested DB ever. wtf do you have against

October 24, 2018 - 05:17

James Morales

It's so reliable the US military uses it, it's an actually succesful FOSS project that puts Linux to shame, it has the best license I have ever seen, and the entire dev team is both politically sane and gifted with a good sense of humor (see their CoC).

October 24, 2018 - 08:56

Andrew Price

blake2.net/

he is probably an atheist

October 24, 2018 - 09:31

Jayden Martin

Blake2 is interesting but SHA-1 is more tested, in any case if this wants to be anything more than a toy project it's clear hashes alone won't cut it in the long term.

October 24, 2018 - 10:47

Blake Russell

Python was made to integrate native components, and that's how I'm using it. All the heavy computing is done by sqlite3 and I'm sure the hashing function is also written in C.

I'll take a look at it. I want to avoid collisions though.

What else do you think I'll need?

October 24, 2018 - 12:15

Anthony Murphy

tech thread, sage

October 24, 2018 - 13:04

Jack Cruz

Actual retard spotted. How is it relevant that SHA-1 is more tested? You are aware that OP uses MD5 as a non-cryptographic checksum, right?
Also SHA-1 is broken, slow, and bloated. There is absolutely no reason to ever use SHA-1.

? Blake2 is a cryptographic hash function. So collision resistance is a necessary property of it.
I posted Blake2 because it is faster than MD5. It is in python's std since 3.6.

October 24, 2018 - 13:49

Easton Peterson

How is it NOT relevant when you are using hashes and want to avoid collisions?
More tested without being broken= collisions are harder to create, intentionally or not.
Md5 is broken so worse than both, of course, and I would personally reccomend SHA-256 for the reasons that made Hydrus choose it.
It's a hash you dumb fuck, there's no bloat.

A way to distinguish between files with the same hash.
Even if collisions are incredibly unlikely now, eventually hashes get broken and I/O failing because of them is unacceptable.
I would suggest something such as hash plus incrementing ID for files with the same hash, that way there is no noticeable perf loss in most cases (when hash lookup finds only one file or zero files) and a collision won't fuck up the system in weird ways.

October 24, 2018 - 15:23

Jack Campbell

Wew lad you are retarded.
Kill yourself you dumb nigger.
wew fucking lad. Now we truly have entered NaN levels of IQ.

October 24, 2018 - 15:42

Nicholas Powell

not op but this is actually how every fucking hash table works. they use intentionally weak very very fast hashes and then compare the value when they have to.

October 24, 2018 - 18:13

Dylan Campbell

Sqlite is a datastore, not a database, lol!

NIST, nuff said.

Only 1 cipher, and depending on compromised hardware.

October 24, 2018 - 18:36

Evan Richardson

I wonder what the probability of collision with >= 128 bit hashes is...

Blake2 b and s are both faster than MD5 on all hardware.

October 24, 2018 - 19:17

Joshua Jackson

you glow too much

October 24, 2018 - 19:42

John Campbell

Not them, but using more than one cipher to test files has always been the standard. I still laugh at people that publish MD5&SHA1 on their release .checksums.
Variety is key, signing is proper.

October 24, 2018 - 21:14

Jason Miller

What if I specialized the filesystem for images, audio, and video, and used perceptual hashing like suggests in another thread?

(I just realized I got Satan trips.)
I've decided that files can be deleted by writing them to a DELETE tag. This seems to be least confusing or clunky way to handle deletion. That's what I'll be working on next.

TMSU looks ideal for running locally, but adding files remotely would require an SSH session.

October 24, 2018 - 22:03

Camden Reed

...

October 24, 2018 - 22:20

Ian Lopez

All of it: you are trying to yield everything replicable. Every step must be duplicable. If I use ZFS, UFS, RFS, etc., how do you get another peer to replicate.

I've been ignoring that thread exactly because Autobooru + Illustration2Vec has existed for 3 years now, and Spic/tech/ love to reinvent wheels for the nth time. Phproject when Haskell already does httpds with functional monads? What year is this, 1998?

October 24, 2018 - 22:40

Sebastian Green

-Inf IQ

>>>/reddit/

LMAO. The absolute state of Zig Forums.
en.wikipedia.org/wiki/Collision_resistance
en.wikipedia.org/wiki/Birthday_attack
>

October 26, 2018 - 13:57

Asher Diaz

This larper faggot still thinks SHA256 is the only hash and that everything has the same properties it does.

October 26, 2018 - 19:40

Adrian Evans

stop calling things "semantic". a heirarchical filesystem is no less semantic than a tagged one. you can tag a file as gay or you can make a folder called /gayshit. i'm not saying tagged systems aren't better, they're just not "semantic"

October 26, 2018 - 20:50

Adrian Ross

gentoo uses blake2 now, you're argument is invalid

October 26, 2018 - 20:51

Easton Wilson

can't wait to use NortonFS

October 27, 2018 - 12:26

Cameron Martin

I never mentioned SHA256. I mentioned Blake2.
Also btw collision resistance is by definition a property of every cryptographic hash function. If a hash function isn't collision resistant, like MD5 and SHA1, then it isn't cryptographic.
Kill yourself, LARPer.

October 27, 2018 - 15:43

Jeremiah Morales

Was interested until this post. I really don't want terms and conditions on a filesystem.

Is it faster than xxhash yet?
Does it really need to be cryptographic for a tag based, read only filesystem where speed is probably paramount?

October 27, 2018 - 16:08

Ethan Campbell

Faggot you think you can backpedal now? We are talking about hash functions as they are useful for the data structure that is a file system. Op specifically talked about using faster hash functions that have higher collision (which is very very common in fast data structures) and then checking for it when needed.

October 27, 2018 - 16:09

Justin Ortiz

collision chance is too high

You fucking nigger. How can you be so fucking retarded? I'm not backtracking, you are just too stupid to read properly.
Absolute rubbish. See
Have you even tried reading the source code OP posted? Probably not because you are LARPing too hard.
Also how do you think collision resolution works in "fast data structures"? You first compare the hash, then the values. Now what happens when the values are big and on a hard drive instead of in memory?
You absolutely do not want to have hash collisions in a hash addressed filesystem.

October 27, 2018 - 16:31

Cameron King

I've added deleting tags and hardlinking files to new tags since I posted last. Since an user showed me TMSU, I see it does nearly everything I need. All I wanted to do was tag my files and browse them in a filesystem with my other software, and share them on a LAN with FTP. So, I may not continue work on this project, but I might finish it to amuse myself. TMSU's virtual filesystem is read only, so remote tagging through FTP is something mine does that TMSU can't. Maybe I could start over by simply wrapping my filesystem around TMSU, so writing a file to a tag uses TMSU to tag it.

I think adding ontology will make it semantic. I've seen the term "semantic data" used for metadata added to content so that a program can easily sort through it. I admit that I don't know the full implications of calling something semantic.

Attached: serveimage (1).jpeg (1178x1600, 889.55K)

November 5, 2018 - 04:51

Symantic Filesystem

Last threads