So I accidentally a new imageboard

I'm one of those people who saves images off of *booru-type boards, and while I tried to keep it organized I inevitably found myself with folder like "good1", "good17", "abs", etc.

Well, I noticed my sloppy habits were using lots of HDD space, so I wrote a little bash script which would put all the images in one central folder, md5 them, then link them in my categories. Saved some HDD space.

But then I f'd up, and accidentally scrambled my folders. With everything in one big fat disorganized folder with md5 names I really didn't want to put effort into sorting them... ... So I wrote another script which would scrape various booru sites and get the tags for the images I had, dumping them into a text file. I didn't know what to do with it, but I knew it was a good step.

A week later I went back and made another script that would create and populate folder based on searches I threw at it. I could just run something like "prondir big_breasts flat_chest -o bigandsmall", and it worked nicely!

But this was ugly. I like nice things. So this morning I started the bare bones of a proper SQL database, tossed a custom MVC CMS I wrote into it, jammed my tag database in there, and put together a mini-site for my glorious porn browsing.

Then I made it nicer... and I started adding features. I crossed the event horizon of "this is for me" into "hey, I think this could be a thing?" It's only day 1 into "formal" development, but I re-wrote my Bash stuff into PHP, and started on features that I would want. E.g; when you view an image it will also recommend images you might also be interested in.

So, if I keep going on this thing - what would anyone want to see in a new booru board? Since it's only got bare bones, there's not any tech debt, planning now around what everyone wants seems like an idea.

Attached: Screenshot_2018-10-01 Borehole(1).jpg (1302x1776 1.81 MB, 606.53K)

Other urls found in this thread:

youtube.com/watch?v=sbzUOajs2d8
0x0.st/sYpT.tgz
phash.org/
github.com/oniony/TMSU
github.com/bakape/hydron
github.com/CuddleBear92/Hydrus-Presets-and-Scripts
news.microsoft.com/2018/06/04/microsoft-to-acquire-github-for-7-5-billion/
e621.net/post/show/1623850
twitter.com/NSFWRedditImage

sudo rm -rf / should fix the problem

Post screenshots of how it looks so far

shit never mind I'm an idiot, didn't see images until after I posted
plus no PW so slim jim can't track me
mods delet my post plz

user...
>>>/hydrus/

support for in-line images.
I want long form posts like you'd find on Usenet or 90s forums, with inline image bbcode-esque support, but with the ease and anonymity of posting on a chan.

Make it happen.

Attached: akira_by_torei.jpg (595x856, 479.19K)

I do appreciate your post, Zig Forums needs more people who actually do stuff, but you should seriously consider suicide.

Attached: yiffinhell.jpg (779x519, 57.31K)

So what I'm thinking with that (just to prevent abuse) is to allow only on-site post images to be inserted. E.g. inserting a 1px transparent remote image that tracks you, or swapping out an image later, or simple breakage. I could add remote images anyway (it's not hard), but in whatever default configuration it might come with I'd disable them by default.

But yeah, Ive done BBCode stuff before, no reason not to have all the basics.

On an aside, users will have "albums". I was always going to have them so you can add images to albums and run on-site slideshows; but maybe I could also give users a "stickers" album people could use to collect reaction images and the like, with a "quick insert" button when posting offering those images. Sort of like how you can collect them on Telegram.

tmsu is better

Attached: 58dc408df11916b2004dce79b97bddf88066f1aa8e4885f438b46e3b43ccb6db.jpg (1500x1000, 191.67K)

So what if I want to fuck PHP and code in furry!?!

Tbh, this is actually one time where the >python meme is right
Hydrus is a slow bloated piece of shit.

...

'cept it's not loading a full fucking gui in 1000.
It's a simple script that integrates with your regular browsing.
It works fine here.

fucking furies haha

Yeah, I've been thinking of doing this for years. Kinda started on it. Made the symlink tagger, and made a search function. My idea was instead of md5ing the files in the pool, I'd md5 just the search results, so you can post them somewhere else without leaking trackable names, while it still has whatever name you want on your system. It's something I never got around to, like most of my programming projects. github.com/SpaceBudokan/SackyTag

I haven't been able to figure out why anybody would use python for web dev, but then again I have no idea why people use php either.

THERE IS NOTHING FUNNY ABOUT FURRY
youtube.com/watch?v=sbzUOajs2d8

Why not use an algo specifically designed to match images based on visual similarity? There are several.

Fuck off kike

Sorry guy, but after hearing about that dead dog fucker idk if i can tolerate furries anymore.

Sweet logic bro

Attached: 1537819952791.gif (201x342, 2.51K)

Furry detected...

Even if not, while it is an overgeneralization in most cases, it isn't in this case given all the other ridiculous shit from the furry community, including (but not limited to) plushie sex, vore, beastiality, diaperfurs, and so on...

It is a very safe bet to say that whole fandom is flaming trash.

Maybe off topic? But on the subject of...
I've played this game. Dir reads take a mega long time with tons and tons of files in them. I've found in my testing when I did something like.
#pseudo codesum = generate_sum(filename)dirname = s[0]s[1]/s[2]s[3]/mkdir dirnamemv filename dirname
And I don't know if you're interest in this but noting that visually identical (down to the pixel) images can have different md5sums due to so many different variables that simple cryptographic hashing isn't sufficient for something like this. You're going to need to do perceptual hashing if you care about that. At the very least you can turn the image greyscale and scale down to a reasonable size you can easily store and compute a hamming distance against other hashes. That will catch less false positives than you may think.
Here's a shitty little testing thing with hardly any functionality I did a couple of years ago that should illustrate what I mean. 0x0.st/sYpT.tgz
It's what I used to generate 5.png from 4.jpg. 5.png just visually represents a 64byte hash that can be stored in a file name perhaps.
These links may hook you up.
phash.org/

Attached: 5.png (2700x1761 15.17 KB, 1.32M)

Meant to say 256

Try again, smartass.

The furry fandom is way too vast to be generalizing like this.

In a way, they kind of asked for it- with their constant attempts to absorb other fandoms and the like, as well as the rejection of some anti-bestiality people in the earlier days. Not going to judge them based on zoophiles hiding int heir ranks though.

Attached: smug.jpg (386x386, 63.74K)

It's just for naming the images, if I wanted to scrape tags from other sites I'd need to md5 them at some point. It's also a decent way of avoiding name collisions with uploaded material.

I did something like this back in the day. What I did is downscale the image to ~50 pixels of data while logging the ratio. After that I converted the image to a limited indexed palette I created. Then it was simple to store the images as strings and compare them. The one requirement was that the image have roughly the same aspect ratio, within a certain tolerance, for the algorithm to be computationally light on larger imagesets.

People bitching about your porn taste OP, but good for you for doing something productive. Was wondering how fast that textfile was over the SQL database? I know it'll be slower but I don't mind waiting a second or two for the sake of not having to set up a local server on my toaster of a machine. I've thought of doing something similar for my own image folders, but I pull images from so many places I'd have to manually tag like 20GB worth of shit and gave up.

Attached: 1524513857036.jpg (798x809, 48.16K)

suicide might also be a fix.>>980631

It was actually pretty respectable. For a 10.1GB folder (5534 files, not including video) running on my HDD, all operations seemed pretty much instant. That being said, I also have a strong i7 CPU.

I even had the "you may also like" feature working through the text file, and even comparing literally every tag in every image it was 'instant'. Had to be < half a second. But I'm also sure the database performance would slow down exponentially with larger datasets... Not sure. You must be hitting ~10,000 images.

The downside is that there was a limit to what I could do with only the text file. E.g. tag aliases, view counting, etc. That being said, probably no reason it couldn't have a SQLite back-end. I'm focusing on getting things running and fleshing out the DB, but the core CMS underneath could eventually make that possible. It will have a comfy installer, I like comfy installers, so no matter what setup should be painless.

I tested the text-driven model again. It generated the heaviest page in 0.267 seconds cold, and 0.060 seconds hot. So performance is decent.

So here's a screenshot of a "tag-heavy" image in the old text version. Specifically in the "you may also like" it's comparing all the tags in every image, doing a formula, and those numbers are the "weight" of the matches. In the bottom left corner it shows the generation time, to to compare every tag in that image with every other tag in every other image, it took 0.041 seconds for that one.

Attached: Screenshot_2018-10-01 Borehole(2).jpg (1077x2418, 544.29K)

Shoot, sorry for not spoilering that; mods - could you help me out? Sorry.

reddit

detected

I use hydrus but his viewer is honestly shit. Not to mention all the times that it fails to grab booru tags after scraping.

The similar search at the bottom looks like it would be a good feature for any booru like board to have so why not? Are the numbers at the bottom based on a best match and decline from there? How does the "You may also like" algorithm work?

I'm not sure why it isn't a super-common feature either. This is how my own first stab roughly works:

1. Find all images with at least one tag in common with the current image to create the sample set.
2. Inversely weight all tags in the image by their commonality among the sample set. 1/sqrt(totalOccurencesInResults). This makes it so 'rare' tags (e.g. a character name or distinctive feature) will give higher weights.
3. Sum up the weighted tag scores for each sample.
4. Divide the score by square root of the total number of tags in that image. This is so images with large numbers of tags don't steal the limelight from more relevant images.
5. arsort & slice.

There's a couple little fiddlybits I left out, but that sums it up. By the looks of it, "great matches" seems to hover around a score of ~3.1+, O.K. images are ~2.8, and it drops off from there. I (think) the system will get more accurate with a larger database though.

This is still mostly done in PHP, but I'm wanting to make it more SQL-driven.

sounds good to me,rare tag having more weight is smart. Hydrus or other boorus use categorization for characters and series which is a simple lock but but how would you differentiate a distinctive feature from the rest of the noise.

I've been thinking about that. In the DB all tags have taxonomies in addition to types already.

Instead of complicating the general "you might like" formula, other things could be done. Probably down the road. I'm getting somewhat happy with the recommendations it's giving with the formula tweaks. It sounds obvious, but the more you complicate a formula the harder it is to get predictable results - and it's near impossible to adjust a complex formula without reverberating repercussions.

Instead I'm interested in integrating taxonomies and tag parentage to improve the exploration level of the system. For example, if you search for "pink_skin" I'm aiming to have it suggest stuff like popular characters with that trait, like "majin android 21" and "emelie" if it finds strong correlations. Maybe there's an author who draws em' thick, and it might suggest that author when it picks up on what you're looking for.

Princess Zelda by Didi-Esmeralda.

Attached: Bh3bcym.jpg (666x1200 643.92 KB, 130.12K)

Download when?

Is there even an option in Hydrus to scrape tags for multiple images at once? I somehow still can't find it. I'd use OP's thing for that alone and even if it doesn't Hydrus doesn't seem to support multiple databases and I would like one for porn and one for memes. I like using imgbrd-grabber to browse and individually download the best pics over downloading entire tags and it doesn't seem to have an option for saving tags besides in the filename. (which isn't long enough)

Isn't this one of the main features of hydrus? It's nuts how much file bloat you can get because you threw in a general tag to search for. Thats why there's a file limit set up so you dont download 2gb in one sitting.

Default hydrus is set up for one database but more than one user, including the dev of hydrus has done more than one db.

github.com/oniony/TMSU

Attached: DoQNCQ8U4AEebSg.jpg (826x1168, 152.92K)

You've been doing it for that long? Or is this a fork or something?


Maybe I'm retarded after all. What do you mean by more than one user though? As in actually making a new system user and running it with su/sudo?

i mean more than one user has done it. not that you need a different user system since there's no accounts. the hydrus help has written up on making more than one db. i never did it though, just decided to split the db load between drives.

This isnt OPs project and you're a faggot. tmsu is a meme program but with little practical use unless you want to be anal about every file. it doesnt come anywhere near hydrus's PTR functionality. Who the fuck wants to manually tag tens of thousands of files?

TMSU actually provides more functionality than hydrus, sense it uses command line options instead of a bloated gui.

Also, who the fuck keeps shilling it? I've seen it in like 3 threads today.

you dont need to tag thats the point of the feature you stupid ass shill.

Your project is shit.
It's not integratable into normal filemanagers at all, it's made in python, it's bloated and slow.
TMSU manages to do all of what hydrus does and more without any of the draw backs.
If someone wanted to use tag sorting in hydrus they're going to need to learn the hiddious ui that looks like it was made for fucking windows 95.
TMSU uses fusermount so all you need to do is use the same filemanager and media players that you've always had.

The entire reason why hydrus is shit is because it's not simple, or does the single task it's supposed to in a way that's easy to understand.
On top of that, once you actually learn how it works, you still can't do much around it.
Like i said previously, you can use TMSU with bash or any other scripting language you want to make everything nice and automatic.
You can't do that in hydrus.

Stopped reading there.

I'd say the chances of you being the maintainer of it is quite high tbh.

Takers shouldn't talk like that to Makers.

Having image sibling/alternates support would be a big help. That's a huge feature missing from hydrus right now. I've tagged thousands of images as alternates of each other using its phash deduplicator but it's not able to use any of the pairings. It's also advisable to integrate a REST api so other frontends can be plugged in if wanted/needed. Hydrus is a cumbersome monolith with an overly complicated GUI.

As a side note I've been planning on making my own booru browser too in a month when I get some free time. I want to experiment with ArangoDB to see the advantages of graph databases. I'm interested in hearing your progress with the recommendation algorithm. Adding namespace support might help. As a test set I have the 1.8TB Danbooru 2017 archive along with all the images I've downloaded on my own. Let's see what we can do with 2.9 million tagged files.

You know its true.

Improvements!

There's a spangly homepage now... and a logo!

"You may also like" is much better. I added stats & info to the debug output, and with that I re-wrote the logic since it was easier to see what it was doing. I'm shocked at how well it's actually working now. When I re-scrape the tags I know a few more improvements to do. Scores are now normalized and offered as a % match, which also means the system now discards based on a score threshold so it doesn't serve vague trash. Posts below a certain number of tags (5) are not included in results, and don't show recommendations.

I've started a proper query parser which can handle a few things, like OR pipes (|) so you can search something like "breasts|ass".

Attached: Screenshot_2018-10-02 Borehole.png (1154x783, 61.11K)

...

Good shit OP, keep it up.

Why are some tags green? Are those your favorite tags using the + button? If so it could take those into consideration when recommending images.

Very nice OP, I've been wanting to tag all my images for a while and considered using hydrus at one point, but I'd basically have to manually tag 18,000 images if I wanted to go beyong just using existing folder names. Some sort of basic booru lookup to get tags for an existing image would be excellent.

How much space would it require to archive, say, Danbooru? I archive tons of shit and I'm afraid it'll suddenly dissapear someday.

It's trivial to use TMSU with PTR functionality if you know what you're doing. It's trivial to either use the hydrus libs directly or program something yourself to iterate through files and auto-tag recognized ones.

That was just me fucking with tag attributes, but I like your thinking! I made a glyph font and made sure to include the

I can't really give any hard numbers at this point. I know the data storage is negligible, thumbnails are a big heavier than current booru boards (~15-20kb), and I'm not using downscaled images yet... those will affect the storage quite a bit.

See my post

+1 to this.
parent/child images for edits or alternates is definitely needed. Also the ability to differentiate between an official artist edits or fan edits would be nice.

I'm a fucking dumbshit. I just accidentally overwrote all my jpgs with their thumbnails because I forgot a slash in my old script.

Attached: 1445719043480.gif (305x200, 4.34M)

congratulations user, you are the proud new owner of three terabytes of furry porn for ants

Attached: hyper.gif (300x300, 160.26K)

At least you know where to find the originals, right?

Attached: why god.PNG (182x169, 25.73K)

He said he overwrote the pictures.

Well, it seems only the images in the database were thumbnailed (... which makes sense), so I took it as a challenge to add an image downloader to the system. I have a temporary one running and recovering things now - though I can't tell if I'll be able to recover images only found on rule34.xxx. They have stupid urls.

But so far it seems like I'm recovering about an image a second, so that's fine for me.

Recovered all but 108 images! XD

I thought the +/- buttons were for the tag search only. Like how e621 does it, with adding the tag to the current search filter.

Except e621's is halfassed and it's only as good if you are too lazy to type it in yourself. But it isn't keeping track of the current tags like a checklist filter on some store sites. So if you add +tag then -tag they will just cancel eachother out and you'll be left with nothing.

First, try Hydron (written in Go) github.com/bakape/hydron
Then, understand that Hydrus is rapidly prototyped for the last 5 years to support thousands of websites.


Not enough features.


SQLite is always the answer for speed


Looks good


Hydrus Dev is implementing it in the near future


What do you mean by "taxonomies"?

Learn 2 get scripts github.com/CuddleBear92/Hydrus-Presets-and-Scripts
There are also default tag scraping for the major *boorus


Then rewrite the code to use Qt instead of WxPython whiny bitch, or make a CLI if you want things to be fast


Nah, the fanbase is large.


Already implemented
Go on, make it happen.


Put it on GitHub/GitGud


Tag characteristics


Hydrus WINS!


>>>/oven/ >>>/trannypol/

Why are you not in an institution?

Because PHP

In addition to types (character, author, copyright, meta) I'm adding taxonomies, Taxo will describe what the tag is describing. For example, "black_and_white", "3d", and "photo" would go under under the "style" taxonomy. "sleeping", "screaming", "running" would have a verb taxo. "2girls", "solo", "female/male" would fall under a subjects taxo, etc. It will also differentiate tags of the same name a little more nicely; is the image a photo, or is there a photo in the image?

Essentially, tag type will be required for a functioning tag, taxo will not be, but taxo will in general help a lot of things. When all the types of taxo are set in stone, images will require tags with certain taxonomies before they can be fully approved. Every image needs a noun, every image needs a subject, a verb, etc.

I have to go through and strip my name out of the source headers first; I borrowed a lot of code from another project, and this isn't something I want to associate myself with. Future employers might get squeamish, so it will be developed under my "18+" alias "Dizmal". :/

Woowee, retard in the end
news.microsoft.com/2018/06/04/microsoft-to-acquire-github-for-7-5-billion/
Did you like attempt to upload your patches ITT?

I also didn't get what you meant but now I get it. I find illustrating ideas helps a lot. Basically it's tag relationships. You can have top level categories and have tags branch from them.

Category1
/ | \
Tag1 Tag2 Namespace:Tag3 Tag4

The image siblings would be at one level but could be combined with parent/children hierarchy.

Sibling1 --- Sigbling2 ---Sibling3

Parent
|
Child1&Sibling1 --- Sibling2

What are you talking about? None of the source code I have has ever been public. I also never said where specifically I'd be publishing the code.

Then Hydrus already has taxonomy, but you should try make your own.


Not talking about TMSU, fag. I am talking to Dizmal's Borehole (make a better name please).

lul

alright, i'm listening.
Will this have cli arguments?

>how do i use scripts on more than a single file at a time
???

Shit like this is why I have everything important on zfs, with automatic snapshots using pyznap or sanoid. Instant and painless rollback.

I tried to write a script to detect and link duplicate images. Somehow the script didn't save the hash of the new images properly, so any duplicate file was replaced with the first duplicate file the script found. (After recovering the files I could, I ended up deleting that image entirely because I was so sick of seeing it.)

To this booru please op. I wanna look at that sexy dog pussy porn in the search results there.

Link please

Good

It will come... In the meantime...

IT NOW HAS SEXY PORNHUB STYLE VIDEO THUMBNAILS ON HOVER!

Huehuehaha, sample attached. Though they have the "gifv" extension on the site.

Attached: 4ea1b3fb4122309db6774faec49a5297.temp-X.webm (360x203, 174.46K)

Hype hype hype!
BTFO FIoCoder

Oh? Do tell us now.

It uses FFMPEG and FFPROBE to do the heavy lifting. To make the thumbnails it takes & scales 5 2-second slices from the video, then saves the concatenated result as a gifv with webm encoding.

For the static thumbnail it takes a still from the half-way mark. I hate it when videos fade in from black at the beginning, making the thumbnail a useless black rectangle.

Although i like the idea of hydrus it fucking chugs on my thinkpad, if your implementation supports mass-scraping media from booru sites with tags I'll give it a shot.

Are you using Q_invsqrt?

I'll forgive your terminally shit taste in porn on account of the fact that this is more programming than Zig Forums has done in years.

sauce?

e621.net/post/show/1623850
There are other angles in the description.

Why does this shit always look but better in the thumbnails?