What Zig Forums projects are you currently procrastinating on?

What Zig Forums projects are you currently procrastinating on?

Attached: image.jpg (1920x1280, 350.55K)

Other urls found in this thread:



i gotta pull the GPU out of my imac and send it to get it repaired at some electronics repair service on ebay. It's a top-tier i7 from 2011 and it runs like a dream, but like every fucking machine with a discrete GPU apple makes, it fried itself. Fucking Apple can't into thermal dissipation.

I'm procrastinating because of what a pain in the dick it is to take the machine apart.

Attached: TacnA.jpg (3264x2448, 505.98K)

why don't you send the whole thing?

I installed Windows 10 on my two machines and all of a sudden I have no more "projects" to "invest time in" (I no longer spend my free time tinkering with Linux).

I might end up doing something worthwhile with my life if I keep this up.

$130 for them to fix and return the GPU, i'd have to pay several times that to get the machine serviced. Never mind shipping costs.

Really wish they'd go back to making mac pros like they used to. I like their platform and their general hardware design, i guess there's more profit in faggy fully integrated, no-repair, no-upgrade shit.

Always a good idea to limit the time you spend on places like this.

Unironically fizzbuzz.

Why don't you just torch it?

Ok you got me motivated.
/* hurr fizzbuzz */#include int main(void){ int i, hurr; for (i = 1, hurr = 0; i

v1.01 is moar elegant due to elimination of "muh magick numbers" (except 0, but literal zeros are fine).
/* hurr fizzbuzz v1.01*/#include #define MIN 1#define MAX 100#define FIZZ 3#define BUZZ 5int main(void){ int i, hurr; for (i = MIN, hurr = 0; i

Don't many come here to get informed about Zig Forums things and inspired to do Zig Forums things (or at least aren't they makeing that excuse)?

Three and five are intrinsic to fizzbuzz they don't need define statements. And your fizzbuzz annoys me, because you have all the information present in your buzz conditional that you don't need to print %d, but instead you increment and then branch again.

Use the continue statement instead.

It means limit your time. If you're arguing about licenses back and forth with some guy for a few hours, that's not a productive use of your time.


You can't continue right after printing "Buzz" and incrementing hurr because no matter what is output you still need the newline. You could rewrite the main loop as
for (i = MIN, hurr = 0; i

Kill yourselves.

Realtime panorama stitching from mjpeg video feeds. Why the fuck does OpenCV only support Nvidia bullshit CUDA and not OpenCL?

No, it makes it look like you have command of the language. It's always nice to avoid unnecessary conditionals in loops.

Intrinsic or not, inserting literal numeric constants other than 0 (and there's cases where even another representation for 0 is preferred, such as '\0', NULL, FALSE etc.) into the code is still bad form generally and should be avoided unless the specific application asks for very terse and possibly hacky code.

No. This is just bad advice parroted around which makes your intent less clear. There are genuine reasons to avoid magic numbers, but sadly when the literals are intrinsic to the algorithm, they should be used directly.

Similarly, I see militant opposition to multiple return statements, even when avoiding them would reduce clarity.

But, as mentioned, in this particular case the performance to gain is negligible while code readability and consistence suffers. Anyway, anyone else can express their opinion which version they prefer.

Not liking someone else's fizzbuzz (because "muh subjective feels and reasons") is intrinsic to the universe, isn't it.

Is there any thread here at all that is both of non-trivial length *and* contains no vile suggestions of contemplating sucide at all?

Meant to say:
*There are genuine reasons to avoid magic numbers, but sadly this is taken as gospel at the expense of clarity. When the literals are intrinsic to the algorithm, they should be used directly.

Why is it negligible? This is the kind of thinking that gets us the death of a thousand cuts of performance problems. And by stating continue, you're making it very clear. If you were going for "clarity", you'd have used the usual 3-conditional if-else tree.

But they really aren't in this case. By tradition fizzbuzz involves 3 and 5 (probably because they're the two smallest odd primes), but that's nevertheless arbitrary. Obviously you can trivially implement exactly any of the possible fizzbuzz algorithms with just any two positive natural numbers instead of 3 and 5 - the output will be different if the data is different, but the algorithm itself will still be the same. Heck, you could even mix and match different implementations of the algorithm (or parts thereof) - the below example may be a bit contrived, but you get the idea.
/* hurr fizzbuzz v1.03*/#include #define SUCCESS 0#define FALSE 0#define TRUE 1#define MIN 1#define MAX 100#define FIZZ 3#define BUZZ 5#define DIVIDES1 func1int func1(int n, int m);int func2(int n, int m);int main(void){ int i, hurr; int (*divides2)(int, int) = func2; for (i = MIN, hurr = 0; i


Remove the loop and do it with a y combinator.

Fixing my computer

You ain't seen nothing yet.

Only if it's not full of Macfags and people saying that Windows 10 is better than Linux.
Do you not know where you are?

Let me tell you why Linux isn't an operating system...

I am confused as to whether having a cuck code of conduct to go with that is a part of the satire or not.

Sorry, haven't read muh SICP today.

/* hurr fizzbuzz really gud edition*/#include #define SUCCESS 0#define FALSE 0#define TRUE 1#define TWO 2#define MIN 1#define MAX 100#define FIZZ_NUM 3#define FIZZ_STR "Fizz"#define BUZZ_NUM 5#define BUZZ_STR "Buzz"int func1(int n, int m);int func2(int n, int m);typedef struct { int num; char *str;} fizzbuzz_t; void *main(void){ int i, j, hurr; fizzbuzz_t durr[] = { {FIZZ_NUM, FIZZ_STR}, {BUZZ_NUM, BUZZ_STR} }; int (*zurr[])(int, int) = { func1, func2 }; int (*divides)(int, int) = zurr[SUCCESS]; for (i = MIN, hurr = SUCCESS; i str); ++hurr; } } if (!hurr) printf("%d", i); printf("\n"); } return NULL;}int func1(int n, int m){ return m % n ? FALSE : TRUE;}int func2(int n, int m){ return (m - (m / n * n)) ? FALSE : TRUE;}
We're getting there.

The remaining string literals were still jarring.
/* hurr fizzbuzz even moar gud edition*/#include #define SUCCESS 0#define FALSE 0#define TRUE 1#define TWO 2#define MIN 1#define MAX 100#define FIZZ_NUM 3#define FIZZ_STR "Fizz"#define BUZZ_NUM 5#define BUZZ_STR "Buzz"#define DECIMAL_REPRESENTED_INT "%d"#define NEWLINE "\n"int func1(int n, int m);int func2(int n, int m);int print_newline(void);typedef struct { int num; char *str;} fizzbuzz_t; void *main(void){ int i, j, hurr; fizzbuzz_t durr[] = { {FIZZ_NUM, FIZZ_STR}, {BUZZ_NUM, BUZZ_STR} }; int (*zurr[])(int, int) = { func1, func2 }; int (*divides)(int, int) = zurr[SUCCESS]; for (i = MIN, hurr = SUCCESS; i str); ++hurr; } } if (!hurr) printf(DECIMAL_REPRESENTED_INT, i); print_newline(); } return NULL;}int func1(int n, int m){ return m % n ? FALSE : TRUE;}int func2(int n, int m){ return (m - (m / n * n)) ? FALSE : TRUE;}int print_newline(void){ printf(NEWLINE);}
Actual quality code, at last.

Ok, two more minor corrections (print_newline() return type was wrong, and typedefs should go before prototypes as the latter might want to use the former).
/* hurr fizzbuzz actually gud edition*/#include #define SUCCESS 0#define FALSE 0#define TRUE 1#define TWO 2#define MIN 1#define MAX 100#define FIZZ_NUM 3#define FIZZ_STR "Fizz"#define BUZZ_NUM 5#define BUZZ_STR "Buzz"#define DECIMAL_REPRESENTED_INT "%d"#define NEWLINE "\n"typedef struct { int num; char *str;} fizzbuzz_t;int func1(int n, int m);int func2(int n, int m);void print_newline(void);void *main(void){ int i, j, hurr; fizzbuzz_t durr[] = { {FIZZ_NUM, FIZZ_STR}, {BUZZ_NUM, BUZZ_STR} }; int (*zurr[])(int, int) = { func1, func2 }; int (*divides)(int, int) = zurr[SUCCESS]; for (i = MIN, hurr = SUCCESS; i str); ++hurr; } } if (!hurr) printf(DECIMAL_REPRESENTED_INT, i); print_newline(); } return NULL;}int func1(int n, int m){ return m % n ? FALSE : TRUE;}int func2(int n, int m){ return (m - (m / n * n)) ? FALSE : TRUE;}void print_newline(void){ printf(NEWLINE);}

Can devoting time to coming up with ridiculous implementations of fizzbuzz be classified as a form of procrastination?

rate my fizzbuzz

#include #define MIN 1#define MAX 100#define STEP 1#define FIZZ 3#define BUZZ 5#define IF_BUZZ(n) if(n%BUZZ==0) printf("Buzz");#define IF_FIZZ(n) if(n%FIZZ==0) { printf("Fizz"); IF_BUZZ(n);} int main(void){ for(int i=MIN;i

What is the reaction?

A job offer, $300k starting.

If you #define a compound expression (i.e. one which involves operators rather than being just a string literal or numeric/character constant) then put parentheses around it for safety, without them surrounding operators might steal precedence and mess up the resulting expression.

Substitiuting the macros (and reformatting for clarity), it amounts to
#include #define MIN 1#define MAX 100#define STEP 1#define FIZZ 3#define BUZZ 5 int main(void){ int i; for (i = MIN; i

Should be in the standard library by now tbh

i'm trying to rewrite a cpu heavy webext that I use heavily for work to do the work in a native binary instead, to offload the insane amount of javascript cancer into C.

it's easier than I thought it would be, webext already has an API for it, as far as the native binary is concerned

webext -> stdin -> binary -> stdout -> webext

there's no other bullshit than that. json goes into the binary, json comes out of the binary. I thought I was going to have to do some real bs and have it launch a webserver or something for communication, but no. if this works other cpu heavy extensions could be rewritten in future projects to take advantage of the speed boost, i'm thinking adblocking.

This, right?

You can use the below code to benchmark the performace of both variants of this implementation on your system.
/* hurr fizzbuzz v1.01t1 (perf. test)*/#include #include #define FIZZ 3#define BUZZ 5#define MIN 1#define MAX 25000#define LEN_TMP 6#define LEN_BUF 140000#define OUTPUT_CON 0#define OUTPUT_FILE 0#if OUTPUT_FILE#define FILENAME "fizzbuzz.txt"#define FILEMODE "w"#endif#define PRINTSTRLEN 0#define HACKY 0int main(void){ int i, hurr; char buf[LEN_BUF]; char tmp[LEN_TMP];#if OUTPUT_FILE FILE *fp = fopen(FILENAME, FILEMODE);#endif buf[0] = '\0'; for (i = MIN, hurr = 0; i

ya, and firefox's doc's on it

Attached: native-messaging.png (1344x1096, 263.9K)

More than I thought to be honest, certainly nothing to sneeze at.

This program (which is >>881442's program with pre-substituted macros) seems to be about 20% faster in average. Below a version suitable for algorithm performance testing (i.e. one which builds a buffer with output and does actual output only optionally).
/* hurr fizzbuzz v1.01t2 (perf. test)*/#include #include #define FIZZ 3#define BUZZ 5#define MIN 1#define MAX 25000#define LEN_TMP 6#define LEN_BUF 140000#define OUTPUT_CON 0#define OUTPUT_FILE 0#if OUTPUT_FILE#define FILENAME "fizzbuzz.txt"#define FILEMODE "w"#endif#define PRINTSTRLEN 0int main(void){ int i; char buf[LEN_BUF]; char tmp[LEN_TMP];#if OUTPUT_FILE FILE *fp = fopen(FILENAME, FILEMODE);#endif buf[0] = '\0'; for (i = MIN; i

Whoops, wrong (forgot to add some necessary braces - this
else sprintf(tmp, "%d", i); strcat(buf, tmp);
needs to be corrected to this
else { sprintf(tmp, "%d", i); strcat(buf, tmp);}
The corrected version seems to be only about 5% faster than with the HACKY flag set to 0 (i.e. it's actually slower than when HACKY is set to 1). Interesting.

Any higher-performance fizzbuzz propositions?

'Hacky' for the win once again ;) I don't really have much time to mess around further, but I'd start by looking at the assembly code being generated. Are you compiling with any operations?

gcc.godbolt.org if you want something quick and dirty.


I was going to try uncucking an older version of minecraft, but I am literally retarded and cannot decipher java. I want to just give up on it, the things I wanted to do to it, for the sake of making it easily mod-able by anyone, are so far beyond my grasp of programming that I don't think I can make it happen.
I don't know if going through java tutorials would even be of any help because I have so much trouble retaining information.

Attached: 1344607554302.jpg (300x544, 15.78K)

Learn c

Well that's not a good attitude to start with, is it? Don't talk about yourself that way, otherwise it can become a self fulfilling prophesy.

Looks like strcat() is a huge performance hog - getting rid of it and writing to the buffer directly improved general performance by as much as a factor of 15(!).
/* hurr fizzbuzz v1.01t1c (perf. test)*/#include #include #include #define FIZZ 3#define BUZZ 5#define MIN 1#define MAX 9999999#define LEN_TMP 8#define LEN_BUF 70000000#define OUTPUT_CON 0#define OUTPUT_FILE 0#if OUTPUT_FILE#define FILENAME "fizzbuzz.txt"#define FILEMODE "w"#endif#define PRINTSTRLEN 0#define HACKY 0int main(void){ int i, hurr; char *buf = (char *)malloc(LEN_BUF); char *tmp = (char *)malloc(LEN_TMP); char *bufptr = buf;#if OUTPUT_FILE FILE *fp = fopen(FILENAME, FILEMODE);#endif for (i = MIN, hurr = 0; i

On the rare occasion that I nab a girlfriend, she will invariably have some brilliant idea about an app that is tangentially related to her job. Procrastinating on this task until I get dumped has been honed to an art form.

This is what we call "biting off more than you can chew". What you're supposed to do it spit it out and bit off a smaller fucking amount. In short, evaluate your unobtainable goals and create smaller more obtainable goals from them instead of choking.

Attached: 48d583317374159ee06b3e6be6d2c2e710087a03e508eeddc8d320e19e507c68.jpg (1469x2031, 195.26K)

Redoing my static site generator

Making some kind of website that's like a less shit HN

Got some others too but these I've started

Attached: newyearsresolutions.jpg (540x540, 44.23K)

Instagram bot, I need to scrape huge list of certain types of people so I can get more followers.


congratulations you fucking retards. you don't even have to calculate the remainder.
notice the pattern? fucking kill yourselves and stop posting you stupid fucking retards.



Imagine being this underage.

that code with strcat was doomed to be slow because it didn't advance the dest ptr; every strcat call starts searching for the end of buf from the very beginning, so you get quadratic performance. the code here fixes that, which is probably where the performance comes from.

More ideas...
- unroll the loop: the lcd of 5 and 3 is 15. this gets rid of your branches and your modulus operator
- don't use sprintf to convert the number to a string; instead, have a buffer that stores the number as a string and mutate the characters of that. 90% of mutations need increment only a single character, 99% will increment two, etc... the idea here is that most of the string from the last number can already be reused.

Another idea is to try down counters, instead of modulus.

Is this fastest fizzbuzz?
#include int main(void){ printf("1\n2\nfizz\n4\nbuzz\nfizz\n7\n8\nfizz\nbuzz\n11\nfizz\n13\n14\nfizzbuzz\n16\n16\n17\nfizz\n19\nbuzz\nfizz\n22\n23\nfizz\nbuzz\n26\nfizz\n28\n29\nfizzbuzz\n31\n31\n32\nfizz\n34\nbuzz\nfizz\n37\n38\nfizz\nbuzz\n41\nfizz\n43\n44\nfizzbuzz\n46\n46\n47\nfizz\n49\nbuzz\nfizz\n52\n53\nfizz\nbuzz\n56\nfizz\n58\n59\nfizzbuzz\n61\n61\n62\nfizz\n64\nbuzz\nfizz\n67\n68\nfizz\nbuzz\n71\nfizz\n73\n74\nfizzbuzz\n76\n76\n77\nfizz\n79\nbuzz\nfizz\n82\n83\nfizz\nbuzz\n86\nfizz\n88\n89\nfizzbuzz\n91\n91\n92\nfizz\n94\nbuzz\nfizz\n97\n98\nfizz\nbuzz\n"); return 0;}

Are you purposely trying to make it bloated?
On a serious note, gcc will fix this for you automatically at some optimization level

Very innovative Leroy, not one of our top-school applicants put forward such a solution. When can you start working for us?

O0 already does it. The 0 doesn't really mean no optimizations whatsoever.

anime recommendation system based on myanimelist ratings, running on the assumption that because otaku are already database animals, their ratings will naturally be more reflective of their taste than most other rating sites. Combined with the knowledge that myanimelist primarily has two functions: forum and rating list, and yet has some 4M users. Assuming the forum isn't that popular (I never use it, anyways), then you have a significant population that simply wants to keep ratings for their own sake. Scraped all the data, but procrastinating on reading papers for modern rec systems

simple blog system with hierarchal tag system


learning unreal engine to build a king's field -like dungeon crawler w/ semi-realistic economics system. Core gameplay/economics are worked out in-head, but stalled like all hell on picking up a game engine. Not really set on unreal, but I figure it'll give me less trouble in the long haul since I don't know much game programming as it is, and I'm not interested in building from scratch, and unity appears to me incompetently developed.

If you ever get that shit off the ground please post here. I've wanted to do one for ages but never managed to make anything good, the recommendations are always garbage.

The problem with this is that you cannot make a recommendation without numbers, and if your numbers come from user ratings on a shithole with terrible taste like myanimelist, your recommendations are going to be shit too.

1. Getting a 3D printer and other tools to help here: 8ch.net/robowaifu
2. Getting ahead with Python, html5, JS, Flask, ...
3. Learning Kotlin and how to write Apps

Attached: 8ch_net_slash_robowaifu_-001.jpg (681x1024, 248.53K)

Writing a booru in Python (with flask).
Got most of the basic functionality done, but it's still very bare bones and a nightmare to administrate.

What methodologies did you employ?

I don't think the general taste of myanimelist is too important, unless of course its entirely absent of good taste (which I don't believe). What's required is a sufficient number of people who have taste similar to given user, and from there you derive the recommendations. You're not building a list so much as find the nearest neighbor in terms of taste, and stealing the anime they like, that you don't have.

ie If you give me a bunch of mecha anime as your top-10's, then the first thing I want to do is find other users who also consider those mecha's highly. I don't care about the shonen, isekei or whatever people, if they don't even look at the anime you do. Obviously if this results in an empty set, then I can't provide you any recommendations. But also obviously, if this leads to only 10 people, then I can very easily provide recommendations (just take whatever other things they like). The worst case is returning 10,000 people, in which case further filtering is required.

The reason you need a large dataset is not to average out the noise, I think, so much as have a large swath of tastes to match with.

So it doesn't matter that the average taste of myanimelist is awful. As long as we assume people with good taste also use the site, then its sufficient. Assuming, of course, we can efficiently and correctly measure taste from the rating list, and rank user-taste similarity against each other.

If we can't, then we're fucked, no matter how good the dataset is. At best, all we can do is say, return whatever's currently most popular. (and if this is a community of fantastic taste, then it's an acceptable practical result, but it's not a recommendation system).

Just to be clear, we're taking a rating list as input, and returning a list of recommended anime.


I wonder if something simple like this would work.

Find set of all the people who like at least one anime that you like.
To each person in this set assign number of points equal to number of anime that you both like and set of anime that they like but you do not have listed. If you have list of anime that you dislike then purge those from set. If you have list of anime that other people dislike then give them negative point for each anime that you like and they dislike.
Find a sum of points for each anime by going trough the set of people and adding number of points that they have to each anime on the list of anime that they like.
Select top 10 anime with most points.

From what I can remember and referring to information retrieval (document search), the basic thing to do is to generate an inverted index, a hash table mapping
anime -> [(user, value)]
where user-value is how much the user weights that anime for

(it's called inverted because you intuitively start with a mapping of user -> [anime] )

the weighting being something along the lines of rarity*rating
where rarity is how rare the anime is amongst the entire corpus.

ie an anime that only appears in your list, and one other person's list, is significant, even if you hate it and I loved it. That we're even on the same page implies similar taste in genres. But the presence of naruto is irrelevant; everyone has it, and everyone has an opinion on it. And most importantly, because everyone has it, it's not very useful as a filter.

and then rating would have to be normalized: you give 10's to 20/100 anime in your list, so if the anime in question is rated a 10, it's unclear how valuable it really is. but I have one 10 in my 1000 of anime; it's likely highly indicative of my taste.

so then what you do when the user provides an animelist

1) compute the metric for each anime in his list
2) select the top-N anime (ideally stripping out the ones that are both too-popular, and too-low rated)
3) for each of those anime, select the top-M users
4) group the users, sum their total scores, and select the top X-users with the highest score
5) for each of those X users, take the top-Z anime and return the set. Ordered by the same metric perhaps.

so you have an N*M calculation for every input, instead of the full set of 4M, and there's not much real work going on; the biggest chunk of work is the index, but that's precomputed. Every future request is just reading the hash table.

This is still too simple of course: the weight applied to rarity and rating, and selecting values for N, M, X, Z are all likely to fail especially in edge cases. We're also ignoring information given from the set disliked anime, and I'm not entirely clear on how to normalize the rating vector. (number-of-x * x) applied to the vector is probably incredibly dumb.

One paper I skimmed through made the interesting note that people don't really use the rating ranks similarly, but they generally agree on the meaning of three numbers: 1, 5 and 10. Anything above 5 is generally "liked", and below is "disliked"; its just unclear how big a distance 5->6 and 8->9 is.

There's too many subjective components to this particular algorithm that I don't like. I'm sure there are much better solutions out there, but then, that's what I'm procrastinating on. Reading the damn papers.

It's also a task that is likely trivially transformed into an ML problem. MAL ratings is the training set, and for each animelist split it randomly 70/30. The ML alg gets the 70% as input, and has to predict the latter 30%. But I don't know jack shit about ML so I'll probably stick to classical recommender systems when I finally bother doing the research

It's also unclear to me how to check if the algorithm is useful to anyone; users giving manual feedback isn't going to happen, but I'm not going to try to do something absurd like track changes on the MAL account to see if they watched and liked an anime I recommended them.

I should be flashing my switch with brocade turboiron to get actually functioning STP and I should be learning C to continue working on my message via SSH project, but here I am. Reading fullchan and distrohopping.

Attached: 1520490509123-g.png (2880x1800, 2.75M)

Attached: 1402056196212.jpg (1490x1188, 321.58K)

yes, there obviously *is* a pattern, but you have to check each number if it fits the pattern or not and that's what fizzbuzz algorithms do.

This is Zig Forums. We use Linux/BSD/TempleOS. Lurk moar or GTFO

AVL tree, Red-Black tree and Splay tree.

The commits are where the real gold lies.

Haha whoops thanks for reminding me user I meant to start that one a few decades back

I don't remember exactly. I recall doing a graph using tags for the strength of the edges (so for example if Cowboy Bebop and Space Dandy both have 'space' and 'cowboy' there's an edge with a weight of 2 between them, but 1 between Cowboy Bebop and Trigun since the later doesn't have 'space'). One problem was the program I used for the graph sucked donkey balls and I didn't have enough memory to render it (on the other hand it had many clustering algorithms to plot it nicely), another problem was that tags don't actually have the same weights (the 'cowboy' tag for Bebop for example describes it better than a 'has_a_dog' tag). I also looked into pybrain, but stopped once I couldn't easily get the data for individuals (you have to scrape some website, but they often have low limits in the number of requests, etc, it gets bothersome). Btw, if you want to play with the recommendation engine first and do the data gathering later you can use last.fm's dataset, although I'm not sure if it has ratings instead of just what was played by whom.

Attached: serveimage.jpg (2048x1025, 805.83K)

You see this?
This is almost finished 3d renderer, based on github.com/ssloy/tinyrenderer tutorial.
The twist is I am making it in 9front, and I use .stl files instead of whatever tutorial suggests.
Since I only making .stl viewer here, I don't care about textures and shaders and such, so only thing left to do is matrix transformation for camera movement, and I even have most of the matrix math already done.
But I keep staring at the editor window for almost a week now, and didn't type in a single line.

There're also some Z-buffer issues to deal with, but I think I need working camera to understand them anyway

Attached: stlview.png (1224x645, 42.35K)

I'm thinking about patching mawk to add some non standard features like:
- real exit (abort, if you want)
- per file BEGIN/END rules/sections
- cut-like field reference; $1..NF, for example
- split into $0
- feed the output of a cmd to the corresponding section from its BEGIN
- set -eu equivalent
- peek to get a line without consuming it
- maybe some useful gawk stuff like @include/@load and sorting
- backreferences in regexp? Sounds pretty hard

Wait, didn't the front fall off?

Taking a short break from making the indirect command rendering from OpenGL work.
The intermediate glDrawArrays, glDrawArraysInstanceBaseInstance work. So it's something with the structure I'm sending that's wrong but I can't seem to find where yet.

I actually already scraped all of MAL, some 4 mil users and their rating list, which ended up like 40GB in the database ( [id, anime], [id, username], [userid, animeid, rating] was essentially all I stored). I just spawned 5 AWS boxes to do the scraping over a couple weeks, iirc costing me $15 after exceeding the free tier limit for the month.

I was considering also scraping and merging something like anidb, because I figure their genre classification is probably more detailed and consistent than MAL's, if I ever make use of genres anyways

I'm not too sure I like genre classification though, at least, I wouldn't like it to be terribly high weighted, since I think the ideal is to find someone who shares your taste and anime you know nothing about. ie if you and I both like the same mecha anime, but I also like a bunch of little girl anime, there's a good chance you'll also like that anime. Maybe cluster on genre and rating, but do something on the recommendation-generation side (ie when selecting from nearest-neighbor's list, favor alternative genres)

I suppose it depends on the user's goals. If they want to explore more anime they might like in general, or more in the vein they're already exploring. If that decision can be kept to only the last step, then I could probably just have it as a flag without affecting the precomputed stuff

As a side note, 15 users have over 12,000 anime on their list. Most are something like 600 watched, 11,000 plan to watch, 400 dropped. But this user myanimelist.net/profile/Reiterz is particularly interesting. He has 13,282 anime total; 12,760 of those are dropped. I'm not sure if he has godly taste or not.

(i'll probably drop the top 25% / bottom 25% of the counts anyways; 40GB is already too much fucking data, and I figure someone with 12k anime is probably just watching anything and everything; if he has any taste, it's not likely to significantly show from his animelist.)

just to be clear, the 5 boxes over 2 weeks was specifically because MAL has ip-based rate limiting. Used Python Scrapy to handle it, which has a convenient auto-throttle flag to handle rate limiting. Just had to fiddle with the numbers till it stopped 404'ing me. Also conveniently, scrapy auto-retries 404'd links.

They send back a fucking 404 instead of a proper 429 retry-after?


iirc, yes

Also of note
myanimelist.net/users.php?q=&agelow=0&agehigh=&g=4&show=40000 is the query to fetch all users, with each page returning 24 users, and the SHOW variable setting the initial starting point for the page.

They seem to have stopped doing so now, but at least a few months ago, SHOW was clearly being plugged into an OFFSET statement, presumably in mysql; somewhere around 3 million, each page was taking a good few minutes to fetch.

There's also no proper way to fetch users from that webpage without applying some kind of filter (agelow, agehigh, location or gender). At least one field is required server-side, or it just returns you a default page. But apparently setting the age filter to 0 < x < 0 somehow evaluates to true for all rows (assuming ~4M is the total number of users). I'm still not sure what kind of logic could have resulted in true

And one more interesting thing is that apparently MAL has added another ~600,000 users since I scraped.. that's some damn surprising growth

but yeah no MAL is not a well built site

but its got data

Is it that big even compressed?
Good idea, you can't make much use of someone with 3 anime or 12k. At best the 3 one could be used to confirm likes(a) -> likes(b).

you all suck

int three = 0;
int five = 0;
int num = 0;
int gg = 0;
for(int i = 1;i < 100;i++){
if(three == 3){
three = 0;
if(five == 5){
five = 0;
printf("%d", num);
gg = 0;

not sure what it'd be fully compressed, if you mean like archival quality compression, but for the DB I haven't put any real effort into making the schema more efficiently typed; ie it's all postgres integer and text, so variable length storage on everything

there's also a few additional meta-data columns more that I didn't note earlier, ie the [userid, animeid, score] table is actually [userid, animeid, watched_episodes, score, status, rewatching, rewatching_ep, last_updated, tags]

which were fields in MAL's XML that I thought might be useful, but for the most part they're rarely used by users, and it's all user shit. out of that list of columns, the only additional field worth keeping is status, which denotes the states { WATCHED, PLAN-TO-WATCH, DROPPED }.

So in the biggest table, I've basically got 2x the number of fields that I should, and I've assigned variable-length integers to columns that have pretty explicit length requirements (ie rating is 0 to 10).

So 40GB isn't really a fair value; Culling the crap columns alone would probably take me down to 20-30 GB.