Zero terminated "strings"

blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html

When will this stupidity finally come to an end?

Even in C, it's absolutely possible to use a struct with pointer and length, and add a library with replacements for the functions which worked with zero terminated strings.

Why would anyone still use zero terminated "strings"? They make no fucking sense, almost the worst idea ever.

But user, null terminated strings take up less space and are the same format regardless of CPU architecture.

they don't.
3 bytes difference is irrelevant for long strings.
if you want to make short strings small, then there is such a thing as variable length coding for integers, but this would be so little gain, it's not worth it.
CPU architecture is irrelevant, the pointer size would change in both cases. and the internal represenation of the length doesn't mean jack shit anyway.
also zero terminated strings are less efficient because calculating length is O(n).

Ah so that 8 byte size prefix is not a waste of space then?

Yeah we will just do a big num hack to size our strings! Good idea.

What if my string is more than 4,294,967,295 characters!

No one would ever need to store a file more than 4 gigabytes. We don't need to design our system to handle that.

You know windows internally uses sized strings. Look how well that turned out.

4 bytes are enough for strings for all practical purposes. if you have more, obviously it's time to use specialized data structures anyway.
it's not a waste of space, it's only 3 bytes more than the terminating null byte. which is about 1-2 characters on average if you use UTF-8, and less than 1 character if you use fixed size Unicode.
unless you store single characters in strings, this doesn't fucking matter, and it's a lot better than to turn all code dealing with strings into a potential minefield + sacrificing run time where you can't reuse length information for some reason.


it's only big if your brain is small. it's not gonna be used in the most of application level code.
anyway, in the same sentence I also said that the gain is minimal so it is not worth it. learn to read. still, that would be still better than zero terminated strings.

which exactly of windows problems are a consequence of this?

a few words escaped, fixed

With any kind of input you usually don't even know the string length beforehand, so some kind of string termination is necessary.

and you don't always need to know the string length either

I agree, no one could possibly need a hard drive more than 16 megabytes. A 200Mhz CPU is blazing fast.

For 32bit max length strings. For small strings especially its a waste of space.

Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.

Well thats one particular operation

Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.


Idk its proprietary they only let us know so much

if you read a file that's as big, and need to keep all of the content in RAM for some reason, you aren't using a fucking string type for it. it will be almost useless
in this form anyway.


a legit use case, please.
also keep in mind that scanning 4 GiB for the zero byte would take really a lot of time.

So no one reads files into char* then? lol

I'm saying no one will re-implement it.
And using bignum already implemented in a library doesn't make shit any more complex.

when it is used somewhere and turns an O(n) algorithm into O(n^2) that would be a big deal and a PITA to fix.

For most use cases it's a bullshit reason. Byte order only matters for data exchange — files and network. Files do not need to store their size, their size is known. So prefixing size in files would be simply excessive, just as adding zero byte to the end. On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).
But we are talking about in-memory representation. And you know, you don't swap a fucking CPU on a running machine while keeping RAM and CPU cache and registers, etc.

Meanwhile people are happily making programs in javascript where every variable is some abominable super object.

char* doesn't say anything about whether the referenced content is zero terminated.
it's just a fucking pointer.

You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA

Like uhhhh text files, websites, spreadsheets, literally everything.

And this is a bad thing

Not for strings where the smallest representation is null terminated.

you never wrote software which was able to do that, obviously.

and? they don't use zero terminated strings.
and we are talking about in-memory representation.

HAHAHA, go program some stuff without libc.

Of course. My point was that 3 bytes on a string is nothing. It's the difference between "Hello world!" and "Hello world!gay". You're more likely to waste bytes on shit software design than a structured string.

String copy is like 5 lines of code to write yourself, you want to add on a bunch of big num shit to make it 100.

You think file formats don't use null termination literally everywhere?

Its really fucking easy, you just mmap it into memory and start reading. The OS will load and unload the pages for you.

People wasting space at the high level is no excuse to waste shit at the low level. Even worse wasting space at the low level is going to make all that high level shit even shittier.

Yes.

Then you aren't using any null termination.

You're free to null terminate your own esoteric string use cases if you really need to squeeze the fuck out of every byte of memory, but the chances that you do are probably 0. I don't think you understand how meaningless those 3 bytes really are in this case.

combining that with the other text, it's pretty much obvious that he doesn't.

mmap is not dependent on the file type you dolt

No reason to optimize! Not like computers process hundreds of billions of strings a day! That would not be beneficial at all!

which retarded file format are you using which has zero termination at the end?

Null-terminated strings have the advantage of behaving extremely well with recursive function, as constructing a suffix substring is just a matter of incrementing a pointer.
Think about how Lisp handles list: they are chained cons cells, with the last cons having an empty (null) list as its CDR. This is how you work with recursivity in Lisp.
The fact that you think null-terminated strings are useless prove you have still much to learn, youngling.


Wrong. First of all, a size_t variable in a 64 bits environment is 64 bits, so we're talking about 7 extra byes. Second of all, those bytes are nothing on a text, but when you have a plethora of max 10 characters strings (quite frequent), with 32 bits system it increases their size by a 30% factor, on 64 bits system, it's 70% of memory usage increase.
Imagine storing an associative array that uses, as keys, 5-char strings with 8 byte size variable. That's 13 bytes per key. With null-terminated strings, it's 6 bytes. That's some 118% more space used by the size+string solution.

The absolute state of C idiots in this board, everyone.

Go back to your javascript bullshit. Clearly you don't care about efficiency and interchangeability.

Do you know that EOF is an integer value, not an unsigned char, that most strings in a system are very small, that strlen() is rarely needed, and that you can still store a string's length in C, while not losing the advantages of the NULL termination?

You mean Unix.

the virgin FOReskinvoid map(char *str, size_t str_len, char (func*)(char)) { for (size_t i = 0; i < str_len; ++i) { str[i] = func(str[i]); }}

The Chad Elegant Recursion void map(char* str, char (func*)(char)) { if (str) { *str = func(*str); map(str, func); }}
[/code]

should be map(str++, func), LOL

This is why you don't use languages without proper tail recursion.

Remove yourself from premises.

Null-terminated strings suck. C weenies defend it because that's what C uses. Common Lisp strings are arrays, and they can be adjustable (grow and shrink) and have a fill pointer (anything less than it is the currently used part). This covers all the uses of dynamically sized strings, length-prefixed strings, and fixed-length strings. Lisp strings are arrays, so all arrays can have these properties.


Bullshit. You always need to know the length. If you really added up all the waste from C and UNIX "comparing characters to zero and adding one to pointers", it would be more efficient to have GC and dynamic typing and store files as arrays of strings. I'm not kidding. C malloc overhead is huge too, but on a Lisp machine, allocating a list only uses one word of memory per element. Allocating a 1D array only uses one header word to store the actual length of the array (which malloc has to do too, but it doesn't provide useful information to you) followed by the words for the array data. Lisp machine overhead is much smaller than C overhead, and the GC compacts to eliminate memory fragmentation.


That's because C sucks. malloc in C has more than 3 bytes of waste. JavaScript is a better language than C even though it sucks too.


You're going to read an entire 10 GB file into memory (not memory mapping) and stick a 0 byte on the end, but you think an 8 byte length is wasteful? I have no idea why anyone would do things like that.

> Subject: More On Compiler Jibberings... > > ...> There's nothing wrong with C as it was originally > designed,> ...bullshite.Since when is it acceptable for a language to incorporatetwo entirely diverse concepts such as setf and cadr into thesame operator (=), the sole semantic distinction being thatif you mean cadr and not setf, you have to bracket yourvariable with the characters that are used to representswearing in cartoons? Or do you have to do that if you meansetf, not cadr? Sigh.Wouldn't hurt to have an error handling hook, real memoryallocation (and garbage collection) routines, real datatypes with machine independent sizes (and string data typesthat don't barf if you have a NUL in them), reasonableequality testing for all types of variables without havingto call some heinous library routine like strncmp,and... and... and... Sheesh.I've always loved the "elevator controller" paradigm,because C is well suited to programming embedded controllersand not much else. Not that I'd knowingly risk my life inan elevator that was controlled by a program written in C,mind you...

And what can you say about a language which is largely usedfor processing strings (how much time does Unix spendcomparing characters to zero and adding one to pointers?)but which has no string data type? Can't decide if an arrayis an aggregate or an address? Doesn't know if strings areconstants or variables? Allows them as initializerssometimes but not others?(I realize this does not really address the original topic,but who really cares. "There's nothing wrong with C as itwas originally designed" is a dangerously positive sweepingstatement to be found in a message posted to this list.)

dropped

citation needed
what about uint32_t?

...

lol, what a retarded syntax

I bet you are the type of larper that gets autistic about where parenthesis are placed or using spaces vs tabs wasting all our fucking time.

...

...

Who said anything about keeping it in RAM? You can process something without it being in RAM. Every heard of streams? Every heard of memory mapped files? Guess not lol.

And there's absolutely nothing wrong with that. Having been in both worlds, its such a pleasure to write software in the more abstract languages.

not.
it's a lot less clear than `char -> char` for example, or even `Function`.

try to spell (in C) a type of a variable which is a function which takes a char and returns a function which returns a function which returns a function which returns a char, for example.


if you read from file, you already know the size, because files have size. adding 1 useless byte is useless and stupid.

Javascript is the C of high level languages

People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.
Did you know SQL databases solved this ages ago with fixed size char fields, variable size char fields and text fields? Fuck, we could solve this the same way we solved numeric types of different sizes, with short strings, regular strings, long strings, etc.

That alone is a reason to make it better, but it's also clearer, the function takes one less argument, and it doesn't need to push a new variable onto the stack.


While it is true that ANSI C says nothing about tail call recursion, GCC does it.


Mr. Common Lisp[1] here apparently does not understand the value of a null terminator in a linear collection of elements (like a string), even though it is the principle upon which cons cell lists are constructed.

[1] yuck!


Look up any software, and see how long most string are.
Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.


How would you do it, Mr. Smart Man?

...

I bet you think a linked list is good too because it doesn't need an iterator variable to loop through.

People ITT don't know that null terminated strings are used in file formats all the time.

ftfy
although C is crap too, so… not a big difference after all.

Go write in javascript faggot, its where you belong.

Thats the point dingus

Made my day.

Attached: 890ef04edd83d7afef978d973b3387485b768e1190b40d63ed7f80ff0a334ad6.jpg (670x503, 30.94K)

who told you so?
>what is registers?

and the most widely used example is … ?

so many newfag CS undergrads ITT smh

Attached: muammar-al-qaddafi-39014-1-402.jpg (1200x1200, 157.55K)

Older standards than C99 belong to the garbage bin.

Your beloved C does size promotion all the time. Fuck, getchar(), which is used to read a single character from a file, which is about as wasteful of a function as it gets, performs promotions with every single call. And it's negligible.

Really, fuck off. You don't even want assembly, your autism should only allow you to use ASICs that waste zero cycles at all.

I hate C, I just like null termination.

SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.
>

Well tough shit then…

Tough shit for you, attacking a strawman this whole time.

You sure you're not projecting m8?

Attached: 6193c1121b574f696e20d8f671f87bc34967b1e104951ade17ca3fe87f5191d.jpg (310x349, 32.53K)

How many years of programming do you have on your CV, again?

Linked lists waste all that space on pointers though, terrible cache properties, jumping around to different pages all the time. Big O time complexity has little to do with the real world when we are dominated by the size of N.

(checked)
Is an array of pointers that get reallocated all the time a better solution when the list is not changing often?

Well I've never heard of a situation where a linked list is the best solution, so here's your chance to educate me.

That's where you're wrong, kiddo. C99 is one of the worst standards to come, and everyone in the industry uses C95 exclusively.

Attached: f675d1bd03c83ec210e8900cb1e3c8a3.jpg (905x1280, 117.17K)

If that array of pointers fits within a few pages then its absolutely faster compared to chasing down pages wherever they get allocated.

...

Thx. Is this the best solution for small lists? Is there a special list type you'd recommend?

Have you ever benchmarked this shit on a relatively modern computers?

You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive. When the size of your vector remains constant, or is changed very little, using a vector is better.
You wouldn't cut a steak with a wood saw, or cut a plank with a steak knife. Two different tools serve two different purposes, and so do two different data structures.

Not an argument

(OP)
That vulnerability mentioned in that blog post is developer error. The function takes in a string, but you pass in a byte array. Why would you expect it to work? If you pass in the wrong type of variable then of course it might not work right.

Any evidence?

Thats just it, its not extremely expensive. It have a expensive big O cost, but almost every benchmark will show that vectors are faster. This is because cache pages exist. The cache changes how all of this works.

Look your CS 101 data structures class using big O notation is not an accurate description of how caches work.

in C, char* is also used for byte arrays.
this is a programmer error, but it could be prevented if the design of the language and the stdlib was less shit.
programmers will always make some errors, but some of them can be prevented entirely as a class.

No. But performance always takes priority.
And I think we should listen to

's practical advice and not some stupid theory developed by java shitcoders at some university.

it doesn't.
amortized cost of adding an item is still O(1).

Adding an item to the middle of a vector is not amortized to O(1).

You can change how often a vector reallocates itself, but really, the default behavior is sufficient for most implementations.

neither in the linked list if you need first to find a place where to insert — you'll need O(n) traversal first.

Yours neither, loser. You literally made a bold statement without backing it, or providing proof. Your nodev ass can't even write a reverse polish calculator, LOL.

Again you keep using all these fucking big O notation when talking about the speed of these datastructures. The real world does not follow big O. Iterating over a vector thats all in one page is thousands of times faster than jumping between pages where linked list nodes are allocated despite the same time complexity.

Yup. This is why compiler warnings exist when you try to do implicit conversion, and this is why Apps Hungarian Notation is useful.


1st year CS theory that you ought to know if you want to be taken seriously here.


That use cases where vectors are indeed better.


Do you think data structures stop existing outside of RAM?


Filesystems make extensive use of linked data structures.


In the middle, or anywhere besides the end. Dynamic vectors can be used somewhat effectively as stacks because of that, but that's about it.


L M A O
M M
A A
O O

I can write even infix calculator without any problem.
I actually wrote a compiler for a simple language and a lot of other shit too. Fix your detector.

You're still claiming shit you've never done, and don't provide proof.
>>>/reddit/

Look here retard. Iterating over a list and vector have the same big O cost. In the case of an actual list though you will be chasing down pointers in different pages. big O does not at all model this cast. If you knew more about CS theory than an undergrad simpleton you would understand this.

oh shit watch out there's a troll in here.

Insert a new value at the head of a 10 million records vector.
Now do it at the head of a 10 million records linked list.
Come back and tell everyone how it went.

When you make claims based on your invalid mental model of the modern computing hardware, of course you need to prove your bullshit to be taken seriously.

Lol, are you a brainlet or what?

For different reasons altogether.
We are talking about in-memory data structures.

...

if you need to insert at head, you use deque and not vector.
for deque, this is not a problem at all and it will be faster than linked list (amortized)

don't worry I got him right here:

forgot pic

Attached: попался толстяк.jpg (680x491, 78.17K)