Zero terminated "strings"

Question

Zero terminated "strings"

Anthony Adams

blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html

When will this stupidity finally come to an end?

Even in C, it's absolutely possible to use a struct with pointer and length, and add a library with replacements for the functions which worked with zero terminated strings.

Why would anyone still use zero terminated "strings"? They make no fucking sense, almost the worst idea ever.

April 10, 2018 - 11:43

Zachary Nelson

But user, null terminated strings take up less space and are the same format regardless of CPU architecture.

April 10, 2018 - 11:48

Julian Torres

they don't.
3 bytes difference is irrelevant for long strings.
if you want to make short strings small, then there is such a thing as variable length coding for integers, but this would be so little gain, it's not worth it.
CPU architecture is irrelevant, the pointer size would change in both cases. and the internal represenation of the length doesn't mean jack shit anyway.
also zero terminated strings are less efficient because calculating length is O(n).

April 10, 2018 - 11:51

Hunter Jones

Ah so that 8 byte size prefix is not a waste of space then?

Yeah we will just do a big num hack to size our strings! Good idea.

April 10, 2018 - 11:55

Aiden Richardson

What if my string is more than 4,294,967,295 characters!

April 10, 2018 - 11:55

Justin Price

No one would ever need to store a file more than 4 gigabytes. We don't need to design our system to handle that.

April 10, 2018 - 11:56

Thomas Jenkins

You know windows internally uses sized strings. Look how well that turned out.

April 10, 2018 - 11:58

Joshua Jackson

4 bytes are enough for strings for all practical purposes. if you have more, obviously it's time to use specialized data structures anyway.
it's not a waste of space, it's only 3 bytes more than the terminating null byte. which is about 1-2 characters on average if you use UTF-8, and less than 1 character if you use fixed size Unicode.
unless you store single characters in strings, this doesn't fucking matter, and it's a lot better than to turn all code dealing with strings into a potential minefield + sacrificing run time where you can't reuse length information for some reason.

it's only big if your brain is small. it's not gonna be used in the most of application level code.
anyway, in the same sentence I also said that the gain is minimal so it is not worth it. learn to read. still, that would be still better than zero terminated strings.

April 10, 2018 - 12:00

Jayden Cooper

which exactly of windows problems are a consequence of this?

April 10, 2018 - 12:01

Brayden Price

a few words escaped, fixed

April 10, 2018 - 12:02

Jaxon Wood

With any kind of input you usually don't even know the string length beforehand, so some kind of string termination is necessary.

and you don't always need to know the string length either

April 10, 2018 - 12:04

Jonathan Powell

I agree, no one could possibly need a hard drive more than 16 megabytes. A 200Mhz CPU is blazing fast.

For 32bit max length strings. For small strings especially its a waste of space.

Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.

Well thats one particular operation

Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.

Idk its proprietary they only let us know so much

April 10, 2018 - 12:04

Eli Hill

if you read a file that's as big, and need to keep all of the content in RAM for some reason, you aren't using a fucking string type for it. it will be almost useless
in this form anyway.

a legit use case, please.
also keep in mind that scanning 4 GiB for the zero byte would take really a lot of time.

April 10, 2018 - 12:04

Lucas Ramirez

So no one reads files into char* then? lol

April 10, 2018 - 12:06

Camden Wood

I'm saying no one will re-implement it.
And using bignum already implemented in a library doesn't make shit any more complex.

when it is used somewhere and turns an O(n) algorithm into O(n^2) that would be a big deal and a PITA to fix.

For most use cases it's a bullshit reason. Byte order only matters for data exchange — files and network. Files do not need to store their size, their size is known. So prefixing size in files would be simply excessive, just as adding zero byte to the end. On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).
But we are talking about in-memory representation. And you know, you don't swap a fucking CPU on a running machine while keeping RAM and CPU cache and registers, etc.

April 10, 2018 - 12:11

Christian Ramirez

Meanwhile people are happily making programs in javascript where every variable is some abominable super object.

April 10, 2018 - 12:11

Dylan Gutierrez

char* doesn't say anything about whether the referenced content is zero terminated.
it's just a fucking pointer.

April 10, 2018 - 12:12

Ryder Perry

You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA

Like uhhhh text files, websites, spreadsheets, literally everything.

And this is a bad thing

Not for strings where the smallest representation is null terminated.

April 10, 2018 - 12:16

Kayden Hughes

you never wrote software which was able to do that, obviously.

and? they don't use zero terminated strings.
and we are talking about in-memory representation.

HAHAHA, go program some stuff without libc.

April 10, 2018 - 12:20

Oliver Peterson

Of course. My point was that 3 bytes on a string is nothing. It's the difference between "Hello world!" and "Hello world!gay". You're more likely to waste bytes on shit software design than a structured string.

April 10, 2018 - 12:21

Josiah Watson

String copy is like 5 lines of code to write yourself, you want to add on a bunch of big num shit to make it 100.

You think file formats don't use null termination literally everywhere?

Its really fucking easy, you just mmap it into memory and start reading. The OS will load and unload the pages for you.

People wasting space at the high level is no excuse to waste shit at the low level. Even worse wasting space at the low level is going to make all that high level shit even shittier.

April 10, 2018 - 12:31

Parker Bailey

Yes.

Then you aren't using any null termination.

April 10, 2018 - 12:38

Austin Morris

You're free to null terminate your own esoteric string use cases if you really need to squeeze the fuck out of every byte of memory, but the chances that you do are probably 0. I don't think you understand how meaningless those 3 bytes really are in this case.

April 10, 2018 - 12:40

Kayden Hernandez

combining that with the other text, it's pretty much obvious that he doesn't.

April 10, 2018 - 12:41

Julian Bailey

mmap is not dependent on the file type you dolt

No reason to optimize! Not like computers process hundreds of billions of strings a day! That would not be beneficial at all!

April 10, 2018 - 12:41

Carson Reyes

which retarded file format are you using which has zero termination at the end?

April 10, 2018 - 12:47

Isaiah Myers

Null-terminated strings have the advantage of behaving extremely well with recursive function, as constructing a suffix substring is just a matter of incrementing a pointer.
Think about how Lisp handles list: they are chained cons cells, with the last cons having an empty (null) list as its CDR. This is how you work with recursivity in Lisp.
The fact that you think null-terminated strings are useless prove you have still much to learn, youngling.

Wrong. First of all, a size_t variable in a 64 bits environment is 64 bits, so we're talking about 7 extra byes. Second of all, those bytes are nothing on a text, but when you have a plethora of max 10 characters strings (quite frequent), with 32 bits system it increases their size by a 30% factor, on 64 bits system, it's 70% of memory usage increase.
Imagine storing an associative array that uses, as keys, 5-char strings with 8 byte size variable. That's 13 bytes per key. With null-terminated strings, it's 6 bytes. That's some 118% more space used by the size+string solution.

April 10, 2018 - 12:58

Evan Nguyen

The absolute state of C idiots in this board, everyone.

April 10, 2018 - 12:59

Eli Powell

Go back to your javascript bullshit. Clearly you don't care about efficiency and interchangeability.

April 10, 2018 - 13:01

Elijah Morales

Do you know that EOF is an integer value, not an unsigned char, that most strings in a system are very small, that strlen() is rarely needed, and that you can still store a string's length in C, while not losing the advantages of the NULL termination?

April 10, 2018 - 13:01

Justin Miller

You mean Unix.

April 10, 2018 - 13:01

Lucas Barnes

the virgin FOReskinvoid map(char *str, size_t str_len, char (func*)(char)) { for (size_t i = 0; i < str_len; ++i) { str[i] = func(str[i]); }}

The Chad Elegant Recursion void map(char* str, char (func*)(char)) { if (str) { *str = func(*str); map(str, func); }}
[/code]

April 10, 2018 - 13:09

Hudson Hughes

should be map(str++, func), LOL

April 10, 2018 - 13:11

Eli Morales

This is why you don't use languages without proper tail recursion.

April 10, 2018 - 13:13

Charles Evans

Remove yourself from premises.

April 10, 2018 - 13:16

Jeremiah Hall

Null-terminated strings suck. C weenies defend it because that's what C uses. Common Lisp strings are arrays, and they can be adjustable (grow and shrink) and have a fill pointer (anything less than it is the currently used part). This covers all the uses of dynamically sized strings, length-prefixed strings, and fixed-length strings. Lisp strings are arrays, so all arrays can have these properties.

Bullshit. You always need to know the length. If you really added up all the waste from C and UNIX "comparing characters to zero and adding one to pointers", it would be more efficient to have GC and dynamic typing and store files as arrays of strings. I'm not kidding. C malloc overhead is huge too, but on a Lisp machine, allocating a list only uses one word of memory per element. Allocating a 1D array only uses one header word to store the actual length of the array (which malloc has to do too, but it doesn't provide useful information to you) followed by the words for the array data. Lisp machine overhead is much smaller than C overhead, and the GC compacts to eliminate memory fragmentation.

That's because C sucks. malloc in C has more than 3 bytes of waste. JavaScript is a better language than C even though it sucks too.

You're going to read an entire 10 GB file into memory (not memory mapping) and stick a 0 byte on the end, but you think an 8 byte length is wasteful? I have no idea why anyone would do things like that.

> Subject: More On Compiler Jibberings... > > ...> There's nothing wrong with C as it was originally > designed,> ...bullshite.Since when is it acceptable for a language to incorporatetwo entirely diverse concepts such as setf and cadr into thesame operator (=), the sole semantic distinction being thatif you mean cadr and not setf, you have to bracket yourvariable with the characters that are used to representswearing in cartoons? Or do you have to do that if you meansetf, not cadr? Sigh.Wouldn't hurt to have an error handling hook, real memoryallocation (and garbage collection) routines, real datatypes with machine independent sizes (and string data typesthat don't barf if you have a NUL in them), reasonableequality testing for all types of variables without havingto call some heinous library routine like strncmp,and... and... and... Sheesh.I've always loved the "elevator controller" paradigm,because C is well suited to programming embedded controllersand not much else. Not that I'd knowingly risk my life inan elevator that was controlled by a program written in C,mind you...

And what can you say about a language which is largely usedfor processing strings (how much time does Unix spendcomparing characters to zero and adding one to pointers?)but which has no string data type? Can't decide if an arrayis an aggregate or an address? Doesn't know if strings areconstants or variables? Allows them as initializerssometimes but not others?(I realize this does not really address the original topic,but who really cares. "There's nothing wrong with C as itwas originally designed" is a dangerously positive sweepingstatement to be found in a message posted to this list.)

April 10, 2018 - 13:17

Luke Sullivan

dropped

April 10, 2018 - 13:18

Angel Stewart

citation needed
what about uint32_t?

April 10, 2018 - 13:19

Anthony Price

...

April 10, 2018 - 13:19

Adam Hughes

lol, what a retarded syntax

April 10, 2018 - 13:21

Henry Nelson

I bet you are the type of larper that gets autistic about where parenthesis are placed or using spaces vs tabs wasting all our fucking time.

April 10, 2018 - 13:22

Julian Fisher

...

April 10, 2018 - 13:23

Nathaniel Cook

...

April 10, 2018 - 13:24

Josiah Myers

Who said anything about keeping it in RAM? You can process something without it being in RAM. Every heard of streams? Every heard of memory mapped files? Guess not lol.

April 10, 2018 - 13:24

Thomas Lopez

And there's absolutely nothing wrong with that. Having been in both worlds, its such a pleasure to write software in the more abstract languages.

April 10, 2018 - 13:25

Luis Lopez

not.
it's a lot less clear than `char -> char` for example, or even `Function`.

try to spell (in C) a type of a variable which is a function which takes a char and returns a function which returns a function which returns a function which returns a char, for example.

if you read from file, you already know the size, because files have size. adding 1 useless byte is useless and stupid.

April 10, 2018 - 13:26

Ian Gonzalez

Javascript is the C of high level languages

April 10, 2018 - 13:26

Austin Brooks

People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.
Did you know SQL databases solved this ages ago with fixed size char fields, variable size char fields and text fields? Fuck, we could solve this the same way we solved numeric types of different sizes, with short strings, regular strings, long strings, etc.

April 10, 2018 - 13:26

Connor Wright

That alone is a reason to make it better, but it's also clearer, the function takes one less argument, and it doesn't need to push a new variable onto the stack.

While it is true that ANSI C says nothing about tail call recursion, GCC does it.

Mr. Common Lisp[1] here apparently does not understand the value of a null terminator in a linear collection of elements (like a string), even though it is the principle upon which cons cell lists are constructed.

[1] yuck!

Look up any software, and see how long most string are.
Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.

How would you do it, Mr. Smart Man?

April 10, 2018 - 13:26

Evan Richardson

...

April 10, 2018 - 13:27

Jeremiah Morgan

I bet you think a linked list is good too because it doesn't need an iterator variable to loop through.

April 10, 2018 - 13:28

Parker Diaz

People ITT don't know that null terminated strings are used in file formats all the time.

April 10, 2018 - 13:28

Charles Sanders

ftfy
although C is crap too, so… not a big difference after all.

April 10, 2018 - 13:28

Robert Williams

Go write in javascript faggot, its where you belong.

April 10, 2018 - 13:28

Elijah Bell

Thats the point dingus

April 10, 2018 - 13:28

Justin Ramirez

Made my day.

Attached: 890ef04edd83d7afef978d973b3387485b768e1190b40d63ed7f80ff0a334ad6.jpg (670x503, 30.94K)

April 10, 2018 - 13:29

Sebastian Bailey

who told you so?
>what is registers?

April 10, 2018 - 13:29

Carson Foster

and the most widely used example is … ?

April 10, 2018 - 13:30

Anthony Brooks

so many newfag CS undergrads ITT smh

Attached: muammar-al-qaddafi-39014-1-402.jpg (1200x1200, 157.55K)

April 10, 2018 - 13:31

Jayden Diaz

Older standards than C99 belong to the garbage bin.

April 10, 2018 - 13:32

Logan White

Your beloved C does size promotion all the time. Fuck, getchar(), which is used to read a single character from a file, which is about as wasteful of a function as it gets, performs promotions with every single call. And it's negligible.

Really, fuck off. You don't even want assembly, your autism should only allow you to use ASICs that waste zero cycles at all.

April 10, 2018 - 13:33

Jose Martin

I hate C, I just like null termination.

April 10, 2018 - 13:34

Zachary Hill

SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.
>

April 10, 2018 - 13:36

William Mitchell

Well tough shit then…

April 10, 2018 - 13:37

Jonathan Hernandez

Tough shit for you, attacking a strawman this whole time.

April 10, 2018 - 13:38

Zachary Torres

You sure you're not projecting m8?

Attached: 6193c1121b574f696e20d8f671f87bc34967b1e104951ade17ca3fe87f5191d.jpg (310x349, 32.53K)

April 10, 2018 - 13:39

Jacob Anderson

How many years of programming do you have on your CV, again?

April 10, 2018 - 13:40

Thomas Wright

Linked lists waste all that space on pointers though, terrible cache properties, jumping around to different pages all the time. Big O time complexity has little to do with the real world when we are dominated by the size of N.

April 10, 2018 - 13:41

Grayson Cooper

(checked)
Is an array of pointers that get reallocated all the time a better solution when the list is not changing often?

April 10, 2018 - 13:43

Leo White

Well I've never heard of a situation where a linked list is the best solution, so here's your chance to educate me.

April 10, 2018 - 13:44

Lincoln Smith

That's where you're wrong, kiddo. C99 is one of the worst standards to come, and everyone in the industry uses C95 exclusively.

Attached: f675d1bd03c83ec210e8900cb1e3c8a3.jpg (905x1280, 117.17K)

April 10, 2018 - 13:44

Jaxson Scott

If that array of pointers fits within a few pages then its absolutely faster compared to chasing down pages wherever they get allocated.

April 10, 2018 - 13:45

Hunter Lopez

...

April 10, 2018 - 13:47

Adam Ramirez

Thx. Is this the best solution for small lists? Is there a special list type you'd recommend?

April 10, 2018 - 13:48

Isaac Sanchez

Have you ever benchmarked this shit on a relatively modern computers?

April 10, 2018 - 13:49

Kayden Long

You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive. When the size of your vector remains constant, or is changed very little, using a vector is better.
You wouldn't cut a steak with a wood saw, or cut a plank with a steak knife. Two different tools serve two different purposes, and so do two different data structures.

April 10, 2018 - 13:49

Hunter Long

Not an argument

April 10, 2018 - 13:49

Ayden Williams

(OP)
That vulnerability mentioned in that blog post is developer error. The function takes in a string, but you pass in a byte array. Why would you expect it to work? If you pass in the wrong type of variable then of course it might not work right.

April 10, 2018 - 13:50

Brandon Myers

Any evidence?

April 10, 2018 - 13:51

Cameron Watson

Thats just it, its not extremely expensive. It have a expensive big O cost, but almost every benchmark will show that vectors are faster. This is because cache pages exist. The cache changes how all of this works.

April 10, 2018 - 13:51

Zachary Cruz

Look your CS 101 data structures class using big O notation is not an accurate description of how caches work.

April 10, 2018 - 13:52

Matthew Richardson

in C, char* is also used for byte arrays.
this is a programmer error, but it could be prevented if the design of the language and the stdlib was less shit.
programmers will always make some errors, but some of them can be prevented entirely as a class.

April 10, 2018 - 13:52

John Rogers

No. But performance always takes priority.
And I think we should listen to

's practical advice and not some stupid theory developed by java shitcoders at some university.

April 10, 2018 - 13:53

Jose Thomas

it doesn't.
amortized cost of adding an item is still O(1).

April 10, 2018 - 13:53

Nolan Garcia

Adding an item to the middle of a vector is not amortized to O(1).

April 10, 2018 - 13:54

Wyatt Butler

You can change how often a vector reallocates itself, but really, the default behavior is sufficient for most implementations.

April 10, 2018 - 13:54

Dominic Watson

neither in the linked list if you need first to find a place where to insert — you'll need O(n) traversal first.

April 10, 2018 - 13:55

Hudson Hernandez

Yours neither, loser. You literally made a bold statement without backing it, or providing proof. Your nodev ass can't even write a reverse polish calculator, LOL.

April 10, 2018 - 13:56

Andrew Walker

Again you keep using all these fucking big O notation when talking about the speed of these datastructures. The real world does not follow big O. Iterating over a vector thats all in one page is thousands of times faster than jumping between pages where linked list nodes are allocated despite the same time complexity.

April 10, 2018 - 13:56

Eli Roberts

Yup. This is why compiler warnings exist when you try to do implicit conversion, and this is why Apps Hungarian Notation is useful.

1st year CS theory that you ought to know if you want to be taken seriously here.

That use cases where vectors are indeed better.

Do you think data structures stop existing outside of RAM?

Filesystems make extensive use of linked data structures.

In the middle, or anywhere besides the end. Dynamic vectors can be used somewhat effectively as stacks because of that, but that's about it.

L M A O
M M
A A
O O

April 10, 2018 - 13:57

Wyatt Ortiz

I can write even infix calculator without any problem.
I actually wrote a compiler for a simple language and a lot of other shit too. Fix your detector.

April 10, 2018 - 13:57

Robert Nguyen

You're still claiming shit you've never done, and don't provide proof.
>>>/reddit/

April 10, 2018 - 13:58

Joseph White

Look here retard. Iterating over a list and vector have the same big O cost. In the case of an actual list though you will be chasing down pointers in different pages. big O does not at all model this cast. If you knew more about CS theory than an undergrad simpleton you would understand this.

April 10, 2018 - 13:58

Andrew Torres

oh shit watch out there's a troll in here.

April 10, 2018 - 13:59

Justin Miller

Insert a new value at the head of a 10 million records vector.
Now do it at the head of a 10 million records linked list.
Come back and tell everyone how it went.

April 10, 2018 - 14:00

Ethan Reyes

When you make claims based on your invalid mental model of the modern computing hardware, of course you need to prove your bullshit to be taken seriously.

Lol, are you a brainlet or what?

For different reasons altogether.
We are talking about in-memory data structures.

April 10, 2018 - 14:00

Ian Bell

...

April 10, 2018 - 14:00

Easton Martin

if you need to insert at head, you use deque and not vector.
for deque, this is not a problem at all and it will be faster than linked list (amortized)

April 10, 2018 - 14:01

Gavin James

don't worry I got him right here:

April 10, 2018 - 14:02

Ayden Barnes

forgot pic

Attached: попался толстяк.jpg (680x491, 78.17K)

April 10, 2018 - 14:03

1 2 ... 10 Next

Zero terminated "strings"

Last threads