Meltdown and Spectre

Eli Anderson
Eli Anderson

i am using an olde Intel C2D cpu and the recent Meltdown and Spectre patches made by Windows crippled the fuck out of it, if i had to guess i would say i lost between 20% and 30% on avreage, some softwares are now out of my reach, video recording/streaming pretty much impossible.

is there a CPU that isn't affected by this shit? did Intel fix the flaws in recetly designed CPUs ?

Attached: meltdone-and-spectre.jpg (138.57 KB, 1191x855)

Other urls found in this thread:

libreboot.org
github.com/speed47/spectre-meltdown-checker
rentry.co/7o6ey
warosu.org/g/thread/S67942994)
stedolan.net/research/mov.pdf).

Thomas Evans
Thomas Evans

libreboot.org

Xavier Stewart
Xavier Stewart

What computer model?

Ryan Murphy
Ryan Murphy

Ice Lake is supposed to have in silicon mitigations, whatever the fuck that means. Ryzen hasn't been hit as hard. Every architecture has been affected to some degree by this. The only safe CPUs have no speculative execution. It sounds like a no brainier to use one of these, but they're too old or slow as shit like the Raspberry Pi. You can disable mitigations in Linux as a grub option. I wouldn't do it, but if you don't care about security go for it.

Michael Mitchell
Michael Mitchell

patches
I wouldn't bother, what's the point choking your C2D to death for muh safety - just disable HT if you haven't

Attached: good-for-you.jpg (62.61 KB, 526x442)

Sebastian Lee
Sebastian Lee

what is this shit?
i'm using an Intel E7500
if thats your question
so what you're saying is that there is no "patched" cpu yet?

Angel Thomas
Angel Thomas

what is this shit?
lurk moar, or learn to searx
"patched" cpu
How the fuck are they supposed to "patch" a piece of hardware remotely? If you are talking about microcode, most modern x86 cpus are "patched".

Alternatively, use old shit cpus such as allwinner A20 with open source hardware

Jack Sullivan
Jack Sullivan

How the fuck are they supposed to "patch" a piece of hardware remotely?
who said anything about remotely, i was asking if the new designed cpu were fixed

Cameron Hall
Cameron Hall

There is no fix. They got caught ignoring security in favor of speed for nearly two decades.

Jonathan Cooper
Jonathan Cooper

so what you're saying is that there is no "patched" cpu yet?

In software, yes. In hardware, no, at least not in CPUs with speculative execution. I'm also skeptical of Intel's fix. There's another hardware problem in the form of rowhammer and rambleed. I'm not sure how you fix this beyond using older ram.

Jack Richardson
Jack Richardson

Since CPUs now pre-fetch data and then check later if you have credentials to look at it, and it is all hardware side, no patch will help you.
I think these CPUs are only 2007ish and later old

Brayden Nguyen
Brayden Nguyen

even playing HD videos became laggy

Colton James
Colton James

C2D were good enough, now we all have to upgrade because reasons

Attached: your-betters.jpg (65.79 KB, 848x424)

Julian Gonzalez
Julian Gonzalez

tfw Pentium D

Attached: 1282170980270.jpg (48.05 KB, 308x547)

Jordan Cook
Jordan Cook

luckily that doesn't fucking matter because process isolation never worked anyway (due to UNIX braindamage ,etc). just run everything as a single user or fuck off and use one machine per process

Bentley Mitchell
Bentley Mitchell

what is this shit?
i'm using an Intel E7500
if thats your question
No, you nigger, I asked what kind of computer you own, specifically so I could lookup the Coreboot or Libreboot compatibility. These softwares replace the Intel firmware almost entirely and mitigate a lot of these vulnerabilities. Also removes built in back doors.

Logan Sullivan
Logan Sullivan

Can disable speculative execution on a librebooted/corebooted computer? I thought libre/coreboot only applied to the BIOS and not wherever the fuck the CPU stores microcode.
The "patch" would be CPU manufacturers letting you disable speculative execution which will never happen. Speculative execution is the proccessor via some function executing differing paths that code could execute. So say you told the compiler
a = 2
b = 3
1 + a + b
Without speculative execution the proccessor would do this one thing every time which is deterministic
1 + 2 + 3
With speculative execution it executes all of these combinations every time barring some form of cache based on previous execution which just adds more complexity and waste of proccessing power, which wastes time and energy
1 + 2 + 3
1 + 3 + 2
2 + 3 + 1
3 + 2 + 1
3 + 1 + 2
This is in order to guess what the programmer actually wanted to do, it then feeds the result that (((makes the most sense))) back to wherever the data was to be stored. It does this shit for every instruction and is a huge speed decrease in exchange for devs being able to write shit code. Meltdown and specter abuse the results of speculative execution to do what they do.

Older CPU manufacturers fucked up and disabled it on older CPU's since it is a backdoor falsely in the name of speed. Except that disabling it ironically improves your performance since compilers are better then two decades ago and can path the instructions for the programmer without needing to guess at the proccessor level incase the compiler/programmer made an error. See ARM or powerpc speculative vs no speculative benchmarks for proof.

Ayden Brooks
Ayden Brooks

github.com/speed47/spectre-meltdown-checker

rentry.co/7o6ey

Kernel is Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64
CPU is Intel(R) Core(TM)2 CPU P8600 @ 2.40GHz

SUMMARY:
CVE-2017-5753:OK
CVE-2017-5715:OK
CVE-2017-5754:OK
CVE-2018-3640:KO
CVE-2018-3639:KO
CVE-2018-3615:OK
CVE-2018-3620:OK
CVE-2018-3646:OK
CVE-2018-12126:KO
CVE-2018-12130:KO
CVE-2018-12127:KO
CVE-2019-11091:KO

CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1'
CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2'
CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3'
CVE-2018-3640 [rogue system register read] aka 'Variant 3a'
CVE-2018-3639 [speculative store bypass] aka 'Variant 4'
CVE-2018-3615 [L1 terminal fault] aka 'Foreshadow (SGX)'
CVE-2018-3620 [L1 terminal fault] aka 'Foreshadow-NG (OS)'
CVE-2018-3646 [L1 terminal fault] aka 'Foreshadow-NG (VMM)'
CVE-2018-12126 [microarchitectural store buffer data sampling (MSBDS)] aka 'Fallout'
CVE-2018-12130 [microarchitectural fill buffer data sampling (MFBDS)] aka 'ZombieLoad'
CVE-2018-12127 [microarchitectural load port data sampling (MLPDS)] aka 'RIDL'
CVE-2019-11091 [microarchitectural data sampling uncacheable memory (MDSUM)] aka 'RIDL'

Nicholas Ramirez
Nicholas Ramirez

Imagine being this ignorant lol. What speculative execution does is let you execute more instructions at once than you could otherwise, even with the most perfect x86 code.
Without speculative execution you could only fill the pipeline up to a branch. With speculative execution the pipeline is filled with the instructions in the most probable branch, and if that wasn't the path that was actually taken, then the results from the extra useless instructions that were executed are ignored.
The security problems come from the fact that whether those branches are guessed correctly depends on what code was executed previously, which allows programs to deduce information they shouldn't have access to about the other programs running in the system (including kernel structures).

Julian Parker
Julian Parker

What speculative execution does is let you execute more instructions at once than you could otherwise
You are wrong, that is what ALU's are for. What you mean to say is that a schedular on the die creates latency in order to hide the complexity of branch prediction overhead. There's only so many logic units or FPU's that can take decoded instructions at a time. There's no getting around that physical limit in software.

Why waste die space on more complexity and heat with speculative execution and its neccessary components when you could add more ALU's to execute more at once?
you could only fill the pipeline up to a branch
Are you cnile? Why would you ever want control taken away from you? Let the compiler/programmer/you deal with scheduling instructions in parrallel and not the processor. The proccessor processes, it is not meant to schedule. If you want a hardware schedular instead of doing it at the software level such as modern speculative execution schedulars make it a addon card for cnile programmers such as yourself. Let the competent engineers program without the handholding.

Landon Brown
Landon Brown

Most speculative explanation of speculative execution I've ever seen. The penalty for misprediction is getting called a nigger, nigger.
You are wrong, that is what ALU's are for.
No, you absolute doublenigger. He was right. YOU are wrong.

John Cooper
John Cooper

You are wrong, that is what ALU's are for.
While you are right that some architectures (for instance Itanium, the Cell processor or GPUs) rely on very long instructions telling the processor to do multiple things at once (SIMD), there is a fundamental limitation with those. The compiler can't know at compile time what the result of a conditional branch will be, because if it knew the branch would be unnecessary in the first place. It can at most give you the percentage of times the branch goes one way or another. But that is going to be always inferior than taking the percentage a-priori, and then adding to that the information that can be gathered at run time about the state of the different registers and even memory locations that the CPU has seen accessed before, just a few instructions previous to the branch, which allows you to push the certainty to either 0% or 100% as far as you want by making the silicon more complex to take in more data for that probability calculation. Now, I'll give it to you that as far as I know, x64 doesn't give the programmer/compiler ways of hinting the scheduler what path he/it thinks is more likely to be taken, but it's not clear how much performance would be gained by that, and hinting alone is not going to be faster than runtime speculative execution because of what I explained above.
What you mean to say is that a schedular on the die creates latency in order to hide the complexity of branch prediction overhead.
There is supposed to be no prediction overhead. It's all supposed to be done in paralell in sub-clock speeds, otherwise like you say, what's the point. You mentioned ARM works faster with speculative execution disabled. I can see that being true if the underlying silicon doesn't support speculative execution (most ARM hardware doesn't, if not all of it) and the chinks designing the CPU decided to implement speculative execution in microcode as a marketing gimmick in a way that the speculation takes longer than just executing code until a branch and calling it a day. But that's not how it's supposed to work. Now, I can see what you mean about badly written code being helped by it if the code is riddled with conditional branches, and being slowed down if it's mostly linear arithmetical code. But if you write in a high level language you have very little control over how many branches are produced. C compilers have a very hard time optimizing code to take advantage of obscure SIMD instructions. For instance, go here (warosu.org/g/thread/S67942994) and try to write a fast implementation of that program using any high level language of your choice. I'll assure you it won't be faster than using C (or ASM) + the Intel Intrinsics instructions sprinkled in there (despite the name, they work on AMD CPUs too). But you have to include them explitictly, no compiler for any language known to men can figure out to use them without explicitly telling it to do so. Compilers are actually dumb as fuck.
There's only so many logic units or FPU's that can take decoded instructions at a time. There's no getting around that physical limit in software.
Yes... So? The point of speculative execution is to use all of them, all the time (ideally) rather than only fill them up with the instructions you know need to be executed for sure.
Why waste die space on more complexity and heat with speculative execution and its neccessary components when you could add more ALU's to execute more at once?
Because no matter how many you add, without speculative execution you can only use as many of them as you have instructions left before your program has to take a decision. The moment you fill them up with code after a branch you don't know the result of, you're doing speculative execution. Now, we can argue how complex the silicon dedicated to deciding to which extent each path after the branch should be loaded into the instruction decoding units. But even if you split it 50/50 it's still considered speculative execution, it's a type of speculative execution called "eager execution" actually. How much silicon to use for instruction decoding and for scheduling, I don't know, that's a deeper aspect of CPU design, but I can imagine a scheduler taking the space for, say, 2 ALUs, being able to save you from missfilling 5 ALUs for example. Or maybe not. But you haven't presented any evidence for that except the ARM stuff, which is not valid because of the reasons I stated earlier (its speculative execution probably not having enough dedicated silicon).

Jose Jenkins
Jose Jenkins

Are you cnile?
Do you have any evidence Lisp/Rust/whatever code runs faster than C on that ARM CPU with speculative execution disabled? Yeah, didn't think so. C, Fortran or assembly code is always going to be faster than higher level code, no matter how your CPU works, just because compilers don't have general AI yet.
Why would you ever want control taken away from you?
You know what really gives you total control over your computations? Pen and paper. No more jews you have to pay for electricity. You can change your CPU's architecture at any time instantly. You can know the state of every point of the circuit at any time. You don't need a decapping setup to know there aren't any backdoors. Why are you still using jewish silicon-based CPUs?
Let the compiler/programmer/you deal with scheduling instructions in parrallel and not the processor.
Again, I can't pop my head in there and predict what instruction are going to be executed each time in real time. I leave that to the computer, which can do it faster.
The proccessor processes, it is not meant to schedule.
Says who, you? With that logic, let's get the memory fetching circuitry out of the CPU too, since the processor's job is to process the data, not to fetch it from memory. Let's get the instruction decoding out of it too and turn it into a fixed pipeline device while we're at it, since the processor's job is to process, not to fetch and execute instructions which might or might not process anything.
If you want a hardware schedular instead of doing it at the software level
t. I'll just pull out shit from my ass
If you want a hardware schedular instead of doing it at the software level such as modern speculative execution schedulars make it a addon card for cnile programmers such as yourself.
Find me ONE program which runs faster on an ARM processor without speculative execution when written in the language of your choice vs C, Fortran or assembly.
Let the competent engineers program without the handholding.
Oh, you mean the magic compiler that's somehow supposed to turn your high level program into assembly code that doesn't ever have conditional branches? The one that turned Itanium into the fastest processor ever known to man when used in combination with Lisp?

Liam Clark
Liam Clark

getting this triggered over being called a cnile
hurr durr C is so low level and fast

Adrian Harris
Adrian Harris

Is that the smartass way of admitting you were talking out of your ass?

Carter Jenkins
Carter Jenkins

The compiler can't know at compile time what the result of a conditional branch will be
Full stop. Your whole arguement depends on shit software, i.e just about all compilers, generating shit code i.e conditional branching. Don't have conditional branches at the assembly level, let the compiler decide and the programmer using that compiler to compile. Don't take control away from the user.
But if you write in a high level language you have very little control over how many branches are produced. C compilers have a very hard time optimizing code to take advantage of obscure SIMD instructions
C is a shit language as is rust and lisp for these purposes. There is no accounting for branching in software nor extreme parallelization at the software level. No language as far as I am aware has ever attempted such a thing as no speculative execution enabled hardware nor CISC based microarchitectures supports fully controlling the pipeline of instructions from a programmers standpoint. Its always a hint which the proccessor does not have to obey, which takes control away from you.
The moment you fill them up with code after a branch you don't know the result of
Don't branch, or use lazy speculation to simulate branching but yet have the ability to remove the entire dedicated hardware to speculation.
Do you have any evidence Lisp/Rust/whatever code runs faster than C
Did you forget the title of the thread? This is software mitigations against a hardware/proccessor level bug. How do you know that said ARM cpu with the equalivalent of meltdown/specter mitigations has the same or better security then with speculative execution disabled? The point of this is security. If it is faster to disable speculative execution and give the programmer more control, while remaining secure, then do it. If its faster to enable speculative execution and mitigate its inherent downsides with constant jumps, then decide.
compiler that's somehow supposed to turn your high level program into assembly code that doesn't ever have conditional branches?
As mentioned above, all modern software including the compilers are shit.
Again, I can't pop my head in there and predict what instruction are going to be executed each time in real time. I leave that to the computer, which can do it faster.
You are a normalfaggot. You are giving the proccessor manufacturer control over something that you could control and possibly do better. Let the compiler, that you can edit, have control over what executes when in assembly. Which means by extension let the assembly programmers have control over the execution pipeline.

let's get the memory fetching circuitry out of the CPU too, since the processor's job is to process the data, not to fetch it from memory. Let's get the instruction decoding out of it too and turn it into a fixed pipeline device while we're at it, since the processor's job is to process, not to fetch and execute instructions which might or might not process anything.
Those are all great ideas. Why haven't intelaviv and other proccessor companies implemented such a thing? Oh right, unix brain damage.
The one that turned Itanium
Itanium has speculative execution you dumbass. Ontop of the fact the very very early itaniums are heavily undocumented, by intelaviv, and without compilers built ontop of it without unix braindamage and cnilety.
Find me ONE program which runs faster on an ARM processor without speculative execution when written in the language of your choice vs C, Fortran or assembly.
I don't have access to ARM hardware that I can compile my very own OS that doesn't use speculation and that does use it but with mitigations to show you proof. Phone OS's like android and google's fuschia do not include said mitigations vs no speculation so I can't use those to prove it. You are asking for a double negative in that ARM, being a shit architecture as it is, without much FOSS hardware control and documentation, and being made by chinks while wanting me to do or find intensive testing upon such a thing. I can tell you right now that doing no branching and executing the equivelent of
1 + 2 + 3
Is way faster then speculatively executing and doing all of
1 + 2 + 3
1 + 3 + 2
2 + 3 + 1
3 + 2 + 1
3 + 1 + 2
As a very abstract example.
No, that was just a random poster calling you a very unwise peice of technological scrapings from (((somewhere))).

Why are any of you argueing for speculative execution? It reduces your control over your programs and proccessor while being a security vulnerability all at the same time. Ontop of the speed impact of mitigating it vs not having it at all.

Zachary Gutierrez
Zachary Gutierrez

Yes... So? The point of speculative execution is to use all of them, all the time (ideally) rather than only fill them up with the instructions you know need to be executed for sure.
Using all of them all the time for things not relavent to the data you want proccessed creates heat, wastes energy, and is inneffiecient.

Blake Powell
Blake Powell

let's get the memory fetching circuitry out of the CPU too, since the processor's job is to process the data, not to fetch it from memory. Let's get the instruction decoding out of it too and turn it into a fixed pipeline device while we're at it, since the processor's job is to process, not to fetch and execute instructions which might or might not process anything.

<Those are all great ideas. Why haven't intelaviv and other proccessor companies implemented such a thing? Oh right, unix brain damage.
Actually I just comprehended what you wrote. Although getting the memory fetching circuitry out and turning the proccessor into a fixed pipeline device are great ideas. The proccessor should also execute the instructions because it by definition proccesses the data by taking it and executing some fixed function on it to then return a result that can be fetched by some other device like the memory fetching circuitry.

Gabriel Mitchell
Gabriel Mitchell

When you don't know what you're talking about, you should just stop talking.

John Evans
John Evans

Let's begin by establishing a reasonable definition of "branching". A computer is constituted by memory cells, each of which can be in any of a finite number of states, and a set of rules that determines what the next state of these cells will be, dependin on the previous states of these cells (let's ignore IO for simplicity).
These cells comprise the registers (including the Program Counter), RAM, and any permanent storage present on the machine (internal or external). Since in a Von Neumann architecture both programs and data are stored on the same memory, we can't know whether a certain set of cells are program code or data until they are effectively executed by the CPU (or we trace the execution ourselves on pen and paper until that point). This is why we'll be forced to consider a "branch" not an instruction on memory, but some transitions from one vector of states of the cells to another vector of states which happen when the PC is pointing at a set of cells with certain particular states (instructions).
Sometimes, the transition from one vector of states (which I'll call "machine state" from now on, to differentiate from individual cell states) to another is a branch, and other times it isn't. When is it a branch? I would say it is a branch when the states of the set of cells pointed at by the PC immediately before transition_0 determines which cells will determine the mathematical function to obtain the machine state after transition_1 from the machine state before transition_1. How do we know if this is the case? Well, we ask ourselves the question, if the states of the cells pointed at by the PC was different at the immediate moment before transition_0, would this mathematical function I mentioned be a different one? If the answer is affirmative, then transition_0 can be considered a branch, and the cells which were pointed at by the PC immediately before transition_0 can be said to contain an instruction which causes a branch.

Josiah Stewart
Josiah Stewart

Why did I got through all this trouble to define a branch? Because not all branches are the result of obvious instruction combinations like cmp and jne. For example, in x86 assembly, mov can cause different code paths to be executed depending on various conditions, which means it can branch. In fact, you can write a whole program with mov instructions (stedolan.net/research/mov.pdf). What's more, you can write any program using mov instructions. The title of that paper is "mov is Turing-complete". This isn't a coincidence.
Being able to emulate a Turing machine is a necessary and sufficient condition for being a Turing-complete architecture. Sure, you can't have infinite memory on an actual physical computer, but you can emulate it for an arbitrarily long amount of time by swapping external storage media according to the movements of the emulated Turing mahine's head.
The fact is that a Turing machine exhibits branching behavior, as defined above, at least (but not exclusively) because when the head reads a cell, it can move left or right depending on the cell's state (symbol in Turing's terminology), which affects the mathematical function that is applied to the machine state in the next iteration of the read-write-move cycle. There is no mapping between machine states from a device with no branching to a Turing machine, because on a non-branching machine you have no way of changing the mathematical function that is applied to the machine state on each iteration, unlike on a Turing-machine or a Turing-complete machine. Thus your proposed fixed-pipeline device is not Turing-complete.
Is the above definition of branching the relevant one to use in the context of speculative execution? I'd argue so, because without speculative execution any kind of branching behavior as defined (such as the mov instructions from the paper I mentioned which can be used to write any possible program) will cause the processor's pipeline to be unable of being fully filled when the PC is near the branching instruction by a distance lesser than the length of the pipeline.
Sure, as we discussed before, you could give a way for the compiler or the programmer to explicitly tell the processor with which instructions to fill the pipeline. But as I mentioned before, this is always going to be less efficient than the hardware deciding so at runtime. Why? Because if you implement it as a hard and fast rule the best you can do is tell the CPU to follow the path which is most often followed in the general program execution. This will make a percentage of the instructions on the pipeline useless thus reducing performance.

Charles Stewart
Charles Stewart

For instance, if the most likely path is followed 60% of the time, 40% of the time you're going to have to discard a percentage of the pipeline. On the other hand, if you allow the processor more freedom, the prediction could become more accurate for each time the program goes through this branch, reducing the number of mispredictions.
You could choose to make this path hinting more conditional based on more criteria, for example you could tell the CPU what registers to look at and what values would result in what path, but there is a limit to how programmable you could make it, because remember that the CPU has to process all these hints in a single clock cycle, so you would likely benefit from including most of the intelligence in the silicon itself, because you can't include complex logic in a blob of binary and expect it to be able to be executed in a quarter of a nanosecond, unlike you can with silicon. And if you do that you still have to be careful about security when designing the chip. Not to mention all this hinting would increase the size of your code in memory, and likely require you to write assembly while investing 3 or 4 times the effort of writing x86 without all the hinting, because compilers are dumb and wouldn't be able to figure out what criteria to use by themselves.
And that's why I mentioned Itanium, because it's a famous case of the hardware designers putting more responsibility on the compiler and the compiler being unable to fulfill those responsibilities. Even with current x86 which is helped a lot by the silicon transparently, it's very hard for the compilers to output decent code. It's a lie that the compilers are able to output faster code than hand-written assembly, as mentioned in the example provided in my previous posts.
Your whole arguement depends on shit software, i.e just about all compilers, generating shit code
Nah, not my whole argument. There are other arguments as I explained above. But yeah, compilers output shit code. And if no compiler ever managed to output decent code in the history of computers, what makes you believe it is possible to write such compiler that would match a skilled human in code quality?
Don't have conditional branches at the assembly level, let the compiler decide and the programmer using that compiler to compile.
You need conditional branches to have a Turing-complete computer, as I explained.
C is a shit language as is rust and lisp for these purposes
Do you have any evidence that a "good" language can be created when no such language for the purposes relevant here has ever been created?
How do you know that said ARM cpu with the equivalent of meltdown/specter mitigations has the same or better security then with speculative execution disabled?
If it has it disabled then it shouldn't need mitigations, because spectre and meltdown are caused by speculative execution.
So yeah, it should have better security without any mitigations than x86 silicon with mitigations.

Jackson Myers
Jackson Myers

The point of this is security. If it is faster to disable speculative execution and give the programmer more control, while remaining secure, then do it. If its faster to enable speculative execution and mitigate its inherent downsides with constant jumps, then decide.
Don't change the goalposts now. You said speculative execution creates additional latency and makes good code run slower than it would run without it. I'm disputing that. I agree that not having speculative execution is more secure.
As mentioned above, all modern software including the compilers are shit.
Compilers have never outputted code matching human written assembly, not now nor ever. That leads to the conclusion that it is not possible, if you disagree please give evidence. And even with perfect code, user programmed speculation would probably be slower than speculation that's mainly done in fixed function silicon, as I've explained.
You are a normalfaggot. You are giving the proccessor manufacturer control over something that you could control and possibly do better. Let the compiler, that you can edit, have control over what executes when in assembly. Which means by extension let the assembly programmers have control over the execution pipeline.
Nonsense. A programmable CPU running at 3 GHz is always going to be slower than an ASIC or FPGA with the same number of elements running at 3GHz. If you want performance you need to give up some control. If you want total control you can mine the silicon, purify it, make transistors and build a CPU yoursel, but it's going to be slower unless you spend millions of dollars on it. Same with programming your own FPGA. Same with emulating a CPU on top of another CPU. Same with programmable vs fixed function branch prediction.
Those are all great ideas. Why haven't intelaviv and other proccessor companies implemented such a thing? Oh right, unix brain damage.
Because PCB traces have higher inductance than on-die connections, which reduces signal integrity, and also has higher propagation delays. Not to mention cutting and packaging multiple dies costs more than cutting and packaging a single die.

Michael Butler
Michael Butler

I don't have access to ARM hardware that I can compile my very own OS that doesn't use speculation
It doesn't have to be an OS. It can be application code too.
and that does use it but with mitigations to show you proof
Your argument up until this point was that speculation slows down (good) code period, you didn't mention you were comparing it with CPUs running mitigations. In any case, if the code is mainly just logic, it shouldn't make a difference, because mitigations only affect performance when the code does system calls. Not to mention CPU designs vary by the extent to which they're affected, which points to the likelyhood that it would be possible to build a secure CPU with speculative execution that doesn't need software or microcode mitigations, albeit admittedly with a higher trasistor count.

Eli Collins
Eli Collins

I can tell you right now that doing no branching and executing the equivelent of
1 + 2 + 3
Is way faster then speculatively executing and doing all of
1 + 2 + 3
1 + 3 + 2
2 + 3 + 1
3 + 2 + 1
3 + 1 + 2
You have a fundamental misunderstanding of how speculative execution works. Speculative execution doesn't execute multiple combinations of operands for a single addition instruction if it is not productive or there are more gainful uses for the ALUs to be had, which is almost always. It executes instructions that ARE part of the code, except it's not known if they need to be executed, but the idea is that they'll be part of the path your program is in fact going to take more often than not, and they CPU will guess the right values for the register operands more often than not.
You could have successive mispredictions for the same register of the same instruction, but the concept you need to understand is that instead of a misprediction being filled and executed in that ALU, that ALU would've remained empty through those clock cycles. And if you have operations that can be performed in parallel without needing to predict operands or branches, like you would on a SIMD ISA, the CPU will prioritize those and put them in the ALUs to be calculated first before performing computations for which the usefulness is less certain.
And while it is true that an individual CPU with speculative execution will use more power than one without, the power per computation is actually lower than it would be without it, because to reach the same performance a single CPU with speculative execution achieves, you would need multiple CPUs without it, and you would need to duplicate certain circuitry that doesn't need to be increased as you increase the size of the pipeline for an individual CPU.

Blake Cooper
Blake Cooper

This is a pleb question, but how do you access libreboot BIOS like a normal BIOS on startup like pressing F2 or F10?

Liam Peterson
Liam Peterson

Power efficiency is a very subjective thing. Any modern DSP is many times more power efficient than any modern amd64 CPU. They're much more power efficient specifically because they leave out all the OOO machinery in favor of encoding N-issue directly into the ISA and running with what's basically an exposed (albeit short) pipeline. This lets you do the scheduling ahead of time rather than in real-time, which on the kind of code that runs on DSPs is going to yield much better results. Of course their benefit here is that they branch very predictably so it's easy to do an ahead-of-time strategy for branch-always-taken and just get away with it... and also that no one really cares that much about maintaining 100% ISA compatibility between the hardware generations.

Landon Jackson
Landon Jackson

The main purpose of DSPs is doing FFTs, which requires lots of matrix multiplications which requires very little branching. Same thing for GPUs. That's why these devices are better serviced by having SIMD ISAs allowing you to "do the scheduling ahead of time", so to speak. Because there's not much scheduling needed, just lots of arithmetic that can be done in parallel. But this doesn't apply to the logic-heavy code used for most things.
With respect to power consumption, sure, different chips are optimized for different things from the ground up, even having transistors with different characteristics.

Hudson Hernandez
Hudson Hernandez

tfw too brainlet to understand what this speculative execution business is
Forever webdev I guess.

Attached: wtf.jpg (104.01 KB, 464x495)

Brayden Cox
Brayden Cox

You don't. If you want to make changes, you get libre or coreboot source and do it there. Be prepared to manually reflash with pomona clip clip if SHTF.

Logan Clark
Logan Clark

They're not SIMD. By any definition they'd have to be MIMD, but explicitly rather than implicitly so. The mechanism is for all practical purposes the same, it's just that one's done with a compiler and the other's done with a ROB in silicon. When Intel calls something quad-issue it means can issue 4 instructions in a clock cycle, ROB permitting. If a DSP is quad-issue it usually means that YOU can issue 4 instructions in a clock cycle. Other than the explicit encoding allowing one to do it ahead of time they're really no different.
DSPs are not just FFT machines either. Usually they run lots of tight loop algorithms and heuristics to perform whatever task you need - otherwise who'd be stupid enough to buy a 200MHz DSP when an 8MHz DSP can do god knows how many channels worth of 44.1KHz FFT in real time. They run plenty of logic heavy code, it's just that the state is usually set, then the algorithms run without changing it willy-nilly.
There's no hard and fast rule that says you can't have general purpose integer processor that has an exposed pipeline to let you (try to) pull the same stunt outside of the DSP space. Dual-issue processors were not at all weird in the past. The EPIC architecture (Itanium) certainly gave it the old college try. It's hard to say if it was good or bad given the complete lack of market adoption, almost entirely because of poor x86 compatibility in its very x86 market.
In fact OOO started as a way of maintaining software compatibility - not to make faster computers that were more power efficient. The S360 M91 used it to make execution of your already written S360 programs faster, to save programmer time - not because they couldn't make a faster CPU otherwise. Most of the things that are sold as absolute truths today were basically just hacks, shortcuts and marketing wank a few decades ago.
Maybe I'm just biased from working on power optimization of a dual fxp+single integer pipeline DSP in the recent past.

Brody Collins
Brody Collins

They're not SIMD. By any definition they'd have to be MIMD, but explicitly rather than implicitly so. The mechanism is for all practical purposes the same, it's just that one's done with a compiler and the other's done with a ROB in silicon. When Intel calls something quad-issue it means can issue 4 instructions in a clock cycle, ROB permitting. If a DSP is quad-issue it usually means that YOU can issue 4 instructions in a clock cycle. Other than the explicit encoding allowing one to do it ahead of time they're really no different.
What specific architecture are you talking about? Can you back that up with any documentation?
By any definition they'd have to be MIMD
The definition of a DSP is they process signals. I don't see how that implies they're MIMD.
DSPs are not just FFT machines either. Usually they run lots of tight loop algorithms and heuristics to perform whatever task you need - otherwise who'd be stupid enough to buy a 200MHz DSP when an 8MHz DSP can do god knows how many channels worth of 44.1KHz FFT in real time. They run plenty of logic heavy code, it's just that the state is usually set, then the algorithms run without changing it willy-nilly.
Not all DSPs process audio. Right now I'm reverse engineering the firmware for an ARCompact DSP that's used to process digital TV signals. The bandwidth of the performed FFTs is in the MHz rather than KHz range. It does use the mulu64 instruction a lot (which I'd consider SIMD) but it doesn't have any MIMD stuff I know of. You can look up the programming manual for the ISA if you don't believe me, there's nothing about issuing multiple instructions.
The EPIC architecture (Itanium) certainly gave it the old college try. It's hard to say if it was good or bad given the complete lack of market adoption, almost entirely because of poor x86 compatibility in its very x86 market.
Itanic is famous for nobody being able to write a compiler that could output decent code for it.

Leo Peterson
Leo Peterson

reasonable definition of branching
<based upon the x86 pipeline
Thank you for atleast going through the effort, but nice joke there.
For example, in x86 assembly, mov can cause different code paths to be executed depending on various conditions,
X86 is a shit arcitecture. Very few people understand why and what operands get executed when you send a x86 assembly instruction. I don't understand it and its completely undocumented as that would give away (((trade secrets))) that hide the x86 > RISC transition happening in microcode. Which is just more bloat.
<proceeds to spam Turing
Turing completeness is not neccessary for computing. Is it nice and useful? In some cases. Do users need it to watch a video or browse the web? It depends technically but no if some sanity to standards were implemented. Take the ARM or powerpc CPU's without speculative execution. They run web browsers and watch lower res videos just fine.
It executes instructions that ARE part of the code, except it's not known if they need to be executed, but the idea is that they'll be part of the path your program is in fact going to take more often than not, and they CPU will guess the right values for the register operands more often than not.
I understood that before this conversation began you utter retard. Hence the multiple combinations of 1 + 2 + 3. The point being that the order could be seperated by branching because of cosmic rays or some such utter nonsense from cpu manufactureres' silicone defects. I am still argueing that using those using the speculative execution's waste of ACU's to execute operands that are never needed, but executed ahead of time, is a waste of time, energy, and power. Especially since control is taken away from the programmer.
to reach the same performance a single CPU with speculative execution achieves, you would need multiple CPUs
No you don't. You just need more cores dedicated to simple tasks like a RISC architecture and to be less of a faggot user. Less heat with more cores means more execution in parallel and better effieciency assuming you time it all correctly which is git gud for programmers.
You said speculative execution creates additional latency and makes good code run slower than it would run without it
It does, it just depends on the architecture as some don't let you disable it whatsoever. Speculative execution as an idea is shit and useless.
If you want performance you need to give up some control.
The microcode on modern CPU's is changeable and editable. Its just hidden behind proprietary shit so you can't touch it. Those ASIC's and microcodes were programmed by someone were they not? The whole point of a Field Programmable Grid Array is that it can be programmed by you is it not? You don't have to give up control. Just stop buying/making shit hardware like x86, arm, and other such architectures that steal your freedom.
they branch very predictably so it's easy to do an ahead-of-time strategy for branch-always-taken and just get away with it...
Wow, so you admit they are more effiecient and do exactly what you said couldn't be done. No speculative execution and no OOO yet still have branching. Its just the programmer can control the branching so effectively that it never needs to go down a useless branch, also known as lazy execution/speculation as mentioned in a previous post.

Most of the things that are sold as absolute truths today were basically just hacks, shortcuts and marketing wank a few decades ago.
Exactly user. The die shrinkage and heat death of x86 and related architectures has arrived. Its time for a change in how proccessors are made and programming is done or there will be no progress, neither in security or performance, for computers from this point foward. At some point some turbo autist will get enough money to implement a CPU and program it without OOO, speculative execution, and other modern nightmares like coproccessors dedicated to a backdoor wasting energy and power. I'm preety sure the naysayer is some x86 hardware employee brainlet as it keeps making pajeet tier spelling mistakes but not consistently like if it was avoiding typing style fingerprinting.
Itanic is famous for nobody being able to write a compiler that could output decent code for it.
Picture is related.

Jayden Rogers
Jayden Rogers

Forgot pic

Attached: 5bcc558c8a5c00481d09156e18c85a0bb9432555bb87e58f83c0f7b96aac8fcf.jpg (62.46 KB, 421x421)

Joseph Lewis
Joseph Lewis

Don't have conditional branches at the assembly level, let the compiler decide and the programmer using that compiler to compile.

Attached: he's-right-you-know.jpg (71.76 KB, 800x598)

Kayden Young
Kayden Young

based upon the x86 pipeline
It's not based on "the x86 pipeline". It's a conceptual description of branching. I welcome you to provide your own definition of branching and show us how it's possible to have Turing-complete machines without branching, but I know you won't because you don't know what you're talking about.
Very few people understand why and what operands get executed when you send a x86 assembly instruction.
Operands don't get executed. Sounds like you don't even know what an operand is. What gets executed is opcodes, shorthand for operation codes.
I don't understand it
Most of what happens in the pipeline is exposed through debug registers. I'm sure you could find out more about that if you cared to ever read a fucking book.
Take the ARM or powerpc CPU's without speculative execution. They run web browsers and watch lower res videos just fine.
Yes, and they are Turing complete you absolute brainlet. Branching is not the same thing as speculative execution.
I understood that before this conversation began you utter retard. Hence the multiple combinations of 1 + 2 + 3.
If you give the machine an operation with operands that aren't conditional on previous instructions, it will run that instruction a single time without ever trying different combinations of operands. Thus it doesn't take control away from the programmer. If it does try different operands for a single instruction it's because you didn't specify the operands in a predictable enough manner.
The point being that the order could be seperated by branching because of cosmic rays or some such utter nonsense from cpu manufactureres' silicone defects.
I don't even know what you trying to say here.
using the speculative execution's waste of ACU's to execute operands that are never needed
See the above. If the ALUs can be used to run instructions with predictable operands they will be, instead of trying the same instruction with different arguments multiple times. More predictable instructions are prioritized over uncertain instructions, which are only speculated on when the ALUs would otherwise be idle.
You can turn your SIMD program into continuous instructions that are always unconditionally executed with the same arguments, and the CPU won't ever speculate because it won't ever have idle ALUs to use for that.
No you don't. You just need more cores dedicated to simple tasks like a RISC architecture and to be less of a faggot user. Less heat with more cores means more execution in parallel and better effieciency assuming you time it all correctly which is git gud for programmers.
More cores rather than more capable cores means you are duplicating circuitry, such as the cache, the addressing and instruction fetching, the instruction decoding, etc.
It does, it just depends on the architecture as some don't let you disable it whatsoever. Speculative execution as an idea is shit and useless.
Claiming something repeatedly doesn't make it true.
The microcode on modern CPU's is changeable and editable. Its just hidden behind proprietary shit so you can't touch it. Those ASIC's and microcodes were programmed by someone were they not?
ASICs are not programmed, they're designed. Sure, the microcode was programmed by somebody. I don't deny there might be some configurability to the scheduling that isn't exposed to the user, but not a whole lot. When you're dealing with multi-GHz clocks, the higher order harmonics to get clean square waves are in the tens of GHz range. There might be some fields in the microcode to say "enable X set of gates, disable Y set of gates", but not a whole lot of programmability, because for anything that takes multiple steps to perform you need multiple clock steps, and the internal clocks of the CPU can't go much higher than the base clock because of the physical limitations I mentioned.
The whole point of a Field Programmable Grid Array is that it can be programmed by you is it not? You don't have to give up control. Just stop buying/making shit hardware like x86, arm, and other such architectures that steal your freedom.
Yes, and if you ever actually used them or even cared to research about them a little bit, you'd know any CPU you can implement with them is going to be orders of magnitude slower than the CPUs you can buy in any computer shop, and orders of magnitude more expensive.
Wow, so you admit they are more effiecient and do exactly what you said couldn't be done.
I'm not that guy.
I'm preety sure the naysayer is some x86 hardware employee brainlet as it keeps making pajeet tier spelling mistakes but not consistently like if it was avoiding typing style fingerprinting.
Espero que tu español sea tan bueno como mi inglés.

Colton Martin
Colton Martin

Wow, so you admit
You're replying to two different people.

Carter Perez
Carter Perez

Just turn the mitigations off nobodie will hack you anyway.

Hunter Ward
Hunter Ward

Has nothing to do with meltdown or spectre you absolute buzzword spewing retard.

Ryan Diaz
Ryan Diaz

is there a CPU that isn't affected by this shit?
Anything AMD and pre-Pentium Intel, pre-Zen 2 CPUs however are still vulnerable to Spectre though AMD's Spectre mitigations actually make their chips faster somehow.
did Intel fix the flaws in recetly designed CPUs ?
Not really, they partially fixed Spectre in hardware on some models but all the other shit is still set to be software mitigated in BIOS/cpu microcode on new CPUs until 2020 at the very least.

Attached: ClipboardImage.png (736.24 KB, 1200x651)

Oliver Ortiz
Oliver Ortiz

Just turn off the patches with InSpectre from Gibson Research. If a wild exploit appears, it will quickly be added to antivirus libraries everywhere. No point in making your machines uselessly slow over a "maybe". Just keep your shit backed up in case a malicious idiot manages to sneak out something that fucks your shit up for fun or profit. Data loss is the only worry for 99.99% of people, because anyone capable of using the exploits to steal your precious bodily fluid data probably doesn't give a flying squirrel's porn-bleached asshole about your "sooper sekrets". Most people just tell Alexa all their damning illegal fetishes anyway.

Samuel Smith
Samuel Smith

what a fucking retard.
ever thought about branch predictions happening at runtime?
compiler can't do shit
you dont know what branch prediction is faggot

Austin Clark
Austin Clark

no antivirus for linux
you are the antivirus

Eli Green
Eli Green

This, FUD is to keep security researchers employed. Every time you ask them about real-world examples, they go "Errr, hypothetically..." or "Assuming root acces, ...". Bunch of morons.

Robert Johnson
Robert Johnson

don't run untrusted code from random computers all over the world locally, problem mostly solved. (= disable javascript)

"Let's stick a turing complete programming language interpreter in this here program and let it be fed with code by thousands of random computers all over the world" is peak braindamage.

Elijah Scott
Elijah Scott

4chan
go back, faggot

Samuel Adams
Samuel Adams

tldr;
install gentoo
disable all spectre/meltdown mitigations
it's not that difficult, in linux 5.1 they've turned it into a couple of switches in kernel compilation and a linux boot argument to disable most of it.

in reality for desktop use most of the spectre/meltdown security concerns are about being able to escape out of a VM, or read host system memory from inside of a guest VM. In these cases to be affected your already going to have to have fucked up inside of that guest VM, you would have had to install a virus which takes advantage of spectre/meltdown to be fucked by it and it may or may not require root access (not sure on that).

You have control over the guest VM in desktop use cases, which is already 99% of the battle.
spectre/meltdown fucked cloud providers, not desktop users
Imagine a guest VM being able to read all host system memory in your desktop computer, so fucking what; your already running shit that can do that on your desktop computer, your the dumbass that installed it and ran it.

In a cloud situation it's far different, they don't have full control over the guest VMs; and they don't want one retard intentionally or not fucking over other VM's or the host system itself.

Daniel Flores
Daniel Flores

it's hardware implementation you brainlet, no software can "disable it"

Nathaniel Powell
Nathaniel Powell

i'm not talking about disabling speculative execution big brain I'm talking about disabling the mitigation to get your performance back because the security problem is irrelevant for the majority of users.

Tyler Hill
Tyler Hill

itt
denying the fact that increasingly aggressive speculative execution produces massive across-the-board gains in ipc

Attached: IPC-gains.jpg (40.03 KB, 580x365)
Attached: IPC-Over-Sandy.png (189.6 KB, 2685x1735)

Hunter Gonzalez
Hunter Gonzalez

hardware doesn't do its job correctly
but it doesn't do its job correctly SUPER FAST OMG OPTIMIZED I CAN FEEL THE --funroll-loops

William Collins
William Collins

tfw Pentium 4

Sebastian Nelson
Sebastian Nelson

linux 5.1
Is Linux 5 completely gluten- and lactose-free?

Nolan Hughes
Nolan Hughes

caring about hypothetical exploits that can only ever effect emulators running potentially hostile software
Oh, wow, gee gosh golly, colo server farms & JS in muh browser will run slowly.

Attached: what-a-shame.jpg (53.64 KB, 603x393)

Jeremiah Wright
Jeremiah Wright

no, there's actually a slight performance drop with some things, but you have no choice if you want updated drivers.

Charles Cook
Charles Cook

havent noticed it being slower but i have all the memes disabled

Nicholas Bennett
Nicholas Bennett

Are there any known exploits based on Meltdown/Spectre in the wild? What is the actual risk for a Pentium III/4/Core 2/early Core i CPU?

Aiden Evans
Aiden Evans

Before you get a space-heater P4 you might as well get one of those ARM SBCs with CPUs that do in order execution. Not only are they immune by definition, they also will probably be faster.

Owen Allen
Owen Allen

I doubt my A20 (dual-core Cortex-A7) is faster than a P4. I think it's more equivalent to an Intel Atom like they used to put in netbooks. And I guess the older Atoms also aren't vulnerable to Meltdown/Spectre bugs, but they do have other x86 bugs, and there are many of those. Look up the papers/talks by Christopher Domas for a small glimpse of that. The x86 rabbit hole goes deep...
You can of course get other SoC with 4-core (R40) and even 8-core Cortex-A7 (A83t, A80, H8, and such) but that won't help much without multithreaded program. Also some of those boards are designed like shit, especially the Banana Pi ones.
I can't compare Cortex-A53 since I haven't got one yet. But here's a cute rover that used A64 Olimex board in Antarctica.

Attached: r3.jpg (535.45 KB, 682x1024)

Jackson Hall
Jackson Hall

Cortex A-53 is quite snappy, especially compared to A-7. It always depends on what you want to do. If "webbrowsing" is not in that list, you suddenly can do a lot more with a lot less. I also have an A20 setup and am quite happy with it. I also stay in the framebuffer on that machine and remote control others via scripts I wrote.

Gabriel Lopez
Gabriel Lopez

You can still browse the web on that if you take the lynxpill.

Isaiah Gutierrez
Isaiah Gutierrez

netsurf might work too. it wont run your meme javascript frameworks but everything thats worth browsing should work.