How do multi-core CPUs work?

169

Solution 1

A transistor can't "work on a problem". It's a basic building block of CPU's, useless on its own, but required to build logic gates (which can then compute simple operations like addition, etc...). There's a lot of hardware in a single core, more than just one transistor.

There's also much more in a CPU than just "doing stuff". There's a virtual memory manager, a hardware cache manager, various interfaces to connect the CPU to the motherboard and to the system memory, etc... Multicore CPU's often share a lot of hardware inside the actual CPU.

A "program" is a software concept - the CPU doesn't know what it is. All the CPU does is execute operations sent by the operating system. In this sense, a single-core CPU can only perform one logic operation at a time. But you still are able to do multiple stuff at the same time even on a single-core processor because the operating system switches the program which is currently running at a very fast rate. Multicore CPU's allow you to run more than one task at the same time, which can be exploited by the operating system by allowing you to run more programs at the same time comfortably, or having one program take advantage of multiple cores to run faster.

Technically, a "program" is a process divided into one or more threads, each of which are independent in their execution, they all have their own stack, CPU context (registers, etc..) and other stuff, though they can still communicate between each other within the process, obviously.

Solution 2

The classical cycle that CPU's follow is:

Fetch Instruction -> Decode It -> Execute It

Reason being is that most problems that people have historically wanted to solve with computers have involved following a number of steps, one at a time and in order, where the result of some of those steps may affect later steps. With such problems, jumping around and attacking it from the middle using multiple "workers" doesn't work so well. So this model has served those types of programs well, which are really still very common. (That is, until 3D graphics rendering became common...)

The above model has been optimized and modified over the years, of course. And work has progressed in making sure less of the CPU is idle as time goes on. Even as early as the 68000 you had "pipelining", where multiple instructions are actually "in-flight" in various parts of the CPU (and this is why branch prediction was developed, because if you have multiple instructions pipelined, and then have to throw away the results because of a branch, you lose performance). Today you have additional things that prevent the CPU from stalling or waiting for something, like:

  • cache (prevents CPU from having to wait on slow memory sometimes)
  • out-of-order execution (rearranges fetched instructions into an order that executes more efficiently)
  • register renaming (allows out-of-order execution to work better, by giving instructions their own copies of registers while other instructions finish their work)

So, each processor or core contains a number of subsystems that work together to interpret and execute an instruction stream. In a sense parts of modern CPUs are indeed working on something and other parts are working on something else, at the same time.

But, While they can be made very efficient using the above techniques, ultimately they are all working together on the same instruction stream. So they cannot be totally independent from each other. If you want to execute two instruction streams at once, you need two CPUs or cores.

A modern multitasking OS is bouncing between various instruction streams stored (i.e. programs) in memory. What the OS does is cut off the program when it takes up too much of a time slice (most CPUs designed for multitasking environments have a timer that causes an IRQ after a certain interval, or such similar mechanism), or switch over to another task if the process is waiting on some type of I/O or input. It never physically executes two instructions at once on a single CPU.

I think something like the idea you are talking about was tried with the Itanium and it's VLIW architecture. Read that Wikipedia article, it explains things a bit better and more in-depth than I'm trying to here.

Solution 3

Actually "switching tasks" part is performed by an OS. Processor is a relatively "dumb" piece of hardware that just "crunches numbers". From a technical point of view, processor can't work on more than one task, because all tasks are written in assumption that they have full control of a processor at the time of execution. This is partially a legacy because of required backward compatibility.

With a multi-core processor more than one "full processor" available, so more than one program can have "full processor" available at the moment, so they can be executed simultaneously.

Solution 4

The OS could only run ONE thread of a process on ONE core at the SAME time, but once you know that one process is divided into multiple threads by the OS before they can run on the Core, and any modern OS is normally running around 100 or more processes, you will be able to imagine how fast the switching happens, if the Core is clocked at 1 GHz it will refresh around a billion times per second to free up "space" for the upcoming threads to execute.

Recently, in Intel processors, you may have heard of their Multi-Threading technology, which makes ONE core be ABLE to run TWO threads at the exact SAME time, theoretically doubling the Core performance.

Share:
169

Related videos on Youtube

flannel_bioinformatician
Author by

flannel_bioinformatician

Updated on September 18, 2022

Comments

  • flannel_bioinformatician
    flannel_bioinformatician over 1 year

    I have written the following code

    arr_coord = []
    
    for chains in structure:
        for chain in chains:
            for residue in chain:                             
                for atom in residue:
                    x = atom.get_coord()
                    arr_coord.append({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})                
    
    
    coord_table = pd.DataFrame(arr_coord)
    print(coord_table)
    

    To generate the following dataframe

                 X         Y         Z
    0      [-5.43]  [28.077]  [-0.842]
    1     [-3.183]  [26.472]   [1.741]
    2     [-2.574]  [22.752]    [1.69]
    3     [-1.743]  [21.321]   [5.121]
    4      [0.413]  [18.212]   [5.392]
    5      [0.714]  [15.803]   [8.332]
    6      [4.078]  [15.689]  [10.138]
    7      [5.192]    [12.2]   [9.065]
    8      [4.088]   [12.79]   [5.475]
    9      [5.875]  [16.117]   [4.945]
    10     [8.514]  [15.909]    [2.22]
    11    [12.235]   [15.85]   [2.943]
    12    [13.079]  [16.427]  [-0.719]
    13    [10.832]  [19.066]  [-2.324]
    14    [12.327]  [22.569]  [-2.163]
    15     [8.976]  [24.342]  [-1.742]
    16     [7.689]  [25.565]   [1.689]
    17     [5.174]  [23.336]   [3.467]
    18     [2.339]  [24.135]   [5.889]
    19       [0.9]  [22.203]   [8.827]
    20    [-1.217]  [22.065]  [11.975]
    21     [0.334]  [20.465]   [15.09]
    22       [0.0]  [20.066]  [18.885]
    23     [2.738]  [21.762]  [20.915]
    24     [4.087]  [19.615]  [23.742]
    25     [7.186]  [21.618]  [24.704]
    26     [8.867]  [24.914]   [23.91]
    27    [11.679]  [27.173]  [24.946]
    28     [10.76]  [30.763]  [25.731]
    29    [11.517]  [33.056]  [22.764]
    ..         ...       ...       ...
    431    [8.093]  [34.654]  [68.474]
    432    [7.171]  [32.741]  [65.298]
    433    [5.088]  [35.626]  [63.932]
    434    [7.859]   [38.22]  [64.329]
    435   [10.623]  [35.908]    [63.1]
    436   [12.253]  [36.776]  [59.767]
    437    [10.65]  [35.048]  [56.795]
    438    [7.459]  [34.084]  [58.628]
    439    [4.399]  [35.164]  [56.713]
    440    [0.694]  [35.273]  [57.347]
    441   [-1.906]  [34.388]  [54.667]
    442   [-5.139]  [35.863]  [55.987]
    443   [-8.663]  [36.808]  [55.097]
    444   [-9.629]  [40.233]  [56.493]
    445  [-12.886]   [42.15]  [56.888]
    446  [-12.969]  [45.937]  [56.576]
    447  [-14.759]  [47.638]  [59.485]
    448  [-14.836]  [51.367]  [60.099]
    449  [-11.607]  [51.863]  [58.176]
    450   [-9.836]  [48.934]  [59.829]
    451    [-8.95]  [45.445]  [58.689]
    452   [-9.824]  [42.599]  [61.073]
    453   [-8.559]  [39.047]  [60.598]
    454  [-11.201]  [36.341]  [60.195]
    455  [-11.561]   [32.71]  [59.077]
    456   [-7.786]  [32.216]  [59.387]
    457   [-5.785]  [29.886]  [61.675]
    458   [-2.143]  [29.222]  [62.469]
    459   [-0.946]  [25.828]  [61.248]
    460    [2.239]  [25.804]  [63.373]
    
    [461 rows x 3 columns]
    

    What I intend to do is to create a Euclidean distance matrix using these X, Y, and Z values. I tried to do this using the pdist function

    dist = pdist(coord_table, metric = 'euclidean')
    distance_matrix = squareform(dist)
    print(distance_matrix)
    

    However, the interpreter gives the following error

    ValueError: setting an array element with a sequence.
    

    I am not sure how to interpret this error or how to fix it.

    • user541686
      user541686 over 11 years
      Worth noting: Intel CPUs do have the notion of "tasks", but OSes ignore them and implement task switching manually.
    • offeltoffel
      offeltoffel over 5 years
      Not 100 % sure but all your coords are single-list-elements. Why do you construct the df in the form [x[0]] instead of x[0]? I believe pdist seeks elements of an array, but gets lists instead.
  • MarcusJ
    MarcusJ over 11 years
    Yeah, I know that they're literally just crunching numbers, and everything needs to be mathematical, is it the processor or the OS that has that backwards compatibility "full processor" thing though? Thanks for the answers btw.
  • Petr Abdulin
    Petr Abdulin over 11 years
    Both processor and OS have to deal with backward compatibility, but processor is much more restricted in that sense, since it should be compatible with any OS or program that runs on it (Windows, Linux, etc.)
  • MarcusJ
    MarcusJ about 11 years
    Sorry for responding so late, but does that mean that an OS could theoretically allow two or more programs to use a fraction of the transistors available on a CPU?
  • vonbrand
    vonbrand about 11 years
    It won't switch threads more than around a 100 times a second (switching is rather exensive). And the hyperthreading technology gives around a 5 to 20% performance for the second thread for one processor (or at least that were the measurements I saw when it came out or a bit later).
  • Achraf Almouloudi
    Achraf Almouloudi about 11 years
    Yes , the refreshes made by the Clock Frequency are meant to free up the processor from the currently executing instructions, it would have nothing left to do if it was switching threads a billion times per second. I am sure it won't exceed the 20% performance increase mark, and that's exactly why I said "theoretically" but in real world usage, normal users will rarely see any difference.
  • flannel_bioinformatician
    flannel_bioinformatician over 5 years
    Thank you, this worked. Out of curiosity, what was wrong with my previous code?
  • Mortz
    Mortz over 5 years
    The function expects the points to be in an m X n array with m observations and n dimensions. In your version, each entry in the array was not a point but a list