Programming for Multi core Processors

17,872

Solution 1

That is correct. Your program will not run any faster (except for the fact that the core is handling fewer other processes, because some of the processes are being run on the other core) unless you employ concurrency. If you do use concurrency, though, more cores improves the actual parallelism (with fewer cores, the concurrency is interleaved, whereas with more cores, you can get true parallelism between threads).

Making programs efficiently concurrent is no simple task. If done poorly, making your program concurrent can actually make it slower! For example, if you spend lots of time spawning threads (thread construction is really slow), and do work on a very small chunk size (so that the overhead of thread construction dominates the actual work), or if you frequently synchronize your data (which not only forces operations to run serially, but also has a very high overhead on top of it), or if you frequently write to data in the same cache line between multiple threads (which can lead to the entire cache line being invalidated on one of the cores), then you can seriously harm the performance with concurrent programming.

It is also important to note that if you have N cores, that DOES NOT mean that you will get a speedup of N. That is the theoretical limit to the speedup. In fact, maybe with two cores it is twice as fast, but with four cores it might be about three times as fast, and then with eight cores it is about three and a half times as fast, etc. How well your program is actually able to take advantage of these cores is called the parallel scalability. Often communication and synchronization overhead prevent a linear speedup, although, in the ideal, if you can avoid communication and synchronization as much as possible, you can hopefully get close to linear.

It would not be possible to give a complete answer on how to write efficient parallel programs on StackOverflow. This is really the subject of at least one (probably several) computer science courses. I suggest that you sign up for such a course or buy a book. I'd recommend a book to you if I knew of a good one, but the paralell algorithms course I took did not have a textbook for the course. You might also be interested in writing a handful of programs using a serial implementation, a parallel implementation with multithreading (regular threads, thread pools, etc.), and a parallel implementation with message passing (such as with Hadoop, Apache Spark, Cloud Dataflows, asynchronous RPCs, etc.), and then measuring their performance, varying the number of cores in the case of the parallel implementations. This was the bulk of the course work for my parallel algorithms course and can be quite insightful. Some computations you might try parallelizing include computing Pi using the Monte Carlo method (this is trivially parallelizable, assuming you can create a random number generator where the random numbers generated in different threads are independent), performing matrix multiplication, computing the row echelon form of a matrix, summing the square of the number 1...N for some very large number of N, and I'm sure you can think of others.

Solution 2

I don't know if it's the best possible place to start, but I've subscribed to the article feed from Intel Software Network some time ago and have found a lot of interesting thing there, presented in pretty simple way. You can find some very basic articles on fundamental concepts of parallel computing, like this. Here you have a quick dive into openMP that is one possible approach to start parallelizing the slowest parts of your application, without changing the rest. (If those parts present parallelism, of course.) Also check Intel Guide for Developing Multithreaded Applications. Or just go and browse the article section, the articles are not too many, so you can quickly figure out what suits you best. They also have a forum and a weekly webcast called Parallel Programming Talk.

Solution 3

Yes, simply adding more cores to a system without altering the software would yield you no results (with exception of the operating system would be able to schedule multiple concurrent processes on separate cores).

To have your operating system utilise your multiple cores, you need to do one of two things: increase the thread count per process, or increase the number of processes running at the same time (or both!).

Utilising the cores effectively, however, is a beast of a different colour. If you spend too much time synchronising shared data access between threads/processes, your level of concurrency will take a hit as threads wait on each other. This also assumes that you have a problem/computation that can relatively easily be parallelised, since the parallel version of an algorithm is often much more complex than the sequential version thereof.

That said, especially for CPU-bound computations with work units that are independent of each other, you'll most likely see a linear speed-up as you throw more threads at the problem. As you add serial segments and synchronisation blocks, this speed-up will tend to decrease.

I/O heavy computations would typically fare the worst in a multi-threaded environment, since access to the physical storage (especially if it's on the same controller, or the same media) is also serial, in which case threading becomes more useful in the sense that it frees up your other threads to continue with user interaction or CPU-based operations.

Share:
17,872
Chathuranga Chandrasekara
Author by

Chathuranga Chandrasekara

A seasoned professional with 10+ years of Industry experience. The core competencies are, 1. Full Stack Solutions Architecture 2. Design and Implementation of Internet of Things (IoT) Software and Hardware/Firmware Programming Languages - Java | Python | NodeJS | JavaScript | TypeScript Front End Frameworks - Angular | React | Backbone | Bootstrap | Material Dependency Injection - Spring ORM - Hibernate Microservices - Spring Boot Batch Processing - Spring Batch Containerization - Docker Orchestration – Kubernetes Databases – MySQL | Postgres SQL | MS SQL Server NoSQL - MongoDB | Cassendra Build Tools – Maven | Gradle CI/CD – Jenkins | Ansible | Chef Testing - JUnit | Jasmine | Karma | RestAssured | Selenium Caching - Redis | Guava Dashboarding - Kibana | Banana Reporting - Jasper | Penthaho Health Monitoring – Prometheous | OpenTSDB | Ngios Messaging – RabbitMQ | Kafka API Gateways – Zuul | WSO2 API Manager | Nginx | Kong Cloud Services – AWS | OpenShift Identity Providers - KeyCloak | Apereo CAS REST Documentation - Swagger REST Security - JWT | OAuth2 Protocols - CoAP | STOMP | XMPP | TLS | REST | SOAP | MQTT | AMQP Source Management - Git | Subversion | Mercural Deep Learning & Numerical Calculation - Keras | Tensorflow | Caffe | Pandas | Numpy Image Processing and Computer Vision - OpenCV Project Management - Jira | ScrumWorks Programmable Hardware - Arduino | Rasberry Pi | PIC | ESP 32| ESP8266 GPRS & NB-IoT - SIM 800 | SIM 900 | SIM 7000 IoT Prototyping - NodeRed Search Engines - Elastic | Solr | Fast ESP Mobile – Android | Telerik NativeScript | Ionic 2 | React Native My other interests are, 1. Machine Learning 2. Deep Learning / Artificial Neural Networks 3. Artificial Intelligence

Updated on June 03, 2022

Comments

  • Chathuranga Chandrasekara
    Chathuranga Chandrasekara almost 2 years

    As far as I know, the multi-core architecture in a processor does not effect the program. The actual instruction execution is handled in a lower layer.

    my question is,

    Given that you have a multicore environment, Can I use any programming practices to utilize the available resources more effectively? How should I change my code to gain more performance in multicore environments?