cant add user to sudo group in centOS 7 i386(no GUI,Its minimal)
1,548
You just need to add user Smit to group wheel which is have permission to run all commands with sudo command And you can accomplish it by entering the following command
vim /etc/group
look for wheel group and add smit to it
save and exit and thats it.
Related videos on Youtube
Author by
karnajitsen
Updated on September 18, 2022Comments
-
karnajitsen over 1 year
I have to implement Khatri Rao product between 2 matrices in C. Mathematically this is a column major access of data and I can not change that. But if I use preload ( PLD instruction in ARMv7 ) to prefetch every next loop data will that solve the problem of performance in stead of using a row major access of data.
If yes how to preload properly?
Please check my preload code below,
void khatrirao_pref(double *C, double *A, double *B, int nmax, int mmax, int pmax) { int i,k,l; for (i=0;i<nmax;i++) { for (k=0;k<mmax;k++) { asm("PLD [%0]\n\t" :: "r" (A+i+((nmax+1)*k))); for (l=0;l<pmax;l++) { asm("PLD [%0]\n\t" :: "r" (B+i+((nmax+1)*l))); C[i+(nmax*((k*pmax)+l))]=A[i+(nmax*k)]*B[i+(nmax*l)]; }}} }
-
Jonathan Leffler about 8 yearsIf you're always going to use column-major order, consider whether you can treat columns as rows and rows as columns by reversing the sense of array indices — where you had
A[row][col]
, useA[col][row]
. That gives you the caching benefit of accessing the data in memory sequence. It is not something to undertake lightly — measure and test very carefully. -
karnajitsen about 8 years@JonathanLeffler Hi Jonathan, Thank you for the reply. But I cannot do so. I have to strictly stay in collumn major access. I can not change the inner equation or the order of the 3 loops or the dimensions of arrays. I can only use prefetch to get the next loop data of A and B of the same column. I know this is kind of odd I am asking. What do you think ?
-
BitBank about 8 yearsGoing "against the grain" with your memory access pattern will definitely hurt performance. A better plan would be to prefetch the entire array into cache before it arrives at this code. As written, you're prefetching won't help; you need to give a hint to the CPU and then give it time to actually do the reading.
-
karnajitsen about 8 years@BitBank: I tried to prefetch everything before, it was not helpful so much .
-
artless noise about 8 yearsTry compiler switches. gcc knows about the
PLD
instruction and should use it. You can use compile specific options for a function. It is not likely to help manually if gcc doesn't think it will by itself;pld
will function best with some unrolling to only issue apld
to fill the cache lines. Too manypld
is harmful. -
karnajitsen about 8 yearsI receive 7% improvement from PLD. How much maximum speed up can be achieved? any guess.
-
Todd Christensen about 8 yearsIt really depends, but if it were me I would try for a little more than 7%. Since you're multiplying doubles, memory access may not be the most dominant part of the time... PLD can't help with time spent in the FP unit. 10-15% might be achievable, as a guess.
-
Admin almost 7 yearsi had used the command in both manner.without
<>
and with<>
-
-
karnajitsen about 8 yearsbut I am using ARMv7 processor. Will __builtin_prefetch work there? I think PLD should be the instruction to do so.
-
Todd Christensen about 8 yearsYes, that's how you tell gcc/clang to emit that instruction.