## May 19, 2015

You can follow this conversation by subscribing to the comment feed for this post.

Those numbers don't sound right at all. Numpy finds the product of two 5k*5k matrices in about 6.5 seconds on my macbook air, raising to 10**17 should take about a hundred multiplies (depending on what exactly the power is), which is about ten minutes.

In your other example, raising a 10k*10k matrix to the 100th power should take 10 multiplies, each lasting about 61 seconds, or about ten minutes.

Also if your matrices are symmetric, maybe try doing a partial SVD, and then you can just write down the SVD of the power matrix, and multiply it back out.

Also I just noticed they were markov matrices, so you're going to have a very friendly eigenspectrum.

post links from where you bought the motherboard and the phi coprocessors for so cheap

I hope you realize that you can compute the 17th power of A with five matrix multiplies (compute A^2, then A^4, then A^8, then A^16, and finally A^17).

You might also be interested in pqR (see pqR-project.org), which can automatically parallelize various R operations. It probably wouldn't help for this calculation, however, which will be dominated by how well your BLAS does matrix multiplies.

By the way, if your cores can run two threads (via hyperthreading), there may appear to be twice as many processors, and it may appear that a program not using these extra processors is using only 50% of the CPU, but it's probably really using something like 85% of the CPU, since two threads running on the same core are far from being the equivalent of two threads on separate cores.

Looks like you already invested a lot of time for the configuration. I really think you should try linux (ubuntu can be quite good for win users), using just a little part of time you've invested already you would have a good base to start HPC without any M$artifacts. The matrix shown is a projection matrices and has the property that c raised to any power is c so if that is the form of your actual matrix you can avoid the multiplication entirely. I have been looking into the Phi cards are well. Our simulation software is in FORTRAN. I wonder what the performance would be with FORTRAN compiled with the Intel compiler specifically made for use with the Phi cores? This can be found as a bundle packages 'starting for under$5k' https://software.intel.com/en-us/xeon-phi-starter-kit

Also has anybody re-compiled the R core using the intel starting kit - specifically optimizing for the Phi cores? I assume that is what Revolution has done - to gain access to the Phi cores and intel math libraries?

Hi Everyone,

@Jake

The numbers are correct. I ran them multiple times. I lost a month+ earlier in the winter semester because my original software took so long.

I tried a couple different methods earlier from my numerical analysis class. When I talked to my prof about the issues I had, it came down to float point errors. My eigenvalues values all ended up being 1.00.

I got the phi from Sabrepc.com. They have 3 phis at a low price.

@buddy

On my Gamer comouter, I get 100% utilization. One the workstation and my laptop,(both intel processors) I get 50% utilization. I run simutations from BOINC, which use 100% of my processors on all my computers. The core temps on the xeon processor and cooling fan speeds seem to correlate to 50% utilization too. One of my fans is really loud at full throttle, when I run BOINC For several hours. When I run my larger program, the fans are not that loud.

I did download Ubantu. The phi drivers work on red hat, not ubantu. I spent a few days trying to get the phi working on ubantu. I gave up. Since I wasted so much time earlier in the semester, I didn't have a week or two more to figure out ubantu. As is, I turned in my paper 2 days before the end of the semester.

@Jang
I'm not sure what you are getting at. Matrix c is a sparse matrix where c (n, n) is approximately 0.999999998. I ran multiple iterations of the program with a=500, 1000, 2500, 5000, 10000. To cut down on programming mistakes, I used a=# to declare the size of the matrix.

@Groth
From my (poor) understanding, if you can call upon the intel mkl and set the environment variables, the automatic offload does the rest. You can optimize the programming. I have a copy of Intel's parallel suite, which I got free for a year. (Sometimes being a student isn't so bad.) I'm not sure if I even need it. I haven't done anything with it yet, that I know of.

Thanks for reading and commenting.

Dear Andrew:

For your own good, learn about Sparse Arrays...

For example, using Mathematica:
a = 10000; b = 10^-9;
cc = SparseArray[{{1, 1} -> 1 - b, {a, a} -> 1, {a, a - 1} -> 1,
{i_, i_} -> 1 - 1.75*b, {i_, j_} /; i - 1 == j ->
b, {i_, j_} /; i == j - 1 -> .75}, {a, a}, 0]

It will take LESS THAN A SECOND to calculate this.
Timing[dd = MatrixPower[cc, 100]]

This are the values of the first row:
1., 75.0002, 2784.38, 68217.3, ...

Greaat post. What moherboard did you get for the Xeon Phi coprocessors?

The comments to this entry are closed.

## Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.