{"id":5601,"date":"2019-12-04T13:00:52","date_gmt":"2019-12-04T18:00:52","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=5601"},"modified":"2019-12-04T13:00:52","modified_gmt":"2019-12-04T18:00:52","slug":"a-matrix-for-the-new-hpl-ai-benchmark","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2019\/12\/04\/a-matrix-for-the-new-hpl-ai-benchmark\/","title":{"rendered":"A Matrix for the New HPL-AI Benchmark"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>My colleagues are looking for a matrix to be used in a new benchmark. They've come to the right place.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#6f5315d6-ad46-4b35-8764-677b167da963\">Email<\/a><\/li><li><a href=\"#ec0c12b2-744f-4a38-950a-cc5be7fd721c\">Symmetric Hilbert Matrix<\/a><\/li><li><a href=\"#c4ed2873-690a-44c2-9287-ef44ffcc9484\">Nonsymmetric Hilbert Matrix<\/a><\/li><li><a href=\"#5aeb53c5-275a-4d87-bb57-1cd00c4a7bdd\">Benchmark Matrix<\/a><\/li><li><a href=\"#d3942f58-edb5-48f0-891c-7a7ecec1af6b\">Example<\/a><\/li><li><a href=\"#58364797-0112-4455-89cb-4d4e759bbd52\">Animation<\/a><\/li><li><a href=\"#781fa356-fe10-46fb-b54d-8924359bdada\">Gaussian elimination<\/a><\/li><li><a href=\"#532aaf80-f812-47ae-9a20-0b12d101aacb\">Condition<\/a><\/li><li><a href=\"#0245bd74-6fc1-470b-b562-907250503a72\">sigma_1<\/a><\/li><li><a href=\"#574d19eb-ad2d-4ebd-b451-8141696e0431\">sigma_n<\/a><\/li><li><a href=\"#8e373526-14f7-4cf3-b08e-807cc5106d2f\">sigma_1\/sigma_n<\/a><\/li><li><a href=\"#d064d4d3-26d6-4255-b4d3-e4f6669b328b\">Overflow<\/a><\/li><li><a href=\"#a031341e-49db-4d70-ba25-4115efd35419\">Thanks<\/a><\/li><\/ul><\/div><h4>Email<a name=\"6f5315d6-ad46-4b35-8764-677b167da963\"><\/a><\/h4><p>A couple of weeks ago I got this email from Jack Dongarra at the University of Tennessee.<\/p><p><tt>For this new HPL-AI benchmark, I'm looking for a matrix that is not symmetric, is easily generated (roughly O(n^2) ops to generate), is dense, doesn&#8217;t require pivoting to control growth, and has a smallish condition number (&lt;O(10^5)).<\/tt><\/p><p><tt>There are two steps for the benchmark:<\/tt><\/p><p><tt>1) do LU on the matrix in short precision and compute an approximation solution x. (On Nvidia this is 6 times faster in half precision then in DP.)<\/tt><\/p><p><tt>2) do GMRES on with L and U as preconditions with x as the starting point (this usually takes 3-4 iterations to converge.)<\/tt><\/p><p><tt>The benchmark details are here: <a href=\"http:\/\/bit.ly\/hpl-ai\">http:\/\/bit.ly\/hpl-ai<\/a>.<\/tt><\/p><p><tt>Any ideas for a matrix?<\/tt><\/p><h4>Symmetric Hilbert Matrix<a name=\"ec0c12b2-744f-4a38-950a-cc5be7fd721c\"><\/a><\/h4><p>One of the first matrices I ever studied was the <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2017\/06\/07\/hilbert-matrices-2\/\">Hilbert matrix.<\/a> With the singleton expansion features now in MATLAB, the traditional symmetric Hilbert matrix can be generated by<\/p><pre class=\"codeinput\">  format <span class=\"string\">rat<\/span>\r\n  n = 5;\r\n  i = 1:n\r\n  j = i'\r\n  H = 1.\/(i+j-1)\r\n<\/pre><pre class=\"codeoutput\">i =\r\n       1              2              3              4              5       \r\nj =\r\n       1       \r\n       2       \r\n       3       \r\n       4       \r\n       5       \r\nH =\r\n       1              1\/2            1\/3            1\/4            1\/5     \r\n       1\/2            1\/3            1\/4            1\/5            1\/6     \r\n       1\/3            1\/4            1\/5            1\/6            1\/7     \r\n       1\/4            1\/5            1\/6            1\/7            1\/8     \r\n       1\/5            1\/6            1\/7            1\/8            1\/9     \r\n<\/pre><p>When you add the row vector <tt>i<\/tt> to the column vector <tt>j<\/tt>, the resulting <tt>i+j<\/tt> is a matrix.  Compare this with the <i>inner product<\/i>, <tt>i*j<\/tt>, which produces a scalar, and the <i>outer product<\/i>, <tt>j*i<\/tt>, which produces another matrix.  The expression <tt>i+j<\/tt> can be thought of as an \"outer sum\".<\/p><p>Two more scalar expansions complete the computation; <tt>i+j-1<\/tt> subtracts 1 from each element of the sum and <tt>1.\/(i+j-1)<\/tt> divides 1 by each element in the denominator.  Both of these scalar expansions have been in MATLAB from its earliest days.<\/p><p>The elements of Hilbert matrices depend only on the sum <tt>i+j<\/tt>, so they are constant on the antidiagonals.  Such matrices are known as Cauchy matrices.  There are fast algorithms for solving system of equations involving Cauchy matrices in $O(n^2)$ time.<\/p><h4>Nonsymmetric Hilbert Matrix<a name=\"c4ed2873-690a-44c2-9287-ef44ffcc9484\"><\/a><\/h4><p>Some years ago, I wanted a nonsymmetric test matrix, so I changed the plus sign in the denominator of the Hilbert matrix to a minus sign. But this created a division by zero along the subdiagonal, so I also changed the -1 to +1\/2.  I call this the \"nonsymmetric Hilbert matrix.\"<\/p><pre class=\"codeinput\">   H = 1.\/(i-j+1\/2)\r\n<\/pre><pre class=\"codeoutput\">H =\r\n       2              2\/3            2\/5            2\/7            2\/9     \r\n      -2              2              2\/3            2\/5            2\/7     \r\n      -2\/3           -2              2              2\/3            2\/5     \r\n      -2\/5           -2\/3           -2              2              2\/3     \r\n      -2\/7           -2\/5           -2\/3           -2              2       \r\n<\/pre><p>The elements depend only on <tt>i-j<\/tt> and so are constant on the diagonals. The matrices are Toeplitz and, again, such systems can be solved with $O(n^2)$ operations.<\/p><p>I was astonished when I first computed the singular values of nonsymmetric Hilbert matrices -- most of them were nearly equal to $\\pi$.  Already with only the 5-by-5 we see five decimal places.<\/p><pre class=\"codeinput\">   format <span class=\"string\">long<\/span>\r\n   sigma = svd(H)\r\n<\/pre><pre class=\"codeoutput\">sigma =\r\n   3.141589238341321\r\n   3.141272220456508\r\n   3.129520593545258\r\n   2.921860834913688\r\n   1.461864493324889\r\n<\/pre><p>This was <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2014\/02\/03\/surprising-svd-square-waves-and-pi\/\">a big surprise.<\/a>  The eventual explanation, provided by <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/0024379586902806\">Seymour Parter,<\/a> involves a theorem of Szego from the 1920's relating the eigenvalues of Toeplitz matrices to Fourier series and square waves.<\/p><h4>Benchmark Matrix<a name=\"5aeb53c5-275a-4d87-bb57-1cd00c4a7bdd\"><\/a><\/h4><p>I have been working with Jack's colleague Piotr Luszczek. This is our suggestion for the HPL-AI Benchmark Matrix. It depends upon two parameters, <tt>n<\/tt> and <tt>mu<\/tt>.  As <tt>mu<\/tt> goes from 1 to -1, the core of this matrix goes from the symmetric to the nonsymmetric Hilbert matrix.  Adding <tt>n<\/tt> in the denominator and 1's on the diagonal produces diagonal dominance.<\/p><pre class=\"codeinput\">   A = 1.\/(i + mu*j + n) + double(i==j);\r\n<\/pre><pre class=\"codeinput\">   clear <span class=\"string\">A<\/span>\r\n   type <span class=\"string\">A<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\nfunction Aout = A(n,mu)\r\n% A(n,mu).  Matrix for the HPL-AI Benchmark.  mu usually in [-1, 1].\r\n    i = 1:n;\r\n    j = i';\r\n    Aout = 1.\/(i+mu*j+n) + eye(n,n);\r\nend\r\n\r\n<\/pre><p>If <tt>mu<\/tt> is not equal to +1, this matrix is not Cauchy. And, if <tt>mu<\/tt> is not equal to -1, it is not Toeplitz. Furthermore, if <tt>mu<\/tt> is greater than -1, it is diagonally dominant and, as we shall see, very well-conditioned.<\/p><h4>Example<a name=\"d3942f58-edb5-48f0-891c-7a7ecec1af6b\"><\/a><\/h4><p>Here is the 5x5 with rational output.  When <tt>mu<\/tt> is +1, we get a section of a symmetric Hilbert matrix plus 1's on the diagonal.<\/p><pre class=\"codeinput\">   format <span class=\"string\">rat<\/span>\r\n   mu = 1;\r\n   A(n,mu)\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n       8\/7            1\/8            1\/9            1\/10           1\/11    \r\n       1\/8           10\/9            1\/10           1\/11           1\/12    \r\n       1\/9            1\/10          12\/11           1\/12           1\/13    \r\n       1\/10           1\/11           1\/12          14\/13           1\/14    \r\n       1\/11           1\/12           1\/13           1\/14          16\/15    \r\n<\/pre><p>And when <tt>mu<\/tt> is -1, the matrix is far from symmetric. The element in the far southwest corner almost dominates the diagonal.<\/p><pre class=\"codeinput\">   mu = -1;\r\n   A(n,mu)\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n       6\/5            1\/6            1\/7            1\/8            1\/9     \r\n       1\/4            6\/5            1\/6            1\/7            1\/8     \r\n       1\/3            1\/4            6\/5            1\/6            1\/7     \r\n       1\/2            1\/3            1\/4            6\/5            1\/6     \r\n       1              1\/2            1\/3            1\/4            6\/5     \r\n<\/pre><h4>Animation<a name=\"58364797-0112-4455-89cb-4d4e759bbd52\"><\/a><\/h4><p>Here is an animation of the matrix without the addition of <tt>eye(n,n)<\/tt>. As <tt>mu<\/tt> goes to -1 to 1, <tt>A(n,mu)-eye(n,n)<\/tt> goes from nonsymmetric to symmetric.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/HPLAI_matrix.gif\" alt=\"\"> <\/p><h4>Gaussian elimination<a name=\"781fa356-fe10-46fb-b54d-8924359bdada\"><\/a><\/h4><p>If mu &gt; -1, the largest element is on the diagonal and <tt>lu(A(n,mu))<\/tt> does no pivoting.<\/p><pre class=\"codeinput\">   format <span class=\"string\">short<\/span>\r\n   mu = -1\r\n   [L,U] = lu(A(n,mu))\r\n\r\n   mu = 1\r\n   [L,U] = lu(A(n,mu))\r\n<\/pre><pre class=\"codeoutput\">mu =\r\n    -1\r\nL =\r\n    1.0000         0         0         0         0\r\n    0.2083    1.0000         0         0         0\r\n    0.2778    0.1748    1.0000         0         0\r\n    0.4167    0.2265    0.1403    1.0000         0\r\n    0.8333    0.3099    0.1512    0.0839    1.0000\r\nU =\r\n    1.2000    0.1667    0.1429    0.1250    0.1111\r\n         0    1.1653    0.1369    0.1168    0.1019\r\n         0         0    1.1364    0.1115    0.0942\r\n         0         0         0    1.1058    0.0841\r\n         0         0         0         0    1.0545\r\nmu =\r\n     1\r\nL =\r\n    1.0000         0         0         0         0\r\n    0.1094    1.0000         0         0         0\r\n    0.0972    0.0800    1.0000         0         0\r\n    0.0875    0.0729    0.0626    1.0000         0\r\n    0.0795    0.0669    0.0580    0.0513    1.0000\r\nU =\r\n    1.1429    0.1250    0.1111    0.1000    0.0909\r\n         0    1.0974    0.0878    0.0800    0.0734\r\n         0         0    1.0731    0.0672    0.0622\r\n         0         0         0    1.0581    0.0542\r\n         0         0         0         0    1.0481\r\n<\/pre><h4>Condition<a name=\"532aaf80-f812-47ae-9a20-0b12d101aacb\"><\/a><\/h4><p>Let's see how the 2-condition number, $\\sigma_1\/\\sigma_n$, varies as the parameters vary.  Computing the singular values for all the matrices in the following plots takes a little over 10 minutes on my laptop.<\/p><h4>sigma_1<a name=\"0245bd74-6fc1-470b-b562-907250503a72\"><\/a><\/h4><pre class=\"codeinput\">   load <span class=\"string\">HPLAI.mat<\/span> <span class=\"string\">s1<\/span> <span class=\"string\">sn<\/span> <span class=\"string\">n<\/span> <span class=\"string\">mu<\/span>\r\n\r\n   surf(mu,n,s1)\r\n   view(25,12)\r\n   xlabel(<span class=\"string\">'mu'<\/span>)\r\n   ylabel(<span class=\"string\">'n'<\/span>)\r\n   set(gca,<span class=\"string\">'xlim'<\/span>,[-0.9 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'xtick'<\/span>,[-0.9 -0.5 0 0.5 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'ylim'<\/span>,[0 2500])\r\n   title(<span class=\"string\">'\\sigma_1'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/HPLAI_blog_01.png\" alt=\"\"> <p>The colors vary as <tt>mu<\/tt> varies from -0.9 up to 1.0.  I am staying away from <tt>mu<\/tt> = -1.0 where the matrix looses diagonal dominance.  The other horizontal axis is the matrix order <tt>n<\/tt>.  I have limited <tt>n<\/tt> to 2500 in these experiments.<\/p><p>The point to be made is that if <tt>mu<\/tt> &gt; -0.9, then the largest singular value is bounded, in fact <tt>sigma_1<\/tt> &lt; 2.3.<\/p><h4>sigma_n<a name=\"574d19eb-ad2d-4ebd-b451-8141696e0431\"><\/a><\/h4><pre class=\"codeinput\">   surf(mu,n,sn)\r\n   view(25,12)\r\n   xlabel(<span class=\"string\">'mu'<\/span>)\r\n   ylabel(<span class=\"string\">'n'<\/span>)\r\n   set(gca,<span class=\"string\">'xlim'<\/span>,[-0.9 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'xtick'<\/span>,[-0.9 -0.5 0 0.5 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'ylim'<\/span>,[0 2500])\r\n   title(<span class=\"string\">'\\sigma_n'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/HPLAI_blog_02.png\" alt=\"\"> <p>If <tt>mu<\/tt> &gt; -0.9, then the smallest singular value is bounded away from zero, in fact <tt>sigma_n<\/tt> &gt; .75.<\/p><h4>sigma_1\/sigma_n<a name=\"8e373526-14f7-4cf3-b08e-807cc5106d2f\"><\/a><\/h4><pre class=\"codeinput\">   surf(mu,n,s1.\/sn)\r\n   view(25,12)\r\n   xlabel(<span class=\"string\">'mu'<\/span>)\r\n   ylabel(<span class=\"string\">'n'<\/span>)\r\n   set(gca,<span class=\"string\">'xlim'<\/span>,[-0.9 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'xtick'<\/span>,[-0.9 -0.5 0 0.5 1.0], <span class=\"keyword\">...<\/span>\r\n       <span class=\"string\">'ylim'<\/span>,[0 2500])\r\n   title(<span class=\"string\">'\\sigma_1\/\\sigma_n'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/HPLAI_blog_03.png\" alt=\"\"> <p>If <tt>mu<\/tt> &gt; -0.9, then the condition number in the 2-norm is bounded, <tt>sigma_1\/sigma_n<\/tt> &lt; 2.3\/.75 $\\approx$ 3.0 .<\/p><h4>Overflow<a name=\"d064d4d3-26d6-4255-b4d3-e4f6669b328b\"><\/a><\/h4><p>The HPL-AI benchmark would begin by computing the LU decomposition of this matrix using 16-bit <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2019\/02\/18\/experiments-with-variable-format-half-precision\/\">half-precision floating point arithmetic<\/a>, FP16. So, there is a signigant problem if the matrix order <tt>n<\/tt> is larger than 65504. This value is <tt>realmax<\/tt>, the largest number that can be represented in FP16.  Any larger value of <tt>n<\/tt> overflows to <tt>Inf<\/tt>.<\/p><p>Some kind of scaling is going to have to be done for <tt>n<\/tt> &gt; 65504. Right now, I don't see what this might be.<\/p><p>The alternative half-precision format to FP16 is Bfloat16, which has three more bits in the exponent to give <tt>realmax<\/tt> = 3.39e38. There should be no overflow problems while generating this matrix with Bfloat16.<\/p><h4>Thanks<a name=\"a031341e-49db-4d70-ba25-4115efd35419\"><\/a><\/h4><p>Thanks to Piotr Luszczek for working with me on this.  We have more work to do.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_090b2b07d699444096989550bd6fcef7() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='090b2b07d699444096989550bd6fcef7 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 090b2b07d699444096989550bd6fcef7';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2019 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_090b2b07d699444096989550bd6fcef7()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2019b<br><\/p><\/div><!--\r\n090b2b07d699444096989550bd6fcef7 ##### SOURCE BEGIN #####\r\n%% A Matrix for the New HPL-AI Benchmark\r\n% My colleagues are looking for a matrix to be used in a new benchmark.\r\n% They've come to the right place.\r\n\r\n%% Email\r\n% A couple of weeks ago I got this email from Jack Dongarra at the\r\n% University of Tennessee.\r\n%\r\n% |For this new HPL-AI benchmark, I'm looking for a matrix that is\r\n% not symmetric, is easily generated (roughly O(n^2) ops to generate),\r\n% is dense, doesn\u00e2\u20ac&#x2122;t require pivoting to control growth,\r\n% and has a smallish condition number (<O(10^5)).|\r\n%\r\n% |There are two steps for the benchmark:|\r\n%\r\n% |1) do LU on the matrix in short precision and compute an \r\n% approximation solution x. (On Nvidia this is 6 times faster\r\n% in half precision then in DP.)|\r\n%\r\n% |2) do GMRES on with L and U as preconditions with x as the starting\r\n% point (this usually takes 3-4 iterations to converge.)|\r\n%\r\n% |The benchmark details are here:\r\n% <http:\/\/bit.ly\/hpl-ai>.|\r\n%\r\n% |Any ideas for a matrix?|\r\n\r\n%% Symmetric Hilbert Matrix\r\n% One of the first matrices I ever studied was the\r\n% <https:\/\/blogs.mathworks.com\/cleve\/2017\/06\/07\/hilbert-matrices-2\/\r\n% Hilbert matrix.>\r\n% With the singleton expansion features now in MATLAB, the\r\n% traditional symmetric Hilbert matrix can be generated by\r\n\r\n  format rat\r\n  n = 5;\r\n  i = 1:n\r\n  j = i'\r\n  H = 1.\/(i+j-1)\r\n  \r\n%%\r\n% When you add the row vector |i| to the column vector |j|, the resulting\r\n% |i+j| is a matrix.  Compare this with the _inner product_, |i*j|,\r\n% which produces a scalar, and the _outer product_, |j*i|, which\r\n% produces another matrix.  The expression |i+j| can be thought of as an\r\n% \"outer sum\". \r\n%\r\n% Two more scalar expansions complete the computation; |i+j-1| subtracts 1\r\n% from each element of the sum and |1.\/(i+j-1)| divides 1 by each element\r\n% in the denominator.  Both of these scalar expansions have been in MATLAB\r\n% from its earliest days.\r\n%\r\n% The elements of Hilbert matrices depend only on the sum |i+j|, so\r\n% they are constant on the antidiagonals.  Such matrices are known as\r\n% Cauchy matrices.  There are fast algorithms for solving system of\r\n% equations involving Cauchy matrices in $O(n^2)$ time.\r\n\r\n%% Nonsymmetric Hilbert Matrix\r\n% Some years ago, I wanted a nonsymmetric test matrix, so I changed the\r\n% plus sign in the denominator of the Hilbert matrix to a minus sign.\r\n% But this created a division by zero along the subdiagonal,\r\n% so I also changed the -1 to +1\/2.  I call this the \"nonsymmetric\r\n% Hilbert matrix.\"\r\n\r\n   H = 1.\/(i-j+1\/2)\r\n   \r\n%%\r\n% The elements depend only on |i-j| and so are constant on the diagonals.\r\n% The matrices are Toeplitz and, again, such systems can be solved with\r\n% $O(n^2)$ operations.\r\n\r\n%%\r\n% I was astonished when I first computed the singular values of\r\n% nonsymmetric Hilbert matrices REPLACE_WITH_DASH_DASH most of them were nearly equal to\r\n% $\\pi$.  Already with only the 5-by-5 we see five decimal places.\r\n\r\n   format long\r\n   sigma = svd(H)\r\n   \r\n%%\r\n% This was \r\n% <https:\/\/blogs.mathworks.com\/cleve\/2014\/02\/03\/surprising-svd-square-waves-and-pi\/\r\n% a big surprise.>  The eventual explanation, provided by\r\n% <https:\/\/www.sciencedirect.com\/science\/article\/pii\/0024379586902806\r\n% Seymour Parter,> involves a theorem of Szego from the 1920's relating\r\n% the eigenvalues of Toeplitz matrices to Fourier series and\r\n% square waves.\r\n\r\n%% Benchmark Matrix\r\n% I have been working with Jack's colleague Piotr Luszczek.\r\n% This is our suggestion for the HPL-AI Benchmark Matrix.\r\n% It depends upon two parameters, |n| and |mu|.  As |mu| goes from 1 to -1,\r\n% the core of this matrix goes from the symmetric to the nonsymmetric\r\n% Hilbert matrix.  Adding |n| in the denominator and 1's on the diagonal\r\n% produces diagonal dominance.\r\n\r\n   A = 1.\/(i + mu*j + n) + double(i==j);\r\n\r\n%%\r\n\r\n   clear A\r\n   type A\r\n   \r\n%% \r\n% If |mu| is not equal to +1, this matrix is not Cauchy.\r\n% And, if |mu| is not equal to -1, it is not Toeplitz.\r\n% Furthermore, if |mu| is greater than -1, it is diagonally dominant\r\n% and, as we shall see, very well-conditioned.   \r\n\r\n%% Example\r\n% Here is the 5x5 with rational output.  When |mu| is +1, we get\r\n% a section of a symmetric Hilbert matrix plus 1's on the diagonal.\r\n\r\n   format rat\r\n   mu = 1;\r\n   A(n,mu)\r\n   \r\n%%\r\n% And when |mu| is -1, the matrix is far from symmetric.\r\n% The element in the far southwest corner almost dominates the diagonal.\r\n\r\n   mu = -1;\r\n   A(n,mu)\r\n   \r\n%% Animation\r\n% Here is an animation of the matrix without the addition of |eye(n,n)|.\r\n% As |mu| goes to -1 to 1, |A(n,mu)-eye(n,n)|\r\n% goes from nonsymmetric to symmetric.\r\n%\r\n% <<HPLAI_matrix.gif>>\r\n\r\n%% Gaussian elimination\r\n% If mu > -1, the largest element is on the diagonal and |lu(A(n,mu))|\r\n% does no pivoting.\r\n\r\n   format short\r\n   mu = -1\r\n   [L,U] = lu(A(n,mu))\r\n   \r\n   mu = 1\r\n   [L,U] = lu(A(n,mu))\r\n   \r\n%% Condition\r\n% Let's see how the 2-condition number, $\\sigma_1\/\\sigma_n$, varies as the\r\n% parameters vary.  Computing the singular values for all\r\n% the matrices in the following plots takes a little over 10 minutes\r\n% on my laptop.\r\n\r\n%% sigma_1\r\n\r\n   load HPLAI.mat s1 sn n mu\r\n   \r\n   surf(mu,n,s1)\r\n   view(25,12)\r\n   xlabel('mu')\r\n   ylabel('n')\r\n   set(gca,'xlim',[-0.9 1.0], ...\r\n       'xtick',[-0.9 -0.5 0 0.5 1.0], ...\r\n       'ylim',[0 2500])\r\n   title('\\sigma_1')\r\n   \r\n%%\r\n% The colors vary as |mu| varies from -0.9 up to 1.0.  I am staying away\r\n% from |mu| = -1.0 where the matrix looses diagonal dominance.  The other\r\n% horizontal axis is the matrix order |n|.  I have limited |n| to 2500\r\n% in these experiments.\r\n%\r\n% The point to be made is that if |mu| > -0.9, then the largest singular\r\n% value is bounded, in fact |sigma_1| < 2.3.\r\n\r\n%% sigma_n\r\n\r\n   surf(mu,n,sn)\r\n   view(25,12)\r\n   xlabel('mu')\r\n   ylabel('n')\r\n   set(gca,'xlim',[-0.9 1.0], ...\r\n       'xtick',[-0.9 -0.5 0 0.5 1.0], ...\r\n       'ylim',[0 2500])\r\n   title('\\sigma_n')\r\n   \r\n%%\r\n% If |mu| > -0.9, then the smallest singular value is bounded away\r\n% from zero, in fact |sigma_n| > .75.\r\n\r\n%% sigma_1\/sigma_n\r\n\r\n   surf(mu,n,s1.\/sn)\r\n   view(25,12)\r\n   xlabel('mu')\r\n   ylabel('n')\r\n   set(gca,'xlim',[-0.9 1.0], ...\r\n       'xtick',[-0.9 -0.5 0 0.5 1.0], ...\r\n       'ylim',[0 2500])\r\n   title('\\sigma_1\/\\sigma_n')\r\n   \r\n%%\r\n% If |mu| > -0.9, then the condition number in the 2-norm is bounded,\r\n% |sigma_1\/sigma_n| < 2.3\/.75 $\\approx$ 3.0 .\r\n\r\n%% Overflow\r\n% The HPL-AI benchmark would begin by computing the LU decomposition\r\n% of this matrix using 16-bit\r\n% <https:\/\/blogs.mathworks.com\/cleve\/2019\/02\/18\/experiments-with-variable-format-half-precision\/\r\n% half-precision floating point arithmetic>, FP16.\r\n% So, there is a signigant problem if the matrix order |n| is larger than\r\n% 65504. This value is |realmax|, the largest number that can be\r\n% represented in FP16.  Any larger value of |n| overflows to |Inf|.\r\n%\r\n% Some kind of scaling is going to have to be done for |n| > 65504.\r\n% Right now, I don't see what this might be.\r\n%\r\n% The alternative half-precision format to FP16 is Bfloat16, which\r\n% has three more bits in the exponent to give |realmax| = 3.39e38.\r\n% There should be no overflow problems while generating this matrix\r\n% with Bfloat16.\r\n\r\n%% Thanks\r\n% Thanks to Piotr Luszczek for working with me on this.  We have more\r\n% work to do.\r\n##### SOURCE END ##### 090b2b07d699444096989550bd6fcef7\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/HPLAI_blog_03.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><!--introduction--><p>My colleagues are looking for a matrix to be used in a new benchmark. They've come to the right place.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2019\/12\/04\/a-matrix-for-the-new-hpl-ai-benchmark\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":5611,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,16,14,7,19],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/5601"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=5601"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/5601\/revisions"}],"predecessor-version":[{"id":5641,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/5601\/revisions\/5641"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media\/5611"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=5601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=5601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=5601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}