<div dir="ltr">Michael,<div>The problem looks very strange. We haven&#39;t come across such an issue (in glusterfs) so far. However I do recall seeing such bit flips at a customer site in the past, and in the end it was diagnosed to be a hardware issue. Can you retry a few runs of same rsync directly to the backends through nfs or rsyncd (without involving gluster or ssh), from the same client and same data set, to both the servers and see if you can reproduce such a md5sum/sha1sum mismatch?</div>
<div><br></div><div>Avati</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Aug 26, 2013 at 7:03 AM, Michael Peek <span dir="ltr">&lt;<a href="mailto:peek@nimbios.org" target="_blank">peek@nimbios.org</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    Hi gurus,<br>
    <br>
    This is a follow-up to a previous report about data integrity
    problems with Gluster 3.4.0.  I will be as thorough as I can, but
    this is already a pretty long post.  So feel free to see my previous
    post for more information specific to my previous run of tests.<br>
    <br>
    <ol>
      <li>I am running a fully up-to-date version of Ubuntu 12.04, with
        Gluster 3.4.0final-ubuntu1~precise1.<br>
        <br>
      </li>
      <li>My cluster consists of four nodes.  Each node consists of:</li>
      <ol>
        <li>Hostnames: bkupc1-a -to- bkupc1-d</li>
        <li>Bricks: Each host has /export/a/glusterfs/ and
          /export/b/glusterfs/, which are 4TB ext4 drives</li>
        <li>Clients: I have a client that mounts the volume as
          /data/bkupc1/ using the fuse driver.<br>
          <br>
        </li>
      </ol>
      <li>My volume was created with:<br>
        /usr/sbin/gluster peer probe bkupc1-a<br>
        /usr/sbin/gluster peer probe bkupc1-b<br>
        /usr/sbin/gluster peer probe bkupc1-c<br>
        /usr/sbin/gluster peer probe bkupc1-d<br>
        /usr/sbin/gluster volume create bkupc1 replica 2 transport tcp \<br>
            bkupc1-a:/export/a/glusterfs   bkupc1-b:/export/a/glusterfs
        \<br>
            bkupc1-c:/export/a/glusterfs   bkupc1-d:/export/a/glusterfs
        \<br>
            bkupc1-a:/export/b/glusterfs   bkupc1-b:/export/b/glusterfs
        \<br>
            bkupc1-c:/export/b/glusterfs   bkupc1-d:/export/b/glusterfs<br>
        /usr/sbin/gluster volume set bkupc1 auth.allow {list of IP
        addresses}<br>
        <br>
      </li>
      <li>On the client I have a 1TB drive filled with 900+GB of data in
        156,554 test files.  These files are encrypted backups that are
        dispersed throughout many subdirectories.  They are ugly to look
        at.  Here&#39;s an example:<br>
        <br>
        data/<br>
        884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
        884a7040-0443-11e3-b8fb-f46d04e15793/<br>
        8825c6c8-0443-11e3-b8fb-f46d04e15793/<br>
        880f8f0c-0443-11e3-b8fb-f46d04e15793/<br>
iMmV,UqdiqZRie5QUu341iRS7s,-OK7PzXSuPgr0o30yNDXNG6uvqA0Wyr7RRR3MBE4<br>
        <br>
        &lt;Line breaks for readability&gt;<br>
        <br>
        I have pre-calculated MD5 and SHA1 checksums for all of these
        files, and I have verified that the checksums are correct on the
        client drive.<br>
        <br>
      </li>
      <li>My first set of runs involved using rsync.  Nothing fancy
        here:</li>
      <ol>
        <li>The volume is empty when I begin</li>
        <li>I create /data/bkupc1/BACKUPS-rsync.${timestamp}/</li>
        <li>Use rsync to copy files from the client to the volume</li>
        <li>Here&#39;s my script:<br>
          <tt>#!/bin/bash -x</tt><tt><br>
          </tt><tt><br>
          </tt><tt>timestamp=&quot;${1}&quot;</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>mkdir /data/bkupc1/BACKUPS-rsync.${timestamp}</tt><tt><br>
          </tt><tt><br>
          </tt><tt>rsync \</tt><tt><br>
          </tt><tt>    -a \</tt><tt><br>
          </tt><tt>    -v \</tt><tt><br>
          </tt><tt>    --delete \</tt><tt><br>
          </tt><tt>    --delete-excluded \</tt><tt><br>
          </tt><tt>    --force \</tt><tt><br>
          </tt><tt>    --ignore-errors \</tt><tt><br>
          </tt><tt>    --one-file-system \</tt><tt><br>
          </tt><tt>    --stats \</tt><tt><br>
          </tt><tt>    --inplace \</tt><tt><br>
          </tt><tt>    ./ \</tt><tt><br>
          </tt><tt>    /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>    #</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>(\</tt><tt><br>
          </tt><tt>    cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>    &amp;&amp; md5sum -c --quiet md5sums \</tt><tt><br>
          </tt><tt>)</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>(\</tt><tt><br>
          </tt><tt>    cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>    &amp;&amp; sha1sum -c --quiet sha1sums \</tt><tt><br>
          </tt><tt>)</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/usr/bin/diff -r -q ./
            /data/bkupc1/BACKUPS-rsync.${timestamp}/</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><br>
          <br>
        </li>
        <li>As you can see from the script, after rsyncing, I check the
          files on the volume</li>
        <ol>
          <li>Against their MD5 checksums</li>
          <li>Then against their SHA1 checksums</li>
          <li>Then, just to beat a dead horse, I use diff to do a
            byte-for-byte check between the files on the client and the
            files on the volume.  (Note to self: I should replace diff
            with cmp, as I have run into &quot;out of memory&quot; errors with
            diff on files that cmp can handle just fine.)<br>
            <br>
          </li>
        </ol>
        <li>What I have found is that about 50% of the time, there will
          be one or two files out of those 156,554 that differ.  I
          documented my findings in more detail in my previous email.<br>
          <br>
        </li>
      </ol>
      <li>One though that occurred to me is that this could be the fault
        of rsync.  So I have repeated the tests using plain old
        /bin/cp.  Here&#39;s my (very similar) script:<br>
        <tt>#!/bin/bash -x</tt><tt><br>
        </tt><tt><br>
        </tt><tt>timestamp=&quot;${1}&quot;</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>mkdir /data/bkupc1/BACKUPS-cp.${timestamp}</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/cp -ar ./ /data/bkupc1/BACKUPS-cp.${timestamp}/</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>(\</tt><tt><br>
        </tt><tt>    cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \</tt><tt><br>
        </tt><tt>    &amp;&amp; md5sum -c --quiet md5sums \</tt><tt><br>
        </tt><tt>)</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>(\</tt><tt><br>
        </tt><tt>    cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \</tt><tt><br>
        </tt><tt>    &amp;&amp; sha1sum -c --quiet sha1sums \</tt><tt><br>
        </tt><tt>)</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/usr/bin/diff -r -q ./
          /data/bkupc1/BACKUPS-cp.${timestamp}/</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><br>
        <br>
      </li>
      <li>Results:</li>
      <ol>
        <li>Output from the script:<tt><br>
          </tt><tt>+ timestamp=20130821-081918</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 08:19:18 EDT 2013</tt><tt><br>
          </tt><tt>+ mkdir /data/bkupc1/BACKUPS-cp.20130821-081918</tt><tt><br>
          </tt><tt>+ /bin/cp -ar ./
            /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 13:51:53 EDT 2013</tt><tt><br>
          </tt><tt>+ cd /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ md5sum -c --quiet md5sums</tt><tt><br>
          </tt><tt>data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87f54d22-0443-11e3-b8fb-f46d04e15793/KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR:
            FAILED</tt><tt><br>
          </tt><tt>data/a34bc588-0443-11e3-b8fb-f46d04e15793/a34a8bf0-0443-11e3-b8fb-f46d04e15793/a3494b3c-0443-11e3-b8fb-f46d04e15793/a34808b2-0443-11e3-b8fb-f46d04e15793/a346cd08-0443-11e3-b8fb-f46d04e15793/a3456e2c-0443-11e3-b8fb-f46d04e15793/a344366a-0443-11e3-b8fb-f46d04e15793/8c9e94a0-0443-11e3-b8fb-f46d04e15793/NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1:
            FAILED</tt><tt><br>
          </tt><tt>md5sum: WARNING: 2 computed checksums did NOT match</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 16:54:13 EDT 2013</tt><tt><br>
          </tt><tt>+ cd /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ sha1sum -c --quiet sha1sums</tt><tt><br>
          </tt><tt>data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/8825c6c8-0443-11e3-b8fb-f46d04e15793/8810e21c-0443-11e3-b8fb-f46d04e15793/7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1:
            FAILED</tt><tt><br>
          </tt><tt>sha1sum: WARNING: 1 computed checksum did NOT match</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 19:54:29 EDT 2013</tt><tt><br>
          </tt><tt>+ /usr/bin/diff -r -q ./
            /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Thu Aug 22 00:16:53 EDT 2013</tt><br>
          <br>
        </li>
        <li>A listing of files that were reported as different (line
          breaks for readability):</li>
        <ol>
          <li>MD5 failure:<br>
            data/<br>
            884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
            884a7040-0443-11e3-b8fb-f46d04e15793/<br>
            87fdc790-0443-11e3-b8fb-f46d04e15793/<br>
            87f54d22-0443-11e3-b8fb-f46d04e15793/<br>
KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR<br>
            <br>
          </li>
          <li>MD5 failure:<br>
            data/<br>
            a34bc588-0443-11e3-b8fb-f46d04e15793/<br>
            a34a8bf0-0443-11e3-b8fb-f46d04e15793/<br>
            a3494b3c-0443-11e3-b8fb-f46d04e15793/<br>
            a34808b2-0443-11e3-b8fb-f46d04e15793/<br>
            a346cd08-0443-11e3-b8fb-f46d04e15793/<br>
            a3456e2c-0443-11e3-b8fb-f46d04e15793/<br>
            a344366a-0443-11e3-b8fb-f46d04e15793/<br>
            8c9e94a0-0443-11e3-b8fb-f46d04e15793/<br>
            NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1<br>
            <br>
          </li>
          <li>SHA1 failure:<br>
            data/<br>
            884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
            884a7040-0443-11e3-b8fb-f46d04e15793/<br>
            8825c6c8-0443-11e3-b8fb-f46d04e15793/<br>
            8810e21c-0443-11e3-b8fb-f46d04e15793/<br>
7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1<br>
            <br>
          </li>
        </ol>
        <li>A byte-for-byte comparison:</li>
        <ol>
          <li>File from 7.2.1 above:<br>
(KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR)</li>
          <ol>
            <li>After the test, this file exist on three locations:<br>
              client:/export/d/eraseme/  &lt;-- the original<br>
              bkupc1-a:/export/a/glusterfs/  &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-b:/export/a/glusterfs/  &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- bkupc1-a (directly
              from the brick)<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- bkupc1-b (directly
              from the brick)<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- client<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- volume (from the
              mount via the fuse driver)<br>
              <br>
              NOTE: There is no difference between the MD5 checksums<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- bkupc1-a<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- bkupc1-b<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- client<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- volume<br>
              <br>
              NOTE: There is no difference between the SHA1 checksums<br>
              <br>
            </li>
            <li>Both /usr/bin/diff and /usr/bin/cmp report no difference
              between these files.<br>
              <br>
            </li>
          </ol>
          <li>File 7.2.2 from above:<br>
            (NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1)</li>
          <ol>
            <li>After the test, this file exists on three locations:<br>
              client:/export/d/eraseme/  &lt;-- the original<br>
              bkupc1-a:/export/a/glusterfs/  &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-b:/export/a/glusterfs/  &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- bkupc1-a<br>
              77fdce4ebe9e94f611848d174de01357 &lt;-- bkupc1-b<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- client<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- volume<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- bkupc1-a<br>
              0254c117b92ca95987aa7389980fb0bcc850e9c5 &lt;-- bkupc1-b<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- client<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- volume<br>
              <br>
            </li>
            <li>Byte differences: The output of /usr/bin/cmp -l, when
              comparing the version of the file on the client with the
              version of the file on bkupc1-b:<br>
              <br>
               3262724555 274 234<br>
              <br>
              If I&#39;m reading this right, then this means that the files
              differ by only one byte (274 vs. 234).<br>
              <br>
            </li>
          </ol>
          <li>File 7.2.3 from above:<br>
(7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1)</li>
          <ol>
            <li>After the test, this file exists on three locations:<br>
              client:/export/d/eraseme/  &lt;-- the original<br>
              bkupc1-c:/export/a/glusterfs/  &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-d:/export/a/glusterfs/  &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              05ea9e04df984cc7ed514f93dc79067e  &lt;-- bkupc1-c<br>
              868c0eafa2bde7386d808b722166a283  &lt;-- bkupc1-d<br>
              05ea9e04df984cc7ed514f93dc79067e  &lt;-- client<br>
              868c0eafa2bde7386d808b722166a283  &lt;-- volume<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              a6cf53c106b1856826db8de8b947273b05eb6391  &lt;-- bkupc1-c<br>
              f56e01f982028eed9c115f0696346861bf3b7169  &lt;-- bkupc1-d<br>
              a6cf53c106b1856826db8de8b947273b05eb6391  &lt;-- client<br>
              f56e01f982028eed9c115f0696346861bf3b7169  &lt;-- volume<br>
              <br>
            </li>
            <li>Byte differences: The output of /usr/bin/cmp -l, when
              comparing the version of the file on the client with the
              version of the file on bkupc1-d:<br>
              <br>
                181361479 226 206<br>
              <br>
            </li>
          </ol>
        </ol>
      </ol>
    </ol>
    <p>For 7.3.1, md5sum reported a difference between the client and
      the volume, even though re-comparing it with md5sum, sha1sum, cmp,
      and diff on both the mounted volume and the individual bricks
      showed no difference at all.  This would imply that there may be a
      data read error somewhere in gluster.<br>
    </p>
    <p>For 7.3.2 and 7.3.3, each file differed between the client, the
      volume, and between replica bricks, by exactly one byte (if I&#39;ve
      read the output from cmp correctly).  This would imply that there
      may be a data write error in gluster.<br>
    </p>
    (And please, for the love of Pete, tell me if I have done anything
    stupid, b/c I really wanted gluster to be the silver bullet solution
    to my data storage problems.)<span class="HOEnZb"><font color="#888888"><br>
    <p>Michael Peek<br>
    </p>
  </font></span></div>

<br>_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>