<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi gurus,<br>
    <br>
    This is a follow-up to a previous report about data integrity
    problems with Gluster 3.4.0.&nbsp; I will be as thorough as I can, but
    this is already a pretty long post.&nbsp; So feel free to see my previous
    post for more information specific to my previous run of tests.<br>
    <br>
    <ol>
      <li>I am running a fully up-to-date version of Ubuntu 12.04, with
        Gluster 3.4.0final-ubuntu1~precise1.<br>
        <br>
      </li>
      <li>My cluster consists of four nodes.&nbsp; Each node consists of:</li>
      <ol>
        <li>Hostnames: bkupc1-a -to- bkupc1-d</li>
        <li>Bricks: Each host has /export/a/glusterfs/ and
          /export/b/glusterfs/, which are 4TB ext4 drives</li>
        <li>Clients: I have a client that mounts the volume as
          /data/bkupc1/ using the fuse driver.<br>
          <br>
        </li>
      </ol>
      <li>My volume was created with:<br>
        /usr/sbin/gluster peer probe bkupc1-a<br>
        /usr/sbin/gluster peer probe bkupc1-b<br>
        /usr/sbin/gluster peer probe bkupc1-c<br>
        /usr/sbin/gluster peer probe bkupc1-d<br>
        /usr/sbin/gluster volume create bkupc1 replica 2 transport tcp \<br>
        &nbsp;&nbsp;&nbsp; bkupc1-a:/export/a/glusterfs &nbsp; bkupc1-b:/export/a/glusterfs
        \<br>
        &nbsp;&nbsp;&nbsp; bkupc1-c:/export/a/glusterfs &nbsp; bkupc1-d:/export/a/glusterfs
        \<br>
        &nbsp;&nbsp;&nbsp; bkupc1-a:/export/b/glusterfs &nbsp; bkupc1-b:/export/b/glusterfs
        \<br>
        &nbsp;&nbsp;&nbsp; bkupc1-c:/export/b/glusterfs &nbsp; bkupc1-d:/export/b/glusterfs<br>
        /usr/sbin/gluster volume set bkupc1 auth.allow {list of IP
        addresses}<br>
        <br>
      </li>
      <li>On the client I have a 1TB drive filled with 900+GB of data in
        156,554 test files.&nbsp; These files are encrypted backups that are
        dispersed throughout many subdirectories.&nbsp; They are ugly to look
        at.&nbsp; Here's an example:<br>
        <br>
        data/<br>
        884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
        884a7040-0443-11e3-b8fb-f46d04e15793/<br>
        8825c6c8-0443-11e3-b8fb-f46d04e15793/<br>
        880f8f0c-0443-11e3-b8fb-f46d04e15793/<br>
iMmV,UqdiqZRie5QUu341iRS7s,-OK7PzXSuPgr0o30yNDXNG6uvqA0Wyr7RRR3MBE4<br>
        <br>
        &lt;Line breaks for readability&gt;<br>
        <br>
        I have pre-calculated MD5 and SHA1 checksums for all of these
        files, and I have verified that the checksums are correct on the
        client drive.<br>
        <br>
      </li>
      <li>My first set of runs involved using rsync.&nbsp; Nothing fancy
        here:</li>
      <ol>
        <li>The volume is empty when I begin</li>
        <li>I create /data/bkupc1/BACKUPS-rsync.${timestamp}/</li>
        <li>Use rsync to copy files from the client to the volume</li>
        <li>Here's my script:<br>
          <tt>#!/bin/bash -x</tt><tt><br>
          </tt><tt><br>
          </tt><tt>timestamp="${1}"</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>mkdir /data/bkupc1/BACKUPS-rsync.${timestamp}</tt><tt><br>
          </tt><tt><br>
          </tt><tt>rsync \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; -a \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; -v \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --delete \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --delete-excluded \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --force \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --ignore-errors \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --one-file-system \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --stats \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; --inplace \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; ./ \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; #</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>(\</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; &amp;&amp; md5sum -c --quiet md5sums \</tt><tt><br>
          </tt><tt>)</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>(\</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \</tt><tt><br>
          </tt><tt>&nbsp;&nbsp;&nbsp; &amp;&amp; sha1sum -c --quiet sha1sums \</tt><tt><br>
          </tt><tt>)</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/usr/bin/diff -r -q ./
            /data/bkupc1/BACKUPS-rsync.${timestamp}/</tt><tt><br>
          </tt><tt><br>
          </tt><tt>/bin/date</tt><br>
          <br>
        </li>
        <li>As you can see from the script, after rsyncing, I check the
          files on the volume</li>
        <ol>
          <li>Against their MD5 checksums</li>
          <li>Then against their SHA1 checksums</li>
          <li>Then, just to beat a dead horse, I use diff to do a
            byte-for-byte check between the files on the client and the
            files on the volume.&nbsp; (Note to self: I should replace diff
            with cmp, as I have run into "out of memory" errors with
            diff on files that cmp can handle just fine.)<br>
            <br>
          </li>
        </ol>
        <li>What I have found is that about 50% of the time, there will
          be one or two files out of those 156,554 that differ.&nbsp; I
          documented my findings in more detail in my previous email.<br>
          <br>
        </li>
      </ol>
      <li>One though that occurred to me is that this could be the fault
        of rsync.&nbsp; So I have repeated the tests using plain old
        /bin/cp.&nbsp; Here's my (very similar) script:<br>
        <tt>#!/bin/bash -x</tt><tt><br>
        </tt><tt><br>
        </tt><tt>timestamp="${1}"</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>mkdir /data/bkupc1/BACKUPS-cp.${timestamp}</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/cp -ar ./ /data/bkupc1/BACKUPS-cp.${timestamp}/</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>(\</tt><tt><br>
        </tt><tt>&nbsp;&nbsp;&nbsp; cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \</tt><tt><br>
        </tt><tt>&nbsp;&nbsp;&nbsp; &amp;&amp; md5sum -c --quiet md5sums \</tt><tt><br>
        </tt><tt>)</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>(\</tt><tt><br>
        </tt><tt>&nbsp;&nbsp;&nbsp; cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \</tt><tt><br>
        </tt><tt>&nbsp;&nbsp;&nbsp; &amp;&amp; sha1sum -c --quiet sha1sums \</tt><tt><br>
        </tt><tt>)</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/usr/bin/diff -r -q ./
          /data/bkupc1/BACKUPS-cp.${timestamp}/</tt><tt><br>
        </tt><tt><br>
        </tt><tt>/bin/date</tt><br>
        <br>
      </li>
      <li>Results:</li>
      <ol>
        <li>Output from the script:<tt><br>
          </tt><tt>+ timestamp=20130821-081918</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 08:19:18 EDT 2013</tt><tt><br>
          </tt><tt>+ mkdir /data/bkupc1/BACKUPS-cp.20130821-081918</tt><tt><br>
          </tt><tt>+ /bin/cp -ar ./
            /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 13:51:53 EDT 2013</tt><tt><br>
          </tt><tt>+ cd /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ md5sum -c --quiet md5sums</tt><tt><br>
          </tt><tt>data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87f54d22-0443-11e3-b8fb-f46d04e15793/KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR:
            FAILED</tt><tt><br>
          </tt><tt>data/a34bc588-0443-11e3-b8fb-f46d04e15793/a34a8bf0-0443-11e3-b8fb-f46d04e15793/a3494b3c-0443-11e3-b8fb-f46d04e15793/a34808b2-0443-11e3-b8fb-f46d04e15793/a346cd08-0443-11e3-b8fb-f46d04e15793/a3456e2c-0443-11e3-b8fb-f46d04e15793/a344366a-0443-11e3-b8fb-f46d04e15793/8c9e94a0-0443-11e3-b8fb-f46d04e15793/NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1:
            FAILED</tt><tt><br>
          </tt><tt>md5sum: WARNING: 2 computed checksums did NOT match</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 16:54:13 EDT 2013</tt><tt><br>
          </tt><tt>+ cd /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ sha1sum -c --quiet sha1sums</tt><tt><br>
          </tt><tt>data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/8825c6c8-0443-11e3-b8fb-f46d04e15793/8810e21c-0443-11e3-b8fb-f46d04e15793/7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1:
            FAILED</tt><tt><br>
          </tt><tt>sha1sum: WARNING: 1 computed checksum did NOT match</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Wed Aug 21 19:54:29 EDT 2013</tt><tt><br>
          </tt><tt>+ /usr/bin/diff -r -q ./
            /data/bkupc1/BACKUPS-cp.20130821-081918/</tt><tt><br>
          </tt><tt>+ /bin/date</tt><tt><br>
          </tt><tt>Thu Aug 22 00:16:53 EDT 2013</tt><br>
          <br>
        </li>
        <li>A listing of files that were reported as different (line
          breaks for readability):</li>
        <ol>
          <li>MD5 failure:<br>
            data/<br>
            884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
            884a7040-0443-11e3-b8fb-f46d04e15793/<br>
            87fdc790-0443-11e3-b8fb-f46d04e15793/<br>
            87f54d22-0443-11e3-b8fb-f46d04e15793/<br>
KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR<br>
            <br>
          </li>
          <li>MD5 failure:<br>
            data/<br>
            a34bc588-0443-11e3-b8fb-f46d04e15793/<br>
            a34a8bf0-0443-11e3-b8fb-f46d04e15793/<br>
            a3494b3c-0443-11e3-b8fb-f46d04e15793/<br>
            a34808b2-0443-11e3-b8fb-f46d04e15793/<br>
            a346cd08-0443-11e3-b8fb-f46d04e15793/<br>
            a3456e2c-0443-11e3-b8fb-f46d04e15793/<br>
            a344366a-0443-11e3-b8fb-f46d04e15793/<br>
            8c9e94a0-0443-11e3-b8fb-f46d04e15793/<br>
            NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1<br>
            <br>
          </li>
          <li>SHA1 failure:<br>
            data/<br>
            884b9a38-0443-11e3-b8fb-f46d04e15793/<br>
            884a7040-0443-11e3-b8fb-f46d04e15793/<br>
            8825c6c8-0443-11e3-b8fb-f46d04e15793/<br>
            8810e21c-0443-11e3-b8fb-f46d04e15793/<br>
7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1<br>
            <br>
          </li>
        </ol>
        <li>A byte-for-byte comparison:</li>
        <ol>
          <li>File from 7.2.1 above:<br>
(KAfe4MUAlmO-Lt4N0KqVQTtf0im3mcoTuAyJvSP,t0o2Lc,FGce49pe9wEPDiIIt201oEks-taGDbc5-Nph6AacR)</li>
          <ol>
            <li>After the test, this file exist on three locations:<br>
              client:/export/d/eraseme/&nbsp; &lt;-- the original<br>
              bkupc1-a:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-b:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- bkupc1-a (directly
              from the brick)<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- bkupc1-b (directly
              from the brick)<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- client<br>
              68ce7073e462fda42d4b551a843bd71f &lt;-- volume (from the
              mount via the fuse driver)<br>
              <br>
              NOTE: There is no difference between the MD5 checksums<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- bkupc1-a<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- bkupc1-b<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- client<br>
              c5c59c18f5cc0c1b6e4dd80b2d41fc3bc7148509 &lt;-- volume<br>
              <br>
              NOTE: There is no difference between the SHA1 checksums<br>
              <br>
            </li>
            <li>Both /usr/bin/diff and /usr/bin/cmp report no difference
              between these files.<br>
              <br>
            </li>
          </ol>
          <li>File 7.2.2 from above:<br>
            (NLrXi5u80FoUV6Gi2ouEybAebOgnF7p1PtEYmPbd0huh,1)</li>
          <ol>
            <li>After the test, this file exists on three locations:<br>
              client:/export/d/eraseme/&nbsp; &lt;-- the original<br>
              bkupc1-a:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-b:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- bkupc1-a<br>
              77fdce4ebe9e94f611848d174de01357 &lt;-- bkupc1-b<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- client<br>
              78696407263ef75ae2795ed7cb4eb24a &lt;-- volume<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- bkupc1-a<br>
              0254c117b92ca95987aa7389980fb0bcc850e9c5 &lt;-- bkupc1-b<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- client<br>
              de93bcc7b64458926505dfc5ac4c597f3fefe6db &lt;-- volume<br>
              <br>
            </li>
            <li>Byte differences: The output of /usr/bin/cmp -l, when
              comparing the version of the file on the client with the
              version of the file on bkupc1-b:<br>
              <br>
              &nbsp;3262724555 274 234<br>
              <br>
              If I'm reading this right, then this means that the files
              differ by only one byte (274 vs. 234).<br>
              <br>
            </li>
          </ol>
          <li>File 7.2.3 from above:<br>
(7LOu,NZ5eMXxxrqjZHv5a9-4aHd641hN2tGaneMa1D2Kl9wLXf1f71nX6g-8ps2BpABovO7w68Wy63pH0gU3yLnyLEfFfT25Zk5jNvpDU6eQ,1)</li>
          <ol>
            <li>After the test, this file exists on three locations:<br>
              client:/export/d/eraseme/&nbsp; &lt;-- the original<br>
              bkupc1-c:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 1 of 2<br>
              bkupc1-d:/export/a/glusterfs/&nbsp; &lt;-- replicated volume
              copy 2 of 2<br>
              <br>
            </li>
            <li>MD5sums:<br>
              05ea9e04df984cc7ed514f93dc79067e&nbsp; &lt;-- bkupc1-c<br>
              868c0eafa2bde7386d808b722166a283&nbsp; &lt;-- bkupc1-d<br>
              05ea9e04df984cc7ed514f93dc79067e&nbsp; &lt;-- client<br>
              868c0eafa2bde7386d808b722166a283&nbsp; &lt;-- volume<br>
              <br>
            </li>
            <li>SHA1sums:<br>
              a6cf53c106b1856826db8de8b947273b05eb6391&nbsp; &lt;-- bkupc1-c<br>
              f56e01f982028eed9c115f0696346861bf3b7169&nbsp; &lt;-- bkupc1-d<br>
              a6cf53c106b1856826db8de8b947273b05eb6391&nbsp; &lt;-- client<br>
              f56e01f982028eed9c115f0696346861bf3b7169&nbsp; &lt;-- volume<br>
              <br>
            </li>
            <li>Byte differences: The output of /usr/bin/cmp -l, when
              comparing the version of the file on the client with the
              version of the file on bkupc1-d:<br>
              <br>
              &nbsp; 181361479 226 206<br>
              <br>
            </li>
          </ol>
        </ol>
      </ol>
    </ol>
    <p>For 7.3.1, md5sum reported a difference between the client and
      the volume, even though re-comparing it with md5sum, sha1sum, cmp,
      and diff on both the mounted volume and the individual bricks
      showed no difference at all.&nbsp; This would imply that there may be a
      data read error somewhere in gluster.<br>
    </p>
    <p>For 7.3.2 and 7.3.3, each file differed between the client, the
      volume, and between replica bricks, by exactly one byte (if I've
      read the output from cmp correctly).&nbsp; This would imply that there
      may be a data write error in gluster.<br>
    </p>
    (And please, for the love of Pete, tell me if I have done anything
    stupid, b/c I really wanted gluster to be the silver bullet solution
    to my data storage problems.)<br>
    <p>Michael Peek<br>
    </p>
  </body>
</html>