<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 10/27/2014 01:34 PM, Tytus Rogalewski wrote:<br>

    <blockquote

cite="mid:CANfXJztiuuFKvtezUXV=LS+OKDFPBTveZXD5MrQQ2StCP9W+fg@mail.gmail.com"

      type="cite">

      <div dir="ltr"><span

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Hi

          guys,</span>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">I

          wanted to ask you about what happen in case of power failure.</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">I

          have 2 node proxmox cluster with glusterfs as sdb1 XFS, and

          mounted on each node as localhost/glusterstorage.</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">I

          am storing VMs on it as qcow2(and inside ext4 filesystem).</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px"> Live

          migration works ok WOW.. Everything works fine.</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">But

          tell me will something bad happen when the power will fail on

          whole datacenter ?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Will

          be data corrupted and will be the same thing if i am using

          drbd ?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">DRBD

          doesnt give me so much flexability(because i cant use qcow2

          and store files like iso or backups on drbd), but glusterfs

          does give me much flexability !</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Anyway

          yesterday i created glusterfs with ext4, and VM qcow with ext4

          on it and when i made "reboot -f"(i assume this is the same as

          i will pull power cord off ?) - after node went online again,

          VM data was corrupted and i had many ext failures inside VM.</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Tell

          me was that because i used ext4 on top of sdb1 glusterfs

          storage or will that work the same with XFS ?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Is

          drbd better  protection in case of power failure ?</div>

      </div>

    </blockquote>

    My experience with DRBD is really old, but I became a gluster user

    because of my experience with drbd. After it destroyed my filesystem

    for the 3rd time, it was "replace that or find somewhere else to

    work" time.<br>

    <br>

    I chose gluster because you can create a fullly redundant system

    from the client to each replica server, all the way through all the

    hardware by creating parallel network paths.<br>

    <br>

    What you experienced is a result of the ping timeout. Ping-timeouts

    happen when the TCP connection is not closed, like when you pull the

    plug. The timeout exists to allow the filesystem to recover

    gracefully in the event of a temporary network problem. Without

    that, there's an increased load on the server while all the file

    descriptors are re-established. This can be a fairly heavy load, to

    the point where tcp pings are delayed. If they're delayed longer

    than ping-timeout, you have a race condition from which you'll never

    recover. For that reason, the ping-timeout is longer. You *can*

    adjust that timeout as long as you sufficiently test around the

    actual loads you're expecting.<br>

    <br>

    Keep in mind your SLA/OLA expectations and engineer for them using

    the actual mathematical calculations, not just some gut

    expectations. Your DC power should be more reliable than most

    industries requirements.<br>

    <br>

    <blockquote

cite="mid:CANfXJztiuuFKvtezUXV=LS+OKDFPBTveZXD5MrQQ2StCP9W+fg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px"><br>

        </div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Anyway

          second question, if i have 2 nodes with glusterfs.</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">node1

          is changing file1.txt</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">node2

          is changing file2.txt</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">then

          i will disconnect glusterfs in network, and data keeps

          changing on both nodes)</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">After

          i will reconnect glusterfs how this will go?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Newer

          changed file1 from node1 will overwrite file1 on node2?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">and

          newer file2 changed on node2 will overwrite file2 on node1 ?</div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Am

          i correct ?<br>

        </div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px"><br>

        </div>

        <div

          style="font-family:arial,sans-serif;font-size:13.3333339691162px">Thx

          for answer :)</div>

        <br>

      </div>

    </blockquote>

    Each client intends to write to both (all) replicas. The intent

    count is incremented in extended attributes, the write executes on a

    replica, the intent count is decremented for that replica. With the

    disconnect, each of those files will show pending changed destined

    for the other replica. When they are reconnected, the self-heal

    daemon (or a client attempting to access those files) will note the

    changes destined for the other brick and repair it.<br>

    <br>

    Split-brain occurs when each side of that netsplit writes to the

    same file. That file indicates pending changes for the other brick.

    When the connection returns, they compare those pending flags and

    see changes to each that are unwritten on the other. They refuse and

    leave each file intact, forcing manual intervention to clear the

    split-brain.<br>

    <br>

    You can avoid split-brain by using replica 3 and volume-level

    quorum, or with replica 2 and some 3rd observer, server quorum. It

    is also possible to have quorum with only 2 servers or replicas, but

    I wouldn't recommend it. With volume based quorum, the volume will

    go read only if the client loses connection with either server. With

    server quorum and only two servers, the server will shut down if it

    loses quorum completely removing access to the volume.<br>

  </body>

</html>