<div dir="ltr"><div><div><div>Hello,<br><br>I&#39;m running into some serious problems with Gluster + CTDB and Samba. What I have:<br><br></div>A two node replicated gluster cluster set up to share volumes using Samba setup according to this guide: <a href="https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf">https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf</a><br>

<br></div>When we edit or copy files into the volume via SMB (from a Windows client accessing through a samba file share) this inevitably leads to a split-brain scenario. For example:<br><br>gluster&gt; volume heal fl-webroot info<br>

Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/<br>&lt;gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5&gt;<br>&lt;gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8&gt;<br>Number of entries: 2<br><br>Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/<br>

/LandingPage_Saturn_Production/images<br>/LandingPage_Saturn_Production<br>/LandingPage_Saturn_Production/Services/v2<br>/LandingPage_Saturn_Production/images/country/be<br>/LandingPage_Saturn_Production/bin<br>/LandingPage_Saturn_Production/Services<br>

/LandingPage_Saturn_Production/images/generic<br>/LandingPage_Saturn_Production/aspnet_client/system_web<br>/LandingPage_Saturn_Production/images/country<br>/LandingPage_Saturn_Production/Scripts<br>/LandingPage_Saturn_Production/aspnet_client<br>

/LandingPage_Saturn_Production/images/country/fr<br>Number of entries: 12<br><br>gluster&gt; volume heal fl-webroot info<br>Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/<br>&lt;gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5&gt;<br>

&lt;gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8&gt;<br>Number of entries: 2<br><br>Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/<br>/LandingPage_Saturn_Production/images<br>/LandingPage_Saturn_Production<br>
/LandingPage_Saturn_Production/Services/v2<br>
/LandingPage_Saturn_Production/images/country/be<br>/LandingPage_Saturn_Production/bin<br>/LandingPage_Saturn_Production/Services<br>/LandingPage_Saturn_Production/images/generic<br>/LandingPage_Saturn_Production/aspnet_client/system_web<br>

/LandingPage_Saturn_Production/images/country<br>/LandingPage_Saturn_Production/Scripts<br>/LandingPage_Saturn_Production/aspnet_client<br>/LandingPage_Saturn_Production/images/country/fr<br><br><br><br></div><div>Sometimes self-heal works, sometimes it doesn&#39;t:<br>

<br>[2014-08-06 19:32:17.986790] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-fl-webroot-replicate-0:  entry self heal  failed,   on /LandingPage_Saturn_Production/Services/v2<br>[2014-08-06 19:32:18.008330] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-fl-webroot-client-0: remote operation failed: No such file or directory. Path: &lt;gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282&gt; (a89d7a07-2e3d-41ee-adcc-cb2fba3d2282)<br>

[2014-08-06 19:32:18.024057] I [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-fl-webroot-replicate-0:  gfid or missing entry self heal  is started, metadata self heal  is successfully completed, backgroung data self heal  is successfully completed,  data self heal from fl-webroot-client-1  to sinks  fl-webroot-client-0, with 0 bytes on fl-webroot-client-0, 168 bytes on fl-webroot-client-1,  data - Pending matrix:  [ [ 0 0 ] [ 1 0 ] ]  metadata self heal from source fl-webroot-client-1 to fl-webroot-client-0,  metadata - Pending matrix:  [ [ 0 0 ] [ 2 0 ] ], on /LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx<br>

<br></div><div><b>More seriously, some files are simply missing on one of the nodes without any error in the logs or notice when running gluster volume heal $volume info.</b><br></div><div><br></div><div>Of course I can provide any log file necessary.<br clear="all">

</div><div><div><div><div><br>-- <br><div dir="ltr">Tiemen Ruiten<br>Systems Engineer<br>R&amp;D Media<br></div>
</div></div></div></div></div>