<p dir="ltr">On 19-Jul-2014 11:06 pm, &quot;Niels de Vos&quot; &lt;<a href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt; On Sat, Jul 19, 2014 at 08:23:29AM +0530, Pranith Kumar Karampuri wrote:<br>

&gt; &gt; Guys,<br>

&gt; &gt;      Does anyone know why device-id can be different even though it<br>

&gt; &gt; is all single xfs filesystem?<br>

&gt; &gt; We see the following log in the brick-log.<br>

&gt; &gt;<br>

&gt; &gt; [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]<br>

&gt; &gt; 0-home-posix: mismatching ino/dev between file<br>

&gt;<br>

&gt; The device-id (major:minor number) of a block-device can change, but<br>

&gt; will not change while the device is in use. Device-mapper (DM) is part<br>

&gt; of the stack that includes multipath and lvm (and more, but these are<br>

&gt; most common). The stack for the block-devices is built dynamically, and<br>

&gt; the device-id is assigned when the block-device is made active. The<br>

&gt; ordering of making devices active can change, hence the device-id too.<br>

&gt; It is also possible to deactivate some logical-volumes, and activate<br>

&gt; them in a different order. (You can not deactivate a dm-device when it<br>

&gt; is in use, for example mounted.)<br>

&gt;<br>

&gt; Without device-mapper in the io-stack, re-ordering disks is possible<br>

&gt; too, but requires a little more (advanced sysadmin) work.<br>

&gt;<br>

&gt; So, the main questions I&#39;d ask would be:<br>

&gt; 1. What kind of block storage is used, LVM, multipath, ...?</p>

<p dir="ltr">A single RAID10 XFS partition</p>

<p dir="ltr">&gt; 2. Were there any issues on the block-layer, scsi-errors, reconnects?</p>

<p dir="ltr">Yes, one of the servers had a bad disk that was replaced</p>

<p dir="ltr">&gt; 3. Were there changes in the underlaying disks or their structure? Disks<br>

&gt;    added, removed or new partitions created.</p>

<p dir="ltr">No </p>

<p dir="ltr">&gt; 4. Were disks deactivated+activated again, for example for creating<br>

&gt;    backups or snapshots on a level below the (XFS) filesystem?<br>

&gt;</p>

<p dir="ltr">No</p>

<p dir="ltr">&gt; HTH,<br>

&gt; Niels<br>

&gt;<br>

&gt; &gt; /data/gluster/home/techiebuzz/<a href="http://techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old">techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old</a><br>


&gt; &gt; (1077282838/2431) and handle<br>

&gt; &gt; /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0<br>

&gt; &gt; (1077282836/2431)<br>

&gt; &gt; [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:<br>

&gt; &gt; setting gfid on<br>

&gt; &gt; /data/gluster/home/techiebuzz/<a href="http://techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old">techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old</a><br>


&gt; &gt; failed<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; Pranith<br>

&gt; &gt; On 07/17/2014 07:06 PM, Nilesh Govindrajan wrote:<br>

&gt; &gt; &gt;log1 was the log from client of node2. The filesystems are mounted<br>

&gt; &gt; &gt;locally. /data is a raid10 array and /data/gluster contains 4 volumes,<br>

&gt; &gt; &gt;one of which is home which is a high read/write one (the log of which<br>

&gt; &gt; &gt;was attached here).<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri<br>

&gt; &gt; &gt;&lt;<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt; wrote:<br>

&gt; &gt; &gt;&gt;On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote:<br>

&gt; &gt; &gt;&gt;&gt;log1 and log2 are brick logs. The others are client logs.<br>

&gt; &gt; &gt;&gt;I see a lot of logs as below in &#39;log1&#39; you attached. It seems like the<br>

&gt; &gt; &gt;&gt;device ID of where the file where it is actually stored, where the gfid-link<br>

&gt; &gt; &gt;&gt;of the same file is stored i.e inside &lt;brick-dir&gt;/.glusterfs/ are different.<br>

&gt; &gt; &gt;&gt;What all devices/filesystems are present inside the brick represented by<br>

&gt; &gt; &gt;&gt;&#39;log1&#39;?<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;[2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]<br>

&gt; &gt; &gt;&gt;0-home-posix: mismatching ino/dev between file<br>

&gt; &gt; &gt;&gt;/data/gluster/home/techiebuzz/<a href="http://techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old">techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old</a><br>


&gt; &gt; &gt;&gt;(1077282838/2431) and handle<br>

&gt; &gt; &gt;&gt;/data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0<br>

&gt; &gt; &gt;&gt;(1077282836/2431)<br>

&gt; &gt; &gt;&gt;[2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:<br>

&gt; &gt; &gt;&gt;setting gfid on<br>

&gt; &gt; &gt;&gt;/data/gluster/home/techiebuzz/<a href="http://techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old">techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old</a><br>


&gt; &gt; &gt;&gt;failed<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;Pranith<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri<br>

&gt; &gt; &gt;&gt;&gt;&lt;<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt; wrote:<br>

&gt; &gt; &gt;&gt;&gt;&gt;On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote:<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan &lt;<a href="mailto:me@nileshgr.com">me@nileshgr.com</a>&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;wrote:<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Hello,<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;I&#39;m having a weird issue. I have this config:<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;node2 ~ # gluster peer status<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Number of Peers: 1<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Hostname: sto1<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;State: Peer in Cluster (Connected)<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;node1 ~ # gluster peer status<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Number of Peers: 1<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Hostname: sto2<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Port: 24007<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;State: Peer in Cluster (Connected)<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Volume Name: home<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Type: Replicate<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Status: Started<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Number of Bricks: 1 x 2 = 2<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Transport-type: tcp<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Bricks:<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Brick1: sto1:/data/gluster/home<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Brick2: sto2:/data/gluster/home<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Options Reconfigured:<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.write-behind-window-size: 2GB<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.flush-behind: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.cache-size: 2GB<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;cluster.choose-local: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;storage.linux-aio: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;transport.keepalive: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.quick-read: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.io-cache: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.stat-prefetch: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;performance.read-ahead: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;cluster.data-self-heal-algorithm: diff<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;nfs.disable: on<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;sto1/2 is alias to node1/2 respectively.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;As you see, NFS is disabled so I&#39;m using the native fuse mount on both<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;nodes.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;The volume contains files and php scripts that are served on various<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;websites. When both nodes are active, I get split brain on many files<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;and the mount on node2 going &#39;input/output error&#39; on many of them<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;which causes HTTP 500 errors.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;I delete the files from the brick using find -samefile. It fixes for a<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;few minutes and then the problem is back.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;What could be the issue? This happens even if I use the NFS mounting<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;method.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;&gt;Gluster 3.4.4 on Gentoo.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;And yes, network connectivity is not an issue between them as both of<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;them are located in the same DC. They&#39;re connected via 1 Gbit line<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;(common for internal and external network) but external network<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;doesn&#39;t cross 200-500 Mbit/s leaving quite a good window for gluster.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;I also tried enabling quorum but that doesn&#39;t help either.<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;_______________________________________________<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;Gluster-users mailing list<br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt;&gt;&gt;&gt;&gt;<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt; &gt;&gt;&gt;&gt;hi Nilesh,<br>

&gt; &gt; &gt;&gt;&gt;&gt;        Could you attach the mount, brick logs so that we can inspect what<br>

&gt; &gt; &gt;&gt;&gt;&gt;is<br>

&gt; &gt; &gt;&gt;&gt;&gt;going on the setup.<br>

&gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt;&gt;Pranith<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

&gt; &gt; <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-devel">http://supercolony.gluster.org/mailman/listinfo/gluster-devel</a><br>

</p>