[Gluster-devel] test failure reports for last 15 days

Thu Apr 11 19:18:41 UTC 2019

While analysing the logs of the runs where uss.t failed made following
observations.

1) In the first iteration of uss.t, the time difference between the first
test of the .t file and the last test of the .t file is just within 1
minute.

But, I think it is the cleanup sequence which is taking more time. One of
the reasons I guess this is happening is, we dont see the brick process
shutting down message
in the logs.

2) In the 2nd iteration of uss.t (because 1st iteration failed because of
timeout) it fails because something has not been completed in the cleanup
sequence of the previous iteration.

The volume start command itself fails in the 2nd iteration. Because of that
the remaining tests also fail

This is from cmd_history.log

uster.org:/d/backends/2/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt
++++++++++
[2019-04-10 19:54:09.145086]  : volume create patchy
builder202.int.aws.gluster.org:/d/backends/1/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/2/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt : SUCCESS
[2019-04-10 19:54:09.156221]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 39
gluster --mode=script --wignore volume set patchy nfs.disable false
++++++++++
[2019-04-10 19:54:09.265138]  : volume set patchy nfs.disable false :
SUCCESS
[2019-04-10 19:54:09.274386]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 42
gluster --mode=script --wignore volume start patchy ++++++++++
[2019-04-10 19:54:09.565086]  : volume start patchy : FAILED : Commit
failed on localhost. Please check log file for details.
[2019-04-10 19:54:09.572753]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 44
_GFS --attribute-timeout=0 --entry-timeout=0 --volfile-server=
builder202.int.aws.gluster.org --volfile-id=patchy /mnt/glusterfs/0
++++++++++

And this is from the brick showing some issue with the export directory not
being present properly.

[2019-04-10 19:54:09.544476] I [MSGID: 100030] [glusterfsd.c:2857:main]
0-/build/install/sbin/glusterfsd: Started running
/build/install/sbin/glusterfsd version 7dev (args:
/build/install/sbin/glusterfsd -s buil
der202.int.aws.gluster.org --volfile-id
patchy.builder202.int.aws.gluster.org.d-backends-1-patchy_snap_mnt -p
/var/run/gluster/vols/patchy/builder202.int.aws.gluster.org-d-backends-1-patchy_snap_mnt.pid
-S /var/
run/gluster/7ac65190b72da80a.socket --brick-name
/d/backends/1/patchy_snap_mnt -l
/var/log/glusterfs/bricks/d-backends-1-patchy_snap_mnt.log --xlator-option
*-posix.glusterd-uuid=695c060d-74d3-440e-8cdb-327ec297
f2d2 --process-name brick --brick-port 49152 --xlator-option
patchy-server.listen-port=49152)
[2019-04-10 19:54:09.549394] I [socket.c:962:__socket_server_bind]
0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2019-04-10 19:54:09.553190] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-04-10 19:54:09.553209] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 0
[2019-04-10 19:54:09.556932] I
[rpcsvc.c:2694:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
rpc.outstanding-rpc-limit with value 64
[2019-04-10 19:54:09.557859] E [MSGID: 138001] [index.c:2392:init]
0-patchy-index: Failed to find parent dir
(/d/backends/1/patchy_snap_mnt/.glusterfs) of index basepath
/d/backends/1/patchy_snap_mnt/.glusterfs/
indices. [No such file or directory]        ============================>
(.glusterfs is absent)
[2019-04-10 19:54:09.557884] E [MSGID: 101019] [xlator.c:629:xlator_init]
0-patchy-index: Initialization of volume 'patchy-index' failed, review your
volfile again
[2019-04-10 19:54:09.557892] E [MSGID: 101066]
[graph.c:409:glusterfs_graph_init] 0-patchy-index: initializing translator
failed
[2019-04-10 19:54:09.557900] E [MSGID: 101176]
[graph.c:772:glusterfs_graph_activate] 0-graph: init failed
[2019-04-10 19:54:09.564154] I [io-stats.c:4033:fini] 0-patchy-io-stats:
io-stats translator unloaded
[2019-04-10 19:54:09.564748] W [glusterfsd.c:1592:cleanup_and_exit]
(-->/build/install/sbin/glusterfsd(mgmt_getspec_cbk+0x806) [0x411f32]
-->/build/install/sbin/glusterfsd(glusterfs_process_volfp+0x272) [0x40b9b
9] -->/build/install/sbin/glusterfsd(cleanup_and_exit+0x88) [0x4093a5] )
0-: received signum (-1), shutting down

And this is from the cmd_history.log file of the 2nd iteration uss.t from
another jenkins run of uss.t

[2019-04-10 15:35:51.927343]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 39
gluster --mode=script --wignore volume set patchy nfs.disable false
++++++++++
[2019-04-10 15:35:52.038072]  : volume set patchy nfs.disable false :
SUCCESS
[2019-04-10 15:35:52.057582]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 42
gluster --mode=script --wignore volume start patchy ++++++++++
[2019-04-10 15:35:52.104288]  : volume start patchy : FAILED : Failed to
find brick directory /d/backends/1/patchy_snap_mnt for volume patchy.
Reason : No such file or directory =========> (export directory is not
present)
[2019-04-10 15:35:52.117735]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 44
_GFS --attribute-timeout=0 --entry-timeout=0 --volfile-server=
builder205.int.aws.gluster.org --volfile-id=patchy /mnt/glusterfs/0
++++++++++

I suspect something wrong with the cleanup sequence which causes the
timeout of the test in the 1st iteration and the export directory issues in
the next iteration causes the failure of uss.t in the 2nd iteration.

Regards,
Raghavendra

On Wed, Apr 10, 2019 at 4:07 PM FNU Raghavendra Manjunath <rabhat at redhat.com>
wrote:

>
>
> On Wed, Apr 10, 2019 at 9:59 AM Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>> And now for last 15 days:
>>
>>
>> https://fstat.gluster.org/summary?start_date=2019-03-25&end_date=2019-04-10
>>
>> ./tests/bitrot/bug-1373520.t     18  ==> Fixed through
>> https://review.gluster.org/#/c/glusterfs/+/22481/, I don't see this
>> failing in brick mux post 5th April
>>
>
> The above patch has been sent to fix the failure with brick mux enabled.
>
>
>> ./tests/bugs/ec/bug-1236065.t     17  ==> happens only in brick mux,
>> needs analysis.
>> ./tests/basic/uss.t             15  ==> happens in both brick mux and non
>> brick mux runs, test just simply times out. Needs urgent analysis.
>>
>
> Nothing has changed in snapview-server and snapview-client recently.
> Looking into it.
>
> ./tests/basic/ec/ec-fix-openfd.t 13  ==> Fixed through
>> https://review.gluster.org/#/c/22508/ , patch merged today.
>> ./tests/basic/volfile-sanity.t      8  ==> Some race, though this
>> succeeds in second attempt every time.
>>
>> There're plenty more with 5 instances of failure from many tests. We need
>> all maintainers/owners to look through these failures and fix them, we
>> certainly don't want to get into a stage where master is unstable and we
>> have to lock down the merges till all these failures are resolved. So
>> please help.
>>
>> (Please note fstat stats show up the retries as failures too which in a
>> way is right)
>>
>>
>> On Tue, Feb 26, 2019 at 5:27 PM Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>>> [1] captures the test failures report since last 30 days and we'd need
>>> volunteers/component owners to see why the number of failures are so high
>>> against few tests.
>>>
>>> [1]
>>> https://fstat.gluster.org/summary?start_date=2019-01-26&end_date=2019-02-25&job=all
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190411/0eba823b/attachment-0001.html>