<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Al 04/09/13 02:55, En/na Anand Avati ha
escrit:<br>
</div>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Sep 3, 2013 at 1:42 AM,
Xavier Hernandez <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Al 03/09/13 09:33, En/na Anand Avati ha escrit:<br>
</div>
<div class="im">
<blockquote type="cite">
<div dir="ltr">On Mon, Sep 2, 2013 at 7:24 AM,
Xavier Hernandez <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:xhernandez@datalab.es"
target="_blank">xhernandez@datalab.es</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">Hi,<br>
<br>
dict_t structures are widely used in
glusterfs. I've some ideas that could
improve its performance.<br>
<br>
* On delete operations, return the current
value if it exists.<br>
<br>
This is very useful when we want to get a
value and remove it from the dictionary.
This way it can be done accessing and
locking the dict_t only once (and it is
atomic).<br>
</blockquote>
<div><br>
</div>
<div>Makes sense.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * On add
operations, return the previous value if it
existed.<br>
<br>
This avoids to use a lookup and a
conditional add (and it is atomic).<br>
</blockquote>
<div><br>
</div>
<div>Do you mean dict_set()? If so, how do you
propose we differentiate between "failure"
and "previous value did not exist"? Do you
propose setting the previous value into a
pointer to pointer, and retain the return
value as is today?</div>
</div>
</div>
</div>
</blockquote>
</div>
Yes, I'm thinking to something similar to dict_set() (by
the way, I would remove the dict_add() function).</div>
</blockquote>
<div><br>
</div>
<div style="">dict_add() is used in unserialization routines
where dict_set() for a big set of keys guaranteed not to
repeat is very expensive (unserializing would otherwise
have a quadratic function as its asymptote). What is the
reason you intend to remove it?</div>
</div>
</div>
</div>
</blockquote>
Yes, but it is used only inside dict.c itself. It should not be
published in the dict.h. This is only a cosmetic change but I think
it would be better. Any other use of dict_add() by a translator will
be incorrect unless it is made with much care, which is not
desirable for future maintenance.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> What you propose
would be the simplest solution right now. However I
think it would be interesting to change the return value
to an error code (this would supply more detailed
information in case of failure and we could use EEXIST
to know if the value already existed. In fact I think it
would be interesting to progressively change the -1
return code of many functions by an error code). The
pointer to pointer argument could be NULL if the
previous value is not needed.<br>
<br>
Of course this would change the function signature,
breaking a lot of existing code. Another possibility
could be to create a dict_replace() function, and
possibly make it to fail if the value didn't exist.</div>
</blockquote>
<div><br>
</div>
<div style="">It is best we do not change the meaning of
existing APIs, and just add new APIs instead. The new API
can be:</div>
<div style=""><br>
</div>
<div style="">int dict_replace (dict_t *dict, const char
*key, data_t *newval, data_t **oldval);</div>
<div style=""><br>
</div>
<div style="">.. and leave dict_set() as is.</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
That would be good. I would allow that dict_replace() sets *oldval
to NULL to indicate that the key did not exist, and maintain the
0/-1 return code to indicate error (this will maintain homogeneity
with other APIs, though I still think that an error code would be
more useful in general, but this would be another topic).<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Always
return the data_pair_t structure instead of
data_t or the data itself.<br>
<br>
This can be useful to avoid future lookups
or other operations on the same element.
Macros can be created to simplify writing
code to access the actual value.<br>
</blockquote>
<div><br>
</div>
<div>The use case is not clear. A more
concrete example will help..</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Having a data_pair_t could help to navigate from an
existing element (getting next or previous. This is
really interesting if dict where implemented using a
sorted structure like a trie since it would allow to
process a set of similar entries very fast, like the
trusted.afr.<brick> values for example) or
removing or replacing it without needing another lookup
(a more detailed analysis would be needed to see how to
handle race conditions).<br>
<br>
By the way, is really the dict_t structure used
concurrently ? I haven't analyzed all the code deeply,
but it seems to me that every dict_t is only accessed
from a single place at once.</div>
</blockquote>
<div><br>
</div>
<div style="">There have been instances of dict_t getting
used concurrently, when used as xdata and in xattrop (by
AFR). There have been bugs in the past with concurrent
dict access.</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
I missed that. Sorry.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Use a trie
instead of a hash.<br>
<br>
A trie structure is a bit more complex than
a hash, but only processes the key once and
does not need to compute the hash. A test
implementation I made with a trie shows a
significant improvement in dictionary
operations.<br>
</blockquote>
<div><br>
</div>
<div>There is already an implementation of
trie in libglusterfs/src/trie.c. Though it
does not compact (collapse) single-child
nodes upwards into the parent. In any case,
let's avoid having two implementations of
tries.</div>
</div>
</div>
</div>
</blockquote>
</div>
I know. The current implementation wastes a lot of
memory because it uses an array of 256 pointers, and in
some places it needs to traverse the array. Not a b¡g
deal, but if it is made many times it could be
noticeable. In my test I used a trie with 4 child
pointers (with collapsing single-child nodes) that runs
a bit faster than the 256 implementation and uses much
less memory. I tried with 2, 4, 16 and 256 childs per
node, and 4 seems to be the best (at least for
dictionary structures) though there are very little
difference between 4 and 16 in terms of speed.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">The 256 child pointers give you constant time
lookup for the next level child with just an offset
indirection. With smaller fan-out, do you search through
the list? Can you show an example of this? Collapsing
single child node upwards is badly needed though.</div>
</div>
</div>
</div>
</blockquote>
For the case of 4 childs I split each byte in 4 elements of 2 bits.
This makes it very easy to access the next child. It only needs some
basic logic operations. This is the current implementation for the
lookup function. It shows how it is accessed:<br>
<br>
<pre>/* TRIE_DIMENSION values:
*
* 1: 1 bit per element, 8 elements per byte
* 2: 2 bits per element, 4 elements per byte
* 3: 4 bits per element, 2 elements per byte
* 4: 8 bits per element, 1 element per byte
*/
#define TRIE_DIMENSION 2
#define TRIE_INDEX_BITS (4 - TRIE_DIMENSION)
#define TRIE_ELEM_BITS (1 << (TRIE_DIMENSION - 1))
#define TRIE_CHILDS (1 << TRIE_ELEM_BITS)
#define TRIE_ELEMS_PER_BYTE (1 << TRIE_INDEX_BITS)
#define TRIE_INDEX_MASK (TRIE_ELEMS_PER_BYTE - 1)
#define TRIE_ELEM_MASK (TRIE_CHILDS - 1)
#define sys_trie_value(_data, _offs) \
(((_data) >> (((_offs) & TRIE_INDEX_MASK) * TRIE_ELEM_BITS)) & \
TRIE_ELEM_MASK)
#define sys_trie_value_idx(_data, _offs) \
({ \
off_t __tmp_offs = (_offs); \
sys_trie_value(((_data)[(__tmp_offs) >> TRIE_INDEX_BITS]), \
__tmp_offs); \
})
typedef struct _sys_trie_node
{
struct _sys_trie_node * childs[TRIE_CHILDS];
struct _sys_trie_node * parent;
void * data;
uint32_t count;
uint32_t length;
uint8_t key[0];
} sys_trie_node_t;
typedef struct _sys_trie
{
sys_trie_node_t root;
} sys_trie_t;
sys_trie_node_t * sys_trie_lookup(sys_trie_t * trie,
const uint8_t * key,
size_t length)
{
sys_trie_node_t * node;
size_t len;
len = length << TRIE_INDEX_BITS;
for (node = trie->root.childs[sys_trie_value(*key, 0)];
(node != NULL) && (node->length < len);
node = node->childs[sys_trie_value_idx(key, node->length)]);
if ((node != NULL) && (node->length == len) && (node->data != NULL) &&
(memcmp(node->key, key, length) == 0))
{
return node;
}
return NULL;
}
</pre>
<br>
It's true that the collapsing feature is not really used for
dictionaries, however I think it would be interesting to have it to
use tries for other purposes that may require a long duration data
structure that eventually adds and removes items.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> I agree that it is
not good to maintain two implementations of the same
thing. Maybe we could change the trie implementation. It
should be transparent.</div>
</blockquote>
<div><br>
</div>
<div style="">Yes, I believe the current API can accommodate
such internal changes.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Implement
dict_foreach() as a macro (similar to
kernel's list_for_each()).<br>
<br>
This gives more control and avoids the need
of helper functions.<br>
</blockquote>
<div><br>
</div>
<div>This makes sense too, but there are quite
a few users of dict_foreach in the existing
style. Moving them all over might be a pain.</div>
</div>
</div>
</div>
</blockquote>
</div>
Maybe we could create a differently named macro to
implement this feature and allow the developers to
slowly change it. The old implementation could be
flagged as deprecated and use the new one for new code.
Old code will have enough time to change it until
eventually the old implementation is removed.<br>
<br>
If we make important changes to the dict_t structure, we
could replace current functions by macros that use the
new implementation but simulates the old behavior.</div>
</blockquote>
<div><br>
</div>
<div style="">Sounds OK.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> Additionally,
I think it's possible to redefine structures
to reduce the number of allocations and
pointers used for each element (actual data,
data_t, data_pair_t and key).<br>
</blockquote>
<div><br>
</div>
<div>This is highly desirable. There was some
effort from Amar in the past (<a
moz-do-not-send="true"
href="http://review.gluster.org/3910"
target="_blank">http://review.gluster.org/3910</a>)
but it has been in need of attention for
some time. It would be intersting to know if
you were thinking along similar lines?</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Yes, it is quite similar though I should analyze it more
deeply. I would also try to remove some unused/unneeded
fields that are used in very few places, add complexity
and can be replaced easily, like extra_free and
extra_stdfree in dict_t for example.
<div class="im"><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div style="">Thanks,</div>
<div style="">Avati</div>
<div style=""> </div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
<a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>