<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">I would also like to note that each
node can store multiple elements. Current implementation creates a
node for each byte in the key. In my implementation I only create
a node if there is a prefix coincidence between 2 or more keys.
This reduces the number of nodes and the number of indirections.<br>
<br>
Each node stores the key from the beginning to allow very fast
lookup by a single memcmp().<br>
<br>
In normal situations, this implementation is very fast and very
efficient in space.<br>
<br>
Greetings,<br>
<br>
Xavi<br>
<br>
Al 04/09/13 10:10, En/na Xavier Hernandez ha escrit:<br>
</div>
<blockquote cite="mid:5226EAEE.7060901@datalab.es" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Al 04/09/13 02:55, En/na Anand Avati
ha escrit:<br>
</div>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Sep 3, 2013 at 1:42 AM,
Xavier Hernandez <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Al 03/09/13 09:33, En/na Anand Avati ha escrit:<br>
</div>
<div class="im">
<blockquote type="cite">
<div dir="ltr">On Mon, Sep 2, 2013 at 7:24 AM,
Xavier Hernandez <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:xhernandez@datalab.es"
target="_blank">xhernandez@datalab.es</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">Hi,<br>
<br>
dict_t structures are widely used in
glusterfs. I've some ideas that could
improve its performance.<br>
<br>
* On delete operations, return the current
value if it exists.<br>
<br>
This is very useful when we want to get a
value and remove it from the dictionary.
This way it can be done accessing and
locking the dict_t only once (and it is
atomic).<br>
</blockquote>
<div><br>
</div>
<div>Makes sense.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * On add
operations, return the previous value if
it existed.<br>
<br>
This avoids to use a lookup and a
conditional add (and it is atomic).<br>
</blockquote>
<div><br>
</div>
<div>Do you mean dict_set()? If so, how do
you propose we differentiate between
"failure" and "previous value did not
exist"? Do you propose setting the
previous value into a pointer to pointer,
and retain the return value as is today?</div>
</div>
</div>
</div>
</blockquote>
</div>
Yes, I'm thinking to something similar to dict_set()
(by the way, I would remove the dict_add() function).</div>
</blockquote>
<div><br>
</div>
<div style="">dict_add() is used in unserialization
routines where dict_set() for a big set of keys
guaranteed not to repeat is very expensive
(unserializing would otherwise have a quadratic function
as its asymptote). What is the reason you intend to
remove it?</div>
</div>
</div>
</div>
</blockquote>
Yes, but it is used only inside dict.c itself. It should not be
published in the dict.h. This is only a cosmetic change but I
think it would be better. Any other use of dict_add() by a
translator will be incorrect unless it is made with much care,
which is not desirable for future maintenance.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> What you propose
would be the simplest solution right now. However I
think it would be interesting to change the return
value to an error code (this would supply more
detailed information in case of failure and we could
use EEXIST to know if the value already existed. In
fact I think it would be interesting to progressively
change the -1 return code of many functions by an
error code). The pointer to pointer argument could be
NULL if the previous value is not needed.<br>
<br>
Of course this would change the function signature,
breaking a lot of existing code. Another possibility
could be to create a dict_replace() function, and
possibly make it to fail if the value didn't exist.</div>
</blockquote>
<div><br>
</div>
<div style="">It is best we do not change the meaning of
existing APIs, and just add new APIs instead. The new
API can be:</div>
<div style=""><br>
</div>
<div style="">int dict_replace (dict_t *dict, const char
*key, data_t *newval, data_t **oldval);</div>
<div style=""><br>
</div>
<div style="">.. and leave dict_set() as is.</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
That would be good. I would allow that dict_replace() sets *oldval
to NULL to indicate that the key did not exist, and maintain the
0/-1 return code to indicate error (this will maintain homogeneity
with other APIs, though I still think that an error code would be
more useful in general, but this would be another topic).<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Always
return the data_pair_t structure instead
of data_t or the data itself.<br>
<br>
This can be useful to avoid future lookups
or other operations on the same element.
Macros can be created to simplify writing
code to access the actual value.<br>
</blockquote>
<div><br>
</div>
<div>The use case is not clear. A more
concrete example will help..</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Having a data_pair_t could help to navigate from an
existing element (getting next or previous. This is
really interesting if dict where implemented using a
sorted structure like a trie since it would allow to
process a set of similar entries very fast, like the
trusted.afr.<brick> values for example) or
removing or replacing it without needing another
lookup (a more detailed analysis would be needed to
see how to handle race conditions).<br>
<br>
By the way, is really the dict_t structure used
concurrently ? I haven't analyzed all the code deeply,
but it seems to me that every dict_t is only accessed
from a single place at once.</div>
</blockquote>
<div><br>
</div>
<div style="">There have been instances of dict_t getting
used concurrently, when used as xdata and in xattrop (by
AFR). There have been bugs in the past with concurrent
dict access.</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
I missed that. Sorry.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Use a trie
instead of a hash.<br>
<br>
A trie structure is a bit more complex
than a hash, but only processes the key
once and does not need to compute the
hash. A test implementation I made with a
trie shows a significant improvement in
dictionary operations.<br>
</blockquote>
<div><br>
</div>
<div>There is already an implementation of
trie in libglusterfs/src/trie.c. Though it
does not compact (collapse) single-child
nodes upwards into the parent. In any
case, let's avoid having two
implementations of tries.</div>
</div>
</div>
</div>
</blockquote>
</div>
I know. The current implementation wastes a lot of
memory because it uses an array of 256 pointers, and
in some places it needs to traverse the array. Not a
b¡g deal, but if it is made many times it could be
noticeable. In my test I used a trie with 4 child
pointers (with collapsing single-child nodes) that
runs a bit faster than the 256 implementation and uses
much less memory. I tried with 2, 4, 16 and 256 childs
per node, and 4 seems to be the best (at least for
dictionary structures) though there are very little
difference between 4 and 16 in terms of speed.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">The 256 child pointers give you constant
time lookup for the next level child with just an offset
indirection. With smaller fan-out, do you search through
the list? Can you show an example of this? Collapsing
single child node upwards is badly needed though.</div>
</div>
</div>
</div>
</blockquote>
For the case of 4 childs I split each byte in 4 elements of 2
bits. This makes it very easy to access the next child. It only
needs some basic logic operations. This is the current
implementation for the lookup function. It shows how it is
accessed:<br>
<br>
<pre>/* TRIE_DIMENSION values:
*
* 1: 1 bit per element, 8 elements per byte
* 2: 2 bits per element, 4 elements per byte
* 3: 4 bits per element, 2 elements per byte
* 4: 8 bits per element, 1 element per byte
*/
#define TRIE_DIMENSION 2
#define TRIE_INDEX_BITS (4 - TRIE_DIMENSION)
#define TRIE_ELEM_BITS (1 << (TRIE_DIMENSION - 1))
#define TRIE_CHILDS (1 << TRIE_ELEM_BITS)
#define TRIE_ELEMS_PER_BYTE (1 << TRIE_INDEX_BITS)
#define TRIE_INDEX_MASK (TRIE_ELEMS_PER_BYTE - 1)
#define TRIE_ELEM_MASK (TRIE_CHILDS - 1)
#define sys_trie_value(_data, _offs) \
(((_data) >> (((_offs) & TRIE_INDEX_MASK) * TRIE_ELEM_BITS)) & \
TRIE_ELEM_MASK)
#define sys_trie_value_idx(_data, _offs) \
({ \
off_t __tmp_offs = (_offs); \
sys_trie_value(((_data)[(__tmp_offs) >> TRIE_INDEX_BITS]), \
__tmp_offs); \
})
typedef struct _sys_trie_node
{
struct _sys_trie_node * childs[TRIE_CHILDS];
struct _sys_trie_node * parent;
void * data;
uint32_t count;
uint32_t length;
uint8_t key[0];
} sys_trie_node_t;
typedef struct _sys_trie
{
sys_trie_node_t root;
} sys_trie_t;
sys_trie_node_t * sys_trie_lookup(sys_trie_t * trie,
const uint8_t * key,
size_t length)
{
sys_trie_node_t * node;
size_t len;
len = length << TRIE_INDEX_BITS;
for (node = trie->root.childs[sys_trie_value(*key, 0)];
(node != NULL) && (node->length < len);
node = node->childs[sys_trie_value_idx(key, node->length)]);
if ((node != NULL) && (node->length == len) && (node->data != NULL) &&
(memcmp(node->key, key, length) == 0))
{
return node;
}
return NULL;
}
</pre>
<br>
It's true that the collapsing feature is not really used for
dictionaries, however I think it would be interesting to have it
to use tries for other purposes that may require a long duration
data structure that eventually adds and removes items.<br>
<br>
<blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> I agree that it
is not good to maintain two implementations of the
same thing. Maybe we could change the trie
implementation. It should be transparent.</div>
</blockquote>
<div><br>
</div>
<div style="">Yes, I believe the current API can
accommodate such internal changes.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> * Implement
dict_foreach() as a macro (similar to
kernel's list_for_each()).<br>
<br>
This gives more control and avoids the
need of helper functions.<br>
</blockquote>
<div><br>
</div>
<div>This makes sense too, but there are
quite a few users of dict_foreach in the
existing style. Moving them all over might
be a pain.</div>
</div>
</div>
</div>
</blockquote>
</div>
Maybe we could create a differently named macro to
implement this feature and allow the developers to
slowly change it. The old implementation could be
flagged as deprecated and use the new one for new
code. Old code will have enough time to change it
until eventually the old implementation is removed.<br>
<br>
If we make important changes to the dict_t structure,
we could replace current functions by macros that use
the new implementation but simulates the old behavior.</div>
</blockquote>
<div><br>
</div>
<div style="">Sounds OK.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="im"> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Additionally, I think it's possible to
redefine structures to reduce the number
of allocations and pointers used for each
element (actual data, data_t, data_pair_t
and key).<br>
</blockquote>
<div><br>
</div>
<div>This is highly desirable. There was
some effort from Amar in the past (<a
moz-do-not-send="true"
href="http://review.gluster.org/3910"
target="_blank">http://review.gluster.org/3910</a>)
but it has been in need of attention for
some time. It would be intersting to know
if you were thinking along similar lines?</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Yes, it is quite similar though I should analyze it
more deeply. I would also try to remove some
unused/unneeded fields that are used in very few
places, add complexity and can be replaced easily,
like extra_free and extra_stdfree in dict_t for
example.
<div class="im"><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div style="">Thanks,</div>
<div style="">Avati</div>
<div style=""> </div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
<a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>