<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">I would also like to note that each

      node can store multiple elements. Current implementation creates a

      node for each byte in the key. In my implementation I only create

      a node if there is a prefix coincidence between 2 or more keys.

      This reduces the number of nodes and the number of indirections.<br>

      <br>

      Each node stores the key from the beginning to allow very fast

      lookup by a single memcmp().<br>

      <br>

      In normal situations, this implementation is very fast and very

      efficient in space.<br>

      <br>

      Greetings,<br>

      <br>

      Xavi<br>

      <br>

      Al 04/09/13 10:10, En/na Xavier Hernandez ha escrit:<br>

    </div>

    <blockquote cite="mid:5226EAEE.7060901@datalab.es" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <div class="moz-cite-prefix">Al 04/09/13 02:55, En/na Anand Avati

        ha escrit:<br>

      </div>

      <blockquote

cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra"><br>

            <div class="gmail_quote">On Tue, Sep 3, 2013 at 1:42 AM,

              Xavier Hernandez <span dir="ltr">&lt;<a

                  moz-do-not-send="true"

                  href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>&gt;</span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF">

                  <div>Al 03/09/13 09:33, En/na Anand Avati ha escrit:<br>

                  </div>

                  <div class="im">

                    <blockquote type="cite">

                      <div dir="ltr">On Mon, Sep 2, 2013 at 7:24 AM,

                        Xavier Hernandez <span dir="ltr">&lt;<a

                            moz-do-not-send="true"

                            href="mailto:xhernandez@datalab.es"

                            target="_blank">xhernandez@datalab.es</a>&gt;</span>

                        wrote:<br>

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">Hi,<br>

                              <br>

                              dict_t structures are widely used in

                              glusterfs. I've some ideas that could

                              improve its performance.<br>

                              <br>

                              * On delete operations, return the current

                              value if it exists.<br>

                              <br>

                              This is very useful when we want to get a

                              value and remove it from the dictionary.

                              This way it can be done accessing and

                              locking the dict_t only once (and it is

                              atomic).<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Makes sense.</div>

                            <div><br>

                            </div>

                            <div>&nbsp;</div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> * On add

                              operations, return the previous value if

                              it existed.<br>

                              <br>

                              This avoids to use a lookup and a

                              conditional add (and it is atomic).<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Do you mean dict_set()? If so, how do

                              you propose we differentiate between

                              "failure" and "previous value did not

                              exist"? Do you propose setting the

                              previous value into a pointer to pointer,

                              and retain the return value as is today?</div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  Yes, I'm thinking to something similar to dict_set()

                  (by the way, I would remove the dict_add() function).</div>

              </blockquote>

              <div><br>

              </div>

              <div style="">dict_add() is used in unserialization

                routines where dict_set() for a big set of keys

                guaranteed not to repeat is very expensive

                (unserializing would otherwise have a quadratic function

                as its asymptote). What is the reason you intend to

                remove it?</div>

            </div>

          </div>

        </div>

      </blockquote>

      Yes, but it is used only inside dict.c itself. It should not be

      published in the dict.h. This is only a cosmetic change but I

      think it would be better. Any other use of dict_add() by a

      translator will be incorrect unless it is made with much care,

      which is not desirable for future maintenance.<br>

      <br>

      <blockquote

cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <div>&nbsp;</div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"> What you propose

                  would be the simplest solution right now. However I

                  think it would be interesting to change the return

                  value to an error code (this would supply more

                  detailed information in case of failure and we could

                  use EEXIST to know if the value already existed. In

                  fact I think it would be interesting to progressively

                  change the -1 return code of many functions by an

                  error code). The pointer to pointer argument could be

                  NULL if the previous value is not needed.<br>

                  <br>

                  Of course this would change the function signature,

                  breaking a lot of existing code. Another possibility

                  could be to create a dict_replace() function, and

                  possibly make it to fail if the value didn't exist.</div>

              </blockquote>

              <div><br>

              </div>

              <div style="">It is best we do not change the meaning of

                existing APIs, and just add new APIs instead. The new

                API can be:</div>

              <div style=""><br>

              </div>

              <div style="">int dict_replace (dict_t *dict, const char

                *key, data_t *newval, data_t **oldval);</div>

              <div style=""><br>

              </div>

              <div style="">.. and leave dict_set() as is.</div>

              <div>&nbsp;</div>

            </div>

          </div>

        </div>

      </blockquote>

      That would be good. I would allow that dict_replace() sets *oldval

      to NULL to indicate that the key did not exist, and maintain the

      0/-1 return code to indicate error (this will maintain homogeneity

      with other APIs, though I still think that an error code would be

      more useful in general, but this would be another topic).<br>

      <br>

      <blockquote

cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF">

                  <div class="im"> <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div>&nbsp;</div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> * Always

                              return the data_pair_t structure instead

                              of data_t or the data itself.<br>

                              <br>

                              This can be useful to avoid future lookups

                              or other operations on the same element.

                              Macros can be created to simplify writing

                              code to access the actual value.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>The use case is not clear. A more

                              concrete example will help..</div>

                            <div><br>

                            </div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  Having a data_pair_t could help to navigate from an

                  existing element (getting next or previous. This is

                  really interesting if dict where implemented using a

                  sorted structure like a trie since it would allow to

                  process a set of similar entries very fast, like the

                  trusted.afr.&lt;brick&gt; values for example) or

                  removing or replacing it without needing another

                  lookup (a more detailed analysis would be needed to

                  see how to handle race conditions).<br>

                  <br>

                  By the way, is really the dict_t structure used

                  concurrently ? I haven't analyzed all the code deeply,

                  but it seems to me that every dict_t is only accessed

                  from a single place at once.</div>

              </blockquote>

              <div><br>

              </div>

              <div style="">There have been instances of dict_t getting

                used concurrently, when used as xdata and in xattrop (by

                AFR). There have been bugs in the past with concurrent

                dict access.</div>

              <div>&nbsp;</div>

            </div>

          </div>

        </div>

      </blockquote>

      I missed that. Sorry.<br>

      <br>

      <blockquote

cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF">

                  <div class="im">

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div>&nbsp;</div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> * Use a trie

                              instead of a hash.<br>

                              <br>

                              A trie structure is a bit more complex

                              than a hash, but only processes the key

                              once and does not need to compute the

                              hash. A test implementation I made with a

                              trie shows a significant improvement in

                              dictionary operations.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>There is already an implementation of

                              trie in libglusterfs/src/trie.c. Though it

                              does not compact (collapse) single-child

                              nodes upwards into the parent. In any

                              case, let's avoid having two

                              implementations of tries.</div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  I know. The current implementation wastes a lot of

                  memory because it uses an array of 256 pointers, and

                  in some places it needs to traverse the array. Not a

                  b&iexcl;g deal, but if it is made many times it could be

                  noticeable. In my test I used a trie with 4 child

                  pointers (with collapsing single-child nodes) that

                  runs a bit faster than the 256 implementation and uses

                  much less memory. I tried with 2, 4, 16 and 256 childs

                  per node, and 4 seems to be the best (at least for

                  dictionary structures) though there are very little

                  difference between 4 and 16 in terms of speed.<br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div style="">The 256 child pointers give you constant

                time lookup for the next level child with just an offset

                indirection. With smaller fan-out, do you search through

                the list? Can you show an example of this? Collapsing

                single child node upwards is badly needed though.</div>

            </div>

          </div>

        </div>

      </blockquote>

      For the case of 4 childs I split each byte in 4 elements of 2

      bits. This makes it very easy to access the next child. It only

      needs some basic logic operations. This is the current

      implementation for the lookup function. It shows how it is

      accessed:<br>

      <br>

      <pre>/* TRIE_DIMENSION values:

 *

 *    1: 1 bit per element, 8 elements per byte

 *    2: 2 bits per element, 4 elements per byte

 *    3: 4 bits per element, 2 elements per byte

 *    4: 8 bits per element, 1 element per byte

 */

#define TRIE_DIMENSION      2

#define TRIE_INDEX_BITS     (4 - TRIE_DIMENSION)

#define TRIE_ELEM_BITS      (1 &lt;&lt; (TRIE_DIMENSION - 1))

#define TRIE_CHILDS         (1 &lt;&lt; TRIE_ELEM_BITS)

#define TRIE_ELEMS_PER_BYTE (1 &lt;&lt; TRIE_INDEX_BITS)

#define TRIE_INDEX_MASK     (TRIE_ELEMS_PER_BYTE - 1)

#define TRIE_ELEM_MASK      (TRIE_CHILDS - 1)

#define sys_trie_value(_data, _offs) \

    (((_data) &gt;&gt; (((_offs) &amp; TRIE_INDEX_MASK) * TRIE_ELEM_BITS)) &amp; \

     TRIE_ELEM_MASK)

#define sys_trie_value_idx(_data, _offs) \

    ({ \

        off_t __tmp_offs = (_offs); \

        sys_trie_value(((_data)[(__tmp_offs) &gt;&gt; TRIE_INDEX_BITS]), \

                       __tmp_offs); \

    })

typedef struct _sys_trie_node

{

    struct _sys_trie_node * childs[TRIE_CHILDS];

    struct _sys_trie_node * parent;

    void *                  data;

    uint32_t                count;

    uint32_t                length;

    uint8_t                 key[0];

} sys_trie_node_t;                                                              

typedef struct _sys_trie

{

    sys_trie_node_t root;

} sys_trie_t;

sys_trie_node_t * sys_trie_lookup(sys_trie_t * trie,

                                  const uint8_t * key,

                                  size_t length)

{

    sys_trie_node_t * node;

    size_t len;

    len = length &lt;&lt; TRIE_INDEX_BITS;

    for (node = trie-&gt;root.childs[sys_trie_value(*key, 0)];

         (node != NULL) &amp;&amp; (node-&gt;length &lt; len);

         node = node-&gt;childs[sys_trie_value_idx(key, node-&gt;length)]);

    if ((node != NULL) &amp;&amp; (node-&gt;length == len) &amp;&amp; (node-&gt;data != NULL) &amp;&amp;

        (memcmp(node-&gt;key, key, length) == 0))

    {

        return node;

    }

    return NULL;

}

</pre>

      <br>

      It's true that the collapsing feature is not really used for

      dictionaries, however I think it would be interesting to have it

      to use tries for other purposes that may require a long duration

      data structure that eventually adds and removes items.<br>

      <br>

      <blockquote

cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <div>&nbsp;</div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"> I agree that it

                  is not good to maintain two implementations of the

                  same thing. Maybe we could change the trie

                  implementation. It should be transparent.</div>

              </blockquote>

              <div><br>

              </div>

              <div style="">Yes, I believe the current API can

                accommodate such internal changes.</div>

              <div>&nbsp;</div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF">

                  <div class="im"><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> * Implement

                              dict_foreach() as a macro (similar to

                              kernel's list_for_each()).<br>

                              <br>

                              This gives more control and avoids the

                              need of helper functions.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>This makes sense too, but there are

                              quite a few users of dict_foreach in the

                              existing style. Moving them all over might

                              be a pain.</div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  Maybe we could create a differently named macro to

                  implement this feature and allow the developers to

                  slowly change it. The old implementation could be

                  flagged as deprecated and use the new one for new

                  code. Old code will have enough time to change it

                  until eventually the old implementation is removed.<br>

                  <br>

                  If we make important changes to the dict_t structure,

                  we could replace current functions by macros that use

                  the new implementation but simulates the old behavior.</div>

              </blockquote>

              <div><br>

              </div>

              <div style="">Sounds OK.</div>

              <div>&nbsp;</div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF">

                  <div class="im"> <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div>&nbsp;</div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              Additionally, I think it's possible to

                              redefine structures to reduce the number

                              of allocations and pointers used for each

                              element (actual data, data_t, data_pair_t

                              and key).<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>This is highly desirable. There was

                              some effort from Amar in the past (<a

                                moz-do-not-send="true"

                                href="http://review.gluster.org/3910"

                                target="_blank">http://review.gluster.org/3910</a>)

                              but it has been in need of attention for

                              some time. It would be intersting to know

                              if you were thinking along similar lines?</div>

                            <div><br>

                            </div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  Yes, it is quite similar though I should analyze it

                  more deeply. I would also try to remove some

                  unused/unneeded fields that are used in very few

                  places, add complexity and can be replaced easily,

                  like extra_free and extra_stdfree in dict_t for

                  example.

                  <div class="im"><br>

                  </div>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div style="">Thanks,</div>

              <div style="">Avati</div>

              <div style="">&nbsp;</div>

            </div>

          </div>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

Gluster-devel mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>

</pre>

      </blockquote>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Gluster-devel mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>

<a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>