<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Al 04/09/13 02:55, En/na Anand Avati ha
      escrit:<br>
    </div>
    <blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Sep 3, 2013 at 1:42 AM,
            Xavier Hernandez <span dir="ltr">&lt;<a
                moz-do-not-send="true"
                href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div>Al 03/09/13 09:33, En/na Anand Avati ha escrit:<br>
                </div>
                <div class="im">
                  <blockquote type="cite">
                    <div dir="ltr">On Mon, Sep 2, 2013 at 7:24 AM,
                      Xavier Hernandez <span dir="ltr">&lt;<a
                          moz-do-not-send="true"
                          href="mailto:xhernandez@datalab.es"
                          target="_blank">xhernandez@datalab.es</a>&gt;</span>
                      wrote:<br>
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">Hi,<br>
                            <br>
                            dict_t structures are widely used in
                            glusterfs. I've some ideas that could
                            improve its performance.<br>
                            <br>
                            * On delete operations, return the current
                            value if it exists.<br>
                            <br>
                            This is very useful when we want to get a
                            value and remove it from the dictionary.
                            This way it can be done accessing and
                            locking the dict_t only once (and it is
                            atomic).<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Makes sense.</div>
                          <div><br>
                          </div>
                          <div>&nbsp;</div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> * On add
                            operations, return the previous value if it
                            existed.<br>
                            <br>
                            This avoids to use a lookup and a
                            conditional add (and it is atomic).<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Do you mean dict_set()? If so, how do you
                            propose we differentiate between "failure"
                            and "previous value did not exist"? Do you
                            propose setting the previous value into a
                            pointer to pointer, and retain the return
                            value as is today?</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Yes, I'm thinking to something similar to dict_set() (by
                the way, I would remove the dict_add() function).</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">dict_add() is used in unserialization routines
              where dict_set() for a big set of keys guaranteed not to
              repeat is very expensive (unserializing would otherwise
              have a quadratic function as its asymptote). What is the
              reason you intend to remove it?</div>
          </div>
        </div>
      </div>
    </blockquote>
    Yes, but it is used only inside dict.c itself. It should not be
    published in the dict.h. This is only a cosmetic change but I think
    it would be better. Any other use of dict_add() by a translator will
    be incorrect unless it is made with much care, which is not
    desirable for future maintenance.<br>
    <br>
    <blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"> What you propose
                would be the simplest solution right now. However I
                think it would be interesting to change the return value
                to an error code (this would supply more detailed
                information in case of failure and we could use EEXIST
                to know if the value already existed. In fact I think it
                would be interesting to progressively change the -1
                return code of many functions by an error code). The
                pointer to pointer argument could be NULL if the
                previous value is not needed.<br>
                <br>
                Of course this would change the function signature,
                breaking a lot of existing code. Another possibility
                could be to create a dict_replace() function, and
                possibly make it to fail if the value didn't exist.</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">It is best we do not change the meaning of
              existing APIs, and just add new APIs instead. The new API
              can be:</div>
            <div style=""><br>
            </div>
            <div style="">int dict_replace (dict_t *dict, const char
              *key, data_t *newval, data_t **oldval);</div>
            <div style=""><br>
            </div>
            <div style="">.. and leave dict_set() as is.</div>
            <div>&nbsp;</div>
          </div>
        </div>
      </div>
    </blockquote>
    That would be good. I would allow that dict_replace() sets *oldval
    to NULL to indicate that the key did not exist, and maintain the
    0/-1 return code to indicate error (this will maintain homogeneity
    with other APIs, though I still think that an error code would be
    more useful in general, but this would be another topic).<br>
    <br>
    <blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div class="im"> <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div>&nbsp;</div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> * Always
                            return the data_pair_t structure instead of
                            data_t or the data itself.<br>
                            <br>
                            This can be useful to avoid future lookups
                            or other operations on the same element.
                            Macros can be created to simplify writing
                            code to access the actual value.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>The use case is not clear. A more
                            concrete example will help..</div>
                          <div><br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Having a data_pair_t could help to navigate from an
                existing element (getting next or previous. This is
                really interesting if dict where implemented using a
                sorted structure like a trie since it would allow to
                process a set of similar entries very fast, like the
                trusted.afr.&lt;brick&gt; values for example) or
                removing or replacing it without needing another lookup
                (a more detailed analysis would be needed to see how to
                handle race conditions).<br>
                <br>
                By the way, is really the dict_t structure used
                concurrently ? I haven't analyzed all the code deeply,
                but it seems to me that every dict_t is only accessed
                from a single place at once.</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">There have been instances of dict_t getting
              used concurrently, when used as xdata and in xattrop (by
              AFR). There have been bugs in the past with concurrent
              dict access.</div>
            <div>&nbsp;</div>
          </div>
        </div>
      </div>
    </blockquote>
    I missed that. Sorry.<br>
    <br>
    <blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div class="im">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div>&nbsp;</div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> * Use a trie
                            instead of a hash.<br>
                            <br>
                            A trie structure is a bit more complex than
                            a hash, but only processes the key once and
                            does not need to compute the hash. A test
                            implementation I made with a trie shows a
                            significant improvement in dictionary
                            operations.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>There is already an implementation of
                            trie in libglusterfs/src/trie.c. Though it
                            does not compact (collapse) single-child
                            nodes upwards into the parent. In any case,
                            let's avoid having two implementations of
                            tries.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                I know. The current implementation wastes a lot of
                memory because it uses an array of 256 pointers, and in
                some places it needs to traverse the array. Not a b&iexcl;g
                deal, but if it is made many times it could be
                noticeable. In my test I used a trie with 4 child
                pointers (with collapsing single-child nodes) that runs
                a bit faster than the 256 implementation and uses much
                less memory. I tried with 2, 4, 16 and 256 childs per
                node, and 4 seems to be the best (at least for
                dictionary structures) though there are very little
                difference between 4 and 16 in terms of speed.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">The 256 child pointers give you constant time
              lookup for the next level child with just an offset
              indirection. With smaller fan-out, do you search through
              the list? Can you show an example of this? Collapsing
              single child node upwards is badly needed though.</div>
          </div>
        </div>
      </div>
    </blockquote>
    For the case of 4 childs I split each byte in 4 elements of 2 bits.
    This makes it very easy to access the next child. It only needs some
    basic logic operations. This is the current implementation for the
    lookup function. It shows how it is accessed:<br>
    <br>
    <pre>/* TRIE_DIMENSION values:
 *
 *    1: 1 bit per element, 8 elements per byte
 *    2: 2 bits per element, 4 elements per byte
 *    3: 4 bits per element, 2 elements per byte
 *    4: 8 bits per element, 1 element per byte
 */

#define TRIE_DIMENSION      2
#define TRIE_INDEX_BITS     (4 - TRIE_DIMENSION)
#define TRIE_ELEM_BITS      (1 &lt;&lt; (TRIE_DIMENSION - 1))
#define TRIE_CHILDS         (1 &lt;&lt; TRIE_ELEM_BITS)
#define TRIE_ELEMS_PER_BYTE (1 &lt;&lt; TRIE_INDEX_BITS)
#define TRIE_INDEX_MASK     (TRIE_ELEMS_PER_BYTE - 1)
#define TRIE_ELEM_MASK      (TRIE_CHILDS - 1)

#define sys_trie_value(_data, _offs) \
    (((_data) &gt;&gt; (((_offs) &amp; TRIE_INDEX_MASK) * TRIE_ELEM_BITS)) &amp; \
     TRIE_ELEM_MASK)

#define sys_trie_value_idx(_data, _offs) \
    ({ \
        off_t __tmp_offs = (_offs); \
        sys_trie_value(((_data)[(__tmp_offs) &gt;&gt; TRIE_INDEX_BITS]), \
                       __tmp_offs); \
    })

typedef struct _sys_trie_node
{
    struct _sys_trie_node * childs[TRIE_CHILDS];
    struct _sys_trie_node * parent;
    void *                  data;
    uint32_t                count;
    uint32_t                length;
    uint8_t                 key[0];
} sys_trie_node_t;                                                              

typedef struct _sys_trie
{
    sys_trie_node_t root;
} sys_trie_t;

sys_trie_node_t * sys_trie_lookup(sys_trie_t * trie,
                                  const uint8_t * key,
                                  size_t length)
{
    sys_trie_node_t * node;
    size_t len;

    len = length &lt;&lt; TRIE_INDEX_BITS;
    for (node = trie-&gt;root.childs[sys_trie_value(*key, 0)];
         (node != NULL) &amp;&amp; (node-&gt;length &lt; len);
         node = node-&gt;childs[sys_trie_value_idx(key, node-&gt;length)]);

    if ((node != NULL) &amp;&amp; (node-&gt;length == len) &amp;&amp; (node-&gt;data != NULL) &amp;&amp;
        (memcmp(node-&gt;key, key, length) == 0))
    {
        return node;
    }
    return NULL;
}
</pre>
    <br>
    It's true that the collapsing feature is not really used for
    dictionaries, however I think it would be interesting to have it to
    use tries for other purposes that may require a long duration data
    structure that eventually adds and removes items.<br>
    <br>
    <blockquote
cite="mid:CAFboF2w2BTVefSNMaBDexQrVqSYFmjwLsVWj2uUXFXjU3dwqdQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"> I agree that it is
                not good to maintain two implementations of the same
                thing. Maybe we could change the trie implementation. It
                should be transparent.</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">Yes, I believe the current API can accommodate
              such internal changes.</div>
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div class="im"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> * Implement
                            dict_foreach() as a macro (similar to
                            kernel's list_for_each()).<br>
                            <br>
                            This gives more control and avoids the need
                            of helper functions.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>This makes sense too, but there are quite
                            a few users of dict_foreach in the existing
                            style. Moving them all over might be a pain.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Maybe we could create a differently named macro to
                implement this feature and allow the developers to
                slowly change it. The old implementation could be
                flagged as deprecated and use the new one for new code.
                Old code will have enough time to change it until
                eventually the old implementation is removed.<br>
                <br>
                If we make important changes to the dict_t structure, we
                could replace current functions by macros that use the
                new implementation but simulates the old behavior.</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">Sounds OK.</div>
            <div>&nbsp;</div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div class="im"> <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div>&nbsp;</div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> Additionally,
                            I think it's possible to redefine structures
                            to reduce the number of allocations and
                            pointers used for each element (actual data,
                            data_t, data_pair_t and key).<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>This is highly desirable. There was some
                            effort from Amar in the past (<a
                              moz-do-not-send="true"
                              href="http://review.gluster.org/3910"
                              target="_blank">http://review.gluster.org/3910</a>)
                            but it has been in need of attention for
                            some time. It would be intersting to know if
                            you were thinking along similar lines?</div>
                          <div><br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Yes, it is quite similar though I should analyze it more
                deeply. I would also try to remove some unused/unneeded
                fields that are used in very few places, add complexity
                and can be replaced easily, like extra_free and
                extra_stdfree in dict_t for example.
                <div class="im"><br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">Thanks,</div>
            <div style="">Avati</div>
            <div style="">&nbsp;</div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
<a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>