Monday, 30 July 2007

XEP33: Tell limits in disco#info response using XEP128

Some time ago I explained why The limit of addresses in XEP33 must be fixed. Of the proposed solutions, I prefer #3: 'Configurable limit, and method to inform'.

A method to inform is using XEP-0128: Service Discovery Extensions, as proposed by Ralphm in a comment.

How does this work? Currently, when an entity wants to send a XEP33 stanza, it first checks if there is a XEP33-enabled service available. To check that, it queries disco#info to the service, and looks for in the response.

If we also use XEP128, the disco#info response will not only announce XEP33 support, but also announce which are the exact limits in effect in the service.

Let's see an example. The Jabber server capulet.com wants to send a stanza with XEP33 addresses to the Jabber server shakespeare.lit. The response announces XEP33 support, and also provides information of several limits:

<iq type='get'
from='capulet.com'
to='shakespeare.lit'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

<iq type='result'
from='shakespeare.lit'
to='capulet.com'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'>
<identity
category='server'
type='im'
name='shakespeare.lit jabber server'/>
...
<feature var='http://jabber.org/protocol/address'/>
<x xmlns='jabber:x:data' type='result'>
<field var='FORM_TYPE' type='hidden'>
<value>http://jabber.org/protocol/address</value>
</field>
<field var='limit-remote-user'>
<value>20</value>
</field>
<field var='limit-remote-service'>
<value>100</value>
</field>
<field var='limit-local-user'>
<value>300</value>
</field>
</x>
...
</query>
</iq>

Saturday, 28 July 2007

XEP33: types of limits and default values

I've talked in the past that, if we want to limit the number of destination addresses in a XEP33 stanza, we must fix the XEP. One improvement I propose is to tell the limits in disco#info response using XEP128.

Let's suppose that limiting the number of destination addresses in a XEP33 stanza really serves a purpose, for example, to prevent or reduce abuse of the multicast service. If you accept this supposition, then you probably want fine-grained limits, so you can define different limits depending in the sending entity, type of stanza...

The different characteristics of a XEP33 stanza are:

  • sender is: local or remote
  • sender is: user or server/service/component
  • the stanza type is: message or presence. Note that iq stanzas don't directly include XEP33 addresses.
There are eight possible permutations, and I associate a limit to each one. Let's review them in detail:
  • limit-remote-user-message: A spammer can create accounts in a remote friendly server, and send XEP33 stanzas to our multicast service. I think that if a user wants to use a multicast service, he should ask the administrator of his server to install it, instead of using our multicast service. So, this limit should be 0 by default, which means users of remote servers can't send XEP33 stanzas directly to us.
  • limit-remote-user-presence: Same as above, default: 0.
  • limit-remote-server-message: There are many reasons for a remote server to send us a message stanza with XEP33 addresses: MUC message, pubsub message, user message... I propose a default limit of 20.
  • limit-remote-server-presence: A remote server sends presence stanzas when a user logins, logouts or changes presence.
    I consider that presence stanzas are not as annoying as message or iq stanzas. I propose a default limit of 100.
  • limit-local-user-message: In a publicly accesible Jabber server with unrestricted account registration (such as jabber.org), spammers can create accounts in jabber.org and use the local multicast service to send spam messages both to local and remote users/servers. I propose a default value of 20.
  • limit-local-user-presence: I don't see any reason for a user to send a presence stanza to several destinations. Same as with remote servers, I propose a default limit of 100.
  • limit-local-server-message: Obviously, this limit should be infinite always, since a multicast service is expected to trust a local server.
  • limit-local-server-presence: Same as above, default: infinite.
Summarizing I propose those limits::
  • zero: remote-user-*
  • infinite: local-server-*
  • variable: remote-server-* and local-user-*
Once XEP33 defines exact default values, if the limits in a XEP33 deployment are the default values, those limits SHOULD NOT be reported in disco#info responses.


PD: Another possible limitations to reduce abuse of a multicast service are # of messages per minute, # of addresses per minute, # of total bytes sent... But I don't expect them to be interesting for inclusion in XEP33.

Tuesday, 17 July 2007

Midterm GSoC project status, and remaining work

The Google Summer of Code program is half way to the end. This is a summary of the accomplished work until now:

  • Implemented or updated several small XEPs in ejabberd: Contact Addresses, Delayed Delivery... Patches are awaiting review and integration in ejabberd.
  • Implemented XEP33 in a Jabber component for ejabberd. All the code lives in mod_multicast.erl. Code is currently in ejabberd-modules SVN.
  • Describe Sever Active Multicast
  • Describe how to implement XEP33 in the server's C2S.
  • Added XEP33 support to ejabberd core (sending presence updates). Code in ejabberd-modules SVN.
  • Describe how to implement XEP33 in a MUC service
  • Added XEP33 support to ejabberd's mod_muc. Code in ejabberd-modules SVN.
  • Alpha-testing the code with small examples. Fix any bug, improve any potential drawback in the existing code.
The remaining tasks that I'm aware of, from now until the end of my GSoC project are:
  • Discuss improvements in the current XEP33 definition of limits.
  • Implement the improvements for limits.
  • Discuss potential security and spam vulnerabilities, and how to prevent them.
  • Propose improvements for XEP33. Most of the text can be reused from my previous blog posts.
  • Perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
  • Test compatibility with other XEP33 existing implementations.
  • Add XEP33 support to mod_pubsub and/or mod_pep if their codebase is stable at the time
  • Write documentation for ejabberd Guide.
  • Implement or update ejabberd's XEP133 Service Administration
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my code before commiting to ejabberd.
I weekly update the project timeline. Considering my speed at working, and the difficulty of the remaining tasks, I consider my project to be on track, and will be completed on time.

mod_multicast - Big rewrite: replaced loop with pool

During the past days I concentrated in improving a critical part of mod_multicast: the code that checks for protocol support on remote servers.

Originally all the processing for a user stanza was done in a single, large, procedural run. This included a loop that slowly checks XEP33 support for each remote server, sends the stanza accordingly, and then proceeds to the next server.

Now, the loop only sends the stanzas to the servers which support is already known. For the unknown servers, it only sends the iq:query disco#info request, but does not wait for an answer. Instead, the group of destinations related to that server (which are now considered 'Waiters') are temporarily stored in a 'Pool'. Eventually, a server answers, or an error is received; mod_multicast reads from the Pool the Waiters group and finally sends the stanza.

Future work: I must fix new bugs, related to the new code. Later I'll reconsider if the current Pool needs further changes. And finally, I can go back to the XEP33 problem with the addresses limits.

Monday, 9 July 2007

The limit of addresses in XEP33 must be fixed

Problem

XEP33 says that a server should have a limit for the maximum number of addresses allowed on a single packet: the limit SHOULD be more than 20 and less than 100.

That limit is easy to implement on the receiving party. But what happens with the sender? How many addresses can a sender put on each packet? If it puts too many, the packet will be rejected. If it puts too few, it is not profiting of XEP33 as much as it could do.

On the current version of XEP33, remote servers allow as few as 20 and as much as 100 addresses. This means that a sender have to reduce to the minimum common in order to not get rejects: it can only send as much as 20 addresses on each packet.

If we already know that 20 is the maximum limit in practice, then why bother telling some admins that they can put 30, 40 or more on their servers? Nobody will send more than 20 addresses on each packet!

Proposed solutions

I can see three solutions to the current situation:

  1. Unlimited addresses
  2. Strict limit
  3. Configurable limit, and method to inform
Since each one has pros and cons, let's see them in more detail.

#1: Unlimited addresses

Don't force any limit at all: XEP33 servers must accept a packet with as many addresses as the sender desires. The limit in this case is not imposed by XEP33, but by the server implementation, which usually limits the size of any XMPP packet.

This solution is the easier of all them to implement.

This solution can be potentially damaging. For example ejabberd limits the size of any received packet to 64KB. A spammer could construct stanzas with 4KB of message, and 60KB of destination addresses.
In general, the size of an 'address' element is around 50 bytes.
In practice, this allows a spammer to send one packet to a Jabber server that implements XEP33, and the server itself will send 4KB of spam to 1200 local accounts. And all this damage with just a single XMPP stanza.

#2: Strict limit

Define a strict limit on XEP33, and all senders and receivers must agree. For example 100. This way all Jabber servers, components and clients that implement XEP33 know how much addresses can be send on a packet.

This solution is easy to understand and to implement.

However, it limits the power of XEP33 on certain situations. For example, on a restricted and controlled private network where spammers are not an issue, it may be preferable to allow up to 1000 addresses.

#3: Configurable limit, and method to inform

Allow a configurable limit on the protocol, for example between 20 and 100, like currently. And describe a method for senders to know which limit is applied on each destination server.

This solution is the hardest to implement: it makes XEP33 more complex, and requires more code on both senders and receivers. The benefit of this solution is that it allows XEP33 to adapt to different network conditions.

Some topics that must be addressed if this solution is used:
  • The disco#info response that reports XEP33 support could indicate the limit:
    <feature var='http://jabber.org/protocol/address' limit=50 />
  • If a stanza is sent with more than the allowed addresses, the resulting error stanza informs of the limit:
    <not-acceptable limit=50 />
Temporary solution in ejabberd

Until a definitive solution is adopted for this problem, I implemented in ejabberd the easiest possible solution ever: never send a packet with more than 20 addresses. If this limit is reached, simply send several packets with smaller list of destination addresses.

Tuesday, 3 July 2007

ejabberd's C2S with XEP33 support

I have improved ejabberd's C2S code to use XEP-0033: Extended Stanza Addressing when the users login, logout or update presence.

This means that, when a user sends initial presence to login, or logout, or updates his presence, the server instead of sending a single packet to each contact on his roster, tries to send a packet to each destination server.

As usual, I've commited the change to ejabberd-modules SVN and the changes can be seen here: ejabberd_c2s.erl.

As previously with the MUC service, I followed the Guidelines for Server Active Multicasting on XEP33.