--- 2000.08.18 --- A lot of this is c&p or transcribed from rfc2060 * wire messages All messages between the client and server are a bunch of characters terminated by a CRLF. There are some exceptions about continuation with specific commands. * In IMAP every message has a bunch of attributes that are controlled by the IMAP server. ** In IMAP every message in a mailbox has two id's: o unique identifier o message sequence number UID is a 64 bit numbers consisting 2 32 bits numbers: o a per message unique id within a mailbox (aka folder) o a per mailbox unique identifier validity value The per message uid must be unique and increasing. If you re-order messages in the mailbox you must assign new id's to all the messages that are no longer in the same order as they were before the re-order. The per mailbox unique identifier validity value is what the client uses to tell if the uid's in a mailbox are still the same. If they have changed then this value must be higher than it was before. I am not sure but it sounds like these have to be unique as well between mailboxes. RFC recommends a 32bit timestamp. Could just map that to the create of the mh folder directory. The per mailbox unique identifier validity value would also be modified when the messages in the folder are re-ordered - causing us to have to regenerate the unique message identifiers. ** Every message has a Flags attribute This is a list of zero or more named tokens associated with a message. Flags can either be permanent or session only. There are two types of flags: "System" and "keyword." System flags are pre-defined by the server. System flags begin with "\". The currently defined system flags are: o \Seen o \Answered o \Flagged o \Deleted o \Draft o \Recent A keyword is defined by the server implementation. Keywords do not begin with "\".x Servers may permit a client to define new keywords in the mailbox.(which means they only exist in a specific folder/mailbox?) A flag can be permanent or session-only on a per-flag basis. It looks like the client sets all of these flags except for \Recent. ** Internal Date Message Attribute Not the rfc-822 header date & time, but a system date&time of when the message was received. This is the same as what the header defines for messages received via an smpt server. The special defined cases deal with the IMAP COPY and APPEND command. ** [rfc-822] size message attribute Size of message in octets as defined by [rfc-822] format. ** Envelope structure message attr A parsed representation of the [RFC-822] envelope information (not to be confused with an [SMTP] envelope) of the message. ** Body structure message attribute A parsed representation of the [MIME-IMB] body structure information of the message. * Message texts In addition to being able to fetch the full [RFC-822] text of a message, IMAP4rev1 permits the fetching of portions of the full message text. Specifically, it is possible to fetch the [RFC-822] message header, [RFC-822] message body, a [MIME-IMB] body part, or a [MIME-IMB] header. * IMAP Server states An IMAP4rev1 server is in one of four states. Most commands are valid in only certain states. It is a protocol error for the client to attempt a command while the command is in an inappropriate state. In this case, a server will respond with a BAD or NO (depending upon server implementation) command completion result. ** Non-Authenticated State In non-authenticated state, the client MUST supply authentication credentials before most commands will be permitted. This state is entered when a connection starts unless the connection has been pre- authenticated. ** Authenticated State In authenticated state, the client is authenticated and MUST select a mailbox to access before commands that affect messages will be permitted. This state is entered when a pre-authenticated connection starts, when acceptable authentication credentials have been provided, or after an error in selecting a mailbox. ** Selected State In selected state, a mailbox has been selected to access. This state is entered when a mailbox has been successfully selected. ** Logout State In logout state, the connection is being terminated, and the server will close the connection. This state can be entered as a result of a client request or by unilateral server decision. ** State diagram +--------------------------------------+ |initial connection and server greeting| +--------------------------------------+ || (1) || (2) || (3) VV || || +-----------------+ || || |non-authenticated| || || +-----------------+ || || || (7) || (4) || || || VV VV || || +----------------+ || || | authenticated |<=++ || || +----------------+ || || || || (7) || (5) || (6) || || || VV || || || || +--------+ || || || || |selected|==++ || || || +--------+ || || || || (7) || VV VV VV VV +--------------------------------------+ | logout and close connection | +--------------------------------------+ (1) connection without pre-authentication (OK greeting) (2) pre-authenticated connection (PREAUTH greeting) (3) rejected connection (BYE greeting) (4) successful LOGIN or AUTHENTICATE command (5) successful SELECT or EXAMINE command (6) CLOSE command, or failed SELECT or EXAMINE command (7) LOGOUT command, server shutdown, or connection closed * Data forms IMAP4rev1 uses textual commands and responses. Data in IMAP4rev1 can be in one of several forms: atom, number, string, parenthesized list, or NIL. ** Atom An atom consists of one or more non-special characters. ** Number A number consists of one or more digit characters, and represents a numeric value. re: [0-9]+ ** String A string is in one of two forms: literal and quoted string. The literal form is the general form of string. The quoted string form is an alternative that avoids the overhead of processing a literal at the cost of limitations of characters that can be used in a quoted string. A literal is a sequence of zero or more octets (including CR and LF), prefix-quoted with an octet count in the form of an open brace ("{"), the number of octets, close brace ("}"), and CRLF. In the case of literals transmitted from server to client, the CRLF is immediately followed by the octet data. In the case of literals transmitted from client to server, the client MUST wait to receive a command continuation request (described later in this document) before sending the octet data (and the remainder of the command). A quoted string is a sequence of zero or more 7-bit characters, excluding CR and LF, with double quote (<">) characters at each end. The empty string is represented as either "" (a quoted string with zero characters between double quotes) or as {0} followed by CRLF (a literal with an octet count of 0). Note: Even if the octet count is 0, a client transmitting a literal MUST wait to receive a command continuation request. *** 8-bit and Binary Strings 8-bit textual and binary mail is supported through the use of a [MIME-IMB] content transfer encoding. IMAP4rev1 implementations MAY transmit 8-bit or multi-octet characters in literals, but SHOULD do so only when the [CHARSET] is identified. Although a BINARY body encoding is defined, unencoded binary strings are not permitted. A "binary string" is any string with NUL characters. Implementations MUST encode binary data into a textual form such as BASE64 before transmitting the data. A string with an excessive amount of CTL characters MAY also be considered to be binary. ** Parenthesized List Data structures are represented as a "parenthesized list"; a sequence of data items, delimited by space, and bounded at each end by parentheses. A parenthesized list can contain other parenthesized lists, using multiple levels of parentheses to indicate nesting. The empty list is represented as () -- a parenthesized list with no members. ** NIL The special atom "NIL" represents the non-existence of a particular data item that is represented as a string or parenthesized list, as distinct from the empty string "" or the empty parenthesized list (). * Operational Considerations ** Mailbox Naming The interpretation of mailbox names is implementation-dependent. However, the case-insensitive mailbox name INBOX is a special name reserved to mean "the primary mailbox for this user on this server". So when a client refers to INBOX in any case we are probably going to have it refer to the mh folder "inbox" - and if that folder does not exist, we will create it for the user. (THey do not have to use it, it just has to exist.. B-) ) *** Mailbox Hierarchy Naming If it is desired to export hierarchical mailbox names, mailbox names MUST be left-to-right hierarchical using a single character to separate levels of hierarchy. The same hierarchy separator character is used for all levels of hierarchy within a single name. *** Mailbox Namespace Naming Convention By convention, the first hierarchical element of any mailbox name which begins with "#" identifies the "namespace" of the remainder of the name. This makes it possible to disambiguate between different types of mailbox stores, each of which have their own namespaces. For example, implementations which offer access to USENET newsgroups MAY use the "#news" namespace to partition the USENET newsgroup namespace from that of other mailboxes. Thus, the comp.mail.misc newsgroup would have an mailbox name of "#news.comp.mail.misc", and the name "comp.mail.misc" could refer to a different object (e.g. a user's private mailbox). *** Mailbox International Naming Convention By convention, international mailbox names are specified using a modified version of the UTF-7 encoding described in [UTF-7]. The purpose of these modifications is to correct the following problems with UTF-7: 1) UTF-7 uses the "+" character for shifting; this conflicts with the common use of "+" in mailbox names, in particular USENET newsgroup names. 2) UTF-7's encoding is BASE64 which uses the "/" character; this conflicts with the use of "/" as a popular hierarchy delimiter. 3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with the use of "\" as a popular hierarchy delimiter. 4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with the use of "~" in some servers as a home directory indicator. 5) UTF-7 permits multiple alternate forms to represent the same string; in particular, printable US-ASCII chararacters can be represented in encoded form. In modified UTF-7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character "&" (0x26) is represented by the two- octet sequence "&-". All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further modification from [UTF-7] that "," is used instead of "/". Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself. "&" is used to shift to modified BASE64 and "-" to shift back to US- ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet MUST end with a "- "). For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw- === 2002.12.12 Some implementation details - to keep things relatively simple I plan on implementing this in C (lets me not worry about java issues with using things like berkeley db, etc.) and I do plan on using state threads. I also wanted to use Boehm-GC but Boehm-GC & state threads do not play well together. Part of the reasoning for using state threads is that I get to write a multi-threaded program without having to worry about all of the MT issues. Part of it is that I want to write a server that can serve many users at once without having to have a separate process for each user - and without having to code a single server that has to worry about using C structures to separate all of the users' states. With ST I plan on having a demon that will server a certain number of clients, and when that number exceeds a limit it will fork off another copy of the process to tell with more clients. This will give us a scaleable service that is generally written as if it were a single process for each client. o About sequences and mailboxes and the like... It may be neat to do something a little more than MH does for our IMAP with MH backend - that is we want to make a virtual mailbox that has in it all of the currently unread email. This mailbox, being virtual, would probably not have persistent mail uid's which makes it not very well performing for an IMAP mail box but would be neat in that it would give us an easy way of seeing all unread mail in all boxes - which is probably better than having to visit all the boxes, or have everything end up in the inbox. o About incr.. if the user doese not have something that inc's mail out of their mail spool and in to their mh directories we could have the server off to do that inc'ing for them - with the option of running it through some filter program. Basically the idea is -- "tell me the command to inc mail for you." "tell me the command to check whether I should inc mail or not" and "should I use a command to check whether there is mail or not, or should I just look at a mail spool file (and what is the mail spool file.. default /var/mail/") o About our startup approach when we start up we iterate through all of the directories in the mail directory. We iterating through all of the database in our bdb file. If we find a directory we did not know about (and it is not on some sort of exclusion list) then we create a db for it and begin a db indexing operation of that directory. NOTE: Some of this could be done via "flists -recurse -fast" If we find a db that is suppose to represent a real mailbox (as opposed to a db that represents a virtual mailbox?) for which we have no directory, then we delete that db. db's will be named after the real path to that mailbox directory. If any of the directories have a last modify date that is after the one recorded in the db for that directory then we begin a "check and update or rebuild" process. The "check and update or rebuild" process checks to see if the unique message ids are still valid for that mailbox and if all we have are _new_ messages. If this is _not_ the case then something fiddled with the mailbox in such a way that we need to create a new version record (or whatever imap calls it) for that mailbox and begin a re-indexing thread for that mailbox (same process we undergo when we find a new mailbox.) Otherwise all we need to do is add the new messagees to the mailbox. === 2002.12.27 A short note about state threads and berkeley db. We plan to use db4 with the "Berkeley DB Concurrent Data Store applications." This is the single writer multiple readers scenario where no process can be granted a write lock as long as a read lock or higher is currently on the db. Any process that tries to gain such a lock will block at the process level. Experience from coworkers has shown that we should not try to use the db4 lock detection mechanisms as they are more for transactions. Now one thing we want to support is to have multiple login instances, either within a single unix process, or between multiple processes be able to access a single mailbox. We intend to have a db only represent a single mailbox (we may need to have more than one db represent a single mailbox but we do not intend to have one db represent more than one mailbox.) Between processes this is mildly okay as the process that blocks while waiting for another process to finish its work will, in general, not block that other process unless we have two interdependent locks outstanding. Within a process different state threads could get in to deadlock issues on a single db and this would be bad. The plan is to create a state thread condition variable for each db opened within a process and use those to prevent the deadlock. We still want to reduce the issues between processes as a single process will be serving multiple clients. We do not want a single login instance in a process to lock up the entire process while waiting for _another_ single login instance in a separate process do its business -- at least not for too long. I am thinking of seeing if the suggested way of doing inter-process communications is good enough. We may not catch all the cases but even if we do not, we just lock up one process for a short amount of time -- ie: the worse case is still the worse case but it will frequently be better than that. The method recommended by the state thread programming notes involves creating a semaphore out of standard IPC calls: Ideally all user sessions are completely independent, so there is no need for inter-process communication. It is always better to have several separate smaller process-specific resources (e.g., data caches) than to have one large resource shared (and modified) by all processes. Sometimes, however, there is a need to share a common resource among different processes. In that case, standard UNIX IPC facilities can be used. In addition to that, there is a way to synchronize different processes so that only the thread accessing the shared resource will be suspended (but not the entire process) if that resource is unavailable. In the following code fragment a pipe is used as a counting semaphore for inter-process synchronization: #ifndef PIPE_BUF #define PIPE_BUF 512 /* POSIX */ #endif /* Semaphore data structure */ typedef struct ipc_sem { st_netfd_t rdfd; /* read descriptor */ st_netfd_t wrfd; /* write descriptor */ } ipc_sem_t; /* Create and initialize the semaphore. Should be called before fork(2). */ /* 'value' must be less than PIPE_BUF. */ /* If 'value' is 1, the semaphore works as mutex. */ ipc_sem_t *ipc_sem_create(int value) { ipc_sem_t *sem; int p[2]; char b[PIPE_BUF]; /* Error checking is omitted for clarity */ sem = malloc(sizeof(ipc_sem_t)); /* Create the pipe */ pipe(p); sem->rdfd = st_netfd_open(p[0]); sem->wrfd = st_netfd_open(p[1]); /* Initialize the semaphore: put 'value' bytes into the pipe */ write(p[1], b, value); return sem; } /* Try to decrement the "value" of the semaphore. */ /* If "value" is 0, the calling thread blocks on the semaphore. */ int ipc_sem_wait(ipc_sem_t *sem) { char c; /* Read one byte from the pipe */ if (st_read(sem->rdfd, &c, 1, -1) != 1) return -1; return 0; } /* Increment the "value" of the semaphore. */ int ipc_sem_post(ipc_sem_t *sem) { char c; if (st_write(sem->wrfd, &c, 1, -1) != 1) return -1; return 0; } The above is a way to have a single count shared by all processes for a single pipe. We want something like that but want it to do more.. ie: have a name (used to refer to what db) and a count. Hm.. That will be hard to do with a single pipe. Would need one pipe per db.. UG!!! Another alternative is that when a process gets a connection it authenticates and identifies the user, it then queries all of the other processes if they are currently processing any connections for that user and if they are it passes the fd to that process (if that process is willing to accept it.) This does have issues in that a process will have a maximum # of clients it will support. Also need some rules for shutting down and starting new such processes. Do IMAP clients tend to open up a connection to the IMAP server and just leave it there all the time? We should probably have a soft limit and hard limit for each process. Soft limit could be set by # of unique users a process will support and hard limit will be # of clients a process will support.. OR soft limit is # of clients, and we will only accept more clients if they are of a user we are already supporting a client conection for. This would be done by: once we hit our soft limit we close our listening socket (and send a message to our parent that we are no longer listening.. and when all of our current clients go away we will exit.) Hm, and if a process is not willing to accept an fd then we just degenerate to the mode above -- when one process has to do a write it will cause the other process to block. We can even make that our starting impelmentation and add the fd passing later. I am assuming a central keeper process that could be used for such communication (of which sub-processes currently have connections for which users.) Oh, hm.. On another note entirely I just realized that my current design assumes that a single process can access files owned by different users. Part of the whole solution is that users can store their entire mail spool and associated db files in locations of their choosing. If we can not do this then we have to either have a central repository or super user access or a writeable group or a single process per connection. I am currently thinking writeable group but that has large security issues.. means that there will be a unix group that can access everyone's email that uses this system (and that for setup they need to enable this group's access to their files.) Now that I think of it again that all may be too much.. and we may just need to "fork" a new demon for every single user that connects.. in which case we may just want to use inetd and do it all via stdin/stdout. Do we still want to use state threads in this case? Perhaps have requests from the user create a new thread.. and also have some background thread (or threads) that checks on the last-modify times of folders. === 2002.12.31 more thoughts on the mh integration. We will need to be able to parse the .mh_profile file for various things. We will also need to be aware of how to properly use the "flist" command as well as 'scan unseen' We also want to make a new MH context for each IMAP client that connects to us. We could just have one for each user -- it would be simpler.. but they would interfere with each other if they were doing commands in different mailboxes that update the context and it is best just to avoid that. This means that we need to set up an MHCONTEXT env. variable before we execute any MH command and we had better make sure that this context is unique. We also need to clean up the context files we create when the user logs off (or when the demon shuts down.) The question I need to figure out is do we just commands like "flist" and "scan unseen" via popen(3), or do we do it all ourselves by scanning last modify times on directories that indicate folders, or do we do it via some combination of both. "flist" forks a process and could be rather resource intensive - but it does do all the work for us. A combination approach would be to iterate over the directories and check for last modify times and if any of them have changed, do the appropiate flist command, and other actions. The question, I suppose, becomes what do we need to be aware of? === 2003.08.08 Good prime.. no comments since end of 2002.. So we are now down to a single process per user. This process is launched when they log in to the IMAP server. It is either a single demon that is sitting waiting for connections and spawns a process for each user or INETD. Further more if a single user connects more than once we need to have both connections go to the same process. This means that we need to have of moving connections over to the new process. In order to do that we need to establish a connection to the existing process for a user if it already exists and then transfer the new connection to the existing process. I believe we will leave authentication of the user and of the connection in the original process. This way it will not spawn the first user process or pass a new connection off to the existing user unless the new connection can authenticate. This does somewhat limit the ability to let the user process use user private data for authentication. === 2003.09.16 Okay. So, after looking at libraries to do rfc822 message header parsing and all that guff the only thing that was really out there seemed to be c-client. It was pretty grody in that it had a lot of baggage that,although some of it looked very useful, a lot of it did not. I needed to make some progress so I decided I would look at the Python stuff. I used python at work but it had been a while since I wrote a program of any real size that did anything really complex. The performance seemed acceptable. As usual I am leaping in to a rather complex project right away with a tool I am rusty or not totally familiar with. So I am trying to write the complete server in Python. It has a lot of support for the stuff I need. (I was most worried about having to have it parse the entire message when I just want the header but it does seem that the header-only parsing functions only read the header and parse it leaving the rest of the file unread.. and the MH & bsddb modules will make other aspects. I am not sure if I will use threads or try to just use the asyncore routines. Some of these file operations will lock the server for a while. So back to how I want to store the data. We are going to use berkeley db (using the bsddb3 interface, which is now just bsddb in Python 2.3+). We will have one database per folder. I am wondering if I need a database that lists all the folders - if this is faster than seeing if a particular db exists for a particular folder. So, for now setting aside the MH meta/toplevel/configuration db.. folder db's: I do not need the path name to the actual folder. I can derive that at run time from the folder name and the MH path. I need to know the last time the db and the folder were in sync - I will use the mtime on the directory. I want this instead of ctime because this will tell me when the folder (and its contents) is changed via mknod, utimes, and write - but not when it is changed via chmod, chown, link, rename, unlink. I need to have a serial number for the folder itself. This will be used for the unique identifier validity value (see rfc2060). I also need to have a unique identifer for every single message. This uid is monotonically increasing but it does not have to be contiguous. They have to be unique though - if I reset the serial number for the folder (the unique identifier validity value) I am allowed to reset this uid.. the combination of folder/mbox uid-vv and message uid has to be unique.) If we remove a message from the middle of this sequence in a folder - that is okay.. we do not need to change the uid-vv or any of the uid's assigned to messages. However, if we re-order the messages in the folder such that the uid-vv's are no longer monotonically increasing, then we need to re-assign uid's to all of the messages starting with the first one that is out of order, and these uid's must be higher than anything previously assigned in this folder. We could also bump the uid-vv for the folder and start the entire uid sequence over. I think that this will not happen that often in most folders. Resequencing usually only happens when you resort a folder by date (you can also resort it by sender or subject (by any means for that matter) but that does not happen that often - and usually only in archive folders. I also need a message sequence number which is monotonically increasing and contiguous for every message in the mailbox/folder. This we will just build on the fly as we go through a folder. This means that for both of these I can not use the message number that MH uses. NOTE: We may want other databases for things like looking up all messages to/cc/bcc a particular destination and from a particular destination. These db's need some way to refer to messages in other folders reliabily - I guess that would at least be by folder name, uid-vv/uid. We would need to be able to quickly and easily update this ancillary db when we detected changes - and it may mean a lot of rebuilding. So we will leave this for later. NOTE: We may want a way to build a temporary or virtual folder of all the messages in all the folders that are part of a certain sequence (such as unseen) Okay, so in the folder db we have: o last mtime o uid-vv - we are going to have the uid-vv's be unique among all folders. If we have to change a folder's uid-vv then we pick a uid-vv greater than the currently max assigned uid-vv. I imagine this will be in a meta-db. If we do this we will need to remove the old entry from that db and add the new entry. This is useful in that since we store uid-vv & uid's in each message, when we move a message to a new folder it will be easy for that folder to see that that message is new to that folder (and needs to have its uid-vv.uid reassigned. o current highest uid (although we probably should just go to the last element in the db and get its uid.. ) o we may want to have elements in the db that are like secondary indices: all of the messages that are flagged a certain way, such as \Delted - this may be simpler to do than having a secondary index db. o Users can define flags on a folder by folder basis. I believe that means we have to record what flags they have defined. o for each message a uid - This will be a key! Maybe we should have a db just for the uid's and another db that is just tied to a dictionary for the folder attributes. o for each message a set of flags These are things like \Seen, \Answered, \Flagged, \Deleted, \Draft. Some of these flags we derive from headers in the message. Some we set based on commands by the user. Some are derived by other means (for instance, if all messages in the drafts/ folders are drafts, then they all automatically get the \Draft flag) There can also be user-defined flags. I wonder if I should tack some of the flags in to the message's header? Probably not.. they would get forwarded with the message. It would be nice to preserve the flags in case we have to re-assign the message's uid. NOTE: If our server is the one that is told to move a message from one folder to another it should somehow try to preserve persistent flags. This, to me, almost means that I should probably have the key not be the uid alone but the uid-vv.. then we could stick an entry in the destination folder's db under this uid-vv which would let us store the flags there. Or perhaps we should have an entry in the destination folder's db that lists "new messages: uid-vv/uid (data we want to persist.) Or just have the move operation, if it moving the message to the end of the folder which I think is the only thing we have to allow, add this new message to the folder db with is proper uid and all. o for each message what file it should refer to o for each message how we know that a message is the message we think it is. We shall call this the identifier (to differentiate it from the uid.. this is something that is in the message that we can use to uniquely identify the message. I wonder if Message-ID is good enough.. or if I should perhaps re-write the header adding the uid?) NOTE: Message-id is not good enough if you get two copies of the same message which does happen from time to time so may be we will re-write the message adding our own header line and making that unique. NOTE: in experience these messages are exact duplicates usually received via more than one address or mailing list. I should probably make a tool that will go through a folder and remove duplicates.. should be simple enough with the tools at hand in python. Question is how to trigger it - I suppose could add a button to exmh.. but may want to do this remotely. NOTE: I have decided that I will re-write each message and add the uid&uid-vv in a header on these messages. This takes the place of the identifier. So the key in the db that uniquely identifies the message is a catenation of folder uid-vv and message uid. NOTE: We may need a way to indicate to any other processes inside this server that we are fiddling with a folder so that other threads/whathaveyou will block while this fiddling is happening. Perhaps a dictionary with semaphores. When we start up in a folder, if the folder has been modified, we need to re-check the mappings for identifier -> file name and identifier -> uid. We go through the messages in the folder in the order of the MH folder message number. We take the message, determine its identifier, look it up in the folder db and see if the identifier -> file name mapping is valid. If it is we can move on to the next message - the uid will not have changed either. If the identifer -> file name mapping does not match then either: 1) the folder was packed 2) the message was removed from this folder 3) the folder was re-sorted. if the identifier in the message is the same as the identifier that we expect in the db then the order is preserved. The folder was probably packed. Re-set the identifier -> file name mapping go to the next message Now, either the folder was re-shuffled from this message onward and we may need to re-uid all further messages, or there is a gap in the sequence. Whether this gap was caused by the messages being removed from this folder or by them being shuffled later in the folder we do not care. If the identifier from this message is in our folder db anywhere: delete all the entries in the db between the record we found in the db and the last good record we had in the db. Re-set the identifier -> file name mapping and go to the next record NOTE: There should be a way to tell the server to ignore a specific folder unless it is told by the IMAP client or by some other signal to go and re-check that folder specifically. This will be useful for the spam folder that gets a lot of email that is mostly ignored. === 2003.09.24 Now we go back to how we want to organize these objects and what methods they have and how this relates to how the server actually operates. So we have these objects: o Server o UserMHDir o Mailbox o Message o Client === o Server: The Server object is what accepts the connections and authenticates them. Once it authenticates a connection it sees if there is already a process handling a UserMHDir object for that user. If there is it passes the fd off to that process to handle. If not it forks and execs a new process that setuid's to the user id of the user that authenticated (we may need a mapping to handle virtual users) and that process instantiates a UserMHDir object. The Server object is running as root. We will use the recvmsg/sendmsg stuff implemented as an imported C module for passing the fd's around. o UserMHDir: when it is created it will setup its environment. This includes opening the database for the MH Mail dir it represents and loading some state. One of the things it can begin is a check to see if the list of folders it has corresponds to the list of folders that are actually in the MH directory. At the same time it can see if any of these folders are out of date. We now have: a list of folders that exist in MH but not in our db a list of folders that are out of date wrt our db a list of folders in our db that are not in MH We should delete the objects in our dbs related to folders that no longer exist immediately. For folders that we found that we do not know about we probably should add them to our internal objects immediately - then add them to our list of folders that are out of date. For folders that are out of date we want to re-sync them - however we actually do not need to resync them until the user attempts to use them. One thing we want to do is re-sync the most important ones first. The user is likely to select some more quickly than others. Also while we are idle we should re-sync the other folders - however a resync process may be expensive - it may be worth just waiting until a user selects a folder before we even try to sync it - we do not want a pre-emptive re-sync to cause performance to degrade for other operations. Perhaps some sort of signal when we begin processing a client request that pauses background re-syncs that have started up - Use the threading.Event object for this and a background thread that once we receive a command indicates we are doing something and when the command is done we wait for like 20 seconds before unsetting this flag. NOTE: NOP commands and other light weight commands could be excluded from this behaviour? The Event object would exist on the UserMHDir, Client object's would be what set it and say it is okay to think about unsetting it - causing a timer to be started that would unset it. Hm, need a way to make sure it is set for the longest possible time, as well.. ever Client that sets it increments the count and when the count reaches 0 we start the timer.. if you attempt to set it when the count is 0, but the timer is running, it cancels the timer and sets the count to 1. o Mailbox NOTE: A Mailbox object will use weak references to see if any other objects are manipulating it or have references to it besides the UserMHDir object. If after a time out no one else is referring to a Mailbox instance, and it is still in sync with the MH folder it shadows, it will "turf itself out" which means that the object instance will delete itself (although all the persistent data remains.) This way, over time, unused Mailbox objects will silently go away freeing up resources - of course if a Client is constantly probing all of the mailboxes, they will never get a chance to cleanup. delete() - This deletes a mailbox.. not the object instance but the persistent data store that represents this mailbox. We assume that once this delete operation is done our caller will delete the instance of this object as well. NOTE: This will call up in to the UserMHDir() object that this mbox is within and delete it from the datastructures and db's that the UserMHDir() has that refer to and name this mailbox -- with appropiate locking. === 2003.10.02 Okay, we have enough code written that we need to address the mh folder / mailbox resync issue. We are going to try and start with the approach of going through the messages in the MH folder and matching them up with what we expect to be in the db. uids_covered = Set() for msg in mh_folder: get_headers(msg) if msg.has_key(imap_uid): recent = False msg_uid = msg[imap_uid] else: recent = True if in "re-writing all subsequent entries" mode then: next_uid = get_next_uid() if msg_uid and msg_db.has_key(msg_uid): msg_db_entry = msg_db[msg_uid] msg_db.delete(msg_uid) msg_db_entry[flags] += NotRecent else: msg_db_entry = {} msg_db_entry[flags] += Recent msg_db_entry[msg-file] = msg if msg in unseen sequence: msg_db_entry[flags] += unseen else: msg_db_entry[flags] += seen if 'internal date' not in msg_db_entry: if msg.has_key(delivery-date): msg_db_entry['internal date'] = msg['delivery-date'] else: msg_db_entry['internal date'] = msg.getmtime() msg_db[next_uid] = msg_db_entry re-write_message(msg, uidvv, next_uid) uids_covered.add(next_uid) continue fi if recent: # This message has no uid.. this means we have never seen it # before. This means that every message _after_ this message, # whether we have seen it before or not, will need to be # re-written with a new uid. 're-writing all subsequent entries' = True next_uid = get_next_uid() msg_db_entry = {} msg_db_entry[flags] += Recent msg_db_entry[msg-file] = msg if msg in unseen sequence: msg_db_entry[flags] += unseen else: msg_db_entry[flags] += seen if 'internal date' not in msg_db_entry: if msg.has_key(delivery-date): msg_db_entry['internal date'] = msg['delivery-date'] else: msg_db_entry['internal date'] = msg.getmtime() msg_db[next_uid] = msg_db_entry re-write_message(msg, uidvv, next_uid) uids_covered.add(next_uid) continue fi # Okay - we got here because the message had a uid. This means # this server has probably seen it. # if msg_uid[0] != folder.uid_vv: # However, If the uid-vv is not the uid-vv of this folder than # this message was either copied here from another folder or is # from a previous version of this folder. In either case, we need # to make a new uid for this message AND all following messages # will get re-numbered as well (since all the messages must have # monotonically increasing uid's.) # 're-writing all subsequent entries' = True Everything else just like the first case! continue # Okay, so the message has a uid and the uid-vv is valid. # This is where we should be ending up if our mailbox is mostly up # to date with our MH folder. The uid of this current message # should be greater than the uid of our previous message. Also we # should be finding this uid in our msg db. If these are true then # all we need to do is potentially update what file this message # refers to. If the message is on the unseen sequence, add that # flag. if msg_uid[1] > prev_uid: changed = False if msg_uid not in msg_db: # Strange.. the uid-vv was correct, but it was not in the # db. changed = True msg_db_entry = {} msg_db_entry[flags] += Recent else: msg_db_entry = msg_db[msg_uid] fi if msg_db_entry[msg-file] != msg: msg_db_entry[msg-file] = msg changed = True fi if msg in unseen sequence: if 'unseen' not in msg_db_entry[flags]: changed = True msg_db_entry[flags] += unseen fi else: if 'seen' not in msg_db_entry[flags]: changed = True msg_db_entry[flags] += seen fi fi if 'internal date' not in msg_db_entry: changed = True if msg.has_key(delivery-date): msg_db_entry['internal date'] = msg['delivery-date'] else: msg_db_entry['internal date'] = msg.getmtime() fi fi if changed: msg_db[msg_uid] = msg_db_entry fi uids_covered.add(msg_uid) prev_uid = msg_uid[1] continue # Okay.. we got here, the message had a uid, the uid-vv was valid # but the uid was NOT greater than the previous uid. This means # that this message was shuffled from somewhere earlier in the # folder to somewhere later. This means that this message and # all the messages that follow it need to be re-uid'd. Just like # the other cases.. # 're-writing all subsequent entries' = True Everything else just like the first case! rof # We now have a db that has it in a uid-vv.uid entry, strictly # ascending, for every message in the MH folder, in the same order as # the MH folder. We preserved what uid's we could. We preserved what # information associated with what uid's we could. # # Now we have one remaining issue: there may be uid's in the db that # do not correspond to any message we saw while going through the MH # folder. This would be from messages deleted from the MH folder, for # instance, but not deleted from the db. This is why we were building # up a Set of all the uid's we have seen. Now we get all the keys of # our message db and create a set out of those. This should be a # superset of the uid's we have seen. Generate a new set that is all # of the uid's we have seen from all of the keys in the db. This is # the set of keys that is in the db that corresponds to no message in # our folder. We can just delete these keys outright from the db. uids_in_db = Set(msg_db.get_keys()) uids_in_db -= uids_covered for uid in uids_in_db: msg_db.delete(uid) and we are DONE! === 2003.10.08 Finally have implemented the resync folder code. This was previously the biggest barrier to getting this project done. We still need to write a lot of code. One of the things we need to do is constantly rescan the mailboxes to see if they get out of date. One thing that we probably should also do is scan the mtime of the .mh_sequence files as well. If a .mh_sequence file changes then also do a resync on the folder... perhaps this should only happen if we have an active mailbox - so make this a mailbox thread. Actually, we could make resync the mailbox be a mailbox thread in general? As long as a mailbox is active it watches its mtime & .mh_sequences mtime and resyncs if they change. This way the "resync_all_mailboxes" function in the UserMHDir object just goes through each mailbox and forks off a resync watcher thread for them. Perhaps have a semaphore in the UserMHDir object to make sure that there is no more than a certain number of resync threads running at once. I think that can wait a bit though. I think the thing to do next is start adding methods to the Mailbox object that will provide the answers needed to meet the protocol. We also need a Client object. We will probably start out with a "test_receive_message()" method or something more appropiately named. Someone on a python cli can invoke this method with a message as received from a client and it will cause the appropiate things to happen. So, looking at rfc2060 here are the commands we may receive from an IMAP client and an attempt at discussing what this command does inside our server. Format: We will take the text from the RFC, and add our text after it. 6.3.1. SELECT Command Arguments: mailbox name Responses: REQUIRED untagged responses: FLAGS, EXISTS, RECENT OPTIONAL OK untagged responses: UNSEEN, PERMANENTFLAGS Result: OK - select completed, now in selected state NO - select failure, now in authenticated state: no such mailbox, can't access mailbox BAD - command unknown or arguments invalid The SELECT command selects a mailbox so that messages in the mailbox can be accessed. Before returning an OK to the client, the server MUST send the following untagged data to the client: FLAGS Defined flags in the mailbox. See the description of the FLAGS response for more detail. EXISTS The number of messages in the mailbox. See the description of the EXISTS response for more detail. RECENT The number of messages with the \Recent flag set. See the description of the RECENT response for more detail. OK [UIDVALIDITY ] The unique identifier validity value. See the description of the UID command for more detail. to define the initial state of the mailbox at the client. The server SHOULD also send an UNSEEN response code in an OK untagged response, indicating the message sequence number of the first unseen message in the mailbox. If the client can not change the permanent state of one or more of the flags listed in the FLAGS untagged response, the server SHOULD send a PERMANENTFLAGS response code in an OK untagged response, listing the flags that the client can change permanently. Only one mailbox can be selected at a time in a connection; simultaneous access to multiple mailboxes requires multiple connections. The SELECT command automatically deselects any currently selected mailbox before attempting the new selection. Consequently, if a mailbox is selected and a SELECT command that fails is attempted, no mailbox is selected. If the client is permitted to modify the mailbox, the server SHOULD prefix the text of the tagged OK response with the "[READ-WRITE]" response code. If the client is not permitted to modify the mailbox but is permitted read access, the mailbox is selected as read-only, and the server MUST prefix the text of the tagged OK response to SELECT with the "[READ-ONLY]" response code. Read-only access through SELECT differs from the EXAMINE command in that certain read-only mailboxes MAY permit the change of permanent state on a per-user (as opposed to global) basis. Netnews messages marked in a server-based .newsrc file are an example of such per-user permanent state that can be modified with read-only mailboxes. Example: C: A142 SELECT INBOX S: * 172 EXISTS S: * 1 RECENT S: * OK [UNSEEN 12] Message 12 is first unseen S: * OK [UIDVALIDITY 3857529045] UIDs valid S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft) S: * OK [PERMANENTFLAGS (\Deleted \Seen \*)] Limited S: A142 OK [READ-WRITE] SELECT completed SO what does this mean for us? The Client object will get a "select_mailbox()" method. It will take the name of a mailbox being selected, it will also take an examine boolean. It will return an IMAPResponse object. The select_mailbox() method will do: o if we already have a selected mailbox we will remove this client from that mailbox (mailbox.remclient()) o set the selected_mbox field to None o If the Mbox has the \Noselect flag then we return a "NO" IMAPResponse o It will ask the UserMHDir object for a Mailbox by the given name. o If this fails it will generate and return a "NO" IMAPResponse. o set the selected_mbox field to the mailbox we got. o check the mtime of the mailbox. Call resync if it is not up to date o check the mtime of the mh_sequences file. Call resync_seq_and_flags() if it is not up to date. o Lock the mailbox so no one else can fiddle with it while we are doing this (?) o It will create an "OK" IMAPResponse. o It will add this client to this mailbox (mailbox.addclient()) o It will query the mailbox for the number of messages (mailbox.num_messages()) and add the result to the IMAPResponse. o It will query the mailbox for the recent messages and add the length of that list to the IMAPResponse (mailbox.getrecent()?) o It will get the uid-vv of the mailbox and add that to the IMAPResponse o It will query the mailbox for the list of unseen messages and take the first part of the first element of this list o It will query the mailbox for the list of valid flags and add this to the IMAPResponse o if the examine boolean is false: It will query the mailbox for the list of client settable permanent flags and add this to the IMAPResponse else: It will add "OK [PERMANENTFLAGS ()]" to the IMAPResonse fi o It will unlock the mailbox. o if the examine boolean is true: add OK [READ-ONLY] to IMAPResponse selected_rw = False else add OK [READ-WRITE] to the IMAPResponse selected_rw = True fi o it will return the generated IMAP response 6.3.2. EXAMINE Command just call client.select_mailbox(mbox, examine = True) 6.3.3. CREATE Command call the usermhdir.create_mailbox(mbox) command: o create an IMAPResponse object o If the folder is "INBOX" or if the folder specified already exists: add "NO" to the IMAPResponse object and return fi o for each dir in the mailbox listed: if the folder does not exist: create the mh folder create a related Mailbox object mailbox.resync() ? (should be very fast) o add "OK" to the IMAPResponse o return IMAPResponse 6.3.4. DELETE Command call the usermhdir.delete_mailbox(mbox) command: o if the MH folder does not exist then: usermhdir.big_lock.acquire() try to delete the mailbox db's and the mbox entry in the mailbox_db except: return a "NO" IMAPResponse else: return a "OK" IMAPResponse usermhdir.big_lock.release() o see if the MH folder has any sub-folders. o if mbox entry in the mailbox_db does not exist: if the MH folder has no sub folders then: delete the MH folder else: create a database_entry for this mailbox, set the \Noselect flag fi return an "OK" IMAPResponse. o get an instance of the Mailbox object o usermhdir.big_lock.acquire() o if the MH folder has no sub folders then: delete the MH folder delete the Mailbox return an "OK" IMAPResponse. usermhdir.big_lock.release() o if the mailbox has the \Noselect flag then return a "NO" IMAPResponse. usermhdir.big_lock.release() o set the \Noselect flag o delete the mailbox specific dbs o delete all the messages in the MH folder o usermhdir.big_lock.release() 6.3.5. RENAME Command call the usermhdir.rename_mailbox() method: o make list of mailboxes be up to date. o if source mailbox does not exist: return "NO" IMAPResponse o if the destination MH folder exists: return "NO" IMAPResponse o is the mailbox object active and does it have any clients? return "NO" IMAPResponse - Mailbox in use. o usermhdir.big_lock.acquire() o if source mailbox is INBOX: make the destination mailbox lock the destination mailbox lock the INBOX for i in INBOX-msgs do: mv message file to destination MH folder copy db entry to destination mailbox remove db entry from INBOX unlock the INBOX unlock the destination mailbox usermhdir.big_lock.release() spawn a resync thread (that waits for a non-busy momment?) return "OK" IMAPResponse o else: rename the source mh folder to be the destination mh folder do recursive rename of subdir db related stuff (mbox) return "OK" IMAPResponse recursive rename of subdir db related stuff (old_name, new_name): if the mailbox is instantiated: mailbox = usermhdir.mboxes[old_name] call mailbox.rename(): close the mailbox specific db's rename the mailbox specific db's mailbox.name = new_name re-open the mailbox specific db's del usermhdir.mboxes[old_name] usermhdir.mboxes[new_name] = mailbox else: call Mailbox.rename(): create db handle rename the mailbox specific db's drop db handle for folder in subdirs of current mh folder (new_name): construct old_name for folder, new_name = folder call recursive rename of subidr db related stuff(old_name, new_name) 6.3.6. SUBSCRIBE Command NOTE: This is supposed to add 'the specified mailbox name to the server's set of "active" or "subscribed" mailboxes as returned by the LSUB command.' However, we have no real concept of a mailbox being active or subscribed. The RFC gives an example of subscribing to a news group. In terms of our IMAP server I can not think of this having any meaning at all - except perhaps making us keep those Mailbox objects instantiated and in memory if they exist. I think we may want to have a mapping.. the key is a subscribed mailbox, the value is the list of the client objects that have subscribed to this mailbox. I believe it does not make sense for us to allow someone to subscribe to a mailbox that does not exist, although according to the RFC we can not remove such a mailbox from our list if it no longer exists. 6.3.7. UNSUBSCRIBE Command See the discussion for the 'SUBSCRIBE' command. 6.3.8. LIST Command Our specialized IMAP server only understands MH Mail folders. This whole business about a "reference" and the like is just totally confusing. I wish rfc2060 simply stated what the point of the reference was. I suppose if they give a reference of anything but "" we return "NO" that we can not interpret that reference. "*" means match this level and all sub-folders. "%" means only match mailboxes that match at this level. The reference in the LIST command will always be interpreted from the root of the Mail directory. The root will always be \Noselect. I think for now I will just say this only works on unix and the hiearchy sep character will always be "/" This means that we can just concat the reference with the mailbox name. The reference must be a complete mailbox name. usermhdir.list(reference, mailbox_pattern): o update the list of mailboxes - usermhdir.update_mbox_list() o if reference is not "" and not a valid mailbox name: return "NO" IMAPResponse: "not a valid reference" o Generate a re we can use to match mailbox names according to the pattern we got: pattern = reference + mailbox_pattern pattern = "(" + re.escape(pattern) + ")" pattern = re.sub('\\\\\*','.*',pattern) pattern = re.sub('\\\\\%','[^//]*',pattern) pattern = re.compile(pattern) o matches = [] o for mbox in self.mailboxes.keys(): res = pattern.match(mbox) if res: match = res.group(1) if match not in matches: matches.append(match) fi fi rof o create IMAPResponse o for folder in matches: attrs = eval(self.mailboxes_db[folder])['attributes'] append to IMAPResponse("* LIST (attrs) \"/\" folder o append "OK" to IMAPResponse o return 6.3.9. LSUB Command o Apply the same rules to generate a pattern as for the LIST command o apply the "pattern" to the entries on this clients active list o when building the response, if a mailbox does not exist add the \Noselect flag. o append "OK" to IMAPResponse o return 6.3.10. STATUS Command o it items contains invalid item return "BAD" IMAPresponse o if mailbox does not exist return "NO" IMAPRresponse call mailbox.status(items): o self.resync() o self.resync_sequences() o for item in items: res.append(mailbox."item"()) o append "OK" o return IMAPResponse 6.3.11. APPEND Command o if mailbox does not exist: if mailbox name is not "/" and < maxpathlen char: return "NO" IMAPResponse with [TRYCREATE] else return "NO" IMAPResponse o read full message o parse in to message structre o if parse fails return "BAD" IMAPResponse - 'bad message' o parse flags - if any of the flags not proper: return "BAD" IMAPResponse - 'can not set flag ' o if mailbox is not instantiated, instantiate it. o mailbox.append(message, flags): o return "OK" IMAPResponse mailbox.append(message, flags, date): o self.resync() o self.resync_sequences() o self.big_lock.acquire() o find greatest msg-file in db o incr msg-file, open file 'w' with interlock o set uid field on message object o write out message to MH folder o create db entry under uid. o if date not None: msgs_db[uid]['internal-date'] = date else msgs_db[uid]['internal-date'] = time() o set flags as given, add \Recent o check sequences mtime, update in memory cache if not o if no \Seen flag is specified: add msg-no to unseen sequence write out sequences file update sequences mtime o self.set_attribute('\Marked') o self.update_mtime() o self.big_lock.release() o self.notify([uid]) o return 6.4. Client Commands - Selected State In selected state, commands that manipulate messages in a mailbox are permitted. In addition to the universal commands (CAPABILITY, NOOP, and LOGOUT), and the authenticated state commands (SELECT, EXAMINE, CREATE, DELETE, RENAME, SUBSCRIBE, UNSUBSCRIBE, LIST, LSUB, STATUS, and APPEND), the following commands are valid in the selected state: CHECK, CLOSE, EXPUNGE, SEARCH, FETCH, STORE, COPY, and UID. 6.4.1. CHECK Command Just runs house keeping for this mailbox - lets us do a resync when the client asks us to (although I think we are doing a resync with pretty much every operation...) o client.mailbox.resync() o client.mailbox.resync_sequences() o return "OK" IMAPResponse 6.4.2. CLOSE Command o if client selected mailbox is not readonly: client.mailbox.expunge() - Note we discard the response from this command. o return "OK" IMAPResponse 6.4.3. EXPUNGE Command o if client selected mailbox is readonly: return "NO" IMAPResponse - mailbox is readonly o msgs = client.mailbox.expunge() o for msg in msgs: IMAPResponse.append("* %d EXPUNGE" % msg[0]) o IMAPResponse.append("OK") mailbox.expunge(): o msg_num = 0 o expunged = [] o self.resync() o self.big_lock.acquire() o for msg in self.msgs_db.keys() msg += 1 if msg has \Deleted flag: rm msg-file del msgs_db[msg] expunged.append((msg, msg-file, uid)) msg -= 1 o if not self.sequences_uptodate() update sequences for each sequence if any msg-file in expunged is in the sequence: remove it changed = True if changed: re-write sequences file update sequences mtime o self.big_lock.release() o self.notify() ? o return expunged 6.4.4. SEARCH Command This one is a doozy. I am wondering if I can generate a lambda form of the function that will match the criteria.. probably not. I probably need to make some sort of interpreter of the serarch criteria that is passed each message in turn. We will start out only supporting US-ASCII. o parse the search criteria in to a useable structure o if parse fails: IMAPResponse.append("BAD Bad search criteria") return o res = self.selected.search(search criteria) o search_result = "" o for msg in res: search_result = "%s %d" % (search_result, msg[0]) o IMAPResponse.append("* SEARCH %s" % search_result) o IMAPResponse.append("OK") o return search.parse(search_criteria_string) - creates a new Search object search.match(message mapping) mailbox.search(search_object): o self.resync() o self.resync_sequences() o self.big_lock() o Actually, what I will do is create a IMAPSearch() class that you pass the search criteria from the client when you instantiate it. This will hvae a "match()" method that you can pass the file name of a message. We will do this by creating a compiled code object that is a expression and can be passed to the 'eval()' builtin function. The expression will be invoking builtin functions and functions that are provided by the IMAPSearch module. expression generator: result_expr = "" # We go through our search expression string chipping off parts of it # until we have chipped all of it. # parse_search_expr( search_expr, first_or_clause = False ): if len(search) < 1: raise "woo, bad syntax" if search_expr.startswith("ALL"): return "True" if (len(result_expr) > 0): result_expr += " and True" else: result_expr = "True" elif search_expr.startswith("ANSWERED"): elif search_expr.startswith("BCC "): elif search_expr.startswith("BEFORE "): elif search_expr.startswith("BODY "): elif search_expr.startswith("CC "): elif search_expor.startswith("DELETED"): elif search_expr.startswith("FLAGGED"): elif search_expr.startswith("FROM "): elif search_expr.startswith("KEYWORD "): elif search_expr.startswith("NEW"): elif search_expr.startswith("OLD"): elif search_expr.startswith("ON "): elif search_expr.startswith("RECENT"): elif search_expr.startswith("SEEN"): elif search_expr.startswith("SINCE "): elif search_expr.startswith("SUBJECT "): elif search_expr.startswith("TEXT "): elif search_expr.startswith("TO "): elif search_expr.startswith("UNANSWERED"): elif search_expr.startswith("UNDELETED"): elif search_expr.startswith("UNFLAGGED"): elif search_expr.startswith("UNKEYWORD "): elif search_expr.startswith("UNSEEN"): elif search_expr.startswith("DRAFT"): return '"\Draft" in flags' elif search_expr.startswith("HEADER "): elif search_expr.startswith("LARGER "): elif search_expr.startswith("NOT "): elif search_expr.startswith("OR "): elif search_expr.startswith("SENTBEFORE "): elif search_expr.startswith("SENTON "): elif search_expr.startswith("SENTSINCE "): elif search_expr.startswith("SMALLER "): elif search_expr.startswith("UID "): elif search_expr.startswith("UNDRAFT"): elif IMAPSearch.isset(search_expr (up to space)): elif search_expr.startswith("("): find matching ")" recurse( search_expr.between("("))) while len(remainder) > 0: (parsed, remainder) = parse_expr(remainder): if len(remainder) > 0: if remainder[0] == ' ': parsed += " and " remainder = remainder[1:] else: raise "bad search expression" 6.4.5. FETCH Command === 2006.09.26 We have actually done a lot of work on this now. We parse all possible IMAP messages. We actualy _process_ some IMAP messages now as well. We have added support for some of the additional rfc's. Right now we are having issues with how we really should and want to handle messages that are removed or moved from underneath the mh-imap server. The issue here is with how to properly detect this and send EXPUNGE messages to the client. We still have issues with the timing of sending EXPUNGE messages as there are some weird rules in IMAP4rev1 about when you can do this. I think we have those sussed right now. So we need to focus on re-thinking our resync() method a bit. The main thing is that we need to be able to maintain the imap message numbering rules when we have found messages have been deleted or moved around. This is that if you have a folder with 5 messages you know that the IMAP message numbering is: 1,2,3,4,5 What is more, if someone deletes message 2 and then 3 you will see expunges like this: EXPUNGE 2 EXPUNGE 2 ie: we have already renumbered the messages the instant one is removed, as far as what we tell the IMAP client. The same rules apply for when messages are added to the mail box. We have currently decided to make the MH message numbers correspond to the IMAP message numbers. This means that the mh-imap server will be fairly aggressively 'packing' MH folders. I can see this causing issues with users who move messages around in the mailbox frequently via mh command line tools _while_ the mh-imap server is running for their mailbox - as their tools, if they are not using sequences, may have out of date information. We could try to keep these synchronized separately - ie: the ordinal position of a message in an MH mailbox is its IMAP message number. At the time that did not seem like it would be possible. One issue with ordinal numbers is when we have a mailbox with like 8,000 messages in it, that may take a considerable amount of time to update even for trivial changes. Probably best for us to, again, enumerate what the cases are of what might happen to the underlying files through manipulations outside of the mh-imap-server. With each case we need to indicate what is the desired result that the mh-imap-server will send to the appropriate IMAP clients. If we can, sensibly, we should also describe what the mh-imap-server will do to its internal representations when it encounters this case. Finally we need to describe how this can be unequivocally detected (and differentiated from every other case.) Luckily the number of cases is not that large, but differentiation of detection between them is likely to be confusing. o Message removed from mailbox o first message removed o last message removed o message in middle removed o Existing message moved from one location in mailbox to another location o New Message added to mailbox o message added to front of mailbox o message added to end of mailbox o message inserted in to middle of mailbox o New message replacing existing message in mailbox -- should be detected as old message being deleted and then new message being added Now we already have a resync() method that will go through a mailbox === 2007.06.21 hm. re-think of resync. Here is the new gotcha. In response to certain messages we can not send EXPUNGE messages. The point being we are not supposed to change the understood mapping of IMAP message numbers to messagese while certain commands are running. Now, this does not really address multiple people accessing the mailbox at the same causing race conditions, but outside of what should be a very narrow window we can do a lot better of following the word of rfc2060 then we have up to this point. o We tag every actual message with a uid and uid_vv. o We can relate any mh message file to this uid because it is in the actual message. o I want to now maintain that the sorted position of an entry in the message db shelf is the imap number of that message o from the imap side when we operate, we opreate using the msg shelf in order to return a result to a user. if we are doing a search, we search the messages, and then lookup its entry via the uid in the msg shelf and use that to determine the IMAP message number to send to the client. o on initial open of a mailbox we can do the resync/pack such that the mh folder is packed. o on a final close of a mailbox we can do that also (but we will probably skip it because it will be done when it is opened.) o on an 'expunge', 'noop' or any command that is not a FETCH, STORE, or SEARCH. I think the key thing is that when doing those commands we keep our own mapping of IMAP messages that may not correspond to the messages in the mh folder. In the case where the underlying message is not there we return a generic failure message. A following sync should re-fix everything with respect to the client. o if the underlying MH folder has messages moved around and deleted this is okay, we are operating on them by uuid. We use our last known mapping of uuid to mh msg number to find them. If that fails then the folder has had messages