public final class EmailAddressParser extends Object
Regarding the parameter extractCfwsPersonalNames
:
This criteria controls the behavior of getInternetAddress and extractHeaderAddresses. If included, it allows the not-totally-kosher-but-happens-in-the-real-world practice of:
<[email protected]> (Bob Smith)
In this case, "Bob Smith" is not techinically the personal name, just a comment. If this is included, the methods will convert this into: Bob Smith <[email protected]>
This also happens somewhat more often and appropriately with [email protected] (Mail Delivery System)
If a personal name appears to the left and CFWS appears to the right of an address, the methods will favor the personal name to the left. If the methods need to use the CFWS following the address, they will take the first comment token they find.
e.g.:
"bob smith" <[email protected]> (Bobby)
yields personal name "bob smith"
<[email protected]> (Bobby)
yields personal name "Bobby"
[email protected] (Bobby)
yields personal name "Bobby"
[email protected] (Bob) (Smith)
yields personal name "Bob"
Modifier and Type | Method and Description |
---|---|
static @Nullable String |
cleanupPersonalString(@Nullable String string,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Given a string, if the string is a quoted string (without CFWS around it, although it will be trimmed) then remove the bounding quotations and then
unescape it.
|
static @NotNull javax.mail.internet.InternetAddress[] |
extractHeaderAddresses(@Nullable String header_txt,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
Given the value of a header, like the From:, extract valid 2822 addresses from it and place them in an array.
|
static @Nullable String[] |
getAddressParts(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
See getInternetAddress; does the same thing but returns the constituent parts of the address in a three-element array (or null if the address is
invalid).
|
static @Nullable String |
getDomain(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
See getInternetAddress; does the same thing but returns the domain part in string form (essentially, the part to the right of the @).
|
static @Nullable String |
getFirstComment(@Nullable String text,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Given a string, extract the first matched comment token as defined in 2822, trimmed; return null on all errors or non-findings
|
static @Nullable javax.mail.internet.InternetAddress |
getInternetAddress(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
Given a 2822-valid single address string, give us an InternetAddress object holding that address, otherwise returns null.
|
static @Nullable String |
getLocalPart(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
See getInternetAddress; does the same thing but returns the local part that would have been returned from getInternetAddress() in String form
(essentially, the part to the left of the @).
|
static @NotNull String[] |
getMatcherParts(@NotNull Matcher m,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
|
static @Nullable String |
getPersonalName(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
See getInternetAddress; does the same thing but returns the personal name that would have been returned from getInternetAddress() in String form.
|
static @Nullable String |
getReturnPathAddress(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
Pull out the cleaned-up return path address.
|
static @Nullable String |
getReturnPathBracketContents(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria)
WARNING: You may want to use getReturnPathAddress() instead if you're looking for a clean version of the return path without CFWS, etc.
|
static boolean |
isValidAddressList(@NotNull String header_txt,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Tells us if a header line is valid, i.e.
|
static boolean |
isValidMailboxList(@NotNull String header_txt,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Tells us if a header line is valid, i.e.
|
static boolean |
isValidReturnPath(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Tells us if the email represents a valid return path header string.
|
static @Nullable javax.mail.internet.InternetAddress |
pullFromGroups(@NotNull Matcher m,
@NotNull EnumSet<EmailAddressCriteria> criteria,
boolean extractCfwsPersonalNames)
Using knowledge of the group-ID numbers (see comments at top) pull the data relevant to us from an already-successfully-matched matcher.
|
static @Nullable String |
removeAnyBounding(char s,
char e,
@Nullable String str)
If the string starts and ends with s and e, remove them, otherwise return the string as it was passed in.
|
public static boolean isValidReturnPath(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
NOTE: legit forms like <(comment here)> will return true.
You can check isValidReturnPath(), and if it is true, and if getInternetAddress() returns null, you know you have a DSN, whether it be an empty return path or one with only CFWS inside the brackets (which is legit, as demonstated above). Note that you can also simply call getReturnPathAddress() to have that operation done for you.
Note that <""> is not a valid return-path.
@Nullable public static @Nullable String getReturnPathBracketContents(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
Pull whatever's inside the angle brackets out, without alteration or cleaning. This is more secure than a simple substring() since paths like:
<(my > path) >
...are legal return-paths and may throw a simpler parser off. However this method will return all CFWS (comments, whitespace) that may be between the brackets as well. So the example above will return:
(my > path)_
(where the _ is the
trailing space from the original string)
@Nullable public static @Nullable String getReturnPathAddress(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
extractCfwsPersonalNames
- See EmailAddressParser
public static boolean isValidMailboxList(@NotNull @NotNull String header_txt, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
This method seems quick enough so far, but I'm not totally convinced it couldn't be slow given a complicated near-miss string. You may just want to call extractHeaderAddresses() instead, unless you must confirm that the format is perfect. I think that in 99.9999% of real-world cases this method will work fine.
isValidAddressList(String, EnumSet)
public static boolean isValidAddressList(@NotNull @NotNull String header_txt, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
This method seems quick enough so far, but I'm not totally convinced it couldn't be slow given a complicated near-miss string. You may just want to call extractHeaderAddresses() instead, unless you must confirm that the format is perfect. I think that in 99.9999% of real-world cases this method will work fine and quickly enough. Let me know what your testing reveals.
isValidMailboxList(String, EnumSet)
@Nullable public static @Nullable javax.mail.internet.InternetAddress getInternetAddress(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
If your String is an email header, you should probably use extractHeaderAddresses instead, since most headers can have multiple addresses in them. (see that method for more info.) This method will indeed fail if you use it on a header line with more than one address.
Exception: You CAN and should use this for the Sender header, and probably you want to use it for the X-Original-To as well.
Another exception: You can use this for the Return-Path, but if you want to know that a Return-Path is valid and you want to extract it, you will have to call both this method and isValidReturnPath; this operation can be done for you by simply calling getReturnPathAddress() instead of this method. In terms of this method's application to the return-path, note that the common valid Return-Path value <> will return null. So will the illegitimate "" or legitimate empty-string, but other illegitimate Return-Paths like
"hi" <[email protected]>
will return an address, so the moral is that you may want to check isValidReturnPath() first, if you care. This method is useful if you trust the return path and want to extract a clean address from it without CFWS (getReturnPathBracketContents() will return any CFWS), or if you want to determine if a validated return path actually contains an address in it and isn't just empty or full of CFWS. Except for empty return paths (those lacking an address) the Return-Path specification is a subset of valid 2822 addresses, so this method will work on all non-empty return-paths, failing only on the empty ones.
In general for this method, note: although this method does not use InternetAddress to parse/extract the information, it does ensure that InternetAddress
can use the results (i.e. that there are no encoding issues), but note that an InternetAddress object can hold (and use) values for the address which it
could not have parsed itself. Thus, it's possible that for InternetAddress addr, which came as the result of this method, the following may throw an
exception or may silently fail:
InternetAddress addr2 = InternetAddress.parse(addr.toString());
The InternetAddress objects returned by this method will not do any decoding of RFC-2047 encoded personal names. See the documentation for this overall class (above) for more.
Again, all other uses of that addr object should work OK. It is recommended that if you are using this class that you never create an InternetAddress object using InternetAddress's own constructors or parsing methods; rather, retrieve them through this class. Perhaps the addr.clone() would work OK, though.
The personal name will include any and all phrase token(s) to the left of the address, if they exist, and the string will be trim()'ed, but note that InternetAddress, when generating the getPersonal() result or the toString() result, if it encounters any quotes or backslashes in the personal name String, will put the entire thing in a big quoted-escaped chunk.
This will do some smart unescaping to prevent that from happening unnecessarily; specifically, if there are unecessary quotes around a personal name, it will remove them. E.g.
"Bob" <[email protected]>
becomes:
Bob <[email protected]>
(apologies to [email protected] for everything i've done to him)
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable String[] getAddressParts(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
This may be useful because even with cleaned-up address extracted with this class the parsing to achieve this is not trivial.
To actually use these values in an email, you should construct an InternetAddress object (or equivalent) which can handle the various quoting, adding of the angle brackets around the address, etc., necessary for presenting the whole address.
To construct the email address, you can safely use:
result[1] + "@" + result[2]
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable String getPersonalName(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
The Strings returned by this method will not reflect any decoding of RFC-2047 encoded personal names. See the documentation for this overall class (above) for more.
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable String getLocalPart(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable String getDomain(@Nullable @Nullable String email, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
extractCfwsPersonalNames
- See EmailAddressParser
@NotNull public static @NotNull javax.mail.internet.InternetAddress[] extractHeaderAddresses(@Nullable @Nullable String header_txt, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
This method can handle group addresses, but it does not preseve the group name or the structure of any groups; rather it flattens them all into the same array. You can call this method on the From or any other header that uses the mailbox-list form (which doesn't use groups), or you can call it on the To, Cc, Bcc, or Reply-To or any other header which uses the address-list format which might have groups in there. This method doesn't enforce any group structure syntax either. If you care to test for 2822 validity of a list of addresses (including group format), use the appropriate method. This will dependably extract addresses from a valid list. If the list is invalid, it may extract them anyway, or it may fail somewhere along the line.
You should not use this method on the Return-Path header; instead use getInternetAddress() or getReturnPathAddress() (see that doc for info about Return-Path). However, you could use this on the Sender header if you didn't care to check it for validity, since single mailboxes are valid subsets of valid mailbox-lists and address-lists.
header_txt
- is text from whatever header (not including the header name and ": ". I don't think the String needs to be
unfolded, but i haven't tested that.
see getInternetAddress() for more info: this extracts the same way
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable javax.mail.internet.InternetAddress pullFromGroups(@NotNull @NotNull Matcher m, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
You could roll your own method that does what you care about.
This should work on the matcher for MAILBOX_LIST_PATTERN or MAILBOX_PATTERN, but only those. With some tweaking it could easily be adapted to some others.
May return null on encoding errors.
Also cleans up the address: tries to strip bounding quotes off of the local part without damaging it's parsability (by this class); if it can, do that; all other cases, don't.
e.g. "bob"@example.com becomes [email protected]
extractCfwsPersonalNames
- See EmailAddressParser
@NotNull public static @NotNull String[] getMatcherParts(@NotNull @NotNull Matcher m, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria, boolean extractCfwsPersonalNames)
extractCfwsPersonalNames
- See EmailAddressParser
@Nullable public static @Nullable String getFirstComment(@Nullable @Nullable String text, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
This is probably not super-useful. Included just in case.
Note for future improvement: if COMMENT_PATTERN could handle nested comments, then this should be able to as well, but if this method were to be used to find the CFWS personal name (see boolean option) then such a nested comment would probably not be the one you were looking for?
@Nullable public static @Nullable String cleanupPersonalString(@Nullable @Nullable String string, @NotNull @NotNull EnumSet<EmailAddressCriteria> criteria)
Copyright © 2016–2021. All rights reserved.