Step 3: Initial design previous   contents   next
next 

After simplifying the grammar from RFC 2396 without changing the semantics, we get the following grammar:

URL               = [ absoluteURL | relativeURL ] [ "#" fragment ]
absoluteURL       = protocol ":" ( hier_part | opaque_part )
relativeURL       = ( net_path | abs_path | rel_path ) [ "?" query ]
hier_part         = ( net_path | abs_path ) [ "?" query ]
opaque_part (s)   = urichar_no_slash *urichar
 
net_path          = "//" authority [ abs_path ]
abs_path (s)      = "/"  path_segments
rel_path (s)      = 1*relsegchar [ abs_path ]
   
protocol (s)      = alpha *( alphanum | "+" | "-" | "." )
authority (s)     = server | reg_name
reg_name (s)      = 1*( pathchar | ";" )
server            = [ [ userinfo "@" ] host [ ":" port ] ]
userinfo          = user [ ":" password ]
user (s)          = *relsegchar
password (s)      = *relsegchar
   
host (s)          = hostname | IPv4address
hostname          = *( domainlabel "." ) toplabel [ "." ]
domainlabel       = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel          = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address       = 1*digit "." 1*digit "." 1*digit "." 1*digit
port (s)          = *digit
   
path_segments     = segment   *( "/" segment )
segment           = *pathchar *( ";" *pathchar )
   
query (s)         = *urichar
fragment (s)      = *urichar
   
escaped           = "%" hex hex
pathchar          = relsegchar | ":"
relsegchar        = unreserved | escaped | "@" | "&" | "=" | "+" | "$" | ","
urichar           = unreserved | escaped | reserved
urichar_no_slash  = unreserved | escaped | reserved_no_slash
reserved          = reserved_no_slash | "/"
reserved_no_slash = ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
unreserved        = alphanum | mark
mark              = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
alphanum          = alpha | digit
alpha             = ( "a" ... "z" ) | ( "A" ... "Z" )
hex               = ( "a" ... "f" ) | ( "A" ... "F" ) | digit
digit             = "0" ... "9"

All grammar variables which should be available to the customer have been marked with '(s)'. The first idea to implement a parser by creating a method for each left-hand side (LHS) can be refined now. The last block of rules describes sets of characters. These can be separated from the remaining rules where the right-hand side (RHS) includes variables which are not character classes or have a LHS whose value should be stored.
From our knowledge collected up to now we can plan the following classes:

URL.javacontains all parts of an URL broken down into it's parts.
Parser.javarealizes all rules with non-character-class variables on their RHS.
URLCharacter.javacontains methods to distinguish the character classes. This class realizes most work of the lexical analysis of URL strings.
InvalidURLExceptionis beeing thrown if the parser fails to successfully parse an URL string.

With the grammar and the names of classes we want to use, we have taken the initial steps of the design. Now we move on and create a Together project and initial UML diagrams. If you prefer to use JUnit and JUnitX manually, you can follow the next sections without actually running Together. Snapshots of all files for every tutorial step are provided including all necessary standalone build files.

  previous   contents   next
next 

© 2001 A. Heilwagen