Note this does not include the jar file even if the script does build it. I figure it can be generated, so I excluded it.
This assignment seemed really simple in the beginning - just build a hash table and store the counts in there. But there were many hitches along the way.
The first involved the build.xml files, it seems like a lot of editing had to be done in order to get them to work the way I wanted them to. I don't think Dr. Johnson did this on purpose, but then again maybe he did to give us an interesting exercise to work with. The issues I can recall offhand were:
- base build.xml needed to be updated to include junit's jar file
- dist.build.xml needed to include my name in the generated filename
- emma.build.xml needed to be adjusted for the paths for generating the html (theres no way I want to read the xml every time I want to see the coverage reports)
- javadoc.build.xml needed the overview.html path set (it initially said stack)
On to the actual work now...
Part 0: Package creation + tests
It was initially painless... until adding the Junit jar became an issue. So that took me a while to realize.
Part 1: Totallinks implementation
Part 2: Mostpopular implementation
I realized parts 1 and 2 were very similar, so I decided to try to make their implementations similar, with the final result differing. However...
Httpunit does not like parsing Javascript. Many links do contain it; however, Kevin English did have a nice way to disable the exceptions from being thrown. (I just caught them all) But that certainly complicated matters. The only thing is I am unsure if it still processes the pages, or it just suppresses the exceptions.
Also, I initially used the data structure of a HashMap. Then I realized I would need to traverse the structure, so I changed it to a TreeMap. After that, though, I realized I would want a queue to determine which value would be next, so I actually ended up implementing two separate data structures - a queue for which URL would be next to process, and a TreeMap to keep track of the counts.
Using parent classes made changing data structures easy, luckily.
Part 3: Logging
Easy. I just use System.out.println.... I can't? Then I notice the line in the assignment:
You can implement logging using System.out.println, but that's lame.
Well, there goes that idea. However, the HackystatLogger class seemed to fit the description very well of what I needed to do for this task, so I just used that class. (and attributed it in the JavaDoc). Then I built a WebSpiderLog on top of that, which becomes enabled if logging is enabled at the command line.
Part 4: Extra Credit
It's supposed to be a separate entry, but I didn't attempt it.
Conclusions
What an annoying assignment. I initially thought it would be a simple task, and then found all these wonderful humps that slowed down progress. In the end... its a good thing I started moderately early on it (around Wednesday), because otherwise I'd be burning the midnight oil until early hours tonight.
No comments:
Post a Comment