Sunday, September 30, 2007

12. WebSpiderReview

For this assignment I reviewed Michal Zielinski's Webspider program from http://michalzics413.blogspot.com/2007/09/11-webspider-revided.html


1. Installation Review

Installation was straightforward. Since everyone used the same base package with xml files pre-given, the process of running Ant builds was the same as doing so for my own package. JUnit, Checkstyle, PMD, and FindBugs tests were present - all ran successfully. Also, Emma did not report 100% coverage - this will be discussed in further detail in section 3, Test Cases.

2. Code format and conventions review

No errors were found from the automated check tools.

Manual code violations:

FileLinesViolationComments
WebSpiderExample.javaN/A?Class name should reflect what the class does. ('example' no longer should be in class name)
WebSpiderExample.java16, 186EJS-35Use descriptive Javadoc comments.
WebSpiderExample.java75, 118EJS-9Use meaningful variable names. (even when they appear as function parameters)
WebSpiderExample.java82, 83, 140, *ICS-SE-Java-9Use iterators.


3. Test Case Review

Black Box Testing
The following equivalence classes were considered and tested:

Regular URL - the program performs as expected. A JUnit test reflects this.
URL with no links - Also produces expected results (0 links). No JUnit test is present for this case.
URL that isn't a HTML file - threw a NotHTML exception.
404 link - HttpNotFoundException thrown.
non-HTTP link - At first glance it is OK since it checks for the http at the beginning of the URL. However, "httpa://" ends up executing with a MalformedURLException.
Invalid command line parameters - If the fourth parameter isn't "-logging", it shows the example use screen, as well as the results (which ends up being zero). Aside from that, it does pass the following parameter tests:

- ensuring first parameter is either -totallinks or -mostpopular
- ensuring second parameter is a http URL
- ensuring third parameter is a nonnegative number

White Box Testing
Code coverage is not 100%. The following is a list of the code that is not executed in the test cases:

- Case where MostPopularPage has all pages with 0 links (line 100). This can be checked by using 0 as the depth factor for a URL.

- Exception catching statements at 169-174. This is because traversePages has its own catch block, meaning they will get caught in the recursive call rather than the calling function. These lines could possibly be removed.

- Main function accepting an invalid URL (httpnot://foo.bar).


Break da buggah:
Any case mentioned above where an exception gets thrown is a case where the program crashed due to unexpected behavior. These cases again are:

- invalid URL that starts with "http"
- 404 link
- non-html link
- fourth parameter that is not '-logging'

Conclusions:

In conclusion, I learned a lot of the issues that were wrong with my code by reading through Michal's implementation. One of the things about doing testing is that it is difficult to hold a scalpel to your own work, but it is easy to look for faults in other people's work. By doing so, it helps you to think about things that you did on your own version.

Reading through Michal's code started to make me wonder if I implemented my own WebSpider implementation correctly - I interpreted the number parameter of the program as the "maximum number of pages to visit" rather than the "maximum depth from starting page" parameter. I am unsure which is correct, but I can start to understand the advantage of groups - having more perspectives helps to reduce possible errors like this.

The other difficulty comes in devising test cases. Personally I find it a little awkward to devise a set of test cases prior to the implementation of a system, and with pressure on finishing, it reduces the emphasis on testing, which shouldn't be neglected. There are many test cases I would've wanted to put into my code but didn't, mainly due to being exhausted from coding. It does seem like there may be an advantage to writing the test cases before the actual code in this case.

I did enjoy reading through another person's code, however, and it did rattle my brain over ideas on improvements on my own code.

Monday, September 24, 2007

11. WebSpider

Link

Note this does not include the jar file even if the script does build it. I figure it can be generated, so I excluded it.

This assignment seemed really simple in the beginning - just build a hash table and store the counts in there. But there were many hitches along the way.

The first involved the build.xml files, it seems like a lot of editing had to be done in order to get them to work the way I wanted them to. I don't think Dr. Johnson did this on purpose, but then again maybe he did to give us an interesting exercise to work with. The issues I can recall offhand were:
  • base build.xml needed to be updated to include junit's jar file
  • dist.build.xml needed to include my name in the generated filename
  • emma.build.xml needed to be adjusted for the paths for generating the html (theres no way I want to read the xml every time I want to see the coverage reports)
  • javadoc.build.xml needed the overview.html path set (it initially said stack)
All in all, transitioning the system over from stack probably wasn't as smooth as Dr. Johnson wanted ^^

On to the actual work now...

Part 0: Package creation + tests

It was initially painless... until adding the Junit jar became an issue. So that took me a while to realize.

Part 1: Totallinks implementation
Part 2: Mostpopular implementation

I realized parts 1 and 2 were very similar, so I decided to try to make their implementations similar, with the final result differing. However...

Httpunit does not like parsing Javascript. Many links do contain it; however, Kevin English did have a nice way to disable the exceptions from being thrown. (I just caught them all) But that certainly complicated matters. The only thing is I am unsure if it still processes the pages, or it just suppresses the exceptions.

Also, I initially used the data structure of a HashMap. Then I realized I would need to traverse the structure, so I changed it to a TreeMap. After that, though, I realized I would want a queue to determine which value would be next, so I actually ended up implementing two separate data structures - a queue for which URL would be next to process, and a TreeMap to keep track of the counts.

Using parent classes made changing data structures easy, luckily.

Part 3: Logging

Easy. I just use System.out.println.... I can't? Then I notice the line in the assignment:

You can implement logging using System.out.println, but that's lame.

Well, there goes that idea. However, the HackystatLogger class seemed to fit the description very well of what I needed to do for this task, so I just used that class. (and attributed it in the JavaDoc). Then I built a WebSpiderLog on top of that, which becomes enabled if logging is enabled at the command line.

Part 4: Extra Credit

It's supposed to be a separate entry, but I didn't attempt it.

Conclusions

What an annoying assignment. I initially thought it would be a simple task, and then found all these wonderful humps that slowed down progress. In the end... its a good thing I started moderately early on it (around Wednesday), because otherwise I'd be burning the midnight oil until early hours tonight.

Sunday, September 16, 2007

10. Stack

Summary: I think this was our first 'serious' project in this class. Luckily a lot of the difficulties were overcome since we had everything pre-packaged for us in a handy ZIP file.

Which of the five tasks were you able to complete successfully? All.


What problems did you run into, and how did you resolve them. What is your impression of Ant?

I'll cover this for each part.

Task 1: Installation - A recurring theme of these open source tools is that they don't have the usual setup procedures that I'm used to. Because of this, that is part of the reason I'm starting to see as to why these tools aren't mainstream, at least not yet.

Aside from that, the process was generally straightforward - download the package, unzip to a directory, update environment variable, test if its installed correctly with the build.xml in the stack package. The only other difficulty involved PMD version available, since the tools page stated 3.7, whilst the PMD sourceforge page only had 4.0 and 3.9 available to download.

Task 2: Creating the project - Straightforward.

Task 3: Fixing verify.build.xml errors - Nothing too difficult. It did scare me for a while about how when I would save files it would sometimes state that certain classes couldn't be found - I initially thought this was because I renamed an existing project and it would cause me to change class names to be the same as the project. Luckily that isn't what needed to be done - I just had to remake the project.

Also, I was surprised by some of the things pointed out by the tools - I would never think of using the cached instances rather than creating a new Integer object for values from -128 to 127.

Task 4: Writing an Ant script for JavaNCSS - This is the portion I had the most difficult with. I think this is largely due to me not knowing how Java too well - and how the tools (jar, javac, ant, etc) look for their files. The Ant script documentation was helpful; however; the following errors came up at one point when 'testing' the script:

- Main class javancss.JavancssAntTask could not be found

fixed by adding javancss.jar to classpath; then a similar error came up for ccl.jar. Eventually I just put all the JAR files in the classpath separating them with semicolons; Brian did suggest a better way using wildcard modifiers.

- Parse error

... mainly due to not using the correct syntax for the source input files.

3 days later and 4 headaches later, it works.

Task 5: Code Coverage - Was easy compared to task 4, although I didn't notice some tasks involved ClearStack rather than Stack, and so I ended up writing two new test cases.


Are there standards that we are using that you don't understand the motivation for?

The only standard I find questionable is the one involving the asterisk in package declarations - if we are using the entire package, why is it wrong to do so?


What is the difference between SCLC and JavaNCSS, and which counting tool do you prefer?

SCLC looks more general (e.g. it supports multiple types of files), while JavaNCSS is specifically geared towards Java. I think this is a question of which you'd prefer between a specialized knife or a swiss army knife. I personally would take the specialized knife in this case; however, for my own everyday use (e.g. outside of specific Java-specific programming) I would probably prefer SCLC, since I rarely do Java development outside of this class.


My code: http://www2.hawaii.edu/~wongandr/stack-5.0.916.zip

Saturday, September 8, 2007

CodeRuler Tournament ^^

This isn't a required entry for the class, but I thought it'd be fun to run a CodeRuler tournament including my CodeRuler, especially since my submission didn't show up in class.

My basic strategy didn't change too much - instantly charge towards the nearest castle, and pray you don't get wiped out. Peasants use a 1-square lookahead to determine the best square to move to, prioritizing enemy land first, then unclaimed. It works great against the bots, but aside from that... thats what I'm about to find out ^^

So I decided to run three tournaments:

1) Run 1 vs. 1 against each other person's bot (7 total: Ben, Shaoxuan, Kevin, Randy, Paul, Brian, and Laura/Lisa who I'll refer to as LL for the rest of this entry). Rank based on ratio of final scores (my score divided by opponent's score)

2) Using the seedings from tournament 1, run a 4-way free-for-all. Winner is removed, next person is put in.

3) Same as #2, but 6-way.


Results, with my score always listed first:

Tournament 1: (with commentary as I recall)

LL: 36-817. This ruler did the best against mine, interestingly. I'm not sure what it is - although it might be because it always has at least one castle producing knights, and my strategy is probably weak to that.

Ben: 42-913. I pondered doing as many lookaheads in my own code, but didn't want my ruler to time out.

Shaoxuan: 59-902. IMO this strategy was the most similar to mine, although I think mine wasn't as well-refined.

Kevin: 452-552. This was actually a pretty close match, and I tried replaying the match to remember what it was that gave you the edge here, but I kept winning afterwards :-(

Randy: 608-296. I honestly thought this ruler would give me the most trouble since the troops stayed back on defense for the first half of the game - and it was true, but at least my castle could keep producing knights. And once the troops started sweeping, my knights would go in for the capture. (If you stalled till 80% of the battle, though, it might've been different)

Paul: 799-82. My knights met a fair amount of opposition before capturing the castle.

Brian: 806-74. Same idea as Kevin's match, except now every time I try to figure out how I won, I keep losing ^^


Tournament 2: (4-way free-for-alls)

Seeded order from 1: LL, Ben, Shaoxuan, Kevin, Andrew, Randy, Paul, Brian
Results: Shaoxuan, LL, Andrew (!), Ben, Brian, Kevin, Paul, Randy

Tournament 3: (6 way free-for-alls)

Result: Shaoxuan, LL, Andrew (!!), Ben, Brian, Paul, Kevin, Randy

I honestly don't know why my ruler does better in free-for-all matches rather than 1 vs 1, but thats really... interesting.


I'll finish up with my overall impression of each ruler:

Shaoxuan: ... wow. Your ruler is just... yea. I can't describe it with words.
LL: I was amazed how it managed to wipe me out so quickly. (I think it was the knights)
Ben: Your ruler is a good example of move look-ahead. I should've done more of that ^^
Brian: Basically its a smarter version of split-up ruler. And... its actually pretty smart ^^
Kevin: I liked the idea of having various 'tactics' (as assigned by batallion) assigned to groups of knights. Its like those real-time strategy games that I could never seem to get good at.
Paul: Your strategy is nice and simple.
Randy: Well... So maybe it doesn't win matches, but I think this is the most aesthetically pleasing ruler of them all. The 'sweep' is probably one of the funniest things to watch in this game, and winning isn't necessarily everything...

Wednesday, September 5, 2007

08. CodeRulerRedux

URL: http://www2.hawaii.edu/~wongandr/coderulerredux-wongandr.zip

Revisions made: Kevin English's code review was applied. The most notable of these changes is changing Vector to ArrayList - amazingly, this change was nearly seamless, aside from removing a few typecastings (which is good).

Strategy changes: Nothing significant. The criteria for deciding to choose peasants or knights changed slightly - now knights will always be produced until at least 20 are in play under my control. This is because it is generally more useful to own all the castles rather than gain more land, plus peasants in the beginning have a fairly good survival rate (unless being hunted by multiple mobs of knights).

It was considered that the criteria for which castle to target would be changed - this was tossed out for the following reason: Suppose there is castle A and B, and B has more knights guarding it. Then A should be targeted. But suppose that right when we're about to capture A, a knight comes back to A and tons of knights flee from B. Then B is the better target, and all the knights will flock to B instead.

Thus, it still targets the nearest castle.


Other notes: I really like how Eclipse allows you to do variable renaming (e.g. refactoring) with minimal pain - no other IDE I have used has such a useful feature (that I could find, that is).