Our Crawl4neo project is moving along. We have our first tests and they have clean demonstrations of Mockito and Powermock. These are not complete tutorials on the subject, but provide a clean demonstration of how to wire things up.
We have also created a project on GitHub where you can see the evolution of the project. Each feature branch is roughly one part in the tutorial. So if you want to download the project up to a point, you can grab a branch.
What’s Next?
In the next installment we will take real advantage of Java 8’s lambda expressions. We’ll use Predicates to make the crawler configurable. That will get our project to a solid working base for crawling.
Then it will be time to introduce Neo4J. Our goal here is to come up with a graph model (vs. a relational model) that represents an entire website. We’ll go so far as to put every single page into the database and fully ‘normalize’ it in a way the uses very little storage (you’ll have to wait and see that magic.)
Then we can move on to analysis and presentation of the data. Stay tuned!