Crawl4neo gets tests and GitHub


Our Crawl4neo project is moving along.  We have our first tests and they have clean demonstrations of Mockito and Powermock.  These are not complete tutorials on the subject, but provide a clean demonstration of how to wire things up.

We have also created a project on GitHub where you can see the evolution of the project.  Each feature branch is roughly one part in the tutorial.  So if you want to download the project up to a point, you can grab a branch.

What’s Next?

In the next installment we will take real advantage of Java 8’s lambda expressions.  We’ll use Predicates to make the crawler configurable.  That will get our project to a solid working base for crawling.

Then it will be time to introduce Neo4J.  Our goal here is to come up with a graph model (vs. a relational model) that represents an entire website.  We’ll go so far as to put every single page into the database and fully ‘normalize’ it in a way the uses very little storage (you’ll have to wait and see that magic.)

Then we can move on to analysis and presentation of the data.  Stay tuned!



About Author


Steve works with successful software startups and tech companies throughout Silicon Valley. Most recently he has been developing content migration tools for large websites. He has a deep passion for all things software engineering, from design concepts, to team management, to final delivery.

Comments are closed.