Debian packages as Graph Database

I have been playing around with Neo4j as graph database, and searching for a big dataset I decided to look at Debian packages (source and binary) from stable, testing, sid, and experimental, and represent all of that in a big graph database.

While this is far from ready, the following entities and relations a represented:

  • source packages, unversioned and versioned
  • binary packages, unversioned and versioned
  • maintainers
  • all dependencies, including alternatives and versioned dependencies
  • relations like maintains, builds, etc
  • suites (stable, testing, sid, experimental)

The graph currently has 220618 nodes and 782323 edges, and my first trial to import this into the database was by generating a long cypher statement, and then throwing that at cypher-shell. Well, that was not the best idea. After 24h I stopped the process and rewrote the generation script to generate csv files. Using neo4j-import the same amount of data was imported in 5secs (!!!).

What I would like to get in the future is the whole package history as well, and maybe also include all the bugs into the database … if I only would have easily accessible and parseable information about these items (Debian Q&A maybe?). If you have any suggestions, please let me know.

More to come, stay tuned.

Email this to someonePrint this pageShare on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInFlattr the author

3 thoughts on “Debian packages as Graph Database”

Leave a comment

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>