Monday, August 1, 2011

Rethinking the Database

After a meeting with our local humanities computing director, I decided that it would probably be worthwhile to begin recording the individual submission data for the poems in my database. So rather than simply count the number of instances in a particular journal, the idea would be to track every single submission as a separate and distinct edge (or link). Adopting such an approach should also make it easier to create dynamic network visualizations that represent change over time. 

With this in mind, today I began to work with a very small dataset to test how the workflow might look when going from the kind of database described above (with each poem constituting a link) to any of the SNA analysis and visualization tools. I had little trouble loading the edgelist into UCINET or ORA, and both programs automatically translated the multiple links between a poet and a journal into a link weight. The problem, however, is that both programs did this by reducing the connection to a single line, which then rendered irrelevant the data for each poem (i.e., title, date). I can imagine adding this information as an edge attribute in a program like Cytoscape, but given the editing limitations, this could mean a lot of labor intensive input. For while I can easily bring up a list of edges and add an attribute, I can't seem to sort this edge in the same way as my database. And I'm still stuck with the problem of having all submissions to one journal condensed into a single edge. Is there a way to prevent this from happening?

Update (8/26): Soon after writing this post I discovered that Cytoscape can be used to import the edgelist from a .CSV file, and that rather than condensing the edges into a single weighted line, it preserves each and every one as a distinct link. This is extremely useful for me, though the graph itself (at least for the total time span) is obviously quite messy.

No comments:

Post a Comment