Friday, June 17, 2011

Working with Sci2 and Cytoscape

Weighted Bipartite Network -- Thickness of line indicates relative number of contributions
This peculiar, fish-like graph is the result of another morning's efforts to get my initial data into a meaningful and interpretable form. Using the Sci2 Tool (developed at Indiana University for use in analyzing scientific data and co-author networks in scientific literature), I loaded a .CSV file containing my bipartite network data and edge attributes (i.e., weight = # of contributions). I then extracted a bipartite network from this file and used the Cytoscape visualization tool to begin looking at the results.

While the workflow just described might sound straightforward enough, I did run into trouble loading the edge attribute data into Sci2. Failing to find a solution, I ended up entering the data manually in the Cytoscape application, a process made somewhat easier by the fact that I could sort the edges into a list roughly similar to the list contained in my original .CSV file.

Once the data was entered, I could begin the fun part of manipulating the visualization into something that was readable. So, for example, I set the poet and journal nodes to be different shapes and colors. The yellow triangles on the outer rim represent the journals and the green squares just to the right of center represent the 25 poets who makeup my initial database. I then set the thickness of the lines to reflect their weight, with the thinnest lines representing a value of 1 and the thickest representing a value of 48. I also split the journals so as to see more readily which of them had the most contributors in common. Besides 「学校」, of course, these journals included 「詩神」, 「太平洋詩人」, 「暦程」, and 「弾道」.

As I did these manipulations, a couple of questions and ideas arose as to what we might be able to learn and display just at the level of visualization. Would it be interesting, for example, to show just the journals that had contributions from a majority of the poets involved? Might this reveal some of the underlying groupings in the poetic field once we compared it with poets involved in another modernist journal from the period? Could we add a temporal dimension by displaying only those journals that were in publication during some smaller unit of time (e.g., one year)? What would the graph look like if we set 「学校」 at the center, set the poets at the next level up, and then had the rest of the journals form an outer ring or a line at the top? Might this give a better sense for how weakly or strongly connected these poets were in relation to other journals? These are some of the things I would like to continue to work on next week. I think I will also return to the UCINET program to see if I can extract the poet-to-poet and journal-to-journal graphs that will hopefully prove most conducive to meaningful network analysis (i.e., measuring for betweeness, centrality, etc.)

My apologies for the technical and overly detailed nature of these posts. They do not excite in the way that meta-level analysis hopefully will, but it's only by slogging through the mud that we can reach the other shore. 

Thursday, June 16, 2011

Back to Work

After a long hiatus, I've finally been able to find time to get back into my SNA work. I completed my preliminary data entry last month, which involved inputting information for all poets affiliated with the poetry journal "Gakko" and the number of contributions each made to modernist poetry journals between 1920 and 1944. Today, my primary goal was to input the data into some SNA analysis tools and see if I could produce a two-mode graph in which links are weighted according to number of contributions made to a particular journal. I was able to do this quite easily in ORA, which then allowed me to produce a predictably messy graph. But what I was really keen to try out was to load the weighted data into UCINET so as to pull apart the affiliation network and see how the poets are connected to one another through the journals. This proved more difficult than I had thought, as ORA couldn't output to UCINET format. Without this step, I will likely have to input the weights manually in UCINET and go from there.

Beyond such technical details, the broader problem that I need to begin addressing is what I'm going to be able to do at the analytical stage that will reveal anything interesting. Part of the problem is learning what sort of analytical algorithms are possible and sensible given the structure of the underlying data. The other part of the problem is coming up with the right questions to ask about the data. Is it centrality measures that I am most interested in? Am I looking for clusters within the network? Do I want to create a kind of network DNA fingerprint for each poet to see how the participation of each does or does not align with others? I guess what I need to really think about are the questions that can't be answered when one has just individual data. Or even if the results do point to realities that would seem obvious to anyone familiar with the poet, the point is to look at the results in aggregate and consider what they might be able to tell us about the modern field of poetic production seen from a broader scale.