Multivariate Network Exploration and Presentation

In “Multivariate Network Exploration and Presentation,” authors Stef van den Elzen and Jarke J. van Wijk introduce an approach they call “Detail to Overview via Selections and Aggregations,” or DOSA. I was going to make fun of them for naming their approach after a delicious south Indian dish, but since they comment that their name “resonates with our aim to combine existing ingredients into a tasteful result,” I’ll have to just leave it there.

The DOSA approach – and now I am hungry – aims to allow a user to explore the complex interplay between network topology and node attributes. For example, in company email data, you may wish to simultaneously examine assortativity by gender and department over time. That is, you may need to consider both structure and multivariate data.

This is a non-trivial problem, and I particularly appreciated van den Elzen and van Wijk’s practical framing of why this is a problem:

“Multivariate networks are commonly visualized using node-link diagrams for structural analysis. However, node-link diagrams do not scale to large numbers of nodes and links and users regularly end up with hairball-like visualizations. The multivariate data associated with the nodes and links are encoded using visual variables like color, size, shape or small visualization glyphs. From the hairball-like visualizations no network exploration or analysis is possible and no insights are gained or even worse, false conclusions are drawn due to clutter and overdraw.”

YES. From my own experience, I can attest that this is a problem.

So what do we do about it?

The authors suggest a multi-pronged approach which allows non-expert users to select nodes and edges of interest, simultaneously see a detail and infographic-like overview, and to examine the aggregated attributes of a selection.

Overall, this approach looks really cool and very helpful. (The paper did win the “best paper” award at the IEEE Information Visualization 2014 Conference, so perhaps that shouldn’t be that surprising.) I was a little disappointed that I couldn’t find the GUI implementation of this approach online, though, which makes it a little hard to judge how useful the tool really is.

From their screenshots and online video, however, I find that while this is a really valiant effort to tackle a difficult problem, there is still more work to do in this area. The challenge with visualizing complex networks is indeed that they are complex, and while DOSA gives a user some control over how to filter and interact with this complexity, there is still a whole lot going on.

While I appreciate the inclusion of examples and use cases, I would have also liked to see a user design study evaluating how well their tool met their goal of providing a navigation and exploration tool for non-experts. I also think that the issues of scalability with respect to attributes and selection that they raise in the limitations section are important topics which, while reasonably beyond the scope of this paper, ought to be tackled in future work.

facebooktwittergoogle_plusredditlinkedintumblrmail

Multivariate Network Exploration and Presentation

In “Multivariate Network Exploration and Presentation,” authors Stef van den Elzen and Jarke J. van Wijk introduce an approach they call “Detail to Overview via Selections and Aggregations,” or DOSA. I was going to make fun of them for naming their approach after a delicious south Indian dish, but since they comment that their name “resonates with our aim to combine existing ingredients into a tasteful result,” I’ll have to just leave it there.

The DOSA approach – and now I am hungry – aims to allow a user to explore the complex interplay between network topology and node attributes. For example, in company email data, you may wish to simultaneously examine assortativity by gender and department over time. That is, you may need to consider both structure and multivariate data.

This is a non-trivial problem, and I particularly appreciated van den Elzen and van Wijk’s practical framing of why this is a problem:

“Multivariate networks are commonly visualized using node-link diagrams for structural analysis. However, node-link diagrams do not scale to large numbers of nodes and links and users regularly end up with hairball-like visualizations. The multivariate data associated with the nodes and links are encoded using visual variables like color, size, shape or small visualization glyphs. From the hairball-like visualizations no network exploration or analysis is possible and no insights are gained or even worse, false conclusions are drawn due to clutter and overdraw.”

YES. From my own experience, I can attest that this is a problem.

So what do we do about it?

The authors suggest a multi-pronged approach which allows non-expert users to select nodes and edges of interest, simultaneously see a detail and infographic-like overview, and to examine the aggregated attributes of a selection.

Overall, this approach looks really cool and very helpful. (The paper did win the “best paper” award at the IEEE Information Visualization 2014 Conference, so perhaps that shouldn’t be that surprising.) I was a little disappointed that I couldn’t find the GUI implementation of this approach online, though, which makes it a little hard to judge how useful the tool really is.

From their screenshots and online video, however, I find that while this is a really valiant effort to tackle a difficult problem, there is still more work to do in this area. The challenge with visualizing complex networks is indeed that they are complex, and while DOSA gives a user some control over how to filter and interact with this complexity, there is still a whole lot going on.

While I appreciate the inclusion of examples and use cases, I would have also liked to see a user design study evaluating how well their tool met their goal of providing a navigation and exploration tool for non-experts. I also think that the issues of scalability with respect to attributes and selection that they raise in the limitations section are important topics which, while reasonably beyond the scope of this paper, ought to be tackled in future work.

facebooktwittergoogle_plusredditlinkedintumblrmail

Facts, Power, and the Bias of AI

I spent last Friday and Saturday at the 7th Annual Text as Data conference, which draws together scholars from many different universities and disciplines to discuss developments in text as data research. This year’s conference, hosted by Northeastern, featured a number of great papers and discussions.

I was particularly struck by a comment from Joanna J. Bryson as she presented her work with Aylin Caliskan-Islam, Arvind Narayanan on A Story of Discrimination and Unfairness: Using the Implicit Bias Task to Assess Cultural Bias Embedded in Language Models:

There is no neutral knowledge.

This argument becomes especially salient in the context of artificial intelligence: we tend to think of algorithms as neutral, fact-based processes which are free from the biases we experience as humans. But such a simplification is deeply faulty. As Bryson argued, AI won’t be neutral if it’s based on human culture; there is no neutral knowledge.

This argument resonates quite deeply with me, but I find it particularly interesting through the lens of an increasingly relativistic world: as facts increasingly become seen as matters of opinion.

To complicate matters, there is no clear normative judgment that can be applied to such relativism: on the one hand this allows for embracing diverse perspectives, which is necessary for a flourishing, pluralistic world. On the other hand, nearly a quarter of high school government teachers in the U.S. report that parents or others would object if they discussed politics in a government classroom.

Discussing “current events” in a neutral manner is becoming increasingly challenging if not impossible.

This comment also reminds me of the work of urban planner Bent Flyvbjerg who turns an old axiom on its head to argue that “power is knowledge.” Flyvbjerg’s concern doesn’t require a complete collapse into relativism, but rather argues that “power procures the knowledge which supports its purposes, while it ignores or suppresses that knowledge which does not serve it.” Power, thus, selects what defines knowledge and ultimately shapes our understanding of reality.

In his work with rural coal minors, John Gaventa further showed how such power dynamics can become deeply entrenched, so the “powerless” don’t even realize the extent to which their reality is dictated by the those with power.

It is these elements which make Bryson’s comments so critical; it is not just that there is no neutral knowledge, but that “knowledge” is fundamentally controlled and defined by those in power. Thus it is imperative that any algorithm take these biases into account – because they are not just the biases of culture, but rather the biases of power.

facebooktwittergoogle_plusredditlinkedintumblrmail

Reflections from the Trenches and the Stacks

In my Network Visualization class, we’ve been talking a lot about methodologies for design research studies. On that topic, I recently read an interesting article by Michael Sedlmair, Miriah Meyer, and Tamara Munzner: Design Study Methodology: Reflections from the Trenches and the Stacks, after conducting a literature review to determine best practices, they realized that there were no best practices – at least not organized in a coherent, practical to follow way.

Thus, the authors aim to develop “holistic methodological approaches for conducting design studies,” drawn from their combined experiences as researchers as well as from their review of the literature in this field. They define the scope of their work very clearly: they aim to develop a practical guide to determine methodological approaches in “problem-driven research,” that is, research where “the goal is to work with real users to solve their real-world problems.”

Their first step in doing so is to define a 2-dimensional space in which any proposed research task can be placed. One axis looks at task clarity (from fuzzy to crisp) and the other looks at information location (from head to computer). These strike me as helpful axises for positioning a study and for thinking about what kinds of methodologies are appropriate. If your task is very fuzzy, for example, you may want to start with a study that clarifies the specific tasks which need to be examined. If your task is very crisp, and can be articulated computationally…perhaps you don’t need a visualization study but can rather do everything algorithmically.

From my own experience of user studies in a marketing context, I found these axes a very helpful framework for thinking about specific needs and outcomes – and therefore appropriate methodologies – of a research study.

The authors then go into their nine-stage framework for practical guidance in conducting design studies and their 32 identified pitfalls which can occur throughout the framework.

The report can be distilled more briefly into 5 steps a researcher should go through in designing, implementing, and sharing a study. These five stages should feed into each other and are not necessarily neatly chronological:

  1. Before designing a study think carefully about what you hope to accomplish and what approach you need. (Describe the clarity/information location axes are a tool for doing this).
  2. Think about what data you have and who needs to be part of the conversation.
  3. Design and implement the study
  4. Reflect and share your results
  5. Throughout the process, be sure to think carefully about goals, timelines and roles

Their paper, of course, goes into much greater detail about each of these five steps. But overall, I find this a helpful heuristic in thinking about the steps one should go through.

facebooktwittergoogle_plusredditlinkedintumblrmail

Text As Data Conference

At the end of this week, Northeastern will host the seventh annual research conference on “New Directions in Analyzing Text as Data.”

I’m very excited for this conference which brings together scholars from many different universities and disciplines to discuss developments in text as data research.  This year’s conference is cohosted by David Smith and my advisor Nick Beauchamp, and I’ve been busily working on getting everything in order for it.

Here is the description from the conference website:

The main purpose of this conference is to bring together researchers from the social sciences, computer science and linguistics to investigate new approaches to utilizing text in social science research. Text has always been a valuable resource for research, and recent developments in automatic language-processing methodologies from the fields of information retrieval, natural language processing, and machine learning are creating unprecedented opportunities for searching, categorizing, and extracting social science information from text.

Previous conferences took place at Harvard University, Northwestern University, the London School of Economics, and New York University. Selection of participants and papers for the conferences is the responsibility of a team led by Nick Beauchamp (Northeastern) and David Smith (Northeastern), along with Ken Benoit (LSE), Yejin Choi (University of Washington), and Arthur Spirling (NYU).

facebooktwittergoogle_plusredditlinkedintumblrmail

Politics as Associated Living

When I consider ‘politics’ as a field, I’m generally referring to something much broader than simply electoral politics.

‘Electoral politics’ is a relatively narrow field, concerned with the intricacies of voting and otherwise selecting elected officials. Politics is much broader.

John Dewey argued that ‘democracy’ is not simply a form a government but rather more broadly a way of living. Similarly, I take ‘politics’ to mean not merely electoral details but rather the art of associated living.

The members of any society face a collective challenge: we have divergent and conflicting needs and interests, but we must find ways of living together. The ‘must’ in that imperative is perhaps a little strong: without political life to moderate our interactions we would no doubt settle into some sort of equilibrium, but I suspect that equilibrium would be deeply unjust and unpredictable.

The greatest detractors of human nature imagine a world without politics, a world without laws, to be a desolate dystopia; where people maim and murder because they can get away with it or simply because that’s what is needed to survive.

But even without such horrific visions of lawlessness, I imagine a world without thoughtful, associated living to be, at best – distasteful. It would be a society where people yell past each other, consistently put their own interests first, and deeply deride anyone who with different needs or perspectives.

Unfortunately, this description of such a mad society may ring a little too true. It certainly sounds like at least one society with which I am familiar.

And this emphasizes why I find it so important to consider politics broadly as associated living. In this U.S. presidential election, I’ve heard people ask again and again: are any of the candidates worthy role models? Before the second presidential debate Sunday night, the discomfort was palpable: how did our electoral politics become so distasteful?

Those are good and important questions. But I find myself more interested in the broader questions: are we good role models in the challenging task of associated living? Do we shut down and deride our opponents or try, in some way, to understand? If understanding is impossible do we must try, at the very least, to finding ways of living together?

In many ways, the poisonous tones of our national politics is not that surprising. It reflects, I believe, a general loss of political awareness, of civic life. Not that the “good old days” were ever really that good. Political life has always been a little rough-and-tumble, and goodness knows we have many, many dark spots in our past.

But we should still aspire to be better. To welcome the disagreements which come inherent to hearing diverse perspectives, and to try, as best we can, to engage thoughtfully in the political life that is associated living.

facebooktwittergoogle_plusredditlinkedintumblrmail

Node Overlap Removal by Growing a Tree

I recently read Lev Nachmanson, Arlind Nocaj, Sergey Bereg, Leishi Zhang, and Alexander Holroyd’s article on “Node Overlap Removal by Growing a Tree,” which presents a really interesting method.

Using a minimum spanning tree to deal with overlapping nodes seems like a really innovative technique. It made me wonder how the authors came up with this approach!

As outlined in the paper, the algorithm begins with a Delaunay triangulation on the node centers – more information on Delaunay triangulations here – but its essentially a maximal planar subdivision of the graph: eg, you draw triangles connecting the centers of all the nodes.

From here, the algorithm finds the minimal spanning tree, where the cost of an edge is defined so that greater node overlap the lower the cost. The minimal spanning tree, then, find the maximal overlaps in the graph. The algorithm then “grows” the tree: increasing the cost of the tree by lengthening edges. Starting at the root, the lengthening propagates outwards. The algorithm repeats no overlaps exist on the edge of the triangulation.

Impressively, this algorithm runs in O(|V|) time per iteration, making it a fast as well as an effective algorithm.

facebooktwittergoogle_plusredditlinkedintumblrmail

On the Rights of a Black Man

I was struck by a comment from today’s coverage of the shooting death of an unarmed black man. To be clear, this was coverage of the death of an unarmed black man – whose name has not yet been released – in San Diego; not the recent shooting of Keith Scott in Charolette, or of Terence T. Crutcher in Tulsa.

In San Diego, a woman called 911 to get help for her mentally ill brother. Details are contested, but police shot and killed the man they’d been called to help.

In an interview this morning, a woman protesting the murder said: “Because he was black he automatically had no rights.”

That was a profound statement.

Because he was black, he automatically had no rights.

Regardless of whether or not you agree with that statement, the mere perception of that reality should be disturbing. And, incidentally, if you don’t agree with that statement, it is worth noting that it is factually indisputable that many, many unarmed black men have been killed by police under questionable circumstances.

We are a country that prides itself on individual rights, inalienable rights. Rights that can never, ever, be taken away from us.

Unless you are black.

Because he was black, he automatically had no rights.

facebooktwittergoogle_plusredditlinkedintumblrmail

Presidential Debate Quick Take

While I could endlessly pontificate about last night’s presidential debate, there’s not much I could add that hasn’t already been said by the many, many pundits and posters covering this race.

So I decided for today to do a very quick analysis of the debate transcript, as provided by the New York Times.

The transcript captures three speakers – Clinton, Trump, and moderator Lester Holt; and three interactions – crosstalk, laughter, and applause. The audience was under clear instructions to neither laugh nor applaud, but they did so anyway, getting, I think, a bit rowdier as the night went on.

The transcript watched 5 instances of audience laughter – 4 in response to Clinton and 1 in response to Trump (“I also have a much better temperament than she has, you know?”). Of the 12 instances of applause, 4 were in response to the moderator, 3 were in response to Clinton, and 5 were in response to Trump.

For crosstalk, the meaning is a little less clear – crosstalk is marked after 4 Trump comments, 3 Holt comments, and 1 Clinton comment…but this doesn’t explicitly indicate who was the actual interrupter.

While some have argued that Holt did an insufficient job of keeping time, Clinton and Trump did have about equal coverage – at least in terms of word count. Clinton spoke slightly less, using a total of 2403 words to Trump’s 2951. Interestingly, Clinton used more unique words – 788 to Trump’s 730.

And if you’re wondering, Lester Holt spoke a total of 1431 words, 481 of which were unique.

Using a simple log-likelihood technique, we can look at which words are most distinctive by speaker. That is, by comparing the frequency of words in one speaker’s text to the full transcript, we can see which words are over represented in that subsample.

In the role of moderator, for example, we see that Holt was much more likely to use words like “Mr”, “question”, “segment” and “minutes.”

Typically, you’d use log-likelihood on a much larger corpus, but it can still be fun for a single debate transcript.

Among Clinton’s most distinctive words were: “right”, “war”, and “country”

Among Trump’s most distinctive words were: “business”, “new”, and “judgment”. (Note that “bigly” does not appear in the transcript, since he actually said “big league”.)

This is a very rudimentary text analysis, but its still interesting to think about what we can learn from these simple assessments.

facebooktwittergoogle_plusredditlinkedintumblrmail

Big Tent Social Justice

I’ve been trained to think like a marketer, and I tend, at times, to think of social justice efforts through this lens too.

That is, if you’re trying to bring about behavior change among a large portion of the population, what communication strategies and tactics do you use to bring about this change? This way of thinking is somewhat distasteful given the manipulative reputation of marketing as a profession, but I find it useful nonetheless.

From this perspective, the strategy of a social justice movement would be to appeal to the largest possible number of people – to welcome everyone under a “big tent” vision of the cause. If this is your goal, then the strategy becomes relatively straightforward: create messages with broad appeal, take actions which generate sympathy, in all things go for the broadest reach and broadest appeal possible.

This is all very reasonable from a marketing perspective.

However, there’s a problem with this approach: the bigger your tent, the more diluted your vision. The more you try to please a broad group of people, the more you will have to relax your core stance.

This balance applies to any issue, not just social justice. Robert Heinlein used to argue that it was impossible to make a decision if more than 3 people are involved. Any time you have a large number of people in one place, the number of things they can really, deeply agree to will be minimal.

If you’re a marketer trying to maximize your profits, find the right balance takes skill but is relatively straightforward: appeal to the largest number of people possible while also creating a coherent brand identity. There’s a trade-off between the two, but no real sacrifice either way.

The calculation is more complex when it comes to social justice: just how much are you willing to let go?

This is an important question with a non-trivial answer: appeal to many people and you increase your chances of accomplishing something – but you also make it more likely that what you accomplish will be a toothless, meaningless shadow of your original goal.

There are varied opinions on which side of this spectrum it’s better to be on, and there’s no easy answer. When doing nothing is disastrous, is it better to accomplish something ineffective or to accomplish nothing at all?

Perhaps doing something is better than doing nothing; or perhaps an empty victory only serves to alleviate the sense that something needs to be done – making it virtually impossible for any real change to occur.

I don’t have an answer to this question – certainly not a generalizable one which could be applied to any issue at any time. But I do think that both arguments are reasonable – that we must appreciate the efforts of all who strive towards social justice and to value their input and perspective – even when we disagree.

facebooktwittergoogle_plusredditlinkedintumblrmail