Mobile Log Data

I had the opportunity today to attend a talk by Jeffrey Boase of the University of Toronto. Boase has done extensive work around mobile log data – having research participants install apps that gather their (anonymized) call data and engaging participants in short, mobile-based surveys.

The motivation for this work can be seen in part from his earlier research –  while 40% of mobile phone use studies base their findings on self-reported data, this data correlate only moderately with the server log data. In other words, self-reported data has notable validity issues while log data provides a much more accurate picture.

Of course, phone records of call time and duration lacks the context needed to make useful inferences. So Boase works to supplement log data with more traditional data collection techniques.

A research participant, for example, may complete a daily survey asking them to self-report data on how they know a certain person in their address book. Researchers can also probe further, not only getting at familial and social relationships but also asking whether a participant enjoys discussing politics with someone.

By using this survey data in concert with log data, Boase can build real-time social networks and track how they change.

His current work, the E-Rhythms Project, seeks to provide a rich understanding of mobile phone based peer bonding during adolescence and its consequences for social capital using an innovative data collection technique that triangulates smartphone log data, on-screen survey questions, and in-depth interviews.

facebooktwittergoogle_plusredditlinkedintumblrmail

Cris Moore on computational complexity

I am very excited that Northeastern is hosting a series of lectures by Cris Moore of the Santa Fe Institute. Moore has his background in physics, but he has also distinguished himself in the fields of mathematics, computer science, and network science.

Basically, he knows pretty much everything.

He began his lecture series yesterday with a talk on “computational complexity and landscapes.” Exploring qualitative differences between different types of algorithms, he essentially sought to answer the question: why are some problems “hard”?

Of course, “hard” is a relative term. In computer science, the word has a very specific meaning which requires some history to explain.

In 1971, Stephen Cook posed a question which remains open in computer science today: if a solution to a problem can be (quickly) verified by a computer can that problem also be (quickly) solved by a computer?

This question naturally sorts problems into two categories: those that can be solved in polynomial time (P) and those for which an answer can be can be checked in polynomial time (NP for ” nondeterministic polynomial time”).

Consider the famous Bridges of Königsberg problem. Euler, living in Königsberg in 1736, wanted to find a path that would allow him to cross each of the city’s bridges exactly once without ever crossing the same bridge twice.

Now you may start to appreciate the difference between finding a solution and checking a solution. Given a map of Königsberg, you might try a path and then try another path. If you find that neither of these paths work, you have accomplished exactly nothing: you still don’t have a solution but neither have you proven that none exists.

Only if you spent all day trying each and every possibility would you know for certain if a solution existed.

Now that is hard.

If, on the other hand, you were equipped with both a map of Königsberg and a Marauder’s map indicating the perfect route, it would be relatively easy to verify that you’d been handed a perfect solution. Once you have the right route it is easy to show that it works.

Of course, if the story were that simple, “P versus NP” wouldn’t have remained an open problem in computer science for forty years. Euler famously solved the Bridges of Königsberg problem without using the exhaustive technique described above. Instead, in a proof that laid the foundations of network science, Euler reduced the problem to one of vertices and edges. Since the connected path that he was looking for requires that you both enter and leave a land mass, an odd number of bridges meeting in one place poses problems. Eventually you will get stuck.

In this way, Euler was able to prove that there was no such path without having to try every possible solution. In computer science terms, we was able to solve it in polynomial time.

This is the fundamental challenge of the P versus NP dilemma: a problem which initially appears to be “NP” may be reducible to something that is simply “P.”

There are many problems which have not been successfully reduced, and there’s good reason to think that they never will be. But, of course, this is the typical NP problem: you can’t prove all NP problems can’t be reduced until you exhaust all of them.

As Cris Moore said, “the sky would fall” if it turned out that all NP problems are reducible to P.

To get a sense of the repercussions of such a discovery: a problem is defined as NP-hard if it is “at least as hard as the hardest problems in NP.” That is, other NP problems can be reduced, in polynomial time, to NP-hard problems.

That is to say, if you found a quick solution to an NP-hard problem it could also be applied as a quick solution to every NP problem. Thousands of problems which currently require exhaustive permutations to solve could suddenly be solved in polynomial time.

The sky would fall indeed.

His lecture series will continue on Thursdays through November 12.

facebooktwittergoogle_plusredditlinkedintumblrmail

Six Degrees of Wonder Woman: Part 1

Over the course of the semester, I’ll be studying a network of my choosing for my Complex Networks class. So, of course, I’m studying superheroes.

I pulled data from the Grand Comics Database, which really is pretty grand. Their database covers all printed comics throughout the world and includes:

8,738 publishers
5,814 brands
4,611 indicia publishers
88,408 series
1,188,029 issues
46,814 variant issues
226,893 issue indexes
549,708 covers
1,521,152 stories

I’m filtering down all that data to look as a specific subset I’m interested in: female characters in comic books.

This is a non-trivial task since the database does not include gender information. But, once I finish cleaning and processing the data, it looks like I’ll be left with about 10K female characters who appear in about 65K issues. There’s about 100K links connecting the two sets of nodes, with a link connecting a character to each issue they appear in.

From this, I’ll be able to look at the network of female superheroes and explore questions of female representation in comic books. So…get excited.

facebooktwittergoogle_plusredditlinkedintumblrmail

Local and Global Solutions

I am having great fun taking a Graph Theory class this semester. A graduate level math class, it focuses very much on the theory of graphs, which is remarkably different from the real-world networks I’ve been growing accustomed to.

(And for those of you playing at home, a “graph” and a “network” are basically the same thing, but “graph” is the math/theoretical term and “network” is the science/real-world term.)

In each class, we’re basically asked to prove or disprove properties of a given graph. This is harder than it sounds.

The hardest part, actually, is that I usually think I know the answer. There’s something about the functioning of networks – sorry, graphs – that generally seems intuitively clear. But even if I know the answer, I have no idea how to actually prove it. That’s where the fun comes in.

Now about a month into the semester, I’ve notice an interesting trend in my (flawed) approach to proofs. Asked to explain why a certain property cannot be true, I immediate argue why it couldn’t possibly happen at the local level.

If it’s not a problem at the local level, it ought not to be a problem at the global level.

That’s essentially every argument I’ve made so far.

And it’s not necesarily that I’m wrong about the local level, but – networks are fickle things and a localized approach runs the danger of aggregating into something unintended. That is – you can’t just aggregate the local to make inferences about the global.

Frankly, this is one of the reasons I’m studying network science. Networks are complex, dynamic  models which can so easily be broken down and analyzed. To really understand what’s going on you need to appreciate the local and the global, and think more broadly about how the whole structure interacts.

facebooktwittergoogle_plusredditlinkedintumblrmail

Networks and the Rise of the Medici

So, here’s a fun thing. In 1993 John Padget and Cristopher Ansell explored the social network of the great Medici family. with specific attention to the rise of Cosimo de Medici in 15th century Florence.

The paper, published in the America Journal of Sociology, is intended as a historical case study in a theoretical contradiction of state building: that founders cannot be both judge and boss. That is, a regime’s legitimacy hinges on “the conviction that judges and rules are not motivated by self-interest. Yet, a founder doesn’t want to give up control of their “organized creation.”

Cosimo’s leadership led to three centuries of Medici rule in Florence, begging the question – how did that happen?

Drawing on the “thorough and impressive work” of many historians of Florence, Padget and Ansell carefully reconstructed a network of elite families in Florence. Their data set included 215 elite families – where “family” is more akin to clan than household, and their network accounted four nine different types of connections including kinship as well as political, economic, and personal ties.

After disproving may common arguments for the Medici’s rise to power – it turns out they were just as wealthy as the “oligarchs” they displaced – Padget and Ansell turn to the network for possible explanations.

Their finding are remarkable. Looking at the political turmoil of the day, Padget and Ansell make a bold claim: “Rather than parties being generated by social groups, we argue, both parties and social groups were induced conjointly by underlying networks.”

Their analysis of the network provides some interesting insights into that claim:

The Medici party was an extraordinarily centralized, and simple, “star” or “spoke” network system, with very few relations among Medici followers: the party consisted almost entirely of direct tied to the Medici family. One important consequence for central control was that Medici partisans were connected to other Medici partisans almost solely through the Medici themselves. In addition, Medici partisans were connected to the rest of the oligarchic elite only through the intermediation of the Medici family. Medici partisans in general possessed remarkably few intraelite network ties compared to oligarchs, they were structurally impoverished. In such an impoverished network context, it is easy to understand how a solo dependence on a powerful family would loom very large indeed. 

Meanwhile, the rival oligarch party was densely interconnect. But rather than leading to cohesive collective action, this caused conflict. “The oligarchs were composed of too many status equals, each with plausible network claims to leadership.” they explain.

These network structures had deep consequences for politics and power. Ultimately, along with the capable leadership of Cosimo, it was this structure which allowed for the Medici rise.

facebooktwittergoogle_plusredditlinkedintumblrmail

Deliberation in Practice

While there are many rich debates around the theory of deliberation, I turn today to it’s practice. How are real-world deliberation structured and how do those implementations relate to the competing theories of deliberation?

The National Coalition for Dialogue & Deliberation (NCDD) offers a great starting point for examining these questions. A network of more than 2,200 deliberative practioners, NCDD “serves as a gathering place, a resource center, a news source, and a facilitative leader for this vital community of practice.”

NCDD actively embraces a pluralistic approach to deliberation, arguing, “no method works in all situations.” The context-dependent nature of deliberation is implicit throughout the practical literature – as each approach typically introduces itself with a short explanation of where it can be of use.

To help communities “decide which types of approaches are the best fit for your circumstances,” NCDD publishes a useful Engagement Streams Framework, which breaks deliberative techniques into four categories:

Exploration: Encourage people and groups to learn more about themselves, their community, or an issue, and possibly discover innovative solutions

Conflict Transformation: Resolve conflicts, to foster personal healing and growth, and to improve relations among groups

Decision Making: Influence public decisions and public policy and improve public knowledge

Collaborative Action: Empower people and groups to solve complicated problems and take responsibility for the solution

NCDD then takes 22 of the most popular deliberative processes and assigns each to one or more of these categories. I have visualized their chart as a network, showing how different deliberative approaches connect to the four categories NCDD identified.

NetworkOfDeliberation

While perhaps not too much can be inferred from this sample of deliberative practices, it is interesting to note that half of the “Decision Making” practices are focused solely on that stream, while “Collaborative Action” processes are always connected to another stream as well. In this model, “Decision Making” and “Exploration” are the most common approaches, with 12 and 11 practices respectively. Additionally, NCDD’s list captures at least one way to combine any two of their identified streams.

It is worth spending some time briefly describing a model distinctive to the three streams that have dedicated approaches – Decision Making, Exploration, and Conflict Transformation.

National Issues Forum – Decision Making
While a National Issues Forum (NIF) is not a formal decision making body, their facilitated deliberations aim to help groups weigh different options. “We are here to move toward a public decision or CHOICE on a difficult issue through CHOICE WORK,” they explain.

They take this approach of choice work quite literally – each of their dozens of issue guides present a topic along with three possible approaches. Participants are asked to reflect on their own experience of the issue and deliberate about the pros and cons of each outlined approach.

This focus on “choice work” may be somewhat misleading, though. NIF is careful to indicate that successful deliberation does not have to end in agreement or action. “Sometimes, forum participants find the use of the word ‘choice’ confusing” they write. “Some assume that they are being asked to choose one of the approaches. And, of course, they are not.”

The NIF definition of deliberation similarly rejects consensus as a mandatory outcome. “It’s not about reaching agreement or seeing eye-to-eye. It’s about looking at the costs and consequences of possible solutions to daunting problems, and finding out what we, as a people, will or will not accept as a solution.”

Finally, while NIF facilitators are encouraged to begin their session with ground rules, their issue guides don’t provide any suggested ground rules to start from. This seems to be an intentional choice embedded in their philosophy: “The responsibility for doing the work of deliberation belongs to the group,” they write.

NIF expects most forums will last around 2 hours, though they leave room for communities to organize multi-session discussions. Typically, a session will have a hundred or more participants, and NIF encourages communities to determine for themselves the mix of small group versus plenary discussion.

World Café – Exploration
World Café gatherings may be large, but their conversations are intimate. While the total number of attendees can venture into the hundreds, hosts are instructed to seat no more than five people together. Conversations take place in at least three rounds of twenty minutes each.

After each round, one person is encouraged to stay as “table host” to the next round “while the others serve as travelers or ‘ambassadors of meaning.’ The travelers carry key ideas, themes and questions into their new conversations, while the table host welcomes the new set of travelers.

World Café hosts are encouraged to develop their own questions, which can be the same or different for each round of inquiry. “Good questions need not imply immediate action steps or problem solving. They should invite inquiry and discovery vs. advocacy and advantage,” they write.

A light and flexible model, World Cafés can be easily implemented in a range of situations to create “a living network of collaborative dialogue around questions that matter in service to real work.”

The model is subtly action-oriented. Hardly so in comparison to other deliberation models, World Cafés are built around the core idea that there are problems in our communities and only we have the power to address them.

As they describe in their host guide: “The World Café is built on the assumption that people already have within them the wisdom and creativity to confront even the most difficult challenges; that the answers we need are available to us; and that we are Wiser Together than we are alone.”

While some might charge the World Café with being “just talk,” the World Café would retort: “The power of conversation is so invisible and natural that we usually overlook it.”

Public Conversations Project – Conflict Transformation
The Public Conversations Project is a leader in Reflective Structured Dialogue, a technique “designed to help people have the conversation they want to have about some of the most difficult topics.”

Their work is focused on dialogue, “a conversation that is animated by a search for mutual understanding…distinct from conversations focused directly on problem solving.” The Public Conversations Project has led these dialogues in some of the most deeply divided communities, providing spaces for participants to get to authentically know each other without trying to sway each other’s view on an issue.

Dialogues are heavily structured, outlining time for silent reflection, equal time for each person to speak, and a noticeable pause between each person’s response. After every participant responds to the question posed by the facilitator there is an equal time for “questions of genuine interest” which can be posed by any participant to any other participant. These questions seek to “encourage constructive inquiry and exploration that enhances clarity and mutual understanding.”

This highly structured model “empowers participants to share experiences and explore questions that both clarify their own perspectives and help them become more comfortable around, and curious about, those with whom they are in conflict,” and “helps participants engage in constructive, often groundbreaking conversations that can restore trust and lay the foundation for collaborative action.”

Dialogues are typically small group discussions that happen over multiple two-hour session. A community may have multiple dialogues happening at once, but there is typically not a plenary portion of the exchange.

facebooktwittergoogle_plusredditlinkedintumblrmail

Analyzing Political Speech

I had the opportunity today to attend a talk by Oren Tsur, a post-doc in my lab who has done a lot of work around Natural Language Processing (NLP). He spoke about his work analyzing political text, which was published by the Association for Computational Linguistics (ACL) earlier this year.

Tsur noted that political writing and speech is intentionally crafted to influence audiences. This provides an interesting framework to explore the question: can we automatically identify and quantitatively measure topical framing and agenda setting campaigns?

That is, using Natural Language Processing techniques, can a computer identify framing, spin, and agenda setting in political speech?

Tsur and his coauthors used a dataset from VoteSmart of “all individual statements and press releases in a span of four years (2010-2013), a total of 134000 statements made by 641 representatives.”

It’s data sets like that which make “unsupervised” analysis so important. It’s not practical for a human to read through and categorize that many statements…but can a computer be taught to do so effectively?

Each document was considered as a “bag of words,” and each word was associated with various topics with different probabilities.  Topics might be similar, but were fine-grained enough to pick up subtle differences.

One topic caught words like “Obamacare” and “repeal” while another caught words like “social” and “benefits.” And, yes, you can then connect each category to who is saying it to determine which of those topics is “owned” by republicans and which is “owned” by democrats.

Furthermore, Tsur could compare how frequently the same words or phrases (ngrams) appeared in different documents, demonstrating that republicans tend to be much more “on message.” That is, Republicans at any given time, republican politicians are more likely to have phrases in common with each other – perhaps sticking to the same talking points.

 

 

 

facebooktwittergoogle_plusredditlinkedintumblrmail

Data-ism

I had the opportunity today to hear a talk by Steve Lohr, New York Times technology reporter and author of the recent book, Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else.

Lohr said that “big data” is more than just a large collection of digital information, it’s a philosophical framework – a way of approaching the world. Big data, he said, allows people to see patterns in the world and to make better sense of the world around them.

Ultimately, he argued, big data is a revolution in decision-making.

This revolution can have many positive implications, making our lives simpler, faster, and better.

For example, according to Lohr, in 1880 the U.S. census took eight years to conduct. While the population swelled in 1890, this census took only a few weeks to complete. The difference was due to a technological innovation: the creation of a machine-readable punch card by a company that later became IBM.

Of course there are also possible pitfalls – one can imagine using big data to determine who gets a loan going terribly wrong. And, yes, this is something that “data science lenders” do, claiming that their methodology is more accurate than more traditional approaches.

Lohr was somewhat weary of these big data, automated, decision making processes, arguing that when data is used to make decisions affecting people’s lives, that process needs to be transparent.

But, he was more casual about the change than I might have thought. Perhaps it’s because he has covered technology’s evolution for nearly a decade, but – he was somewhat skeptical of concerns about privacy and the de-humanization of our lives.

Technology evolves and our mores will evolve with it, he seemed to say.

Lohr commented that when the handheld Kodak camera was originally introduced, it was seen as a invasion of privacy. Banned from beaches and the Washington monument, it was seen as a danger, a possible corrupting force.

Until privacy expectations evolved to meet the new technology.

Perhaps it is just nostalgia that makes us fear this brave new world.

It’s an interesting argument, and I think it’s good to be skeptical of our instinctual reactions to things. But pointing to the mistakes of our past fears seems insufficient – perhaps we should be more concerned with privacy, but have simply become slowly accustomed to not having it.

That could be a natural evolution, or it could be a slow degradation – with serious and lasting consequences.

facebooktwittergoogle_plusredditlinkedintumblrmail

Possible Research Questions

One of my current tasks is to sharpen the theoretical questions I am interested in exploring through my doctoral work.

While I am thankfully still some time away from having to select a dissertation topic, the process of thinking through and refining my interests will help me identify possible research projects and help guide me as I select from a seeming infinite array of classes, readings, and activities.

So what are the theoretical questions I am interested in exploring? Well, since “ALL OF THEM” doesn’t seem to be a productive answer in this regard, I will attempt to articulate a somewhat more narrow scope.

Broadly, I am interested in civil society – and I am convinced that network approaches can bring value to our understanding. To be perfectly clear, I’m not referring solely to the understanding of academics, but of all of us individual, people who are living in communities.

That is, while network science can certainly be used to help political scientists better model the societies they study, I am more deeply interested in the civic studies question: what should we do?

There are many ways I can envision network science contributing to our collective attempts to answer this question.

First, on a broad scale, I think network analysis can help us more accurately conceive of the communities of which we are a part. I live in a very engaged community with a robust level of social capital, and yet I am also very close friends with some people who I barely see in person. Is one these settings more accurately a “community”? How should I make sense of my place in each of them?

I am particularly interested in trying to capture “layers” of community. If I were to do a power analysis in my local, geographic community I would find many individuals and institutions I could have direct interaction with. I could imagine multiple ways for my own voice and agency to have a real impact on policy or the culture of my community.

But if I were to do a similar analysis at a national or perhaps international scale, I would quickly find myself feeling powerless. Can I change international law? Perhaps some individuals are positioned to do so, but I most certainly am not.

In such a setting, then, I am left little but a foolish choice between inaction and the vain hope that my representatives will represent me, that my voice among thousands will carry some weight.

I’d previously seen this as an argument for more local engagement. Why shout in the wind of national politics when real work can be done at the local level?

But I wonder now if this is simply a false dichotomy. We envision local work and national work, and perhaps other scales of regional work or international work as well. We treat our communities as tiers – scaled up versus hyperlocal.

But what if there’s a better way to think of it? A better way to conceive of our multiple communities, overlapping, intersecting, complex and ever changing. What might that look like?

A second area of interest is around interactions within a given community. While the first set of questions struggles with how we might define the borders of a single community, the second explores what we do once we know what “we” means.

More explicitly, this area centers around questions of dialogue and deliberation. What does “good” dialogue look like? How are ideas exchanged and opinions altered? Using strategies of epistemic network analysis one might even ask questions such as, what does a “good” deliberator look like? What does a good moderator look like? Is there a way of thinking that can categorized as “good” deliberative thinking?

Finally, I’m very interested in applying network science to better understanding the network of ideas and morals held by an individual. This line of thinking can be closely tied to questions of deliberation – asking what idea structure a person ought to have in order to be a “good” deliberator.

But there are other ways to take this question as well – are there features of an individual’s moral network which are better or worse than others? If so, what ought a good person’s moral network look like? What network characteristic should we each seek to cultivate?

 

These are not entirely disparate questions, but they do each take the confluence of civics and networks in different ways. I’m not sure where the next five years will lead me, but I look forward to delving in!

facebooktwittergoogle_plusredditlinkedintumblrmail

Modeling Networks of Individuals and Institutions

One of the topics that I’m interested in as I delve into my Ph.D. program involves using networks to model a community’s interactions. A critical first question in this process is simply: what is a community network a network of?

Social network analysis and it’s face-to-face equivalent focus on networks of individuals. Each person is a node in the network, linked to the individuals they know or communicate with. This is a robust and helpful way of looking at communities.

It allows for mapping information flows and exploring community dynamics. Do most people know most other people? Are there segments of the community that are isolated from each other, like cliques in the high school cafeteria? How diverse is the average person’s network?

These are valuable ways of looking at a community, but this approach doesn’t tell the full story.

There is also great work being done looking at the network of institutions within a community. Can the characters of a community’s institutional network predict how well that community will fare during an economic crisis?

This approach is often not devoid of interest in the individual – asking, too, questions of how strong institutional networks can build social capital, benefiting the community as a whole as well as the individuals who comprise it.

Again, this is a valuable approach that can yield many interesting and helpful results.

But somehow, I find myself unsatisfied with either approach.

Communities are complex systems of individuals and institutions. An institution may be comprised of individuals, but it’s ultimate character is more than the sum of its parts: individuals can change institutions and institutions can change individuals.

And there may be yet more factors that influence how communities function: policies, norms, historical sensibilities, regional or even international networks.

So for now, the question I’m pondering is this: what would a detailed, robust, network model of a community look like? What are its nodes and connections, and is there some fundamental unit which could be used to model all these complex layers together?

facebooktwittergoogle_plusredditlinkedintumblrmail