Jupyter notebook Case_study 4- Dolphin Social Networks.ipynb

Case_study 4- Dolphin Social Networks.ipynb

¹⁰²¹ views

Kernel: Python 2

By Taylor Tobin & Chris Wadibia

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Introduction

The bottlenose dolphins (species Tursiops) of Shark Bay, Australia live in a highly social fission-fussion society. Researchers study dolphins can gain insight into different patterns of foraging, socializing, and caring for young are apparent. Some dolphins have very close bonds with a few friends, some are relative loners while others are social butterflies. Some hunt with tools and others beach themselves to catch fish. (see monkeymiadolphins.org for more information about Shark Bay dolphins.)

Today, our goal will be to study the behavior of sponging through the social network of these bottlenose dolphins.

(source: monkeymiadolphins.org)

Our workflow will be:

Load the network

1 - Import libraries

2 - Load the network

Explore and visualize the network

3 - Explore the basic properties of the network

4 - Visualize the network

Hypothesis testing

6 - Test hypothesis # 1

7 - Test hypothesis # 2

Implications

As always, our libraries are imported first. (Today, we will need the igraph, random and numpy libraries.)

In [2]:

import igraph
import random
import numpy

Don't forget that you can access the igraph package documentation at any time! Most any question about how to use igraph is answered somewhere on:

http://igraph.org/python/doc/igraph.Graph-class.html

To study the dolphin social network, we need to read in our network. We import the data from a GML file that contains the data that makes up this graph. Additionally, each node has ATTRIBUTES. The edges are weighted by EDGE WEIGHTS. Examine the .gml files to see how this data is stored!

The nodes of this graph represent individual dolphins, and edges represent social interactions between individuals. Notice that there are three node attributes for each node: "age", "sex" and "behavior". The behavior attribute provides information about the sponging behavior (0 = non-sponger, 1 = sponger)

We can load a gml file with a function that is part of the igraph library igraph.Graph.Read_GML(filename)

In [3]:

dolphin_net = igraph.Graph.Read_GML('dolphin_net.gml')

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Brief Network Exploration

Today, we will do only a short exploratory analysis, so we can get to hypothesis testing.

First, write code to figure out how many nodes and how many edges there are in this graph.

In [4]:

# 1 code to find number of nodes and number of edges
dolphin_net.summary()

Out[4]:

'IGRAPH UN-- 151 490 -- \n+ attr: age (v), behavior (v), id (v), name (v), sex (v), numsightings (e)'

Next, let's explore the types of individuals in the population. Let's determine the number of males and females in this population. To do this, we can use an igraph function that lets us select the nodes that have a specific value for an attribute. All the nodes with the value are put into a list. To determine the final number of nodes having the attribute value of interest, we can see how long the list is.

Python has a built in function call 'len( )' that returns the number of elements (or the length) of a list.

In [5]:

# find the list of males and females by using the igraph select function 
males = dolphin_net.vs.select(sex="MALE")
females = dolphin_net.vs.select(sex="FEMALE")

# count the numbe of males and females by using the python len function 
male_count = len(males)
female_count = len(females)

# print out the counts
print "The number of males is: ", male_count
print "The number of females is: ", female_count

Out[5]:

The number of males is:  75
The number of females is:  68

Your turn! Write code to count up the number of spongers and non-spongers in the population

In [6]:

# get the list of spongers and non-spongers 
spongers = dolphin_net.vs.select(behavior=1)
non_spongers = dolphin_net.vs.select(behavior=0)

# count the numbe of spongers and non-spongers 
spongers_count = len(spongers)
non_spongers_count = len(non_spongers)

# print out the counts
print "The number of spongers is: ", spongers_count
print "The number of non-spongers is: ", non_spongers_count

Out[6]:

The number of spongers is:  26
The number of non-spongers is:  125

In [7]:

### Next, let's visualize the network:

> For this lab, we will investigate how to create color attributes for our network visualization. To do this, we will pick an attribute, like "behavior", and use a loop to create a list of colors for that attribute. See below, where we have created colors for two possible values the attribute "behavior" can have. We do this by using a conditional (an if statement)

Out[7]:

  File "<ipython-input-7-5eacfb20fbf4>", line 3
    > For this lab, we will investigate how to create color attributes for our network visualization. To do this, we will pick an attribute, like "behavior", and use a loop to create a list of colors for that attribute. See below, where we have created colors for two possible values the attribute "behavior" can have. We do this by using a conditional (an if statement)
    ^
SyntaxError: invalid syntax

In [ ]:

# a list of colors for each node; this list is empty right now, but we will populate it one by one for each node
color_list = []

# iterate over each node in the network (don't forget, dolphin_net.vs is how we access the node list)
node_list = dolphin_net.vs
for node in node_list:
    
    # if the behavior attribute is 1 (i.e. sponger), we color the node red
    if node["behavior"]==1:
        color_list.append("red")
    
    # any other attribute value will be colored black!
    else:
        color_list.append("black")

We can then create a new node attribute "color" and use the color_list we just created to be the values of that attribute. After doing this, each node will be associated with a color representing an attribute.

In [ ]:

dolphin_net.vs["color"] = color_list

Now that we have colors for each node, we can plot the network!

In [ ]:

igraph.plot(dolphin_net, 
            layout = dolphin_net.layout_auto(), 
            bbox=(500,500), 
            vertex_size=10, 
            vertex_color=dolphin_net.vs['color'])

What do you notice about the network from the visualizations? [enter some observations below]

3 - Observations: The network above contains disconnected components. Some components of this graph are completely unrelated to other components of this network, which are vastly more connected.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Hypothesis Testing

Based on the observations above, we are going to investigate two questions:

1) Are spongers more solitary than non-spongers?

2) Do dolphins socialize predominantly with individuals of similar behavior (i.e spongers with spongers, and non-spongers with non-spongers?)

First, we turn these research questions into concrete testable hypotheses.

Below, state the above research questions as testable hypotheses:

4 - Hypothesis 1 and Hypothesis 2

Research question One: Are spongers more solitary than non-spongers? This question in a testable hypothesis form: Spongers are more solitary than non spongers.

Research question Two: Do dolphins socialize predominantly with individuals of similar behavior (i.e. spongers with spongers, and non-spongers with non-spongers?) This question in a testable hypothesis form: Dolphins do not socialize with individuals of similar behavior.

Hypothesis # 1

To test Hypothesis 1, we need to first measure the average degrees of spongers and non-spongers, and the difference between them.

Write code below to calculate the average degree of nodes that are spongers only:

In [8]:

# Add code to get average degree of spongers only
degree_spongers = spongers.degree()
meandegree_spongers = numpy.mean(degree_spongers)
# Print 
print "The average degree of the Spongers network is: ", meandegree_spongers

Out[8]:

The average degree of the Spongers network is:  3.84615384615

Write code below to calculate the average degree of nodes that are non-spongers only:

In [9]:

# Add code to get average degree of non-spongers only
degree_non_spongers = non_spongers.degree()
meandegree_non_spongers = numpy.mean(degree_non_spongers)
# Print 
print "The average degree of the Non-Spongers network is: ", meandegree_non_spongers

Out[9]:

The average degree of the Non-Spongers network is:  7.04

Write code below to calculate the difference in average degrees between sponger and non-spongers:

In [ ]:

# Calculate difference in average degrees 
observed_diff = meandegree_non_spongers - meandegree_spongers
# Print average degrees
print "The observed diff:", observed_diff

Next, to identify if this difference is significant, we want to calculate the same difference for "control" graphs (i.e. graphs where the node attributes have been shuffled) and compare.

To do this, we will use three functions. These functions do not already exist, so we will write them ourselves and call them in our program. The code for these three functions is below.

Remember, a function is just a set of code designed to carry out a specific task. For now, we are just going to practice calling a function. During future labs, you will practice writing your own function.

In [ ]:

#####################    
# This function calculates if the observed difference is significant
# when compared to a random shuffle
# Incoming parameters:
# - Network 
# - Observed difference
# Return value:
# - p value 
def calc_significance(network, observed_diff):
# We want to check how many times (out of 100), the average degree difference is greater than the observed difference    
    
    p = 0 # count of how many times random difference bigger than observed difference
    
    for counter in range(0,100):
        diff = find_degree_diff_random(network)
        if diff > abs(observed_diff):
            p = p + 1
    
    return p


#####################    
# This function shuffles node attributes, and calculates the difference in avg degree between two groups
# Incoming parameters:
# - Network 
# Return value:
# - difference 
def find_degree_diff_random(network):
    
    # get behavior node attributes for graph
    original_behavior = network.vs["behavior"]
    
    # shuffle behavior node attributes using the shuffle function from the random library (which we imported earlier)
    shuffled_behavior = original_behavior
    random.shuffle(shuffled_behavior)
    
    # copy the network to a new network, and assign the shuffled behavior to this network
    shuffled_network = network.copy()
    shuffled_network.vs["behavior"] = shuffled_behavior
    
    # get lists of spongers and non-spongers
    spongers = shuffled_network.vs.select(behavior=1)
    non_spongers = shuffled_network.vs.select(behavior=0)
    
    # get average degree of each group by using the get_group_average_degree() function below
    sponger_avgdeg = get_group_average_degree(spongers)
    nonsponger_avgdeg = get_group_average_degree(non_spongers)
    
    # calculate difference between averages
    avgdeg_diff = abs(nonsponger_avgdeg - sponger_avgdeg)
    
    # return the difference in average degrees
    return avgdeg_diff


#####################
# This function calculates average degrees of a list of nodes
# Incoming parameters:
# - list of nodes
# Return value:
# - average degree 
def get_group_average_degree(nodes_in_group):

    # get degree list for group
    degree_list = nodes_in_group.degree()
    
    # calculate average using function mean() from numpy library
    average_degree = numpy.mean(degree_list)
    
    # return the average degree of the group of nodes
    return average_degree

Read the code above and write out the names of the three functions, what the input parameters and return values represent.

calc_significance. The input for calc_significance is used to see the average degree difference and if it is great than the observed difference is. The return values is p, the statistical significance.

find_degree_diff_random. The input for the find_degree_diff_random is the behavior. The behavior is the network paramter. The return value is avgdeg_diff, or the average degree.

get_group_average_degree. The input is the degree list and the return vaue is the average degree.

8 - Function details

To use these functions, we can just call them. To call the first function, use calc_significance(dolphin_net, observed_diff)

Write code below to calculate and print the statistical significance of the difference in average degree between spongers and non-spongers

In [ ]:

#9 - Calculate and print the statistical significance 
calc_significance(dolphin_net,observed_diff)

Hypothesis # 2

To test Hypothesis 2, we need to measure the assortativity coefficient by sponging behavior.

You can read more about the igraph assortativity function here: http://igraph.org/python/doc/igraph.GraphBase-class.html#assortativity

In [ ]:

sponging_assortativity = dolphin_net.assortativity("behavior")
print sponging_assortativity

Is there statistical significance for Hypothesis # 2?

10 - significance: Yes, the statistical significance of hypothesis 2 is dependent upon the ratios uncovered by monitoring the sponging behavior in question.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Implications

What implications are there for the results you have found today?

11 - Implications: Implications found from today's inquiries help detail the solitary nature of spongers and non-spongers, while also providing insights into the socialization practices of dophins.

By Taylor Tobin & Chris Wadibia

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Introduction

Our workflow will be:

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Brief Network Exploration

First, write code to figure out how many nodes and how many edges there are in this graph.

Python has a built in function call 'len( )' that returns the number of elements (or the length) of a list.

Your turn! Write code to count up the number of spongers and non-spongers in the population

What do you notice about the network from the visualizations? [enter some observations below]

3 - Observations: The network above contains disconnected components. Some components of this graph are completely unrelated to other components of this network, which are vastly more connected.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Hypothesis Testing

Based on the observations above, we are going to investigate two questions:

1) Are spongers more solitary than non-spongers?

2) Do dolphins socialize predominantly with individuals of similar behavior (i.e spongers with spongers, and non-spongers with non-spongers?)

First, we turn these research questions into concrete testable hypotheses.

Below, state the above research questions as testable hypotheses:

4 - Hypothesis 1 and Hypothesis 2

Hypothesis # 1

To test Hypothesis 1, we need to first measure the average degrees of spongers and non-spongers, and the difference between them.

Write code below to calculate the average degree of nodes that are spongers only:

Write code below to calculate the average degree of nodes that are non-spongers only:

Write code below to calculate the difference in average degrees between sponger and non-spongers:

Next, to identify if this difference is significant, we want to calculate the same difference for "control" graphs (i.e. graphs where the node attributes have been shuffled) and compare.

Read the code above and write out the names of the three functions, what the input parameters and return values represent.

8 - Function details

Write code below to calculate and print the statistical significance of the difference in average degree between spongers and non-spongers

Hypothesis # 2

To test Hypothesis 2, we need to measure the assortativity coefficient by sponging behavior.

Is there statistical significance for Hypothesis # 2?

10 - significance: Yes, the statistical significance of hypothesis 2 is dependent upon the ratios uncovered by monitoring the sponging behavior in question.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Implications

What implications are there for the results you have found today?

11 - Implications: Implications found from today's inquiries help detail the solitary nature of spongers and non-spongers, while also providing insights into the socialization practices of dophins.

Product

Resources

Company

BIO 044/COSC 044 Case Study 4: Dolphin Social Networks

By Taylor Tobin & Chris Wadibia

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Introduction

Our workflow will be:

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Brief Network Exploration

First, write code to figure out how many nodes and how many edges there are in this graph.

Python has a built in function call 'len( )' that returns the number of elements (or the length) of a list.

Your turn! Write code to count up the number of spongers and non-spongers in the population

What do you notice about the network from the visualizations? [enter some observations below]

3 - Observations: The network above contains disconnected components. Some components of this graph are completely unrelated to other components of this network, which are vastly more connected.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Hypothesis Testing

Based on the observations above, we are going to investigate two questions:

1) Are spongers more solitary than non-spongers?

2) Do dolphins socialize predominantly with individuals of similar behavior (i.e spongers with spongers, and non-spongers with non-spongers?)

First, we turn these research questions into concrete testable hypotheses.

Below, state the above research questions as testable hypotheses:

4 - Hypothesis 1 and Hypothesis 2

Hypothesis # 1

To test Hypothesis 1, we need to first measure the average degrees of spongers and non-spongers, and the difference between them.

Write code below to calculate the average degree of nodes that are spongers only:

Write code below to calculate the average degree of nodes that are non-spongers only:

Write code below to calculate the difference in average degrees between sponger and non-spongers:

Next, to identify if this difference is significant, we want to calculate the same difference for "control" graphs (i.e. graphs where the node attributes have been shuffled) and compare.

Read the code above and write out the names of the three functions, what the input parameters and return values represent.

8 - Function details

Write code below to calculate and print the statistical significance of the difference in average degree between spongers and non-spongers

Hypothesis # 2

To test Hypothesis 2, we need to measure the assortativity coefficient by sponging behavior.

Is there statistical significance for Hypothesis # 2?

10 - significance: Yes, the statistical significance of hypothesis 2 is dependent upon the ratios uncovered by monitoring the sponging behavior in question.

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Implications

What implications are there for the results you have found today?

11 - Implications: Implications found from today's inquiries help detail the solitary nature of spongers and non-spongers, while also providing insights into the socialization practices of dophins.