Add files via upload

olivierzach · Jul 1, 2018 · b98fabe · b98fabe
1 parent 5d65176
commit b98fabe
Show file tree

Hide file tree

Showing 4 changed files with 203 additions and 0 deletions.
diff --git a/Week 7 Notes/bayesian_modeling.txt b/Week 7 Notes/bayesian_modeling.txt
@@ -0,0 +1,66 @@
+# Week 7 Notes: Bayesian Modeling
+
+- bayesian models are sometimes counterintuitive
+
+Bayesian Probability:
+- based on basic rule of conditional probability
+- bayes' rule
+- P(A|B) = P(B|A) * P(A) / P(B)
+- example: medical test
+- true positives: 98%
+- false positives: 8%
+- 1% of the population and 8.9% of people test positive
+- if someone tests positive what is the probability someone actually has the disease?
+- write out the equation in bayes' rule
+- A = has the disease
+- B = tested positive
+- P(A|B) = P(B|A) * P(A) / P(B) = 98% * 1% / 8.9% = 11%
+- even after testing positive a person only has a 11% of having the disease
+- why?
+- so many more people don't have the disease = many more false positives than true positives
+
+
+Empirical Bayes Modeling:
+- overall distribution of something is known or estimated
+- only a little data is available for the problem
+- ex. predicting basketball outcomes NCAA
+- difference X in points scored by home team and road team
+- approximately normal: X ~ N(m + h, simga^2)
+- h = home court advantage
+- m = true difference in the teams' strength (unknown)
+- simga^2 = variance
+- bayes rule allows us to figure out the unknown m
+- first model the difference between teams' strengths m ~ N(0, tau^2)
+- then look at observed data:
+- x = observed point difference in game
+- m = real difference between two teams, with m != x
+- bayes' rule: look for the probability of having a true points difference given the observation x
+- P(M = m | X = x) = P(X= x | M = m)*P(M=m) / P(X = x)
+- probability of m given x!!
+- if team a beats team b by x points we could find the distribution of how much one team is better
+- we could also integrate that distribution (zero to infinity) to show the probability that a team is actually better!
+- P (home team better | X = x) = integral(P(M = m | X = x)* dm)
+
+what are we actually saying?
+- home team won by 20 points
+- estimated home court advantage h = 4 points
+- standard deviation in team strength difference tau = 6 points
+- standard error from random variance sigma = 11 points
+- of the 20 point victory:
+- about 4 points was home court advantage
+- about 12.5 points due to random variation
+- only 3.5 points due to the difference between teams
+- this seems counterintuitive: 20 points win just 3 point difference?
+- there is a lot more variance due to randomness
+- 20 point win is more likely to happen due to randomness
+- bayes rule shrinks the estimate to a more normal distribution
+
+Summary:
+- take a single observation
+- combine with broader set of observations
+- then make a deduction or prediction
+- bayesian models work especially in the absence of lots of data
+
+
+P(A) = prior distribution
+P(A|B) = posterior distribution 
diff --git a/Week 7 Notes/deep_learning.txt b/Week 7 Notes/deep_learning.txt
@@ -0,0 +1,30 @@
+# Week 7 Notes: Neural Networks and Deep Learning
+
+- used to react to patterns that we don't even understand
+- CAPTCHA type of questions
+- idea of deep learning is to train a system to react to without knowing what it is reacting to
+- powerful in image recognition and speech recognition, NLP
+
+
+Neural Networks:
+- neural networks are modeled after the way neurons work in our brains
+- Artificial Neural Network
+- three levels of neurons:
+- input level, hidden level, output level
+- input > hidden > output
+- each input accepts a single piece of information
+- each neuron: gets inputs from previous layer > calculates function of weighted inputs > gives it output to next layer
+- there might be several layers of hidden layer neurons
+- finally we reach the output is the combination of all weighted hidden layer results
+- the output layer chooses the 'best' answer based on the results from all the hidden layers
+- then the results are fed back through the entire system and re-weighted based on the incorrectness of the first output
+- simple is gradient descent to do this
+- if the network learns well with enough data all of the weights will be adjusted so that the network generates correct outputs from the input
+- require a lot of data to train
+- hard to choose and tune the learning algorithm: re-weight too fast or too slow can be problematic
+
+
+Deep Learning:
+- idea of neural networks adapted for more layers
+- similar approach to neural networks = input > "deep" layers > output > restart with re-weights
+- powerful in NLP, speech, image recognition
diff --git a/Week 7 Notes/game_theory.txt b/Week 7 Notes/game_theory.txt
@@ -0,0 +1,73 @@
+strategy# Week 7 Notes: Competitive Models
+
+- competitive decision making
+- previous models = 'us against the data'
+- descriptive models = get understanding of reality
+- predictive models = find hidden relationships and predict the future
+- prescriptive models = find the best thing to do assuming the system does not react
+- what if the system reacts intelligently?
+- we need to use analytics to consider all sides of the system
+
+ Examples:
+ - pricing examples
+ - using past purchase data and competitor data to price products
+ - one price is set competitors may change their price more - giving different results than the model
+- government = corporate tax policies
+- companies need to decide how to store and spend their money based on tax revenue of the government
+- employee incentives to change behavior
+
+- need to consider not just your own situation but the competitive situation
+- these situations need competitive decision making = game theory
+- cooperative game theory = competitive and cooperative game theory
+
+Timing:
+- make decisions simultaneously
+- can't change once made
+- strategy = counter strategy > counter-counter strategy = best strategy after many iterations
+- sequential game = decision made in series
+
+Types of Strategies:
+pure strategy = just one choice
+mixed strategy = randomize decisions according to probabilities
+example = rock paper scissors
+pure strategy will eventually lose
+mixed strategy will work best
+
+Information Levels:
+- perfect information: know all the information for everyone's situation
+- imperfect information: some have more information than others - competitive advantage - not symmetric across competitors
+
+No Sum and Nero Sum:
+- whatever one side gets the other side loses and vice versa
+- bet $1 on game: get dollar vs. lose dollar
+- non-zero sum: total benefit might be higher or lower
+- example = economics
+
+Summary:
+competitive decision making = game theory
+how do we determine the best strategy? = optimization models
+we want to find the optimal strategy!
+
+
+
+# Week 7 Notes: Game Theory Models
+
+basic demo of game theory models
+how does it work
+what analysis is involved?
+
+Game Theory Example:
+- two gas stations
+- set price: $2 or $2.50
+- same price = 50 / 50 demand
+- otherwise = all demand will go to lower priced one
+- what's the best price to choose?
+- talk to each other and set at $2.50 - half demand at the higher profit margin
+- it matters the cost to determine what the price should be chosen!
+- stable equilibrium = no incentive to change
+- prisoner's dilemma = incentive to agree to higher price and then back out and charge the lower price
+- can choose any price points they want
+- both can keep lowering prices until price is about equal to the cost
+- "race to the bottom" = simple model will cause each to lower prices until they meet the margin
+- competition drives down price for consumers
+- game theory says there is incentive to charge slightly lower price than competition
diff --git a/Week 7 Notes/graph_theory.txt b/Week 7 Notes/graph_theory.txt
@@ -0,0 +1,34 @@
+# Week 7 Notes: Communities in Graphs
+
+- analysis of large interconnected networks
+- automated ways of finding highly interconnected subpopulations
+- social media 'influencers'
+- disease outbreak
+- model to automatically find 'communities'
+
+- community = a set of circles that's highly connected within itself
+- graph = collection of circles, lines of the community
+- circles = nodes / vertices
+- lines = arcs / edges
+- clique = a set of nodes that all have edges between each other
+
+- we don't need full clique (complete)
+- goal is to decompose the graph into a community
+- we do this using the Louvain Algorithm
+- the goal of the Louvain Algorithm is to maximize the modularity of a graph
+
+Louvain Algorithm:
+- aij = weight on the arc between nodes i and j
+- if there  no arc between i and j then aij = 0
+- wi = total weight of arcs connected to i
+- W = total weight of all the arcs in the graph
+- Modularity = (1 / 2W) * sumof(i,j in same community * (aij - wi*wj / 2W))
+- modularity = measure of how well the graph is separated into communities or modules that are connected internally  but not connected much between each other
+- Step 0: each node is its own community
+- Step 1: make biggest modularity increase by moving a node from its current community to an adjacent community
+- Step 2: repeat this process until there are no more increases in modularity
+- Step 3: each community is a super node and repeat step1 using super nodes
+- louvain is a heuristic:
+- not guaranteed to find the absolute best partition of a graph into a community
+- gives very good solutions very quickly
+- best for finding communities inside a large network