-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5d65176
commit b98fabe
Showing
4 changed files
with
203 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Week 7 Notes: Bayesian Modeling | ||
|
||
- bayesian models are sometimes counterintuitive | ||
|
||
Bayesian Probability: | ||
- based on basic rule of conditional probability | ||
- bayes' rule | ||
- P(A|B) = P(B|A) * P(A) / P(B) | ||
- example: medical test | ||
- true positives: 98% | ||
- false positives: 8% | ||
- 1% of the population and 8.9% of people test positive | ||
- if someone tests positive what is the probability someone actually has the disease? | ||
- write out the equation in bayes' rule | ||
- A = has the disease | ||
- B = tested positive | ||
- P(A|B) = P(B|A) * P(A) / P(B) = 98% * 1% / 8.9% = 11% | ||
- even after testing positive a person only has a 11% of having the disease | ||
- why? | ||
- so many more people don't have the disease = many more false positives than true positives | ||
|
||
|
||
Empirical Bayes Modeling: | ||
- overall distribution of something is known or estimated | ||
- only a little data is available for the problem | ||
- ex. predicting basketball outcomes NCAA | ||
- difference X in points scored by home team and road team | ||
- approximately normal: X ~ N(m + h, simga^2) | ||
- h = home court advantage | ||
- m = true difference in the teams' strength (unknown) | ||
- simga^2 = variance | ||
- bayes rule allows us to figure out the unknown m | ||
- first model the difference between teams' strengths m ~ N(0, tau^2) | ||
- then look at observed data: | ||
- x = observed point difference in game | ||
- m = real difference between two teams, with m != x | ||
- bayes' rule: look for the probability of having a true points difference given the observation x | ||
- P(M = m | X = x) = P(X= x | M = m)*P(M=m) / P(X = x) | ||
- probability of m given x!! | ||
- if team a beats team b by x points we could find the distribution of how much one team is better | ||
- we could also integrate that distribution (zero to infinity) to show the probability that a team is actually better! | ||
- P (home team better | X = x) = integral(P(M = m | X = x)* dm) | ||
|
||
what are we actually saying? | ||
- home team won by 20 points | ||
- estimated home court advantage h = 4 points | ||
- standard deviation in team strength difference tau = 6 points | ||
- standard error from random variance sigma = 11 points | ||
- of the 20 point victory: | ||
- about 4 points was home court advantage | ||
- about 12.5 points due to random variation | ||
- only 3.5 points due to the difference between teams | ||
- this seems counterintuitive: 20 points win just 3 point difference? | ||
- there is a lot more variance due to randomness | ||
- 20 point win is more likely to happen due to randomness | ||
- bayes rule shrinks the estimate to a more normal distribution | ||
|
||
Summary: | ||
- take a single observation | ||
- combine with broader set of observations | ||
- then make a deduction or prediction | ||
- bayesian models work especially in the absence of lots of data | ||
|
||
|
||
P(A) = prior distribution | ||
P(A|B) = posterior distribution |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Week 7 Notes: Neural Networks and Deep Learning | ||
|
||
- used to react to patterns that we don't even understand | ||
- CAPTCHA type of questions | ||
- idea of deep learning is to train a system to react to without knowing what it is reacting to | ||
- powerful in image recognition and speech recognition, NLP | ||
|
||
|
||
Neural Networks: | ||
- neural networks are modeled after the way neurons work in our brains | ||
- Artificial Neural Network | ||
- three levels of neurons: | ||
- input level, hidden level, output level | ||
- input > hidden > output | ||
- each input accepts a single piece of information | ||
- each neuron: gets inputs from previous layer > calculates function of weighted inputs > gives it output to next layer | ||
- there might be several layers of hidden layer neurons | ||
- finally we reach the output is the combination of all weighted hidden layer results | ||
- the output layer chooses the 'best' answer based on the results from all the hidden layers | ||
- then the results are fed back through the entire system and re-weighted based on the incorrectness of the first output | ||
- simple is gradient descent to do this | ||
- if the network learns well with enough data all of the weights will be adjusted so that the network generates correct outputs from the input | ||
- require a lot of data to train | ||
- hard to choose and tune the learning algorithm: re-weight too fast or too slow can be problematic | ||
|
||
|
||
Deep Learning: | ||
- idea of neural networks adapted for more layers | ||
- similar approach to neural networks = input > "deep" layers > output > restart with re-weights | ||
- powerful in NLP, speech, image recognition |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
strategy# Week 7 Notes: Competitive Models | ||
|
||
- competitive decision making | ||
- previous models = 'us against the data' | ||
- descriptive models = get understanding of reality | ||
- predictive models = find hidden relationships and predict the future | ||
- prescriptive models = find the best thing to do assuming the system does not react | ||
- what if the system reacts intelligently? | ||
- we need to use analytics to consider all sides of the system | ||
|
||
Examples: | ||
- pricing examples | ||
- using past purchase data and competitor data to price products | ||
- one price is set competitors may change their price more - giving different results than the model | ||
- government = corporate tax policies | ||
- companies need to decide how to store and spend their money based on tax revenue of the government | ||
- employee incentives to change behavior | ||
|
||
- need to consider not just your own situation but the competitive situation | ||
- these situations need competitive decision making = game theory | ||
- cooperative game theory = competitive and cooperative game theory | ||
|
||
Timing: | ||
- make decisions simultaneously | ||
- can't change once made | ||
- strategy = counter strategy > counter-counter strategy = best strategy after many iterations | ||
- sequential game = decision made in series | ||
|
||
Types of Strategies: | ||
pure strategy = just one choice | ||
mixed strategy = randomize decisions according to probabilities | ||
example = rock paper scissors | ||
pure strategy will eventually lose | ||
mixed strategy will work best | ||
|
||
Information Levels: | ||
- perfect information: know all the information for everyone's situation | ||
- imperfect information: some have more information than others - competitive advantage - not symmetric across competitors | ||
|
||
No Sum and Nero Sum: | ||
- whatever one side gets the other side loses and vice versa | ||
- bet $1 on game: get dollar vs. lose dollar | ||
- non-zero sum: total benefit might be higher or lower | ||
- example = economics | ||
|
||
Summary: | ||
competitive decision making = game theory | ||
how do we determine the best strategy? = optimization models | ||
we want to find the optimal strategy! | ||
|
||
|
||
|
||
# Week 7 Notes: Game Theory Models | ||
|
||
basic demo of game theory models | ||
how does it work | ||
what analysis is involved? | ||
|
||
Game Theory Example: | ||
- two gas stations | ||
- set price: $2 or $2.50 | ||
- same price = 50 / 50 demand | ||
- otherwise = all demand will go to lower priced one | ||
- what's the best price to choose? | ||
- talk to each other and set at $2.50 - half demand at the higher profit margin | ||
- it matters the cost to determine what the price should be chosen! | ||
- stable equilibrium = no incentive to change | ||
- prisoner's dilemma = incentive to agree to higher price and then back out and charge the lower price | ||
- can choose any price points they want | ||
- both can keep lowering prices until price is about equal to the cost | ||
- "race to the bottom" = simple model will cause each to lower prices until they meet the margin | ||
- competition drives down price for consumers | ||
- game theory says there is incentive to charge slightly lower price than competition |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Week 7 Notes: Communities in Graphs | ||
|
||
- analysis of large interconnected networks | ||
- automated ways of finding highly interconnected subpopulations | ||
- social media 'influencers' | ||
- disease outbreak | ||
- model to automatically find 'communities' | ||
|
||
- community = a set of circles that's highly connected within itself | ||
- graph = collection of circles, lines of the community | ||
- circles = nodes / vertices | ||
- lines = arcs / edges | ||
- clique = a set of nodes that all have edges between each other | ||
|
||
- we don't need full clique (complete) | ||
- goal is to decompose the graph into a community | ||
- we do this using the Louvain Algorithm | ||
- the goal of the Louvain Algorithm is to maximize the modularity of a graph | ||
|
||
Louvain Algorithm: | ||
- aij = weight on the arc between nodes i and j | ||
- if there no arc between i and j then aij = 0 | ||
- wi = total weight of arcs connected to i | ||
- W = total weight of all the arcs in the graph | ||
- Modularity = (1 / 2W) * sumof(i,j in same community * (aij - wi*wj / 2W)) | ||
- modularity = measure of how well the graph is separated into communities or modules that are connected internally but not connected much between each other | ||
- Step 0: each node is its own community | ||
- Step 1: make biggest modularity increase by moving a node from its current community to an adjacent community | ||
- Step 2: repeat this process until there are no more increases in modularity | ||
- Step 3: each community is a super node and repeat step1 using super nodes | ||
- louvain is a heuristic: | ||
- not guaranteed to find the absolute best partition of a graph into a community | ||
- gives very good solutions very quickly | ||
- best for finding communities inside a large network |