Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
text.xml
macbeth.xml
2 changes: 2 additions & 0 deletions .rvmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
rvm use ruby-1.9.3-p327
rvm gemset use rails3213
11 changes: 11 additions & 0 deletions bin/shakespeare_analyzer
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env ruby
load "lib/shakespeare_analyzer.rb"

if ARGV[0].nil? then
puts "usage: shakespeare_analyzer <input-file>"
exit 1
end

analyzer = ShakespeareAnalyzer.new(ARGV[0])
analyzer.analyze
analyzer.list_by_speaker_count
Empty file added empty.xml
Empty file.
73 changes: 73 additions & 0 deletions lib/shakespeare_analyzer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
require 'pp'
require 'nokogiri'
require 'net/http'
require 'uri'

class ShakespeareAnalyzer
def initialize(file)
@file = file
@persona = {}
end

def analyze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty large method. I'd try to extract several methods out of this to make things more readable.

#TODO: Probably better name of 'parse'
@file = checked_file_for_open
doc = Nokogiri::XML(open(@file)) { |config| config.noerror }
get_persona(doc)
analyze_speeches(doc)
end

def get_persona(doc)
doc.css('PERSONA').each do |p|
pname = p.children.text.tr('"','')
@persona[pname] = 0
end
end

def analyze_speeches(doc)
doc.css('SPEECH').each do |speech|
speech.css('SPEAKER').each do |speaker|
get_speaker_lines(speaker, speech)
end
end
end

def get_speaker_lines(speaker, speech)
## Sometimes the speaker is 'ALL', but who that is depends on who's on stage...
## Ignoring this for now
## Turns out there are speakers without @persona!
speaker_name = speaker.children.text.tr('"','')
@persona[speaker_name] = 0 if @persona[speaker_name].nil?
speech.css('LINE').each do |line|
@persona[speaker_name] += 1
end
end

def list_by_speaker_count
sorted_output = (@persona.sort_by {|k,v| v}).reverse
sorted_output.each do |a|
name = (a[0].split(' ').map {|n| n.capitalize }).join(" ")
puts "#{a[1]} #{name}"
end
end

def checked_file_for_open
return @file if File.file?(@file)
return get_http_file
end

def get_http_file
uri = URI(@file)
raise "Unreadable file" if uri.scheme != "http"
@file = 'play.xml'
Net::HTTP.start(uri.host) do |http|
resp = http.get(uri.path)
raise "Unreadable file" unless (resp.code.to_i >= 200 && resp.code.to_i <= 299)
open(@file, 'wb') do |file|
file.write(resp.body)
end
end
@file
end

end
55 changes: 55 additions & 0 deletions macbeth.output
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
719 Macbeth
265 Lady Macbeth
212 Malcolm
180 Macduff
135 Ross
113 Banquo
74 Lennox
70 Duncan
62 First Witch
46 Porter
45 Doctor
41 Lady Macduff
39 Hecate
35 Sergeant
30 Siward
30 First Murderer
27 Third Witch
27 Second Witch
23 Gentlewoman
23 Messenger
21 Lord
21 Angus
20 Son
15 Second Murderer
12 Menteith
11 Caithness
11 Old Man
10 Donalbain
8 Third Murderer
7 Young Siward
5 Third Apparition
5 Seyton
5 Servant
4 Second Apparition
3 Lords
2 Fleance
2 Both Murderers
2 First Apparition
1 Attendant
1 Soldiers
0 Lords, Gentlemen, Officers, Soldiers, Murderers, Attendants, And Messengers.
0 Young Siward, His Son.
0 Siward, Earl Of Northumberland, General Of The English Forces.
0 Fleance, Son To Banquo.
0 Boy, Son To Macduff.
0 An English Doctor.
0 A Scotch Doctor.
0 A Soldier.
0 A Porter.
0 An Old Man.
0 Gentlewoman Attending On Lady Macbeth.
0 Three Witches.
0 Apparitions.
0 Seyton, An Officer Attending On Macbeth.
0 Duncan, King Of Scotland.
Loading