But the "--master local[1]" setting they're using for Spark will run it on a single thread.
And, in the article they state "The algorithm took around 500 seconds to train on the NETFLIX dataset on a SINGLE processor, which is good for data as large as 1 billion ratings."
Being the one who conducted these experiments, I confirm that the number of threads was varied along(the graph shows performance scaling). I am sorry for the confusion caused, this was a typo, should have been "--master local[N]".
edit: local[1] has been updated to local[N], thank you for the update!
Ok thanks, I didn't know that's what "local[1]" did, so the more relevant comparison would be with --master local[30]?
The algorithm took around 500 seconds to train on the NETFLIX dataset on a SINGLE processor, which is good for data as large as 1 billion ratings.
- this is from the sequential portion of the test, the parallel portion is the next section.
And, in the article they state "The algorithm took around 500 seconds to train on the NETFLIX dataset on a SINGLE processor, which is good for data as large as 1 billion ratings."
-emphasis mine