Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Warris, Sven
r-big-data
Commits
6706c19d
Commit
6706c19d
authored
Sep 29, 2017
by
Sven Warris
Browse files
Merge branch 'master' of git.wageningenur.nl:warri004/r-big-data
parents
9e42c7c0
90ab6764
Changes
4
Hide whitespace changes
Inline
Side-by-side
EvaluationForm.odt
View file @
6706c19d
No preview for this file type
SoftwareRequirements.txt
View file @
6706c19d
...
...
@@ -15,7 +15,7 @@ Part 4, Big data and Machine learning
doParallel, GGally, ggplot2, numbers, OpenCL, parallel, plot3D, randomForest, rgl, Rmpi, rpart, rpart.plot, snow
Part 3, Big data (Ron):
R packages: pryr, bigmemory, biganalytics, biglm, DBI
R packages:
curl,
pryr, bigmemory, biganalytics, biglm, DBI
slides/Ron/lecture2.pdf
View file @
6706c19d
No preview for this file type
slides/Ron/lecture2.tex
View file @
6706c19d
...
...
@@ -553,7 +553,7 @@ welchTest <- function(mns, vrs, ns) {
\begin{column}
<+->[t]
{
.45
\textwidth
}
The
\code
{
biglm
}
package:
\begin{itemize}
[<+->]
\item
create a linear model using only
$
p
^
2
$
memory for
$
p
$
\item
create a linear model using only
$
O
(
p
^
2
)
$
memory for
$
p
$
variables
\item
add more data using
\code
{
update
}
\item
in this way data sets larger than memory can be handled!
...
...
@@ -574,6 +574,15 @@ welchTest <- function(mns, vrs, ns) {
\begin{frame}
[fragile,containsverbatim]
\frametitle
{
Your turn! The
\code
{
flights14
}
data again.
}
\begin{Schunk}
\begin{Sinput}
> library(data.table)
> flights14 <- fread("https://github.com/arunsrinivasan/
satrdays-workshop/raw/master/flights
_
2014.csv")
\end{Sinput}
\end{Schunk}
\begin{itemize}
\item
compare the speed of selecting a subset for
\begin{itemize}
...
...
@@ -582,18 +591,33 @@ welchTest <- function(mns, vrs, ns) {
\item
\code
{
bigmemory
}
without file backing
\item
\code
{
bigmemory
}
with file backing
\end{itemize}
\end{itemize}
\end{frame}
\item
Compare
\code
{
biglm
}
and
\code
{
lm
}
regression models in terms
of time and memory use: fit,
\emph
{
e.g.
}
,
\code
{
arr
\_
delay
}
as a
function of
\code
{
distance
}
\begin{frame}
[fragile,containsverbatim]
\frametitle
{
Big K Means
}
\begin{itemize}
\item
Perform a k-means clustering on the four last columns of the
flight data. Compare the results of the regular
\code
{
kmeans
}
and
the
\code
{
bigkmeans
}
functions. Are the clusters related to,
flight data, using the
\code
{
bigkmeans
}
function. Choose any
$
k
$
that you like...
\item
Compare the results with the results of the regular
\code
{
kmeans
}
function (from the MASS package).
\item
Are the clusters related to,
\code
{
e.g.
}
, season? Is the clustering meaningful, anyway?
\end{itemize}
\end{frame}
\begin{frame}
[fragile,containsverbatim]
\frametitle
{
Big lm
}
\begin{itemize}
\item
Compare
\code
{
biglm
}
and
\code
{
lm
}
regression models in terms
of time and memory use: fit,
\emph
{
e.g.
}
,
\code
{
arr
\_
delay
}
as a
function of
\code
{
distance
}
.
\item
Add a second explanatory variable using
\code
{
update
}
. Again,
compare memory usage in
\code
{
lm
}
and
\code
{
biglm
}
.
\end{itemize}
\end{frame}
\begin{frame}
[fragile,containsverbatim]
\frametitle
{
Caveats and remarks
}
\begin{itemize}
[<+->]
...
...
@@ -653,7 +677,7 @@ welchTest <- function(mns, vrs, ns) {
\begin{frame}
[fragile,containsverbatim]
\frametitle
{
Sources
}
\begin{itemize}
\item
Hadley Wickham
, once again...
: ``Advanced R'', Chapman and
\item
Hadley Wickham: ``Advanced R'', Chapman and
Hall,
\url
{
adv-r.had.co.nz
}
\item
Manual ``Writing R extensions''
\\
\url
{
cran.r-project.org/doc/manuals/r-release/R-exts.html
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment