Commit 6706c19d authored by Sven Warris's avatar Sven Warris
Browse files

Merge branch 'master' of git.wageningenur.nl:warri004/r-big-data

parents 9e42c7c0 90ab6764
No preview for this file type
......@@ -15,7 +15,7 @@ Part 4, Big data and Machine learning
doParallel, GGally, ggplot2, numbers, OpenCL, parallel, plot3D, randomForest, rgl, Rmpi, rpart, rpart.plot, snow
Part 3, Big data (Ron):
R packages: pryr, bigmemory, biganalytics, biglm, DBI
R packages: curl, pryr, bigmemory, biganalytics, biglm, DBI
No preview for this file type
......@@ -553,7 +553,7 @@ welchTest <- function(mns, vrs, ns) {
\begin{column}<+->[t]{.45\textwidth}
The \code{biglm} package:
\begin{itemize}[<+->]
\item create a linear model using only $p^2$ memory for $p$
\item create a linear model using only $O(p^2)$ memory for $p$
variables
\item add more data using \code{update}
\item in this way data sets larger than memory can be handled!
......@@ -574,6 +574,15 @@ welchTest <- function(mns, vrs, ns) {
\begin{frame}[fragile,containsverbatim]
\frametitle{Your turn! The \code{flights14} data again.}
\begin{Schunk}
\begin{Sinput}
> library(data.table)
> flights14 <- fread("https://github.com/arunsrinivasan/
satrdays-workshop/raw/master/flights_2014.csv")
\end{Sinput}
\end{Schunk}
\begin{itemize}
\item compare the speed of selecting a subset for
\begin{itemize}
......@@ -582,18 +591,33 @@ welchTest <- function(mns, vrs, ns) {
\item \code{bigmemory} without file backing
\item \code{bigmemory} with file backing
\end{itemize}
\end{itemize}
\end{frame}
\item Compare \code{biglm} and \code{lm} regression models in terms
of time and memory use: fit, \emph{e.g.}, \code{arr\_delay} as a
function of \code{distance}
\begin{frame}[fragile,containsverbatim]
\frametitle{Big K Means}
\begin{itemize}
\item Perform a k-means clustering on the four last columns of the
flight data. Compare the results of the regular \code{kmeans} and
the \code{bigkmeans} functions. Are the clusters related to,
flight data, using the \code{bigkmeans} function. Choose any $k$
that you like...
\item Compare the results with the results of the regular
\code{kmeans} function (from the MASS package).
\item Are the clusters related to,
\code{e.g.}, season? Is the clustering meaningful, anyway?
\end{itemize}
\end{frame}
\begin{frame}[fragile,containsverbatim]
\frametitle{Big lm}
\begin{itemize}
\item Compare \code{biglm} and \code{lm} regression models in terms
of time and memory use: fit, \emph{e.g.}, \code{arr\_delay} as a
function of \code{distance}.
\item Add a second explanatory variable using \code{update}. Again,
compare memory usage in \code{lm} and \code{biglm}.
\end{itemize}
\end{frame}
\begin{frame}[fragile,containsverbatim]
\frametitle{Caveats and remarks}
\begin{itemize}[<+->]
......@@ -653,7 +677,7 @@ welchTest <- function(mns, vrs, ns) {
\begin{frame}[fragile,containsverbatim]
\frametitle{Sources}
\begin{itemize}
\item Hadley Wickham, once again...: ``Advanced R'', Chapman and
\item Hadley Wickham: ``Advanced R'', Chapman and
Hall, \url{adv-r.had.co.nz}
\item Manual ``Writing R extensions''\\
\url{cran.r-project.org/doc/manuals/r-release/R-exts.html}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment