.. _assignments:
Homework
========
.. sidebar:: Page Contents
.. contents::
:local:
Assignments
----------------------------------------------------------------------
If not otherwise stated homework in all sections and classes is the
same. All lectures are assigned Friday's and homework is due next week
Friday, other than the first week of the semester where the lectures
are assigned on Monday (22nd of August) and the first homework is due
Friday. Therefore we have not posted explicit due dates, as they are
obvious from the calendar. You are welcome to work ahead, but check
back in case the homework has been updated. Additional due dates will
be posted however in CANVAS. Please visit canvas for these due dates.
As you will be doing some discussions, please PREFACE YOUR POSTS with
your Full Name.
External hyperlinks, like `Python `_
=======
Homework Submission is done as follows:
#. All assignments will be posted through Canvas
#. You will be provided with a GitLab folder once you register at https://about.gitlab.com/
#. You will complete your assignments and check in your solutions to
your gitlab.com repository (see :doc:`gitlab`)
#. You will submit to canvas a link to your solution in gitlab
Study groups
~~~~~~~~~~~~
It is very common and encouraged to build study groups to discuss with
each other the content of the class. However such groups should not be
used to copy homework assignments that are intended for individual
submissions.
When working in a team, we recommend that you use English as the
communication language. This will help those that are not native
English speakers.
Week 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Communication
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. _Piazza: https://piazza.com/class/irqfvh1ctrg2vt
* Enroll in the class at Piazza_
* Register in https://about.gitlab.com/
* Register in https://www.chameleoncloud.org
Resources res1
^^^^^^^^^^^^^^
* If you do not have a computer on which you can do your
assignments please apply for an account with Chameleon Cloud.
You will have to ask for you to be added to project
CH-818144: https://www.chameleoncloud.org/user/projects/33130/
Note: You will only be allowed to use VMs for a duration of 6
hours.
* Register in https://portal.futuresystems.org/
.. _SURVEY1:
.. _d1: https://piazza.com/class/irqfvh1ctrg2vt?cid=10
.. _d2: https://piazza.com/class/irqfvh1ctrg2vt?cid=11
.. _d3: https://piazza.com/class/irqfvh1ctrg2vt?cid=12
.. _d4: https://piazza.com/class/irqfvh1ctrg2vt?cid=16
.. _d5: https://piazza.com/class/irqfvh1ctrg2vt?cid=17
.. _d6: https://piazza.com/class/irqfvh1ctrg2vt?cid=18
.. _d7: https://piazza.com/class/irqfvh1ctrg2vt?cid=19
.. _d8: https://piazza.com/class/irqfvh1ctrg2vt?cid=20
.. _d9: https://piazza.com/class/irqfvh1ctrg2vt?cid=21
.. _d10: https://piazza.com/class/irqfvh1ctrg2vt?cid=22
.. _d11: https://piazza.com/class/irqfvh1ctrg2vt?cid=23
.. _d12: https://piazza.com/class/irqfvh1ctrg2vt?cid=24
.. _d13: https://piazza.com/class/irqfvh1ctrg2vt?cid=25
.. _d14: https://piazza.com/class/irqfvh1ctrg2vt?cid=26
Survey 1
^^^^^^^^^
Please fill out the `Survey `_
to let us help you better with the course
Video V1
^^^^^^^^^
Watch Videos in Section 1: Units 1 and 2 at the Course Page :doc:`course`
Video V2
^^^^^^^^^
Watch Videos in Section 2: Units 3, 4, and 5. Note these units
have overlap with Unit 2 of Section 1. (see :doc:`course`)
Discussion d1
^^^^^^^^^^^^^^
Consider Discussion d1_ after Section 1. Please create a new post on the topic "Why
is Big Data interesting to me" and also comment on at least 2
other posts.
.. _P1:
Paper p1
^^^^^^^^^
This assignment may be conducted as a group with at most two
students. It will be up to you to find another student, or you can
just do the paper yourself. There is no need to keep this team during the
semester or the project assignment you can build new
teams throughout the semester for different homework. Make sure
your team contributes equally.
This assignment requires to write a paper that is 2 pages in
length. Please use the 2 column ACM proceedings Format.
- Conduct the Discussion homework first.
- Review what plagiarism is and how to not do it
- Install jabref and organize your citations with jabref
Write a paper discussing all of the following topics:
- What is Big Data?
- Why is Big Data interesting to me? (Summarize and/or contrast
positions in the discussion list. This is not just your
position. See our note bellow.)
- What limitations does Big Data Analytics have?
- If you work in a team please also discuss different positions
if there are any. Make sure the work is shared and no academic honesty policy
has been violated.
Please note that a discussion took place on the discussion list
that you need to analyze. It is important that you summarize the
position and identify a mechanism to evaluate the students
responses. One option is that your discussion could be augmented
by classifications and statistics. It is allowable to include
them as figures in the paper. Others may just highlight selected
points raised by the course members.
You will be submitting the paper in gitlab.com as discussed in:
http://bdaafall2016.readthedocs.io/en/latest/gitlab.html
You will be uploading the following files into the paper1
directory::
paper1.tex
sample.bib
paper1.pdf
After you upload the files, please go to Canvas and fill out the
form for the paper1 submission. You will have to upload the
appropriate links.
----------------------------------------------------------------------
Week 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video V3
^^^^^^^^^
Please watch Section 3 Unit 6. Total Length 2.5 hours, (see :doc:`course`)
Discussion d3
^^^^^^^^^^^^^^
Consider Discussion d3_ after Section 3. Please post
about the topic "Where are the Big Data Jobs now and in
future? Discuss anything you can share -- areas that are
hot, good online sites etc." and also comment on at least
2 other posts.
.. _P2:
Paper p2
^^^^^^^^^
This requires to write a paper that is two pages in
length. Please use the 2 column ACM proceedings Format.
Write a paper discussing the following topics:
* What is the role of Big Data in health?
* Discuss any or all areas from telemedicine, personalized
(precision) medicine, personal monitors like Fitbit,
privacy issues.
You will be submitting the paper in gitlab.com as discussed in:
http://bdaafall2016.readthedocs.io/en/latest/gitlab.html
You will be uploading the following files into the paper2
directory::
paper2.tex
sample.bib
paper2.pdf
After you upload the files, please go to Canvas and fill out the
form for the paper2 submission. You will have to upload the
appropriate links.
A video of how to use the Webbrowser to upload the paper is
available at:
* https://youtu.be/b3OvgQhTFow
Video in cc: TBD
.. _R1:
References R1
^^^^^^^^^^^^^
It is important that you know how to cite. Please see the page
:doc:`n-resources` for guidelines
Bonus points: Use d2_ to discuss the topic of
crowd sourcing in relationship to big data. Conduct research
if needed.
----------------------------------------------------------------------
Week 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video V4
^^^^^^^^^
Please watch Section 4 Unit 7-9. Total Length 3.5 hours (see :doc:`course`).
Discussion d4
^^^^^^^^^^^^^^
Consider Discussion d4_ after Section 4 Please post on
topic "Sports and Health Informatics":
* Which are most interesting job areas;
* Which are likely to make most progress
* Which one would you work in given similar offers in both
fields
* Comment on at least 2 other posts.
.. _P3:
Paper p3
^^^^^^^^^
This requires to write a paper that is from one to two pages in
length. Please use the 2 column ACM proceedings Format.
This assignment may be conducted as a group with at most two
students. It will be up to you to find another student, or
you can just do the paper yourself. There is no need to keep
this team during the semester or the project assignment you
can build new teams throughout the semester for different
homework. Make sure your team contributes equally.
Chose one of the alternatives:
Alternative A:
Using what we call Big Data (such as video) and Little Data
(such as Baseball numerical statistics) in Sports
Analytics. Write a paper discussing the following topics:
* Which offer most opportunity on what sports?
* How is Big Data and Little Data applied to the Olympics2016?
Alternative B (This assignment gives bonus points if done right):
How can big data and lIttle data be used in wildlife
conservation, pets, farming, and other related areas that
involve animal. Write a 2 page paper that covers the topic
and addresses
* Which opportunities are there related to animals?
* Which opportunities are there for wildlife preservation?
* What limitations are there?
* How can big data be best used? give concrete examples.
* This paper could be longer than two pages if you like
* You are allowed to work in a team of six. The number of
pages is determined by team members while the minimum page
number is 2. The team must identify who did what.
* However the paper must be coherent and consistent.
* Additional pages are allowed.
* When building teams the entire team must approve the team
members.
* If a team does not want to have you join, you need to
accept this. Look for another team or work alone.
* Use gitlab to share your LaTeX document or use microsoft
one drive to write it collaboratively.
----------------------------------------------------------------------
Week 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video V5
^^^^^^^^^
see next section
.. comment::
Please watch Section 5 Units 10, 11. Total Length 2.5 hours.
Development Virtual Machine
^^^^^^^^^^^^^^^^^^^^^^^^^^^
To easily develop code and not to effect your local machine, we will
be using ubuntu desktop in a virtual machine running on your
computer. Please make sure your hardware supports this. For example, a
chrome book is insufficient.
The detailed description including 3 videos are posted at:
* http://bdaafall2016.readthedocs.io/en/latest/ubuntu.html
Please conduct form that page Homework 1, 2 & 3
Next you will be using python in that virtual machine.
.. note:: You can use your native OS to do the programming
assignment. However if you like to use any cloud environment
you must also do the Development virtual machine as we want
you to get a feeing for how to use ubuntu before you go on
the cloud.
.. _PRG1:
Programming prg1: Python
^^^^^^^^^^^^^^^^^^^^^^^^
Hardware:
Identify a suitable hardware environment that works
for you to conduct the assignments. First you must have access
to a sufficiently powerful computer. This could be your Laptop
or Desktop, or you could get access to machines at IU's
computer labs or virtual machines.
Setup Python:
Next you will need to setup Python on the machine or
verify if python works. We recommend that you use python 2.7
and *NOT* python 3. We recommend that you follow the
instructions from python.org and use virtualenv. As editor
we recommend you use PyCharm or Emacs.
Canopy and Anaconda:
We made bad experiences with Canopy as well
as Anaconda on some machine of a Teaching Assitant. Therefore
we recommend agains using thise systems. It will be up to you
to determine if these systems work for you. We do recommend
that you use python.org and virtualenv. If you have already
started using canopy or anaconda you can do so (but we do not
recommend it).
Useful software:
- `Python `_
- `NumPy `_
- `SciPy `_
- `Matplotlib `_
- `Pandas `_
Tasks:
* Learn Python, E.g. go through the :doc:`python_big_data`
(and :doc:`python_intro` if you need to) lesson.
* Use *virtualenv* and *pip* to customize your environment.
* Learn `Python pandas ` and do a
simple Python application demonstrating:
* a linechart
* a barchart, e.g. a histogram
Find some real meaningful data such as number of people born
in a year or some other more interesting data set to
demonstrate the various features.
* Review of Scipy: look at the scipy manual and be aware what
you can do with it in case you chose a Project
Deliverables prg1:
The goal of this assignment is to choose one or two datasets
(see :doc:`datasets`), preprocess it to clean it up, and
generate a line graph and histogram plot. Your figures must
provide labels for the axes along with units.
Submit your programs in a folder called ``prg1``, which
must contain the following:
* ``requirements.txt``: list of python libraries your programs
need as installable by: ``pip install -r requirements.txt``
* ``fetchdata.py``: a python program that, when run as ``python
fetchdata.py`` will produce dataset files in CSV format
called ``data-line.csv`` and ``data-hist.csv``.
* ``linechart.py``: a python program that, when run as
``python linechart.py data-line.csv`` will generate a
line chart as save it in PNG format to a file called ``linechart.png``.
* ``histogram.py``: a python program that, when run as
``python historgram.py data-hist.csv`` will generate
a histogram plot as save it in PNG format to a file called
``histogram.png``
* ``README.rst``: a RST format file which documents the
datasets
you used, where you fetched them from, how ``fetchdata.py``
cleans them to generate the ``data-{line,hist}.csv`` files.
.. warning::
Missing items will result in zero points being given
Term Paper and Term Project Report Assignment T1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please prepare for the selection process for a project or a term paper:
* Review the guidelines for the project and term paper.
* Identify if you are likely to do a project or a term paper
* Build teams, chose your team members wisely. For example if you
have 3 people in the team and only two do the work, you still get
graded based on a 3 person team.
* Decide for a topic that you want to do and the team. Commit to
it by end of Week 5.
* For that week the homework also includes to make a plan for
your term paper and write a one page summary which we will
approve and give comments on. Note teaming can change in actual
final project. If you are in a team, each student must submit
an (identical) plan with a notation as to teaming. Note teaming
can change in actual final project.
* You will completing this Form
`Form `_,
throughout the semester in which you will be uploading the title,
the team members, and the location of your proposal in gitlab
with direct URL, description of the artifacts and the final
project report.
Discussion d5
^^^^^^^^^^^^^^
Create a NEW post to discuss your final project you want to do
and look for team members (if you want to build a team).
----------------------------------------------------------------------
Week 5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video S6
^^^^^^^^^
Watch the video in Section 6 (see :doc:`course`).
Futuresystems
^^^^^^^^^^^^^^
* Obtain an account on Futuresystems.org and join project
FG511. Not that this will take time and you need to do this
ASAP. No late assignments will be accepted. If you are late
this assignment will receive 0 points.
Which account name should i use?:
The same name as you use at IU to register. If you have
had a previous class and used a different name, please
let us know, so we can make a note of it. Please do not
apply for two accounts. If you account name is already
taken, please use a different one.
ChameleonCloud
^^^^^^^^^^^^^^^
* Obtain an account on https://www.chameleoncloud.org. Fill
out the Poll TBD (This assignment is optional, but we have
made good experience with Chameleon cloud, so we advise you
to get an account. As you are a student you will not be able
to create a project. We will announce the project in due
time that you can join and use chameleon cloud).
OpenStack
^^^^^^^^^^
* Inform yourself about OpenStack and how to start and stop
virtual machines via the command line.
* Optionally, you can use cloudmesh_client for this (If you
use cloudmesh client you will get bonus points).
prg2 (canceled)
^^^^^
Consider the Python code available on Section 6 Unit 13
“Files” tab (the third one) as HiggsClassIIUniform.py.
This software is also available When run it should produce
results like the file TypicalResultsHW5.docx on the same tab.
This code corresponds to 42000 background events and 300
Higgs. Background is uniformly distributed and Higgs is a
Normal (Gaussian) distribution centered at 126 with width of
2. Produce 2 more figures (plots) corresponding to
experiments with a factor of 10 more or a factor of 10 less
data. (Both Higgs and Background increase or decrease by same
factor). Return the two new figures and your code as
Homework in github under the folder *prg2".
What do you conclude from figures about ability to see Higgs
particle with different amount of data (corresponding to
different lengths of time experiment runs) Due date October
25 Video V6: Video Review/Study Section 7 Units 12-15; total
3 hours 7 minutes. This is Physics Informatics Section.
https://github.com/cglmoocs/bdaafall2015/tree/master/PythonFiles/Section-4_Physics-Units-9-10-11/Unit-9_The-Elusive-Mr.-Higgs
Discussion d6
^^^^^^^^^^^^^^
Post on Discussion d6_ after Section 7, the “Physics” topic:
* What you found interesting, remarkable or shocking about
the search for Higgs Bosons.
* Was it worth all that money?
* Please also comment on at least 2 other posts.
----------------------------------------------------------------------
Week 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video S7
^^^^^^^^^
Watch the videos in section 7 (see :doc:`course`).
Discussion d7
^^^^^^^^^^^^^^
Post on Discussion d7_ on the topic:
* Which is the most interesting/important of the 51
use cases in section 7.
* Why?
* What is most interesting/important use case not
in group of 51?
* Please write one post and comment on at least 2 other
posts in the discussions.
----------------------------------------------------------------------
Week 7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This weeks lecture will be determined at a later time.
----------------------------------------------------------------------
Week 8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video S9
^^^^^^^^^
Watch the videos related to Section 9 (see :doc:`course`).
Discussion d9
^^^^^^^^^^^^^^
Post on Discussion d9_:
* What are benefits for e-Commerce?
* What are limitations for e-Commerce?
* Waht are risks and benefits for Banking industry using
big data?
----------------------------------------------------------------------
Week 9
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Video S10
^^^^^^^^^
Watch the videos related to Section 10 (see :doc:`course`).
Discussion d10
^^^^^^^^^^^^^^^^^
Use Discussion d10_ in case you have questions about PRG-GEO
Programming prg-geo
^^^^^^^^^^^^^^^^^
PRG-GEO can be found here: :ref:`geolocation`
.. comment::
Develop a python program conducting k-means. Use a
meaningful dataset of your choice but not just
random. Produce a histogram that shows the distance of all
points to its nearest cluster center.
For visualization you can chose a python library, or you
can use D3.js and a histogram library based on it, if you
are familiar with it.
Submit your solution to gitlab in the directory *prg3*
Discuss in your solution the details of the dataset.
----------------------------------------------------------------------
Week 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Discussion d11
^^^^^^^^^^^^^^^
Discuss what you learnt from video you watched in
S11: Parallel Computing and Clouds under Discussion d11_
.. _p11:
Paper p11
^^^^^^^^^^
Consider any 5 cloud or cloud like activities from list of 11 below.
Describe the ones you chose and explain what ways they could be used
to generate an X-Informatics for some X. Write a 2 page paper wit
the Paper format from Section :ref:`paper_format`:
* http://aws.amazon.com/ (Links to an external site.)
* http://www.windowsazure.com/en-us/ (Links to an external site.)
* https://cloud.google.com/compute/ (Links to an external site.)
* https://portal.futuresystems.org/ (Links to an external site.)
* http://joyent.com/ (Links to an external site.)
* https://pod.penguincomputing.com/ (Links to an external site.)
* http://www.rackspace.com/cloud/ (Links to an external site.)
* http://www.salesforce.com/cloudcomputing/ (Links to an external site.)
* http://earthengine.google.org/ (Links to an external site.)
* http://www.openstack.org/ (Links to an external site.)
* https://www.docker.com/ (Links to an external site.)
----------------------------------------------------------------------
Week 11 - Week 13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Project or Term Report
^^^^^^^^^^^^^^^^^^^^^^^
Work on your project
Discussion 11, 12, 13, 14
^^^^^^^^^^^^^^^^^^^^^^^^^^
Discuss what you learnt from videos you watched in last 2
weeks of class Sections 12-15; chose one of the topics: Web
Search and Text mining, Big Data Technology, Sensors, Radar
Each `Discussion `_ about the topic is to be conducted in the
week it is introduced. Due dates Friday's.
Week 13 - Dec. 2nd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Continue to work on your Term Paper or Project
Due date for the project is Dec 2nd. It will a considerable
amount of time to grade your project and term papers. Thus
the deadline is mandatory. Late projects and term papers
will receive a 10% grade reduction. Furthermore dependent on
when the project is handed in it may not be graded over the
Christmas break.
Assignment Guidelines
----------------------------------------------------------------------
Getting Access and Systems Support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For some projects you will need access to a cloud. We recommend you
evaluate which cloud would be most appropriate for your project. This
includes:
* chameleoncloud.org
* furturesystems.org
* AWS (you will be responsible for charges)
* Azure (you will be responsible for charges)
* virtualbox if you have a powerful computer and like to prototype
* other clouds
We intend to make some small number of virtual machines available for
us in a project FG511 on FutureSystems:
* https://portal.futuresystems.org/projects/511
.. note:: FutureSystems OpenStack cloud is currently updated and will
not be available till Sept.
Documentation about FutureSystems can be found at
:ref:`OpenStackFutureSystems `
Once you created an account on FutureSystems and you do a project yOu
can add yourself to the project so you gain access. Systems staff is
available only during regular business hours Mo-Fri 10am - 4pm.
You could also use the cloudmesh client software on Linux and OSX to
access multiple clouds in easy fashion. A Section will introduce this
software.
----------------------------------------------------------------------
.. _s_paper_format:
Report and Paper Format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All reports and paper assignments will be using the ACM proceedings
format. The MSWord template can be found here:
* :download:`paper-report.docx `
* The URL is
https://gitlab.com/cloudmesh/fall2016/blob/master/docs/source/files/paper-report.docx
A LaTeX version can be found at
* https://www.acm.org/publications/proceedings-template
however you have to remove the ACM copyright notice in the LaTeX version.
There will be **NO EXCEPTION** to this format. In case you are in a
team, you can use either gitlab while collaboratively developing the
LaTeX document or use MicrosoftOne Drive which allows collaborative
editing features. All bibliographical entries must be put into a
bibliography manager such as jabref, endnote, or Mendeley. This will
guarantee that you follow proper citation styles. You can use either
ACM or IEEE reference styles. Your final submission will include the
bibliography file as a separate document.
Documents that do not follow the ACM format and are not accompanied by
references managed with jabref or endnote or are not spell checked
will be returned without review.
Please do not use figures ore tables toe artificially inflate the
length of the report. Make figures readable and provide the original
images. Use PDF for figures and not png, gif, org jpeg. This way the
figures you produce are scalable and zooming into the paper will be
possible.
Report Checklist:
* [ ] Have you written the report in word or LaTeX in the specified
format.
* [ ] In case of LaTeX, have you removed the ACM copyright information
* [ ] Have you included the report in gitlab.
* [ ] Have you specified the names and e-mails of all team members in
your report. E.g. the username in Canvas.
* [ ] Have you included all images in native and PDF format in gitlab
in the images folder.
* [ ] Have you added the bibliography file (such as endnote or bibtex
file e.g. jabref) in a directory bib.
* [ ] Have you submitted an additional page that describes who did
what in the project or report.
* [ ] Have you spellchecked the paper.
* [ ] Have you made sure you do not plagiarize.
----------------------------------------------------------------------
Software Project
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Develop a software system with OpenStack available on FutureSystems or
Chameleoncloud to support it. Only choose the software option if you
are prepared to take on programming tasks.
In case of a software project, we encourage a group project with up to
three members. You can use the discussion list for the
`Software Project `_
to form project teams or just communicate privately with other class
members to formulate a team. The following artifacts are part of the
deliverables for a project
Code:
You must deliver the code in gitlab. The code must be compilable
and a TA may try to replicate to run your code. You MUST avoid
lengthy install descriptions and everything must be installable
from the commandline.
Project Report:
A report must be produced while using the format discussed in the
Report Format section. The following length is required:
* 4 pages, one student in the project
* 6 pages, two students in the project
* 8 pages, three students in the project
Reports can be longer up to 10 pages if needed. Your high quality
scientific report should describe a) What you did b) results
obtained and c) Software documentation including how to install,
and run it. If c) is longer than half a page and can not be
reproduced with shell scripts or easy to follow steps you will get
points deducted.
Work Breakdown:
This document is only needed for team projects. A one page PDF
document describing who did what. It includes pointers to
the git history that documents the statistics that demonstrate not
only one student has worked on the project.
License:
All projects are developed under an open source license such as
Apache 2.0 License, or similar. You will be required to add a
LICENCE.txt file and if you use other software identify how it can be
reused in your project. If your project uses different licenses,
please add in a README.rst file which packages are used and which
license these packages have.
Code Repository:
Code repositories are for code, if you have additional libraries
that are needed you need to develop a script or use a DevOps
framework to install such software. Thus zip files and .class, .o
files are not permissible in the project. Each project must be
reproducible with a simple script. An example is::
git clone ....
make install
make run
make view
Which would use a simple make file to install, run, and view the
results. Naturally you can use ansible or shell scripts. It is not
permissible to use GUI based DevOps preinstalled
frameworks. Everything must be installable form the command line.
Datasets that may inspire projects can be found in :doc:`datasets`.
You should also review :ref:`sampleprojects`.
----------------------------------------------------------------------
Term Paper
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Term Report:
In case you chose the term paper, you or your team will pick a topic
relevant for the class. You will write a high quality scholarly paper
about this topic. This includes scientifically examining technologies and
application.
Content Rules:
Material may be taken from other sources but that
must amount to at most 25% of paper and must be cited Figures may
be used (citations in the figure caption are required).
As usual, proper citations and quotations must be given to such
content. The quality should be similar to a publishable paper or
technical report. Plagiarism is not allowed.
Proposal:
The topic should be close to what you will propose. Please contact
me if you change significantly topic. Also inform me if you change teaming.
These changes are allowed; We just need to know, review, and approve.
You can use the discussion list for the
`Term Paper `_
to form project teams or just communicate privately with other class
members to formulate a team.
Deliverables:
The following artifacts are part of the deliverables for a term
paper. A report must be produced while using the format discussed
in the Report Format section. The following length is required:
* 6 pages, one student in the project
* 9 pages, two student in the project
* 12 pages, three student in the project
A gitlab repository will contain the paper your wrote in PDF and
in docx or latex. All images will be in an image folder and be
clearly marked. All bibtex or endnote files will be included in
the repository.
Work Breackdown:
This document is only needed for team projects. A one page PDF
document describing who did what. The document is called
workbreakdown.pdf
The directory structure thus look like::
./paper.docx
./paper.pdf
./refrences.enl
./images/myniftyimage-fig1.pptx
./images/myniftyimage-fig1.pdf
Possible Term Paper Topics:
* Big Data and Agriculture
* Big Data and Transportation
* Big Data and Home Automation
* Big Data and Internet of Things
* Big Data and Olympics
* Big Data and Environment
* Big Data and Astrophysics
* Big Data and Deep Learning
* Big Data and Biology
* Survey of Big Data Applications (Difficult as lots of work, tHis is
a 3 person project only and at least 15 pages are required, where
additional three pages are given for references.)
* Big Data and "Suggest your own"
* Review of Recommender Systems: technology & applications
* Review of Big Data in Bioinformatics
* Review of Data visualization including high dimensional data
* Design of a NoSQL database for a specialized application
----------------------------------------------------------------------
Project Proposal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Project and Term Paper Proposal Format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please submit a one page ACM style 2 column paper in which you include
the following information dependent on if you do a term paper or
Project. The title will be proceeded with the keyword "PROJECT" or "REPORT"
A project proposal should contain in the proposal section:
* The nature of the project and its context
* The technologies used
* Any proprietary issues
* Specific aims you intent to complete
* A list of intended deliverables (artifacts produced)
Title:
* REPORT: Your title
or
* Project: Your title
Authors:
The Authors need to be listed in the proposal with Fullname,
e-mail, and gitlab username, if you use futuresystems or
chameleoncloud you will also need to add your futuresystems or
chameleoncloud name. Please put the prefix futuresystems: and/or
chamelon: in the author field accordingly. Please only include if
you have used the resources. If you do not use the resources for
the project or report, ther is no need to include them.
Example::
Gregor von Laszewski
laszewski@gmail.com
chameleon: gregor
futuresystems: gvl
Abstract:
Include in your abstract a short summary of the report or
project
Proposal:
Include a section called proposal in which you in detail
describe what you will do.
Artifacts:
Include a section Artifacts describing what you will produce
and where you will store it.
Examples are:
* A Survey Paper
* Code on gitlab
* Screenshots
* ...
Homework upload
---------------
A video of how to use the Webbrowser to upload the paper is available at:
Video: https://youtu.be/b3OvgQhTFow
Video in cc: TBD
Naturally if you know how to use the git commandline tool use that
which will have to master once you start working on your project or
term paper.