European Data Science Academy Register - V1.4 (July 2016)

Data set reference and name Data set description Standards Data sharing Archiving and preservation
Work Package Associated costs and how these will be covered - do you need to purchase storage? How much time will it take for a person to manage the data - how will this be covered? Generated or collected What standards and methodologies will be utilised for data collection and management? Licensing, data protection, ownership and copyright How long should the data be preserved? How will it exceed the length of the project if necessary?
WP1 Approximately 1 day person effort per month Collected All data collected is translated into CSV format. The terms of the LinkedIn user agreement now forbid harvesting and collection of data without express permission. When the data was collected, this was not the case. https://www.linkedin.com/legal/user-agreement?trk=hb_ft_userag Until the end of the project
WP1 Approximately 1 day person effort per month Collected All data collected is translated into CSV format. The data will be available for use via the EDSA dashboard However it will not be available to download as this contravenes Trovit’s terms and conditions. Until the end of the project
WP1 Approximately 1 day effort per month Generated Data collection methods outlined in D1.4. Translated into CSV format. Raw data will be owned by the project and unlicensed. It will not be available for reuse. Until the end of the project
WP1 Github free and public Generated Data collection methods outlined in D1.4. Translated into CSV format. Creative Commons Attribution (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/ As long as Github exists as a minimum. Beyond that a value judgement would have to be made.
WP1 Github free and public Generated Data collection methods outlined in D1.4. Translated into CSV format. Creative Commons Attribution (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/ As long as Github exists as a minimum. Beyond that a value judgement would have to be made.
WP1 As part of the subcontracting costs of WP1 Generated Qualitative research methodology for collection outlined in D1.4 Raw data will be owned by the project and unlicensed. It will not be available for reuse. Until the end of the project
WP1 Approx 2 person days per month. No other external costs Collected The ideXlab search engine will use the sampling approach outlined in D1.2. for data collection. CSV data will be created Raw data will be owned by the project and unlicensed. It will not be available for reuse. Until the end of the project
WP2 0.5 days per month Collected Systematic search and review of available data science courses. The search terms were Data Science, Big Data, Data Analytics, Business Analytics, Machine Learning, Distributed Computing, Advanced Computing Data Science Stream, Data Analytics stream. The data is licensed under a Creative Commons CC-BY 4.0 licence Until the end of the project
WP2 None Both None GNU GPL V3, http://www.gnu.org/licenses/gpl-3.0.en.html As long as the owners do not remove them. If the datasets are no longer accessible, other similar datasets will be used in the module.
WP2 none Collected Management throuh 3TU data center Non-commercial licence " As long as the owners do not remove them. If the datasets are no longer accessible, other similar datasets will be used in the module. "
WP3 Approximately 1 day per month during the project’s lifetime Collected CSV is used for Videolectures API The data is licensed under a Creative Commons CC-BY 4.0 licence the data will be available after the project ends as part of the project's learning materials
WP3 Server storage has already been purchased. Effort for analysing the data has been allocated in Task 3.4. Generated The xAPI specification is used for expressing the data; the open source Learning Locker software is used for storing and visualising the data. Creative Commons Attribution (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/ At least until the end of project
WP3 N/A Collected JSON is used for Videolectures API Raw data will be owned by the project and unlicensed. It will not be available for reuse. at least until the end of project
WP3 N/A Collected JSON is used for Videolectures API Raw data will be owned by the project and unlicensed. It will not be available for reuse. at least until the end of project
WP3 N/A Generated JSON is used for Videolectures API Raw data will be owned by the project and unlicensed. It will not be available for reuse. at least until the end of project
WP3 N/A collected Data collection is managed by Coursera Raw data is owned by TU/e and cannot be shared due to Coursera restrictions of use. N/A
WP4 Free storage. 0.5 day per month Collected Quantitative recording of website traffic via Google Analytics dashboard, analysed using a variety of analytic tools. Raw data will be owned by the project and unlicensed. It will not be available for reuse. at least until the end of project
WP4 Free storage. 1 day per month Collected Regular access of data from analytics.twitter.com Data will be licensed in compliance with each social network's terms and conditions Until the end of the project
WP5 Free storage. 1 day per month Generated Report detailing results from interviews and exploitation activities Raw data will be owned by the project and unlicensed. It will not be available for reuse. Until the end of the project
WP5 Stored in external repositories - EDSA website and Github; approximately 2 days per month effort for maintenance. Generated Project partners update every three months until the end of the project. ODI responsible for conversion to CSV and publication as open data. This dataset is published on Github, under a CC-BY licence. As long as Github exists as a minimum. Beyond that a value judgement would have to be made.
Download CSV