SeedMe workshop: Collaborative data sharing infrastructure for researchers
Venue: Synthesis Center/Vis Lab E-B143, San Diego Supercomputer Center, UCSD(See directions)
Maximum attendees: 25 (3 spots available as of Aug 17, 2018)
Lodging: The workshop will cover up to two days of stay (arriving Aug 23 and departing Aug 25) at on-campus room and board (dorm style private room, shared common area and bathrooms), meals are included for two days of stay. However, attendees may choose to stay elsewhere and must cover the associated costs on their own. Lodging will be provided for academic non San Diego residents only. Lodging may be requested no later than Jul 31, 2018.
Travel: Attendees must cover travel costs on their own
Food: Light refreshments, lunch & dinner will be provided on the workshop day
Contact us: Send email with any queries you may have regarding the workshop to "amit AT sdsc.edu"
Registration fee: $50
Data is an integral part of scientific research. With a rapid growth in data collection and generation capability and an increasingly collaborative nature of research activities, data management and data sharing have become central and key to accomplishing research goals. Researchers today have variety of solutions at their disposal from local storage to Cloud based storage. However, these solutions solely focus and rely on hierarchical file and folder organization. While such an organization is pervasively used and quite useful, it relegates information about the context of the data such as description and associated collaborative notes to external systems, dispersing this vital information into different silos not only impedes the flow research activities in near term, but also has an impact on mid and long term retention of knowledge about intermediate steps.
In this workshop, we will introduce and provide hands on experience with tools designed to mitigate this critical gap via the NSF supported SeedMe2 platform. The SeedMe2 platform leverages the familiar hierarchical file and folder organization structure, but extends it with an ability to add data, its description and discussion in one system. It also allows a folder to be shared either privately with collaborators or publically for wider dissemination of information. Users may interact with the system via the web browser, command line utility or via REST API using familiar concepts for each method. The platform will enable users to rapidly share and access transient data and preliminary results with collaborators in consumable form. The workshop aims to provide practical training to customize and utilize this infrastructure and enable attendees to overcome existing gaps in collaboration as well as realize several aspects of research data management.
Note: SeedMe2 platform focuses on data that can be transferred easily on the web with standard tools such as stock Web Browsers. This limits the upload sizes to 2GB per file, however any number of files may be uploaded. Moreover derived products from large scale raw data tends to be small, so this workshop and platform is still highly relevant to large data producing groups. In future the platform will likely support larger size uploads.
- A laptop computer is required (tablets will not be sufficient for this workshop)
- Software requirements
- Web Browser: A recent version of one of the following web browser for your operating system is required
Chrome, Edge, Firefox, Opera or Konqueror. (Internet Explorer is not sufficient for this workshop)
- SSH Client:
Windows:Download and install PuTTy software
Linux and Mac: Built in the operating system
- Attendees must be proficient at using web browser and able to make simple edits to text files on a terminal with instructions
- (Optional) Ability to use command line interface
Who may wish to attend this workshop?
We welcome a broad set of attendees to this workshop with complimentary interests such as
- Researchers: Interested to set up or use data sharing for project or personal use
- Research IT/Cyberinfrastructure providers: Provide a predefined or custom data-sharing configuration to your users
- Scientific application/Gateway developers: Integrate/extend your Application/Science Gateways to provide data sharing capabilities and increase your impact by disseminating exemplar content from your users.
- Data curators: Create powerful repositories with custom fields to allow easy discovery via search and interactive exploration. Integrate other tools with repository content via powerful web services.
We anticipate by the end of the workshop the attendees will be able to accomplish the following
- Gain understanding and working knowledge of SeedMe2 platform and how to leverage it for research data needs
- Take away a working research data sharing website with your own branding that provides
- Ability to configure and manage site wide data sharing
- Customized data properties tailored to your project needs
- Customized website for other uses such as to disseminate project news, publications, etc.
- Automation and integration: Learn to use command line and web services tools that could be used from remote resources (such as HPC clusters) as well as for automation
- Deployment options: Learn where such a data infrastructure may be hosted
- Do-it-yourself: On site
- Third party vendor
- Regulatory compliant hosting
Workshop agenda (Tentative)
Instructors: Amit Chourasia and David Nadeau
Intern assistants: Aniruddha Alwani, Austin Chen, Rahul Kulkarni, Ashwanth Muruhathasan, Ryan Wei
Logistics: Susan Rathbun
- 9:00am: Welcome and opening remarks by Dr. Mike Norman, Director SDSC
- Logistics and attendee introduction
- 9:30am: Keynote: Collaborative Curation through Generative Value by Mark Parson (Sr. Scientist at Rensselaer Polytechnic Institute, former Secretary General of Research Data Alliance)
- 10:30am: Break
- 10:40am: Lightning overview of web servers, databases and content management systems (David Nadeau)
- 11:30am: Overview of SeedMe2 platform (Amit Chourasia)
- 12:00pm-1pm Lunch
- 1:00pm: Invited talk: Handling protected data by Sandeep Chandra (Director - Health Cyberinfrastructure Division, SDSC)
- 1:20pm: Intro to Drupal + Exploring/setting up your new data sharing website (Hands on - Amit Chourasia)
- 3:00pm: Break
- 3:10pm: Configuring SeedMe2 building blocks (Hands on - David Nadeau)
- 3:30pm: Customizing and extending your SeedMe2 website (Hands on - Amit Chourasia)
- 4:00pm: Using command line to interact with SeedMe2 (Hands on - David Nadeau)
- 4:30pm: Capstone talk: Dr. Alexey Arefiev, Laser Plasma Group @ UC San Diego
- 5:00pm: Discussion
- 5:30pm: Dinner
Keynote: Collaborative Curation through Generative Value
Speaker: Mark Parsons (Sr. Scientist, Rensselaer Polytechnic Institute, former Secretary General of Research Data Alliance)
Abstract: Research data sharing and management is increasingly recognized as an important part of the scientific endeavor because it can increase the value of data to research and society. Nonetheless, it is not well understood how to measure the value of research data. Guidance to researchers about data management has largely been limited to data sharing mandates and management plan requirements. Scant attention is given to the need for active and ongoing data curation. In this talk, I suggest we adopt the concept of “generative value” for data by extending Zittrain’s (2008) definition of generativity: “the capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.” For data this means we must accommodate unanticipated use and unfiltered modifications. Building from decades of international experience and several case-studies, I present an argument and some initial methods for how data can be curated collaboratively by data management and disciplinary experts to maximize generative value.
Venue, directions and parking
Venue: Synthesis Center/Vis Lab E-B143
Address: San Diego Supercomputer Center (SDSC)
10100 Hopkins Drive, La Jolla, CA 92093
Google maps exact location
San Diego Supercomputer Center’s Synthesis Center E-B143 is located on B1 floor of SDSC’s east entrance, take the stairs just off the driveway on Hopkins Dr, close to the Hopkins Parking Structure, Northwest end of UC San Diego campus.
Local map: Download map with SDSC location, housing, parking, coffee, restaurants.
Airport: The San Diego International Airport (SAN) is the closest airport to UC San Diego and SDSC.
Driving: For driving directions see the visitors page on the SDSC website
Taxi / Shuttle
Cab or shuttle Pick-up/Drop-off: 10100 Hopkins Drive, La Jolla, CA 92093
- Ride sharing services: Lyft & Uber
- Yellow Cab: 619-444-4444
- Super Shuttle: 800.974.8885
Public transportation: Surrounding UC San Diego
The SeedMe workshop is based upon work supported by the National Science Foundation under Grant No. 1443083. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.