2019-06-17 Mini ML Workshop

You can find the formal agenda is on indico (will be filled!). Remote participants can join via zoom. This is an informal (but probably better) memo about the event. Find out about...

Goals

  • Day 1: ML-crash-course
    • Learn basics of gradient-based training ML algorithms
    • Learn how CNN works and applications in computer vision
  • Day 2-5: Code-sprint
    • Bring in your challenge and solve it! No need to be ML: developing software stacks for ML algorithm development, jupyter notebooks for new comers, anything is fine.
    • Build connections with other researchers applying machine learning. Share techniques and insights. How to access data fast? What's the best way to visualize? What are the most suited ML algorithms for you?
    • Provide an excuse to skip all boring meetings, nullify responsibility to reply emails during the workshop, so that you can just code!

Code-sprint Teams

Find the list of participants here. Contact each other and form a team, join the code-sprint!

If you want to join the existing team, you should contact the leaders. Note that, for some teams, you cannot expect to join on-the-fly (e.g. might expect some domain-expert knowledge). Make sure to check with the leaders.

  1. HEP - Neutrino - Patrick Tsang / Michael Mooney lead about 10 people will work on LArTPC (neutrino detector) physics analysis chain implementation using 3D sparse CNN techniques

  2. Accelerator - Auralee Edelen will work on tools development to apply sparse CNN or GNN for accelerator dataset. She will be there till Wednesday but absent on Thursday/Friday. Anyone who knows this problem and is interested in, please contact Auralee and make the plan!

  3. Cryo-EM - Frederic Poitevin will work on developing jupyter notebooks to interface Cryo-EM data for new comers to his group (and possibly some algorithms development, TBD).

  4. LCLS-CookieBox - Audrey Corbeil Therrien will work on LCLS Cookiebox data pipeline in preparation to be implemented in future on FPGA.

  5. Your Project - Your Name Here

Preparation for the workshop

Sorry if you hate slack, but that will be our communication channel. Please join from here. If you have a trouble joining, contact me.

For the lectures, there is not much to prepare: we assume you are familiar with basic calculus (e.g. chain rules) and no alergy on comic-sans. Kidding, we won't use comic-sans.

For the hands-on, we will use python(3) and assume you are familiar with the language. In addition, we assume you are familiar with numpy and Jupyter notebook to some extent. Also we will use data visualization tools like matplotlib.

You don't have to be a master of those tools (I'm not)! If you use them in your daily research work, that's probably enough. Read the next section to see if you need extra preparation. If you are sure you don't know those tools enough, skip to the last section.

Do I need preparation?

Here is a list of pre-workshop practice materials. We assume you understand the contents of these notebooks. The links below will open your notebook on Google colaboratory, and you can try to run the code by yourself. All notebooks are available in the github repository. Feel free to shoot me questions.

The last item requires GPU, which is available for free on Google colaboratory. Read the first notebook to remind yourself how to enable GPU!

I do need preparation

Below is the list of online example materials that may be useful for learning about basics of tools we will use.

Q&A

  • Q: No fee?

    • No fee 😊 but your $2-5 contributions will be appreciated for coffee/sweets.
  • Q: Should I come on day 1?

    • You should! But you don't have to come to ML-crash-course (will be mostly neural nets). We have 2 room reserved: 1 room for ML crash course, and the other room for code-sprint from day 1. ML crash course will cover very basics on how gradient-based learning works and typical choice of activation functions, optimizers, etc., plus hands-on examples. If you are new to ML, come over! If you have done ML but never bothered to learn how exactly SGD works, come over!
  • Q: What's "code sprint"?

    • For day 2-5 (Tue. - Fri.), you/your-team work on a problem you/your-team bring in. Use this workshop as a reason to block other meetings and emails, and focus on problem solving! Also, I want to everyone to meet/know each other. What kind of problem do you work on? How does data look like? What are the techniques you are using? You might learn some techniques from others to attack your problems, and those are the points for this kind of code sprint.
  • Q: What can we do in "code sprint"?

    • Anything! Could be machine learning (ML) problem or not. For instance, your can work on software base for ML algorithm development: how to store your data, how to read/write fast, how to visualize ... none is ML, but super vital for developing any algorithms (including ML). You can work on jupyter notebooks for new comers in your group. You can work on pytorch/tensor-flow APIs to read-in your data and solve a simple problem. I know MANY groups at SLAC needs these basic software workflow, and you might find collaboration opportunities.
  • Q: Is "code sprint" same as "data challenge"?

    • No, to me, data challenge is where the host provide data + task predefined. We don't do data challenge this time (so you bring in your own problem to work on :) ).
  • Q: How is "code sprint" different from "hackathon"?

    • To me, the difference is that we won't lock you in the room with bed and excellent food/drink supply to work 24/48/72 hours straight. The door is unlocked and you go home when you wish :) Though we cannot provide you bed/food beyond coffee/snacks.