Skip to main content

My journey with Bioconductor


My experience with Bioconductor


I been involved with the Bioconductor project for several years and last year I attended the BioC2019 conference in New York. It was my first time in an international conference abroad. I'm posting here my experience with the project.

Pre-conference

The first time I heard about Bioconductor was in 2013 when in an internship, my main goal was to analyze data from Pichia pastoris and look at the functions of the differentially expressed genes through gene ontology. I did it with topGO and other packages included in Bioconductor.

Later during the master in Bioinformatics for Health Science I studied, we used several packages from Bioconductor to analyze other data (the main project is still online).

Also as part of the master thesis I also used some other packages (the thesis involved using WGCNA) but the main question roadblock was knowing the function and what are the genes doing. So I developed a package to compare the annotation of genes according to which pathway their are involved with (There is a similar package that did the same but using the gene ontology GOSemSim). In march 2017 the BioCor package was included in Bioconductor release. Shortly after I contributed to GOSemSim to speed up the process.

As I ended the master and started the PhD I realized with BioCor that the annotation data used for BioCor is the biggest problem. There is bias towards some pathways and genes, there are some genes highly studied and some others neglected... I wanted to compare two or three pathway databases but didn't have the stem to do it own my own time above everything else I was doing.

However I thought that I could expand the basic class for gene sets used in Bioconductor implemented in the GSEABase package. This package is maintained by the Bioconductor core team but when I raised some issues I was told they wouldn't be addressed. Pull requests were welcome but when I asked for clarification I didn't get a reply.

Seeing that I couldn't help further improving that package I went on to create a new one that extended some capabilities of the classes implemented on GSEABase. The new package, GSEAdv, implemented some new method for gene sets and I was working with simulating gene sets following to assess the accuracy of the pathway databases.

After some months I realized that I was starting from a so at the end of November 2018 I started developing a new package, this package aimed to replace the class to represent sets and allow for arbitrary information to be included as well as improving the way to work with gene sets.

On December 2018 I sign up on the Bioconductor slack and got in touch with a group of people discussing how to improve or change the gene set process. We were three people involved but instead of joining efforts it was decided to develop three different packages "to explore strategies". 

We have shared document with ideas about the implementation of the new class and what we would like to be able to do with it. By the same time I also helped rewriting the submission process, we had two teleconferences were we talked and showed our respective packages. At one point it was discussed to present this work at the BioC2020 conference.

BioC2020

I started considering going to the conference on febrary, I applied for the travel schoolarship of the conference and for other travel expenses with my employer. On April I was awarded the schoolarship, booked the flights and prepare the conference.

It was the first time I left Europe and going alone. I left the 23rd June, which is celebrated much in Catalonia with fireworks and social gatherings. I had booked the flights like three months in advance, I was flying via Portugal.

The flight was delayed like 2 hours and when we got there we had to run to reach the next plane, they were waiting us. But after several hours and a couple of films later I reached New York.

Once I reached the hotel from the schoolarship I realized there wasn't a room for me. Fortunately for me I found some other people from the conference which reached the organization (Thanks Gabriel!). We were a small group and decided to go out to eat something and have some drinks. The conference started next day with the developer day,

The developer day had plenary sessions and lighting talks about new packages or results. Then the Birds of Feather (BoF about several topics.
For our Birds of Feather session we were given the Farkas Auditorium which exceeded my expectations. I thought we would talk the topic in a small group of at most 20 people. I started the BoF presenting why we set on this task then we )Kayla, Kevin and me) presented our respective packages and ideas behind. We got many questions and feedback, with different points of view about what and how to rethink the gene set storage at Bioconductor.
Simina M. Boca ending her presentation

The following two days the conference was at another location, but it was full of interesting sessions and workshops. I found particularly interesting the docker session from Nitesh, also the enrichment workshop from Ludwig. I wasn't involved in single cell data but went to the amazing presentation of Helen.

I could also present my poster, and was rewarded meeting for the first time a user. Several people got close to ask about BioCor. Hope they found it interesting.

Besides the official sessions and activities there were many meetings and people going around for dinner or going around. The sticker exchange was high ( I got plenty including MultiAssayExperiment, limma) . The atmosphere was friendly, and I could personally meet several people I've been years reading, like Leo or Martin. Surprisingly I only met one other person from Spain, one of my professors Robert and we could talk a bit while visiting central Park lead by Levi.

While I enjoyed a lot the conference and learned many things my take of the conference was the people I met; all doing different science and different background but awesome.

Post-conference

A couple of months after the conference I asked how would we continue with the set packages if we would merge efforts or something. Unfortunately the response was "'We' have invested considerable effort in developing this, and it will be submitted". The BiocSet package was included in Bioconductor release 3.10. Sincerely, this made me reconsider my efforts on submitting more packages to Bioconductor. Other packages that I have developed will not be hosted on Bioconductor.

Recently there have been more efforts to be more transparent, with three new boards, the Scientific, Technical and Community board. Also de developer forum helped to bring topics developers find using or managing Bioconductor packages to a wider group. For instance it was proposed to me to held a developer forum about how to maintain a package in the long term and how and when Bioconductor team intervene in maintaining packages.

However my impression is that depending to who you are connected you get things more easily done regarding the project. Or perhaps those that are better connected know how and what to ask so they get more from the project.


Closing


The methods and packages on Bioconductor are generally of high quality, the project management is improving.

One of the few photos of a plenary session on the developer day

Popular posts from this blog