Thoughts on JetBrains' 2018 Data Science Survey

As I’m considering pursuing a career in data science, I found JetBrains 2018 Data Science Survey interesting because it gives me a sense (albeit an imperfect one) of which tools and technologies might be most useful to learn.

Here are my takeaways from the survey:

  • The most popular programming languages regularly used for data analysis are:
  • Python 72%
  • Java 62%
  • R 23%
  • As an aside, Kotlin runs on the Java Virtual Machine, integrates with Hadoop and Spark, and is more concise than Java. It is sponsored by JetBrains, and the survey acknowledges that it likely has some bias, but Kotlin may be an up-and-coming language.
  • Spark is most popular for big data, followed closely by Hadoop.
  • Jupyter notebooks and PyCharm are the most popular IDEs/editors.
  • TensorFlow is the most popular deep learning library. (TensorFlow is lower-level than scikit-learn, according to these Quora answers.)
  • Spreadsheet editors and Tableau are the most popular statistics packages for analyzing and visualizing data.
  • The most popular operating systems are:
  • Windows 62%
  • Linux 44%
  • macOS 37%
  • Computations are performed on:
  • local machines 78%
  • clusters 36%
  • cloud service 32%
  • The most popular cloud services are:
  • Amazon Web Services (AWS) 56%
  • Google Cloud Platform 41%
  • Microsoft Azure 28%
  • The correlation seems to be that the more expertise one’s manager has an data science, the more one tends to agree with this statement: "My manager gives me realistic assignments that are relevant to my skills and responsibilities, with a clear and specific description of the requirements."

It’s nice that I already have experience with Python, Jupyter, PyCharm, spreadsheet editors, Windows and Linux, and AWS.

I intend to next learn pandas.

After that, my priorities would probably be:

  • scikit-learn
  • Spark (Hadoop?)
  • TensorFlow
  • Tableau
  • Java