Companies that want to get new insights from existing data often face the decision to have their own data, scientists to build their own application with a Python KI suite, or a ready-to-use AI functionality platform or AutoML use.
ML-based data analysis is for data scientists or developers, also a complex task that begins with the choice of frameworks used. But the number of frameworks and tools in the rapidly developing world of AI and machine learning (ML) is almost unstable. Even more complex, if you want to get insight into unstructured content and tap the potential of these "dark data". Unstructured texts contain a great deal of company information but are difficult to evaluate systematically.
If natural language processing (NLP) is added to machine learning in the context of text and content analytics, the volume of data is much larger and data processing is more difficult – and more important. Tokenizer, Stemmer, Lemmatizer, Word Embedding, TF / IDF, etc. They must be well-controlled by data scientists and developers, in addition to the diverse world of ML, neural networks and deep learning.
Soon the question arises whether the data scientist should build an individual application for the company with his "Python KI kit" and manage this complexity himself. Or is it better to set up a platform with integrated AI features that experts have assembled for specific application experts and that may even offer an "AutoML" feature? As with "make or buy" as usual, there is no universal answer here. An open view of the advantages and disadvantages of both approaches to decision-making is needed.
BYOAIT – Bring your own AI tools
If the enterprise data scientist has to solve a machine learning problem, he faces numerous challenges. Ordinary tasks such as "document types in inbox should be automatically classified" or "customer complaints should be categorized thematically" are usually easy to understand. However, whether the problem can be resolved well with AI, at what rate of recognition, by which algorithms and under what frameworks, is difficult to assess. It also depends on the data available and the experience of the data scientist. Here: Try to study.
This is certainly one of the main reasons why Python is so popular as a data scientist's scripting language. Along with many ML and NLP frameworks and the Jupyter notebook – the data scientist's preferred IDE – a playground for data analysis is available. Ad hoc can be used to process data, transform it, build learning quantities, train models and then test them. This flexibility is a key argument in support of this approach.
Python's most important ML frameworks are Skateboarding, Pitcher, Keras, Teano and Pandas. Scikit-learning in turn uses NumPy and SciPy and is currently among the most popular frameworks. Of course, Python developers also have access to other popular libraries developed in other language programs. This is where the Big Five and their libraries go: TensorFlow (Google), CNTK / DMTK (Microsoft), MXNet (AWS), CoreML (Apple) – Facebook is represented by the aforementioned PyTorch. But neither DeepLearning4J, Apache Spark.MLlib or Caffe should go without thinking. Thus, it can quickly descend into the open playing field in the wild. The article IX 10/2019 Brain Building Kit gives a classification of frames.
In the context of "Text and Content Analysis", more NLP libraries are usually needed. Without knowledge of the Natural Language Handbook (NLTK) or the spa, Python's developer should have a hard time. But are these libraries enough? What is the significance of the algorithms offered in libraries?
In fact, here's the pepper rabbit: Which algorithm is suitable for which application and what features are even learned? Because of all the math, the real goal disappears quickly.