Question

我正在尝试按如下方法制作和测试线性模型：

  public class Given_Fragment extends Fragment {
        @BindView(R.id.recyclerView)
        RecyclerView recyclerView;
        @BindView(R.id.shimmer_view_container)
        ShimmerFrameLayout shimmerLayout;
        @BindView(R.id.msg)
        TextView message;
        //Initialization of Classes
        SessionManager session;
        ApiClient apiClient;
        List<ClassName> List = new ArrayList<>();

        Boolean isScrolling = false;
        boolean isDataLoaded = false;
        int currentItem, totalItem, scrollOutItem;
        //Properties to access
        Context context;
        String email;
        String userId;
        String access_token;
        GivenAdapter adapter;
        BottomSheetBehavior sheetBehavior;
        LinearLayoutManager linearLayoutManager;
        private boolean isLastPage = false;

        @Nullable
        @Override
        public View onCreateView(@NonNull LayoutInflater inflater, @Nullable 
        ViewGroup container, @Nullable Bundle savedInstanceState) {
            View view = inflater.inflate(R.layout.recycler_view_layout, 
            container, false);
            ButterKnife.bind(this, view);
            context = getContext();
            session = new SessionManager(context);
            HashMap<String, String> user = session.getUserDetails();
            email = user.get(SessionManager.KEY_EMAIL);
            access_token = user.get(SessionManager.KEY_ACCESS_TOKEN);
            userId = user.get(SessionManager.KEY_USER_ID);
            apiClient = ((Common) 
            getActivity().getApplication()).getClient().create(ApiClient.class);
            shimmerLayout.startShimmerAnimation();
            linearLayoutManager = new LinearLayoutManager(getContext());
            adapter = new GivenAdapter(List, context, 
            recyclerView);
            adapter.setHasStableIds(true);
            recyclerView.setLayoutManager(linearLayoutManager);
            recyclerView.setAdapter(feedsAdapter);
            recyclerView.addOnScrollListener(new RecyclerView.OnScrollListener() {
                @Override
                public void onScrollStateChanged(RecyclerView recyclerView, int 
                 newState) {
                    super.onScrollStateChanged(recyclerView, newState);

                }

                @Override
                public void onScrolled(RecyclerView recyclerView, int dx, int dy) {
                    super.onScrolled(recyclerView, dx, dy);
                    currentItem = linearLayoutManager.getChildCount();
                    totalItem = linearLayoutManager.getItemCount();
                    scrollOutItem = linearLayoutManager.findFirstVisibleItemPosition();

                    if (!isScrolling && !isLastPage) {
                        if ((currentItem + scrollOutItem) >= totalItem && scrollOutItem >= 0) {
                            LoadMoreData();
                        }


                    }
                }


            });
            getData();
            return view;



        }




        public void filter(String text) {

            feedsAdapter.notifyDataSetChanged();
            List<Rating> temp = new ArrayList();
            for (Rating d : feedsList) {
                String tempF = "";
                for (int i = 0; i < d.getFeatures().size(); i++) {
                    if (d.getFeatures().get(i).getName()!=null) {
                        tempF += d.getFeatures().get(i).getName()+",";
                        Log.d("This is name ", " Feature " + tempF);
                        if (tempF.toLowerCase().contains(text.toLowerCase())){
                            if (!temp.contains(d)){
                                temp.add(d);
                            }
                        }
                    }
                }
                if (d.getName().toLowerCase().contains(text.toLowerCase())&& !d.getAnonymous()) {
                    if (!temp.contains(d)){
                        temp.add(d);
                    }

                }

            }
            //update recyclerview
            feedsAdapter.updateList(temp);

        }

        private void getData() {


                        List.addAll(allgiven);
                        List.remove(0);
                        adapter = new GivenAdapter(List, context, recyclerView);
                        shimmerLayout.setVisibility(View.GONE);
                        recyclerView.setAdapter(feedsAdapter);
                        recyclerView.invalidate();


        }

        private void LoadMoreData() {
            //Log.d("LoadMoreData", "Executing");
            isScrolling = true;

                            if (allgiven.get(0).getSize() == 0) {
                                isLastPage = true;
                                Toast.makeText(context, "That's All", Toast.LENGTH_SHORT).show();
                            } else {
                                isScrolling = false;
                                allGiven.remove(0);
                                if (!List.contains(allgiven)) {
                                   List.addAll(allgiven);
                                    adapter.notifyDataSetChanged();
                                } else {
                                    Toast.makeText(context, "Call Repeated", Toast.LENGTH_SHORT).show();
                                }

                            }
                        }

        }



        @Override
        public void onActivityResult(int requestCode, int resultCode, Intent data) {
            super.onActivityResult(requestCode, resultCode, data);
            if (adapter != null) {
                adapter.onActivityResult(requestCode, resultCode, data);
            }
        }

        public void update() {
            adapter.updateOriginalList();
        }
    }

这将导致以下错误，指出public class AllRecievedAdapter extends RecyclerView.Adapter<RecyclerView.ViewHolder>{ public int layout_feed = R.layout.layout_feed; private List<ClassName> list = new ArrayList<>(); private Context context; private SessionManager session; private ApiClient apiClient; private String email, access_token; private UpvotesAdapter adapter; private BottomSheetDialog bottomSheetDialog; EditText location; EditText dateView; private Activity activity; public AllRecievedAdapter(List<ClassName> listList, Context applicationContext, RecyclerView recyclerView) { this.list = listList; this.context = applicationContext; this.activity = (Activity) context; this.templist = listList; } @NonNull @Override public RecyclerView.ViewHolder onCreateViewHolder(@NonNull ViewGroup parent, int viewType) { session = new SessionManager(context); apiClient = ((Common) ((Activity) context).getApplication()).getClient().create(ApiClient.class); HashMap<String, String> user = session.getUserDetails(); email = user.get(SessionManager.KEY_EMAIL); access_token = user.get(SessionManager.KEY_ACCESS_TOKEN); View view = LayoutInflater.from(parent.getContext()).inflate(layout_feed, parent, false); AllRecievedAdapter.ViewHolderReview listViewHolder = new AllRecievedAdapter.ViewHolderReview(view); return listViewHolder; } @Override public void onBindViewHolder(@NonNull RecyclerView.ViewHolder holder, int position) { holder.userName.setText(list.get(holder.getAdapterPosition()).getName()); GlideApp.with(context).load(list.get(holder.getAdapterPosition()).getImage()).into(holder.userImage); } @Override public int getItemCount() { return list.size(); } public void updateList(List<Rating> temp) { list = temp; notifyDataSetChanged(); } public void updateOriginalList(){ list = templist; notifyDataSetChanged(); } public class ViewHolder extends RecyclerView.ViewHolder { @BindView(R.id.userName) TextView userName; @BindView(R.id.userImage) CircleImageView userImage; public ViewHolderReview(View itemView) { super(itemView); ButterKnife.bind(this, itemView); } } }列具有lm_model <- lm(Purchase ~., data = train) lm_prediction <- predict(lm_model, test)数据帧中存在的值，但Product_Category_1数据帧中没有）：

factor Product_Category_1具有新的级别7、9、14、16、17、18

但是，如果我检查这些，它们肯定会出现在两个数据框中：

test

还显示train和> nrow(subset(train, Product_Category_1 == "7")) [1] 2923 > nrow(subset(test, Product_Category_1 == "7")) [1] 745 > nrow(subset(train, Product_Category_1 == "9")) [1] 312 > nrow(subset(test, Product_Category_1 == "9")) [1] 92的表，表明它们具有相同的因素：

train

Answer 1

一个简单的演练示例
用户建议
我们可以从拟合的模型对象获得有用的信息
好的，我知道问题出在哪里，但是如何使predict起作用？
有没有更好的方法来避免这种问题？

一个简单的演练示例

这里有足够简单的可重现示例，以提示您发生了什么情况。

train <- data.frame(y = runif(4), x = c(runif(3), NA), f = factor(letters[1:4]))
test <- data.frame(y = runif(4), x = runif(4), f = factor(letters[1:4]))
fit <- lm(y ~ x + f, data = train)
predict(fit, newdata = test)
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
#  factor f has new levels d

我为模型拟合的参数比数据更多，因此模型的秩不足（将在后面进行解释）。但是，这不会影响lm和predict的工作方式。

如果仅检查table(train$f)和table(test$f)，则该问题将不起作用，因为问题不是由变量f引起的，而是由NA中的x引起的。 lm和glm丢弃不完整的案例，即具有至少一个NA（请参阅？complete.cases）的行以进行模型拟合。他们必须这样做，否则用于QR分解的基础FORTRAN例程将因为无法处理NA而失败。如果您查看?lm上的文档，则会看到此函数有一个参数na.action，其默认值为na.omit。您也可以将其设置为na.exclude，但是保留na.pass的{{1}}会导致FORTRAN错误：

NA

让我们从训练数据集中删除fit <- lm(y ~ x + f, data = train, na.action = na.pass) #Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : # NA/NaN/Inf in 'x'。

NA

train <- na.omit(train) train$f #[1] a b c #Levels: a b c d现在具有未使用的级别f。 "d"和lm将在构建模型框架（以及后来的模型矩阵）时删除未使用的级别：

glm

这不是用户可控制的。原因是，如果包括未使用的级别，它将在模型矩阵中生成零列。

## source code of lm; don't run
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())

这是不希望的，因为它会为伪变量mf <- model.frame(y ~ x + f, data = train, drop.unused.levels = FALSE) model.matrix(y ~ x + f, data = mf) # (Intercept) x fb fc fd #1 1 0.90021178 0 0 0 #2 1 0.10188534 1 0 0 #3 1 0.05881954 0 1 0 #attr(,"assign") #[1] 0 1 2 2 2 #attr(,"contrasts") #attr(,"contrasts")$f #[1] "contr.treatment"产生NA系数。由fd和drop.unused.levels = TRUE强制的lm：

glm

mf <- model.frame(y ~ x + f, data = train, drop.unused.levels = TRUE) model.matrix(y ~ x + f, data = mf) # (Intercept) x fb fc #1 1 0.90021178 0 0 #2 1 0.10188534 1 0 #3 1 0.05881954 0 1 #attr(,"assign") #[1] 0 1 2 2 #attr(,"contrasts") #attr(,"contrasts")$f #[1] "contr.treatment"消失了，并且

fd

现在不存在的mf$f #[1] a b c #Levels: a b c级别将导致"d"中的“新因子级别”错误。

用户建议

强烈建议所有用户在拟合模型时手动执行以下操作：

[否。 1] 删除不完整的案例；
[否。 2] 降低未使用的因子水平。

这正是此处建议的过程：How to debug "contrasts can be applied only to factors with 2 or more levels" error?这使用户知道predict和lm的功能，并使调试工作变得更加轻松。

注意，列表中还应该有另一个建议：

[否。 0] 做自己的掩饰

用户有时可能会使用glm参数。但是存在潜在的陷阱：并非所有因子水平都可能出现在子集数据集中，因此以后使用subset时可能会得到“新因子水平”。

当您编写包装predict或lm的函数时，上述建议特别重要。您希望您的功能强大。要求您的函数返回信息错误，而不是等待glm和lm抱怨。

我们可以从拟合的模型对象中获得的有用信息

glm和lm在拟合对象中返回一个glm值。它包含实际上用于模型拟合的因子水平。

xlevels

因此，如果您没有遵循上面列出的建议并且遇到了因子水平方面的问题，那么fit$xlevels #$f #[1] "a" "b" "c"应该是检查的第一件事。

如果您想使用类似xlevels的方法来计算每个因子级别有多少种情况，可以采用以下方法：Get number of data in each factor level (as well as interaction) from a fitted lm or glm [R]，尽管制作模型矩阵会占用大量RAM。

好，我知道问题出在哪里，但是如何使table起作用？

如果您不能选择使用另一组predict和train数据集（请参阅下一节），则需要在test中设置这些因子水平，但不能在test至xlevels中。然后NA会针对这种不完整的情况预测predict。

是否有更好的方法可以完全避免此类问题？

人们想要进行交叉验证时，将数据分为NA和train。第一步是对整个数据集应用test，以消除na.omit噪声。然后我们可以对剩下的内容进行随机分区，但是这种幼稚的方式可能最终会结束

NA中有一些因子水平，但test中没有（哎呀，使用train时会出现“新因子水平”错误）;
predict中的某些因子变量在删除未使用的级别后只有1个级别。（糟糕，使用train和lm时会出现“ contrasts”错误） ;

因此，强烈建议您进行一些更复杂的分区，例如分层采样。

实际上还有另一种危险，但不会引起编程错误：

glm的模型矩阵是秩不足的（哦，使用train时，我们会收到“秩不足模型的预测可能会误导”的警告）。

关于模型拟合中的秩不足，请参见lme4::lmer reports "fixed-effect model matrix is rank deficient", do I need a fix and how to?秩不足不会引起模型估计和检查的问题，但可能会导致预测的危险：R lm, Could anyone give me an example of the misleading case on “prediction from a rank-deficient”?但是，此类问题更多很难避免，特别是如果您有很多因素并且可能存在相互作用。

Answer 2

Examples of poor binning

目前还不清楚您的数据是什么样子，应该使用预测变量的图更好地了解正在处理的内容。这是一个例子，说明缺乏通常会成为一个问题。

将计数数据分解为因子时，需要确保没有退化的类。即该类没有零表示或接近零表示。在班级上使用条形图。您将在图像中注意到，在将数据集拆分为虚拟类时，有几个类存在问题。如果这是收集数据的方式，那么您会陷入丢失数据的困境，可以尝试使用K近邻插补，但是如果丢失了太多数据，那么如果它研究数据，您可能不得不重新收集数据（重做实验），重新观察该过程等）。如果数据不可重现，则需要删除该预测变量并注释您的发现，以告知受众。

Answer 3

请参见https://www.r-bloggers.com/2016/08/data-splitting/

插入符包中的函数createDataPartition可用于创建数据的平衡拆分或随机分层拆分

如何调试线性模型和预测的“因数具有新水平”错误

3 个答案: